All Articles

Human Monoliths

Conway’s Law states that any organization that designs a system will inevitably produce a design whose structure is a copy of the organization’s communication structure. In other words, software architecture becomes a reflection of team structure and communication pathways between teams. If that’s the case, then some organizations must be more likely to produce well-designed modular systems, whereas others might be doomed to build monolithic applications. Human Monoliths are teams whose structure drives them to create tightly-coupled, hard-to-maintain software. Let’s take a look at the different types of human monoliths as well as a few tips on how to avoid becoming one.

The Problem With Monoliths

A monolith is an application where multiple heterogeneous components are built and deployed as a single unit. While it’s possible to build modular monoliths, all too often we see monoliths degenerate into tightly-coupled, hard-to-maintain, slow and bug-ridden big balls of mud. This gave monoliths a pretty bad reputation and nowadays you will hardly find anyone who’ll admit to building a monolith.

The goal of software architecture is to minimize the human resources required to build and maintain a system. This is accomplished by drawing boundaries around components and ensuring that the dependency arrows between components point in one direction. Defining components with clear boundaries and APIs makes it easier to add features and introduce changes with few disruptions to the existing code and functionality.

The main problem with monoliths is the tight coupling and blurred boundaries between components. This makes changes more risky, code harder to understand, and tests more difficult to write. With the growth in complexity and coupling, developer productivity and the rate of change grind to a halt as more time is spent on fighting regressions and bugs. See this HN comment for what working on a monolith might look like.

Microservices and Conway’s Law

Monolithic architectures are especially harmful to large engineering organizations. Not only are these applications complex and slow to develop, but they also limit the pace at which the individual teams can move by tying everyone to the same deployment schedule. This is particularly problematic at consumer-centric companies, where the pace of innovation, rate of experimentation, and the ability to get new ideas to production fast are a core competitive advantage.

Overcoming these challenges drove the migration to microservices across many organizations (e.g., Netflix, Twitter, and Uber). Microservices are small services that are built around a business capability and which allow loosely coupled teams to independently deploy their code to production. It’s important to emphasize that microservices are a solution to an organizational problem, not a technical one. Their main purpose is to help scale an engineering organization by decoupling teams and allowing them to make progress independently. Granted, microservices can have other benefits such as better horizontal scalability, fault tolerance, or polyglot persistence. However, these are secondary effects and are not the main driver for microservice adoption.

This is why Conway’s Law is so important for the understanding of microservices: they are the direct application of the idea of aligning of loosely coupled teams with loosely coupled systems that they build. Arguably, the popularization of microservice architectures can be traced to the Amazon’s API Mandate which requires teams to expose functionality through APIs and to only communicate through these APIs. Combined with the idea of 2-pizza teams that puts a bound on how large a team can grow, microservices are a logical outcome. The goal of these changes was to reduce communication overhead by encouraging collaboration within small, tightly-knit teams while keeping communication across teams at the high level of API contracts.

In summary, microservices are the consequence of creating highly cohesive and loosely coupled teams and aligning system boundaries with team boundaries. What happens if this is not the case and if we try to build microservices without the consideration of team structure? This brings us to the main topic of this article: human monoliths.

Human Monoliths

Microservices are not a panacea for all ills. Google “distributed monoliths” and you’ll come across multiple examples of microservice systems turning into distributed big balls of mud with some companies abandoning microservices entirely and moving back to a monolith. Tightly-coupled microservices combine the worst of both worlds: you get the slow pace of development of a monolith, but now you also pay the cost of network latency, complex failure modes, automation, and observability challenges, and on top of this, you have to coordinate deployments across systems. Some team structures are more at risk of developing such systems.

Functional Teams

One of the bigger threats to microservice architectures are teams that are organized around a technical function, for example, storage, ESB, RPC, or messaging. Conway’s law pushes such teams to build services around technologies. Remember that microservices are decomposed by business capability or subdomain to keep them decoupled and independently deployable. This is hard to achieve for technology-focused services because they are highly interdependent any new product feature would likely touch multiple services. Consider the example of Sprouter Service which was built by Etsy to encapsulate DB access and hide the database implementation from the application layer. It was designed to allow the development team to focus on the application logic while the database team would own the stored procedures managed by Sprouter. In practice, any changes to business logic now required changes both to application code and stored procedures, causing significant friction between teams. Testing became more complicated and deployments had to be synchronized to prevent outages caused by tight coupling. Ultimately Sprouter had to be scrapped and replaced by an ORM layer that was free of business logic.

To be clear, it is completely ok to have teams that are focused on technologies (e.g., storage, compute, service meshes or frameworks) as long as they build libraries and platforms that have stable APIs, have an independent lifecycle, and are free of business logic.

Large Teams with Blurred Boundaries

It is great to be a part of a high-trust low-turnover tightly-knit team with a strong sense of identity. Also known as ”jelled teams,” such teams are more productive, a pleasure to manage, and a great environment to learn. Another task that these teams excel at is service ownership. End-to-end service ownership aligns incentives between product, development, and operations. It creates a feedback loop from production, a sense of a long-term investment and, as a result, an incentive to keep tech debt and architecture decay in check through refactoring, thoughtful design, and automation. Furthermore, it helps the team to become experts in their business domain and get more involved in the product development process. Finally, a team with clear boundaries is better positioned to build microservices that have clear boundaries as well.

There are many reasons why we often see blurred team boundaries and the resulting shared ownership of services, from setting up feature teams that contribute to multiple services, to letting anyone commit to a service codebase in order to overcome delivery bottlenecks, to the desire for better collaboration and knowledge sharing. While blurring team boundaries in the name of collaboration might sound and feel good, the long-term effects are harmful both to the teams and the systems they build. Compare this with object-oriented design: we build classes to collaborate, yet we also try to hide their implementation details as much as possible from the outside world. We don’t do this because we are secretive or greedy, but to reduce complexity for the collaborators of a class and to prevent coupling to the implementation details. The same logic applies to teams: we establish clear boundaries not because we don’t want to work with other teams, but because this is the only way to scale an engineering organization. Developers should be able to understand and update the code of a service without knowing anything about the internals of its peer services or the inner workings of the teams they collaborate with.

Breaking up Human Monolith

More than any particular advice on what to do, it’s important to remember the end goal of deploying microservice architectures (scaling the engineering organization) and to monitor the right metrics such as the frequency of production deployments and lead time of feature development. Services have to be independently deployable, they have to fail independently, and they have to be highly autonomous. For this to work, both services and teams that own them must focus on the end-to-end delivery of a business capability.