Imperfect #13: Killing the Distributed Monolith
Going faster through healthier boundaries
One anti-pattern I see a lot is the distributed monolith: a microservices architecture where teams allow a thick network of dependencies to grow among them. They become entangled in that net, unable to make progress. Everyone is busy all the time yet no visible progress is made. Deliveries grow late and customer satisfaction plummets.
The Settings page from hell
Picture a retailer website. There’s an account settings page, with links to all kinds of preferences. One team is responsible for everything on that page. Because they own it, everyone comes to them for changes.
Say there’s a team working on Deliveries. They want the user to be able to pick delivery time slots. They change their services to account for that. But they don’t own the settings page, so there’s no way to receive time slots from the user. So they go to the Account Settings team, and ask them to please add time slots. They helpfully suggest for a form to be added to the Settings page, and for the User database to be altered to store time slot data. When preferences are saved, a request should go out to the Delivery API in order to update them there as well. This is because we’re doing microservices, so everyone talks via APIs and stores their own state. Everything is by the book so we should be going really fast.
Because the Account team has ten other such requests on their plate, they’re late. The Delivery team starts work on something else. The Account team comes back two sprints later with something that doesn’t quite work, but it takes the Settings team half a sprint to realize it. Many meetings happen. The Account team scrambles to fix an incident, then is tasked with a high priority project. Delivery team switches to work on something else yet again. The constant context switches cost them a lot of time and frustrate their developers. A key person leaves, taking lots of domain knowledge with them. Next time the Settings page comes back for review, no one has the full picture anymore.
Delivery Time Slots ship six months later. It works but it’s ugly and there’s a weird bug no one can quite reproduce and fix.
Setting healthy boundaries
It takes an incredible amount of work to get engineering organizations to a point where the story above is rare. I’ve seen similar stories in organizations of all sizes, from thousand-strong corporations all the way down to projects maintained by a single developer. It’s an organizational as well as technical problem, which can only be solved through rigorous and methodical boundary setting and architectural review.
What I normally do, with varying degrees of success, is try and find ways for the work to land as close as possible to those who need the change.
If I want a change, is it quicker and more predictable to ask someone else to do some of the work, or for me to own all of it?
This question is not about avoiding work. It’s about maintaining focus and control.
A framework I like to share with the managers I support goes something like this (details vary depending on the organization):
Assuming changes are to data and/or logic:
Does the data belong uniquely in any system we own? Are, or should we be, its canonical custodians?
If so, review our data models. We might not be meeting a reasonable expectation.
Does this data already exist somewhere else?
Is there a more natural home for it?
Can those who need the change manage that data instead?
Does the logic belong uniquely in any system we own? Is it a state change to data we own, or an operation we’re responsible for?
If so, review our logic. We might not be meeting a reasonable expectation.
Does this behaviour already exist somewhere else?
Is there a more natural home for it?
Can those who need the new logic implement it themselves in a decoupled way (ex: we agree to own the data, but we publish change events which they can listen to)?
Often, teams fail to model the data that they manage in a way that anticipates reasonable future needs. When that’s the case, the right answer is often to accept the change request, and try and find time to do a proper data modelling session.
But, just as often, boundaries can and should be drawn, between teams as well as their services.
The Settings page from heaven
Following the framework above, the Settings team decides to give control back to their stakeholders. They decide that, while the data and logic for Time Slots belong to the Delivery team, it’s their responsibility to provide the tools to make that ownership possible.
So they agree to make the settings page dynamic. When it’s rendered, it gets a list of links to sub-pages from a configuration source. It then simply shows a list of those links, each pointing to a sub-page the team doesn’t own. Business, Design and UX stakeholders are free to change that list as they please, without the team’s involvement. It becomes customizable.
Meanwhile, the Delivery team develops their own settings sub-page to go along their backend API. They own the entire vertical slice of functionality, so they can plan and iterate completely independently.
The Settings team focuses on delivering a better Settings experience, in this case for internal developers.
The Delivery team focuses on a better Delivery experience for the end user.
Boundaries are clear, no one blocks anyone, and the Time Slots feature ships in two weeks. Some bugs sneak past QA, but a couple developers get together for an hour and squash them, because they understand the entire interaction end to end.
As a friend of mine recently said, having dependencies, in code much like in life, is like being in a relationship with someone who doesn’t love you. You can’t rely on what’s coming from the other side, so your needs go unmet.
There’s an art to telling people to do the work themselves when they’re asking you to please do it. It’s important to frame this boundary-setting not in terms of us refusing work, but in terms of us helping them be independent.
Having a simple framework, and using it as early in planning as possible, is something I found helpful. If you struggle with distributed monoliths or other kinds of dependencies - hopefully not in your love life - I hope it helps you too.
Thanks for reading.