[Translation] Your distributed monoliths weave intrigues behind your back

[Translation] Your distributed monoliths weave intrigues behind your back

Hello, Habr!

Today's translation affects not only and not so much microservices - a topic that is on everyone’s lips today - but also recalls how important it is to call things by their proper names. The transition to microservice architecture is sometimes necessary, but, as the author once again emphasizes, it requires carefully calculating the consequences. Enjoy your reading!

From time to time I ask the same question.

Is there such an important truth in which only a few agree with you? - Peter Thiel

Before I sit down for this post, I spent a long time trying this question on one topic, which today is in a serious trend - talking about microservices. I think now I have something to tell; some findings are based on reflections, others on practical experience. So, telling.
Let's start with one important truth, which will serve as a guide for us in this way as the pole star.

Most implementations of microservices are nothing more than distributed monoliths.

Era of Monoliths

Any system began as a monolithic application. I will not go into the details of this topic here - many have not already written a lot about it. However, the lion's share of information about monoliths is devoted to issues such as developer productivity and scalability, leaving behind the brackets the most valuable asset of any Internet company: data.

Typical monolithic application architecture

If the data is so important, then why is all attention paid to any other topics, but not to them? The answer, in general, is simple: because they are not as painful as a data question.
Perhaps the monolith is the only stage in the life cycle of the system where you:

  • Fully understand your data model;
  • You can operate on data consistently (assuming your database is properly selected for your application case).

In terms of data, the monolith is perfect. And since the data is the most valuable asset of any company, it is better not to break the monolith, unless you have a very good reason for this, or a combination of such reasons. In most cases, the decisive cause of this kind is the need to scale up (since we live in the real world with its inherent physical limitations).

When this moment comes, your system most likely goes into a new form: turns into a distributed monolith .

The era of distributed monoliths

Let's say things are going well in your company, and the application needs to evolve. You get bigger clients, and your billing and reporting requirements have changed in terms of both the feature set and its volume.

Taking seriously the demolition of the monolith, in particular, you will try to implement two small services, one of which will provide reporting, and the second - billing. Probably, these new services will provide the HTTP API and have a dedicated database for long-term state storage. After a lot of commits, you, like us in Unbabel , can have something like the following illustration.

Generalized view of the system architecture after detaching the billing and reporting services from the main monolithic application

Everything goes according to plan.

  • The team continues to split the monolith into smaller systems;
  • Continuous integration/delivery conveyors work like a clock;
  • The Kubernetes cluster is healthy, the engineers are productive and satisfied with everything.

Life is beautiful.

But what if I say that heinous conspiracies are lashing against you right now?

Now, looking at your system, you will find that the data was distributed over many different systems. You started from the stage when you had a unique database where all the data objects were stored, and now your data objects have spread to different places. Perhaps, you think, there is no problem in this, since microservices are needed to create abstractions and seal data, hiding the internal complexity of the system.

You are absolutely right. But with the increase in scale, more complex problems arise: now at any time you have to fulfill business requirements (for example, track a certain metric) that require access to data located in more than one system.

What to do? In fact, there are many options. But you are in a hurry, but you need to serve the huge customer fraternity that you have recently registered with, so you have to find a balance between “fast” and “good”. After discussing the details, you decide to build an additional system that would perform some ETL work to help solve the final tasks. This system will need to have access to all read replicas that contain the information you need. The following figure shows how such a system could work.

A generalized example of an analytical ETL system (we called Automatic Translation Analytics in Unbabel)

At Unbabel, we used this approach because:

  • It does not affect the performance of each microservice too much;
  • It does not require major infrastructure changes (just add a new microservice);
  • We were able to fairly quickly meet our business requirements.

Experience suggests that for some time this approach will be operational - until a certain scale is reached. At Unbabel, it served us very well until very recently, when we began to face ever more serious challenges. Here are some things that turned into a headache for us:

1. Data Changes

One of the main advantages of microservices is encapsulation. The internal representation of the data may change, but this does not affect the system’s clients, since they communicate through an external API. However, our strategy required direct access to the internal data presentation, and therefore, the team only needed to make some changes to the data presentation (for example, rename the field or change the type from text to uuid ), we also had to change and redeploy our ETL service.

2. The need to handle many different data schemes

As the number of systems to which we needed to connect increased, we had to deal with more and more numerous non-uniform ways of presenting data. It was obvious that we could not scale all these schemes, the relationships between them and their representations.

The root of all evil

To get a complete picture of what is happening in the system, we had to stop at a monolith-like approach. The difference was that we had not one system and one database, but dozens of such pairs, each with its own presentation of data; Moreover, in some cases the same data was replicated across several systems.

I prefer to call such a system a distributed monolith. Why? Since it is completely unsuitable for tracking changes in the system, the only way to display the state of the system is to build a service that connects directly to the data stores of all microservices.It is interesting to see how many of the colossi of the Internet also faced similar challenges at some point in their development. A good example in this case that I always like to give is the LinkedIn network.

This is exactly the jumble of data that LinkedIn’s data streams represented as around 2011 — the source

At the moment, you may be thinking: “what are you guys going to do with all this?” The answer is simple: you need to start tracking changes and keep track of important actions as they are made.

Breaking a distributed monolith with Event Sourcing

As with virtually the rest of the world, systems on the Internet work by responding to actions. For example, a request to the API may result in the insertion of a new record into the database. Currently, such details do not bother us in most cases, since we are primarily interested in updating the database status. Updating a database state is a result of a certain event (in this case, a request to the API). The phenomenon of the event is simple and, nevertheless, the potential of events is very great - they can even be used to destroy a distributed monolith.

An event is nothing more than the unchanged fact of some modification that has occurred on your system . In the microservice architecture, events become critical and help to comprehend data flows, and based on them - to derive the aggregate state of several systems. Each microservice that performs an action that is interesting from the point of view of the entire system should generate an event along with all the essential information relating to the fact that this event represents.

Perhaps you have a question:
“How can microservices generating events help me with solving the problem of a distributed monolith?”

If you have a system that generates events, then there may be a fact log with the following properties:
  • No binding to any data storage: events are usually serialized using binary formats like JSON, Avro or Protobufs;
  • immutability: once an event has been fired, it cannot be changed;
  • Reproducibility: the state of the system at any given time can be restored; all you need to do is replay the event log.

Using this log, you can display the state using any type of logic at the application level. You are no longer associated with any set of microservices and N ways in which data is presented in them. The single source of truth and your only data source is now the repository in which your events are stored.

Here are some reasons why the event log seems to me to be the tool that helps break the Distributed Monolith:

1. Single source of truth

Instead of maintaining the N data sources that may be needed to connect to (multiple) different types of databases, in this new scenario, the ultimate truth is stored in exactly one repository: the event log.

2. Universal Data Format

In the previous version of the system, we had to deal with a variety of data formats, since we were directly connected to the database. In the new layout, we can act much more flexibly.
Suppose you like a picture from Instagram that was posted by one of your friends. Such an action can be described: “ User X liked the picture P ”. And here is the event representing this fact:

An event corresponding to the AVO approach (Actor, Verb, Object) that simulates the fact that a user has selected the picture they like.

3. Weakening of communication between producers and consumers

Last but not least, one of the greatest benefits of dealing with events is effectively weakening the link between data producers and consumers. This situation not only simplifies scaling, but also reduces the number of dependencies between them. The only contract remaining between the systems in this case is the event chart.
At the beginning of this article the question was posed: Is there such an important truth in which only a few agree with you?

Let me come back to him in the conclusion of this excursion. I suppose most companies do not consider the data as “first-class entities” when they start migrating to microservice architecture. It is argued that all changes to the data can still be carried out through the API, but this approach ultimately leads to a constant complication of the service itself.

I believe that the only right approach to capturing data changes in the microservice architecture is to make the systems generate events according to a strictly defined contract. Having a properly compiled event log, you can display a set of data based on any set of business requirements. In this case, you just have to apply different rules to the same facts. In some cases, this data fragmentation can be avoided if your company (especially your product managers) treats the data as a product. However, this is a topic for another article.

Source text: [Translation] Your distributed monoliths weave intrigues behind your back