Service-to-system communication paradigm in the service-oriented architecture

Anton Mishchuk
Nerd For Tech
Published in
12 min readJan 19, 2021

--

A service-oriented architecture (SOA) is a common way to “decrease” the complexity of the software system by splitting it into several parts. I put “decrease” into quotes because actually the complexity always increases. Splitting the system into separate well-defined components is a logical extension of the splitting system’s functionality into domains. When a system becomes too big to fit stakeholders’ (engineers, product managers, etc) heads, the only way to handle its complexity is to split it into parts. Each stakeholder will then have detailed knowledge about its own part and can manage this part properly. Such separation doesn’t actually decrease the complexity of the system itself but rather decreases the complexity of its development and maintenance. There is always a “communication layer” and it has its own complexity.

This extra complexity is a stumbling block on the road from monolith to SOA. But, in today’s reality, however, the software system behind even an average size business is quite a complex thing, so, actually, there is no other way. You have to do the split at a certain stage of the evolution of the system.

The article, therefore, is not about monolith vs SOA. It is about the complexity of the “communication layer” you’ll have in your system after you’ll switch (finally) to SOA.

When you have made the decision, there are lots of new questions: which topology to choose, should we use synchronous or asynchronous communication, do we need message queues, etc. These questions are quite serious ones, that is why there are so many books and articles about them. And these questions are typically outside the domain of your application, they are about non-functional requirements, that is why sometimes we can see a separate role like an architect (in a narrow sense) who takes responsibility for the answers to these questions.

But the most important thing is that moving from monolith to SOA we usually allow each service to interact with its neighbors as it was when they were inside the monolith. And we continue to build our system in the paradigm of service-to-service communication. And this is a mistake that will bring constantly increasing complexity later.

The article is starting from the discussion of what this complexity is, how it appears, and it is located. Then we’ll consider a couple of infrastructure questions that arise when we try to eliminate the complexity: “what topology is better” and “do we need message queues”. The choices are analyzed in terms of extra-complexity that appears in the system. I’m going to show that the infrastructure of communication doesn’t really matter. It, of course, helps a lot in other aspects. But, if you still follow the service-to-service paradigm, there will be always a quickly growing complexity that can’t be solved on the infrastructure level. The only way to bring the growing complexity under control is to shift to the service-to-system paradigm of communication. And in the final part of the article, I’m demonstrating how the new paradigm may be implemented and how this paradigm forces engineers to use universal language (ubiquitous language) in the code and in human-language communication.

What is the complexity of communication?

Let’s consider an artificial software system consisting of three services: Foo, Bar, Baz. For simplicity, it’s a very symmetrical case: services are quite equal in terms of internal complexity and each service communicates with all other services. There is no “infrastructure” logic, each service can directly access others, e.g. they communicate via REST API.

The green bars represent the additional logic (complexity) that has to be implemented in each service in order to communicate with others. For example, Foo -> Bar is a part of logic that is implemented in Foo in order to communicate with Bar. It’s not just code that wraps HTTP requests, such the wrapping may be implemented with a client library provided by engineers from the Bar service. It is mostly about putting together abstractions (domain knowledge) from the Foo and Bar services. An engineer who is responsible for Foo has to know a bit about Bar, and have to integrate some abstractions of Bar into Foo.

For example, your customer management service communicates with the billing one in order to create invoices. The engineer responsible for the first service has to know some aspects of the billing and has to manage the related functionality in her own code. In other words, the engineer knows everything about its own domain (customer management) and has to know “a little bit” about other services. In terms of Foo-Bar-Baz, an engineer from Foo perfectly knows her own (Foo) domain and, at the same time, has to manage the “partial knowledge” about Bar and Baz.

This “partial knowledge” shapes a core of the complexity of Foo->Bar and Foo->Baz communication. A Foo engineer has to understand the interfaces of Bar and Baz, be aware of their changes, and modify the Foo code accordingly.

In general, Foo->Bar and Bar->Foo are not symmetrical. It is quite obvious if we think in terms of “partial knowledge”. In Foo->Bar you know Foo well and the interface of Bar, but in Bar->Foo, you know Bar well and the interface of Foo. Therefore Foo->Bar and Bar->Foo are quite different complexities and can’t replace each other.

The communication infrastructure is extremely simple in the example above. It’s not a big deal to run services on different machines (containers) and allow them to communicate through the network. The real complexity lies in another dimension — in knowledge about some specifics of the services inside the system. This knowledge actually somehow existed in engineers’ heads before you’ve split the monolith into services. Folks that were working on the Foo part of the monolith were perfectly aware of the functionality of Bar and Baz parts. But when you’ve split the system there are no direct calls of methods and functions anymore, it is now abstracted through, e.g. JSON API or custom client libraries. And your colleagues from Bar and Baz now sit in a separate room.

To summarize the above, if you have N communication directions in your SOA system, the complexity of the communication layer is basically the sum of N complexities of each direction. But, please, don’t consider the statement too mathematically. We defined the complexity in terms of “partial knowledge”, there is no proper way to calculate the sum.

From mesh to star

Let’s see what happens with the complexity if we change the topology of the communication. The opposite case for mesh is a star. The motivation is typically the following. In the mesh topology, each service knows too much, let’s create a central hub (or broker, or very simple API gateway) that will handle the communication between services. Therefore each service will know only about the Hub. Сonsider the case when the hub is just a very simple router without any business logic.

Note that this figure differs slightly from the previous one. There are no bars on the Hub side, and there are no arrows (directions in the communication). That is because communication through a middleman is a bit different. For Foo-Hub, for example, there is no actual need in communication with Hub itself. Hub is just a proxy. We have an illusion here that with just simple topology changes we have helped our engineers. But actually, we haven’t.

All the requests to Hub go to some services hidden behind the hub. And the Foo engineer is pretty aware of where each request goes, how they should be formatted, and so on. So she always has some knowledge about the domains she interacts with. The complexity related to the “partial knowledge” remains almost the same (the Foo-Hub yellow bar is equal to two green bars Foo->Bar and Foo->Baz).

Therefore the overall communication complexity remains at least the same as with the mesh approach. Plus you have a new non-functional component in your system which also may not be very simple.

If we add a bit more business logic to the hub we can create abstractions that will introduce a common interface for other services. And this will actually help to handle the complexity because in this case we also have a paradigm shift from service-to-service to service-to-api-gateway. And only the last (but huge) step remains to the final goal. But this is discussed later.

Message queues, channels, enterprise service bus, and other friends

Message queues (MQ) with all variations are very popular tools that pretend to simplify the communication between the services. They differ from the routing hub because of their own abstractions: queues, channels, etc. And these abstractions leak out to the system logic itself: their names and types become a part of the ubiquitous language of the system. Instead of operating directly with entities from particular domains, one interacts with queues and channels.

As in the case with Hub, it creates a sense that you interact with a single communication component and, therefore, the complexity might decrease. But, again, it’s just an illusion. Behind the scene, there is a transformation of knowledge between queues and services. On the one hand, you operate only with queues, you “don’t see” other services. But on the other hand, you have to manage knowledge about queues functionality — how they are connected and processed in the scope of particular services you interact with.

One may think now that I consider hubs and queues as an absolute evil that just conceals the true complexity and introduces its own. This is not my point. All the mentioned approaches are good and even necessary when you are thinking about scalability and maintainability. You may have some disbalances in your system that might be fixed only with a new non-functional component. For example, some services in the system may be overwhelmed by communication (objective reasons or just poor design). In that case, queues are a good solution for balancing communication. And, of course, it’s much easier to install new services into the system that have a well-designed communication unit.

My point is that all the additional non-functional components just add complexity to the system as a whole. And, it’s even more important, from the engineering point of view, they add complexity to the communication of a particular service. An engineer still has to handle the “partial knowledge” about other services, plus she should understand the non-functional logic.

Service-to-system paradigm

The only way to make things simpler is to shift the paradigm a bit. The starting point is the same as with hubs and queues — we are going to introduce more universal abstraction. Thus, from the point of view of a particular service, all the other services are replaced by a meta-service. And communication occurs only between the service and meta-service.

What is the beast the Meta-service? Well, actually, it’s the whole system itself! And the idea of the paradigm shift is to forget about individual parts of the system and to interact with the system as a whole.

To give more intuition about that and to show that the shift is not a big deal, let’s consider how the system is perceived by stakeholders from the business side (product managers, and other non-engineers). They have no idea how many services in the system and how they interact with each other. They just know the external interface of the system — the entities that exist in it and actions that can be performed with the entities.

The idea is to translate such thinking to the engineering side. And such thinking, actually, already exists. Remember meetings with your colleagues from other departments where you discuss big things — decisions that affect all the services in the system. Because everyone has just a “partial knowledge” about other services, you tend to discuss functionality on a very high level that is meaningful for all the participants — on the system level. And you, as an engineer responsible for the particular service, perfectly understand which concepts relate to your service or services you integrated with. So, actually, we are all well-aware of the interface of the system as a whole and are able to think in such categories.

But what happens when you return back to your laptop? You also return back to the service-to-service paradigm, you translate this high-level language into the language of communication with particular services. For example, in order to send the invoice, you have to get billing data from one service, some user-specific configurations from another, and also do not forget to push notification to the activity-log queue. The only way of not doing such extra-translation is to fully detach yourself from such details. The only thing you should know is that there is an interface of the system that allows you to get billing, get configuration, and post notification. And you don’t care how this Interface is implemented.

If you are fully abstracted from particular services and interact with the system as a whole, the only complexity is the integration with the meta-service. And the “partial knowledge” you have about the meta-service is quite universal knowledge about the system itself.

How can we achieve that?

The first candidate for the role of meta-service is a well-designed API gateway. Indeed, if we transform the simple routing hub into the general API gateway with the well-defined (and well-documented) interface, this will create the meta-service. It will be a single point to interact with the system, its documentation will be the single source of truth about entities and actions. Therefore it will represent the interface for the whole system.

What a particular engineer should know then is just the gateway interface and how her particular service interacts with it. And she doesn’t care about the other service, the topology of the communication layer, names of queues, and other boring stuff.

There is also a sociological aspect of existing of the interface as a well-documented API gateway. This documentation (endpoint specifications and their human-language descriptions) becomes a source of the ubiquitous language for engineers. Here we are talking about the language of internal communication, it might have abstractions that are quite far from business reality. But, ideally, in order to make it really ubiquitous, we may (or even must) keep the internal language very close to the business one.

At this point, one may notice a small step backward to the monolith. We have one very-very important service and all others are strongly dependent on it. And that is, actually, true. We’ve indeed coupled our services to this meta-service quite seriously. But I consider this as a trade-off between reducing complexity and flexibility. Flexibility is critical at the early stages, but if you wanna build something huge you’ll have to decrease the number of degrees of freedom in order to understand what is going on.

Note that such the gateway is a separate application. And it might be quite a complex component that is very coupled both with domain and infrastructure logic.

There is, however, another approach. We can implement this meta-service without any separately running application. It can be completely virtual (“meta”). This would be possible in the systems with services written in the same programming language. You can create a library (package) with the interface expressed in terms of code. This library is then included as a dependency to every service in the system. And all the communication goes through the library.

The library is also quite a complex software product. Actually, the complexity is the same as the gateway above. But there are two big advantages. Because engineers are forced to use the library, it will propagate the ubiquitous language more actively than endpoints specifications. The language of the programmatic interface of the library will get into the code and the human language of the engineers.

The second advantage is a more aesthetic one. The universal interface that is expressed only by a program. A single source of truth that can be used in your code, can be tested and can be put into the version control system.

In my next post, I’ll share some implementation details of the library that my colleagues and I implemented for the system in the insurance domain.

--

--