Designing a scalable application with Elixir: from umbrella project to distributed system
Elixir/Erlang OTP abstractions enforce developers to split programs into independent parts. While “gen_servers” encapsulate parts of business logic on micro-level, “applications” present a more general (“service”) part of the system. Complex programs written in Elixir are always a collection of communicating OTP applications.
The main question appeared while developing such programs is how to split the complex system into separate parts. But the more important problem is how to organize communication between them.
In the article, I would share design principles I follow when creating more or less complex Elixir project. We will discuss how to split the project into small maintainable microservices (Elixir applications) and how to organize modules inside them using “contexts”.
But the main focus will be on designing of flexible interfaces between Elixir applications. You will see how they may be changed while scaling from simple umbrella project to distributed system. I’ll cover a few approaches: Erlang Remote procedure call, Distributed tasks, and HTTP protocol. And, as a bonus, I’ll show how one can limit concurrent access to microservices.
Umbrella project
With Elixir “umbrella project” one can split the complex logic into separate parts at the very beginning of development process. But at the same, time it allows keeping all the logic in one repo. So you can start to develop future microservices with a minimal headache.
I’ve prepared a scaffold demo project in order to demonstrate real code examples. The name of the project is “ml_tools” which states for “Machine Learning Tools”. The project allows users to apply different predictive models to their datasets and to choose the best one. Users should be able to apply different algorithms to their datasets and visualize the results.
The division of the project into several applications is quite obvious from the requirements:
datasets
— application responsible for managing data: create, read and updates datasets.utils
— a set of different utility services that preprocess and visualize data.models
— an service that implements different algorithms for predictive modeling. “Linear model”, “random forest”, “support vector machine”, etc.main
— top-level application that utilizes other applications and exposes top-level API.
Each application is started under its own supervisor, so, acts as independent service.
— — project structure — —
apps/
datasets/
lib/
datasets/
fetchers/
fetchers.ex
aws.ex
kaggle.ex
collections/
...
interfaces/
fetchers.ex
collections.ex
models/
utils/
main/
...
Having divided top-level responsibility into several parts now let’s explore each service in details. Inside every application, we need to split the code into modules or sets of modules. I prefer to define high-level modules based on contexts that are present in a specific application.
For example, datasets
application is responsible for storing collections of data in its own database and also for fetching data from different sources. So the application will have two folders in lib/datasets
directory: “collections” and “fetchers”. Each folder has .ex
file with the same name that contains a module which implements context interface and other utility modules.
Take a look at lib/datasets/fetchers
. The folder has Datasets.Fetchers
module which implements an interface for “fetchers” context — functions which return data from “AWS Public Datasets” and “Kaggle Datasets”. So, besides this module the are Datasets.Fetchers.Aws
and Datasets.Fetchers.Kaggle
that will implement access to the specific source.
The same context related division may be implemented in other applications. models
are split by a specific algorithm: Models.Lm
(Linear model) or Models.Rf
(Random Forest). utils
implements data pre-processing (Utils.PreProcessing
) and visualization (Utils.Visualization
).
And, there is, of course, a top-level (Main
) application that utilizes all the microserviсes. This application also has several contexts: Main.Zillow
module for Zillow competition related code, and Main.Screening
module for Passenger Screening Algorithm Challenge.
The Main application has other application as dependencies in Main.Mixfile
:
defp deps do
[
{:datasets, in_umbrella: true},
{:models, in_umbrella: true},
{:utils, in_umbrella: true}
]
end
Which makes the modules from different applications are available inside main
application.
So, in general, there are three levels of code organization in Elixir project:
- “Service level” — the most obvious way to split the complex system into separate Elixir applications (
datasets
,models
,utils
). - “Context level” — breaks responsibility inside particular service by implementing “context modules” (
Datasets.Fetchers
,Datasets.Collections
). - “Implementation level” — particular modules that define data-structures and functions (
Datasets.Fetchers.Aws
,Datasets.Fetchers.Kaggle
)
Umbrella project pros and cons
As mentioned above the main advantage of using “umbrella project” is that you have all the code in one place and can run it together in the development and test environment. You may play around with the whole system and, most important, write integration tests that will test components altogether. This is very important at the early stage of project development!
At the same time, you project is already split into relatively independent parts and ready for scaling.
Compare this with an approach in many other programming languages where you usually start from monolith project and then try to extract some parts to separate application. Because starting from micro-service approach tremendously complicates the development process.
But it’s time to start worrying about encapsulation!
You may have noticed that idea with including all the apps into main application dependencies is not so good. And you are right!
Elixir language doesn’t have enough constructions for proper encapsulation. There are only modules and functions (public and private). If you add another project as a dependency all the modules will be available for you, so you can call any public function. And a naive implementation of Zillow data fitting in themain
application will look like:
defmodule Main.Zillow do
def rf_fit do
Datasets.Fetchers.zillow_data
|> Utils.PreProcessing.normalize_data
|> Models.Rf.fit_model
end
end
Where Datasets.Fetchers
, Utils.PreProcessing
and Models.Rf
are modules from different applications. This freedom of thoughtless using of modules from another application will couple your services and turn the system back into a monolith!
So, there are two sides. We still want to have all the parts of the project to be accessible during development and test. But we need somehow forbid cross-application coupling.
The only way to do so is creating conventions about which functions from one application may be used in another one. And the best way is extracting all “public” functions into separate “interfaces” modules.
Interface modules
The idea is to move all the “public” application’s functions (functions that can be called by other applications) into separate modules. For example, datasets
application has special “interface” module for Fetchers
’ functions:
defmodule Datasets.Interfaces.Fetchers do
alias Datasets.Fetchers
defdelegate zillow_data, to: Fetchers
defdelegate landsat_data, to: Fetchers
end
In this simple implementation, the interface module just delegates function calls to the corresponding module. But, in the future, when we’ve decided to extract run datasets application on another node, this module will have the main part of communication logic.
Doing so with other application we can rewrite Main.Zillow module:
def rf_fit do
Datasets.Interfaces.Fetchers.zillow_data
|> Utils.Interfaces.PreProcessing.normalize_data
|> Models.Interfaces.Rf.fit_model
end
Generally speaking, the convention is: if you want call a function from another application you must do this through “interface” module.
This approach still allows easy development and testing but creates set of simple rules which protect the code from tight coupling and creates a basis for future scaling!
Scale to distributed system
Imagine that data processing become time-consuming so we decide to run models
on a separate node. So we need to remove {:models, in_umbrella: true}
dependency and run that application on another node.
If you run Elixir console (iex -S mix
) from the main
application folder you won’t have access to models
application modules anymore:
iex(1)> Models.Interfaces.Rf.fit_model(“data”)
** (UndefinedFunctionError) function Models.Interfaces.Rf.fit_model/1 is undefined (module Models.Interfaces.Rf is not available)
The code of models
application is still inside umbrella project but it is not run with the main
application so is not accessible. The models
modules and functions exist only on another node which runs this application only.
But, you know, BEAM VM designed for the distributed applications, so there are many ways to access the code run on an another machine.
:rpc
It is easy to run a function on remote node using Erlang :rpc
module. :rpc
uses Erlang Distribution Protocol for the communication between nodes.
One may reproduce simple experiment: run themain
project with --sname main
option in one terminal tab
iex --sname main -S mix
and models
project in another tab:
iex --sname models -S mix
Now you can run calculations:
iex(main@ip-192–168–1–150)1> :rpc.call(:”models@ip-192–168–1–150", Models.Interfaces.Rf, :fit_model, [“data”])%{__struct__: Models.Rf.Coefficient, a: 1, b: 2, data: “data”}
So what changes we need to make in our project to utilize this approach?
The idea is very simple, we need to add one more application to our project which implements communication logic — models_interface
.
models_interface/
config/
lib/
models_interface/
models_interface.ex
lm.ex
rf.ex
mix.ex
This is a very thin layer that helps main
to access the Models.Interface
functions. There a couple of small modules that just duplicate functions from Interfaces
modules:
defmodule ModelsInterface.Rf do
def fit_model(data) do
ModelsInterface.remote_call(Models.Interfaces.Rf, :fit_model, [data])
end
end
This module just calls Models.Interfaces.Rf.fit_model/1
function. The implementation of remote_call
is in ModelsInterface
module:
defmodule ModelsInterface do
def remote_call(module, fun, args, env \\ Mix.env) do
do_remote_call({module, fun, args}, env)
end
def remote_node do
Application.get_env(:models_interface, :node)
end
defp do_remote_call({module, fun, args}, :test) do
apply(module, fun, args)
end
defp do_remote_call({module, fun, args}, _) do
:rpc.call(remote_node(), module, fun, args)
end
end
The module gets node location from the configuration and does remote procedure call. You might see environment specific implementation of do_remote_call
, this allows to simplify testing process, we will discuss this later.
The next quick refactoring: just replace Models.Interfaces
with ModelsInterface
and we are done! Just don’t forget add models_interface
application to the dependencies of main
application.
defp deps do
[
{:datasets, in_umbrella: true},
{:models, in_umbrella: true, only: [:test]},
{:models_interface, in_umbrella: true},
{:utils, in_umbrella: true},
{:espec, "1.4.6", only: :test}
]
end
Again, I left models
dependency, but only in test environment. This allows making a direct calls to the application in test environment.
That’s it. No we are able to access models
via iex
console:
iex(main@ip-192–168–1–150)1> ModelsInterface.Rf.fit_model(“data”)%{__struct__: Models.Rf.Coefficient, a: 1, b: 2, data: “data”}
Let’s summarize! The only change we did is a new simple interfacing application. We still have all the code in one place and we still have all the tests passed!
Distributed tasks
Direct remote procedure calls are useful if you need a simple synchronous interface with another application. But if you want to effectively run asynchronous code on the remote node you’d better choose Distributed tasks.
Elixir has a specific Task.Supervisor
which can be used to dynamically supervise tasks. This supervisor will start inside the remote application and supervise tasks that execute code. Let’s use Distributed tasks for accessing datasets
application!
First of all, we need to add Task.Supervisor to children of datasets
application supervisor:
defmodule Datasets.Application do
@moduledoc false
use Application
import Supervisor.Spec
def start(_type, _args) do
children = [
supervisor(Task.Supervisor,
[[name: Datasets.Task.Supervisor]],
[restart: :temporary, shutdown: 10000])
]
opts = [strategy: :one_for_one, name: Datasets.Supervisor]
Supervisor.start_link(children, opts)
end
end
The DatasetsInterface
module (which is the separate interfacing application):
defmodule DatasetsInterface do
def spawn_task(module, fun, args, env \\ Mix.env) do
do_spawn_task({module, fun, args}, env)
end
defp do_spawn_task({module, fun, args}, :test) do
apply(module, fun, args)
end
defp do_spawn_task({module, fun, args}, _) do
Task.Supervisor.async(remote_supervisor(), module, fun, args)
|> Task.await
end
defp remote_supervisor do
{
Application.get_env(:datasets_interface, :task_supervisor),
Application.get_env(:datasets_interface, :node)
}
end
end
So we use async/await pattern here. The difference is: tasks are spawned on the remote node and are supervised by remote supervisor. The name and location of the supervisor are set in the configuration file:
config :datasets_interface,
task_supervisor: Datasets.Task.Supervisor,
node: :"models@ip-192-168-1-150"
And, again, there is the same trick with test environment!
Other protocols
RPC and Distributed tasks are built-in Erlang/Elixir abstractions that allow communicate using Elixir term without any additional serialization and deserialization. But if need to communicate with applications that are not written in Elixir you need more common approach such as HTTP protocol.
As an example, let’s implement simple HTTP interface for our utils
application. Again, the first thing we need is a new utils_interface
application:
UtilsInterface
module has the similar structure with ModelsInterface
but the do_remote_call/2 looks like:
defp do_remote_call({module, fun, args}, _) do
{:ok, resp} = HTTPoison.post(remote_url(),
serialize({module, fun, args}))
deserialize(resp.body)
end
For this example I’ve used simple Erlang term_to_binary
and binary_to_term
serialization:
defp serialize(term), do: :erlang.term_to_binary(term)
defp deserialize(data), do: :erlang.binary_to_term(data)
The utils
project needs HTTP server to listen to external requests. I’ve used cowboy
with plug
for this
defp deps do
[
{:cowboy, "~> 1.0.0"},
{:plug, "~> 1.0"},
{:espec, "1.4.6", only: :test}
]
end
The plug module which is responsible for handling requests:
defmodule Utils.Interfaces.Plug do
use Plug.Router
plug :match
plug :dispatch
post "/remote" do
{:ok, body, conn} = Plug.Conn.read_body(conn)
{module, fun, args} = deserialize(body)
result = apply(module, fun, args)
send_resp(conn, 200, serialize(result))
end
end
It just deserializes {module, fun, args}
tuple, does function call and sends a result back to the client.
And, don’t forget to start the “plug” via cowboy
server in utils
application
children = [
Plug.Adapters.Cowboy.child_spec(:http,
Utils.Interfaces.Plug, [], [port: 4001])
]
Please note, that it is not a good practice to call functions directly from deserialized data. I did it only to simplify the example. In the real world, you need more sophisticated approach!
Limiting concurrency with poolboy
The last feature I wanna describe in the post allows you to protect your application and its resources from “overflowing”. Imagine, for example, that models
application use quite a lot of memory for model fitting. So we want to limit the number of clients that want to access models
application. To do this we will create a limited pool of worker processes on the interface level using the poolboy
library.
poolboy
needs to be started byapplication supervisor:
defmodule Models.Application do
use Application
def start(_type, _args) do
pool_options = [
name: {:local, Models.Interface},
worker_module: Models.Interfaces.Worker,
size: 5, max_overflow: 5]
children = [
:poolboy.child_spec(Models.Interface, pool_options, []),
]
opts = [strategy: :one_for_one, name: Models.Supervisor]
Supervisor.start_link(children, opts)
end
end
You may see poolboy
options here: name of supervisor, worker module, size of a pool, and max_overflow.
The worker module is a simple GenServer which just calls corresponding function:
defmodule Models.Interfaces.Worker do
use GenServer
def start_link(_opts) do
GenServer.start_link(__MODULE__, :ok, [])
end
def init(:ok), do: {:ok, %{}}
def handle_call({module, fun, args}, _from, state) do
result = apply(module, fun, args)
{:reply, result, state}
end
end
And the last change is in Models.Interfaces.Rf
module. Instead of function delegation, it will spawn worker process inside pool:
defmodule Models.Interfaces.Rf do
def fit_model(data) do
with_poolboy({Models.Rf, :fit_model, [data]})
end
def with_poolboy(args) do
worker = :poolboy.checkout(Models.Interface)
result = GenServer.call(worker, args, :infinity)
:poolboy.checkin(Models.Interface, worker)
result
end
end
That’s it! Now you are absolutely sure that models
application can handle the only limited number of requests.
Conclusion
As a conclusion I wanna give you some recommendations:
- Start with microservices from the very beginning. It is very easy to do with Elixir umbrella project.
- Use “context” and “implementation” modules to organize logic inside an application.
- Think carefully about application’s interfaces. Do not allow direct calls to implementation functions between applications.
- When scaling to distributed system, place “communication” logic into the separate application. Use Erlang Distribution Protocol for communication between BEAM applications
I hope, approaches and abstractions described in the article will help you to write better code with Elixir!
Hit the 👏 if you enjoyed the article and do not hesitate to contact me if you have questions or proposals!
Have a wonderful week,
Anton