Designing a scalable application with Elixir: from umbrella project to distributed system

Published in

Matic

11 min readDec 20, 2017

Elixir/Erlang OTP abstractions enforce developers to split programs into independent parts. While “gen_servers” encapsulate parts of business logic on micro-level, “applications” present a more general (“service”) part of the system. Complex programs written in Elixir are always a collection of communicating OTP applications.

The main question appeared while developing such programs is how to split the complex system into separate parts. But the more important problem is how to organize communication between them.

In the article, I would share design principles I follow when creating more or less complex Elixir project. We will discuss how to split the project into small maintainable microservices (Elixir applications) and how to organize modules inside them using “contexts”.

But the main focus will be on designing of flexible interfaces between Elixir applications. You will see how they may be changed while scaling from simple umbrella project to distributed system. I’ll cover a few approaches: Erlang Remote procedure call, Distributed tasks, and HTTP protocol. And, as a bonus, I’ll show how one can limit concurrent access to microservices.

Umbrella project

With Elixir “umbrella project” one can split the complex logic into separate parts at the very beginning of development process. But at the same, time it allows keeping all the logic in one repo. So you can start to develop future microservices with a minimal headache.

I’ve prepared a scaffold demo project in order to demonstrate real code examples. The name of the project is “ml_tools” which states for “Machine Learning Tools”. The project allows users to apply different predictive models to their datasets and to choose the best one. Users should be able to apply different algorithms to their datasets and visualize the results.

The division of the project into several applications is quite obvious from the requirements:

datasets — application responsible for managing data: create, read and updates datasets.
utils — a set of different utility services that preprocess and visualize data.
models — an service that implements different algorithms for predictive modeling. “Linear model”, “random forest”, “support vector machine”, etc.
main — top-level application that utilizes other applications and exposes top-level API.

Each application is started under its own supervisor, so, acts as independent service.

— — project structure — —

apps/
  datasets/
    lib/
      datasets/
        fetchers/
          fetchers.ex
          aws.ex
          kaggle.ex
        collections/
          ...
        interfaces/
          fetchers.ex 
          collections.ex
  models/
  utils/
  main/
...

Having divided top-level responsibility into several parts now let’s explore each service in details. Inside every application, we need to split the code into modules or sets of modules. I prefer to define high-level modules based on contexts that are present in a specific application.

For example, datasets application is responsible for storing collections of data in its own database and also for fetching data from different sources. So the application will have two folders in lib/datasets directory: “collections” and “fetchers”. Each folder has .ex file with the same name that contains a module which implements context interface and other utility modules.

Take a look at lib/datasets/fetchers . The folder has Datasets.Fetchers module which implements an interface for “fetchers” context — functions which return data from “AWS Public Datasets” and “Kaggle Datasets”. So, besides this module the are Datasets.Fetchers.Aws and Datasets.Fetchers.Kaggle that will implement access to the specific source.

The same context related division may be implemented in other applications. models are split by a specific algorithm: Models.Lm (Linear model) or Models.Rf (Random Forest). utils implements data pre-processing (Utils.PreProcessing) and visualization (Utils.Visualization).

And, there is, of course, a top-level (Main) application that utilizes all the microserviсes. This application also has several contexts: Main.Zillow module for Zillow competition related code, and Main.Screening module for Passenger Screening Algorithm Challenge.

The Main application has other application as dependencies in Main.Mixfile:

defp deps do
  [
    {:datasets, in_umbrella: true},
    {:models, in_umbrella: true},
    {:utils, in_umbrella: true}
  ]
end

Which makes the modules from different applications are available inside main application.

So, in general, there are three levels of code organization in Elixir project:

“Service level” — the most obvious way to split the complex system into separate Elixir applications (datasets, models, utils).
“Context level” — breaks responsibility inside particular service by implementing “context modules” (Datasets.Fetchers, Datasets.Collections).
“Implementation level” — particular modules that define data-structures and functions (Datasets.Fetchers.Aws, Datasets.Fetchers.Kaggle)

Umbrella project pros and cons

As mentioned above the main advantage of using “umbrella project” is that you have all the code in one place and can run it together in the development and test environment. You may play around with the whole system and, most important, write integration tests that will test components altogether. This is very important at the early stage of project development!

At the same time, you project is already split into relatively independent parts and ready for scaling.

Compare this with an approach in many other programming languages where you usually start from monolith project and then try to extract some parts to separate application. Because starting from micro-service approach tremendously complicates the development process.

But it’s time to start worrying about encapsulation!

You may have noticed that idea with including all the apps into main application dependencies is not so good. And you are right!

Elixir language doesn’t have enough constructions for proper encapsulation. There are only modules and functions (public and private). If you add another project as a dependency all the modules will be available for you, so you can call any public function. And a naive implementation of Zillow data fitting in themain application will look like:

defmodule Main.Zillow do
  def rf_fit do
    Datasets.Fetchers.zillow_data
    |> Utils.PreProcessing.normalize_data
    |> Models.Rf.fit_model
  end
end

Where Datasets.Fetchers, Utils.PreProcessing and Models.Rf are modules from different applications. This freedom of thoughtless using of modules from another application will couple your services and turn the system back into a monolith!

So, there are two sides. We still want to have all the parts of the project to be accessible during development and test. But we need somehow forbid cross-application coupling.

The only way to do so is creating conventions about which functions from one application may be used in another one. And the best way is extracting all “public” functions into separate “interfaces” modules.

Interface modules

The idea is to move all the “public” application’s functions (functions that can be called by other applications) into separate modules. For example, datasets application has special “interface” module for Fetchers’ functions:

defmodule Datasets.Interfaces.Fetchers do
  alias Datasets.Fetchers

  defdelegate zillow_data, to: Fetchers
  defdelegate landsat_data, to: Fetchers
end

In this simple implementation, the interface module just delegates function calls to the corresponding module. But, in the future, when we’ve decided to extract run datasets application on another node, this module will have the main part of communication logic.

Doing so with other application we can rewrite Main.Zillow module:

def rf_fit do
  Datasets.Interfaces.Fetchers.zillow_data
  |> Utils.Interfaces.PreProcessing.normalize_data
  |> Models.Interfaces.Rf.fit_model
end

Generally speaking, the convention is: if you want call a function from another application you must do this through “interface” module.

This approach still allows easy development and testing but creates set of simple rules which protect the code from tight coupling and creates a basis for future scaling!

Scale to distributed system

Imagine that data processing become time-consuming so we decide to run models on a separate node. So we need to remove {:models, in_umbrella: true} dependency and run that application on another node.

If you run Elixir console (iex -S mix) from the main application folder you won’t have access to models application modules anymore:

iex(1)> Models.Interfaces.Rf.fit_model(“data”)
** (UndefinedFunctionError) function Models.Interfaces.Rf.fit_model/1 is undefined (module Models.Interfaces.Rf is not available)

The code of models application is still inside umbrella project but it is not run with the main application so is not accessible. The models modules and functions exist only on another node which runs this application only.

But, you know, BEAM VM designed for the distributed applications, so there are many ways to access the code run on an another machine.

:rpc

It is easy to run a function on remote node using Erlang :rpc module. :rpc uses Erlang Distribution Protocol for the communication between nodes.

One may reproduce simple experiment: run themain project with --sname main option in one terminal tab

iex --sname main -S mix

and models project in another tab:

iex --sname models -S mix

Now you can run calculations:

iex(main@ip-192–168–1–150)1> :rpc.call(:”models@ip-192–168–1–150", Models.Interfaces.Rf, :fit_model, [“data”])%{__struct__: Models.Rf.Coefficient, a: 1, b: 2, data: “data”}

So what changes we need to make in our project to utilize this approach?

The idea is very simple, we need to add one more application to our project which implements communication logic — models_interface.

models_interface/ 
  config/
  lib/
    models_interface/
      models_interface.ex
        lm.ex
        rf.ex
    mix.ex

This is a very thin layer that helps main to access the Models.Interface functions. There a couple of small modules that just duplicate functions from Interfaces modules:

defmodule ModelsInterface.Rf do
  def fit_model(data) do
    ModelsInterface.remote_call(Models.Interfaces.Rf, :fit_model, [data])
  end
end

This module just calls Models.Interfaces.Rf.fit_model/1 function. The implementation of remote_call is in ModelsInterface module:

defmodule ModelsInterface do
  def remote_call(module, fun, args, env \\ Mix.env) do
    do_remote_call({module, fun, args}, env)
  end

  def remote_node do
    Application.get_env(:models_interface, :node)
  end

  defp do_remote_call({module, fun, args}, :test) do
    apply(module, fun, args)
  end
  
  defp do_remote_call({module, fun, args}, _) do
    :rpc.call(remote_node(), module, fun, args)
  end
end

The module gets node location from the configuration and does remote procedure call. You might see environment specific implementation of do_remote_call , this allows to simplify testing process, we will discuss this later.

The next quick refactoring: just replace Models.Interfaces with ModelsInterface and we are done! Just don’t forget add models_interface application to the dependencies of main application.

defp deps do
  [
    {:datasets, in_umbrella: true},
    {:models, in_umbrella: true, only: [:test]},
    {:models_interface, in_umbrella: true},
    {:utils, in_umbrella: true},
    {:espec, "1.4.6", only: :test}
  ]
end

Again, I left models dependency, but only in test environment. This allows making a direct calls to the application in test environment.

That’s it. No we are able to access models via iex console:

iex(main@ip-192–168–1–150)1> ModelsInterface.Rf.fit_model(“data”)%{__struct__: Models.Rf.Coefficient, a: 1, b: 2, data: “data”}

Let’s summarize! The only change we did is a new simple interfacing application. We still have all the code in one place and we still have all the tests passed!

Distributed tasks

Direct remote procedure calls are useful if you need a simple synchronous interface with another application. But if you want to effectively run asynchronous code on the remote node you’d better choose Distributed tasks.

Elixir has a specific Task.Supervisor which can be used to dynamically supervise tasks. This supervisor will start inside the remote application and supervise tasks that execute code. Let’s use Distributed tasks for accessing datasets application!

First of all, we need to add Task.Supervisor to children of datasets application supervisor:

defmodule Datasets.Application do
  @moduledoc false

  use Application
  import Supervisor.Spec

  def start(_type, _args) do
    children = [
      supervisor(Task.Supervisor, 
        [[name: Datasets.Task.Supervisor]],
        [restart: :temporary, shutdown: 10000])
    ]

    opts = [strategy: :one_for_one, name: Datasets.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

The DatasetsInterface module (which is the separate interfacing application):

defmodule DatasetsInterface do
  def spawn_task(module, fun, args, env \\ Mix.env) do
    do_spawn_task({module, fun, args}, env)
  end

  defp do_spawn_task({module, fun, args}, :test) do
    apply(module, fun, args)
  end

  defp do_spawn_task({module, fun, args}, _) do
    Task.Supervisor.async(remote_supervisor(), module, fun, args)
    |> Task.await
  end

  defp remote_supervisor do
    {
      Application.get_env(:datasets_interface, :task_supervisor),
      Application.get_env(:datasets_interface, :node)
    }
  end
end

So we use async/await pattern here. The difference is: tasks are spawned on the remote node and are supervised by remote supervisor. The name and location of the supervisor are set in the configuration file:

config :datasets_interface,
       task_supervisor: Datasets.Task.Supervisor,
       node: :"models@ip-192-168-1-150"

And, again, there is the same trick with test environment!

Other protocols

RPC and Distributed tasks are built-in Erlang/Elixir abstractions that allow communicate using Elixir term without any additional serialization and deserialization. But if need to communicate with applications that are not written in Elixir you need more common approach such as HTTP protocol.

As an example, let’s implement simple HTTP interface for our utils application. Again, the first thing we need is a new utils_interface application:

UtilsInterface module has the similar structure with ModelsInterface but the do_remote_call/2 looks like:

defp do_remote_call({module, fun, args}, _) do
  {:ok, resp} = HTTPoison.post(remote_url(),
                               serialize({module, fun, args}))
  deserialize(resp.body)
end

For this example I’ve used simple Erlang term_to_binary and binary_to_term serialization:

defp serialize(term), do: :erlang.term_to_binary(term)
defp deserialize(data), do: :erlang.binary_to_term(data)

The utils project needs HTTP server to listen to external requests. I’ve used cowboy with plug for this

defp deps do
  [
    {:cowboy, "~> 1.0.0"},
    {:plug, "~> 1.0"},
    {:espec, "1.4.6", only: :test}
  ]
end

The plug module which is responsible for handling requests:

defmodule Utils.Interfaces.Plug do
  use Plug.Router

  plug :match
  plug :dispatch

  post "/remote" do
    {:ok, body, conn} = Plug.Conn.read_body(conn)
    {module, fun, args} = deserialize(body)
    result = apply(module, fun, args)
    send_resp(conn, 200, serialize(result))
  end
end

It just deserializes {module, fun, args} tuple, does function call and sends a result back to the client.

And, don’t forget to start the “plug” via cowboy server in utils application

children = [
  Plug.Adapters.Cowboy.child_spec(:http,  
       Utils.Interfaces.Plug, [], [port: 4001])
]

Please note, that it is not a good practice to call functions directly from deserialized data. I did it only to simplify the example. In the real world, you need more sophisticated approach!

Limiting concurrency with poolboy

The last feature I wanna describe in the post allows you to protect your application and its resources from “overflowing”. Imagine, for example, that models application use quite a lot of memory for model fitting. So we want to limit the number of clients that want to access models application. To do this we will create a limited pool of worker processes on the interface level using the poolboy library.

poolboy needs to be started byapplication supervisor:

defmodule Models.Application do
  use Application

  def start(_type, _args) do
    pool_options = [
      name: {:local, Models.Interface},
      worker_module: Models.Interfaces.Worker,
      size: 5, max_overflow: 5]

    children = [
      :poolboy.child_spec(Models.Interface, pool_options, []),
    ]

    opts = [strategy: :one_for_one, name: Models.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

You may see poolboy options here: name of supervisor, worker module, size of a pool, and max_overflow.

The worker module is a simple GenServer which just calls corresponding function:

defmodule Models.Interfaces.Worker do
  use GenServer

  def start_link(_opts) do
    GenServer.start_link(__MODULE__, :ok, [])
  end

  def init(:ok), do: {:ok, %{}}

  def handle_call({module, fun, args}, _from, state) do
    result = apply(module, fun, args)
    {:reply, result, state}
  end
end

And the last change is in Models.Interfaces.Rf module. Instead of function delegation, it will spawn worker process inside pool:

defmodule Models.Interfaces.Rf do
  def fit_model(data) do
    with_poolboy({Models.Rf, :fit_model, [data]})
  end

  def with_poolboy(args) do
    worker = :poolboy.checkout(Models.Interface)
    result = GenServer.call(worker, args, :infinity)
    :poolboy.checkin(Models.Interface, worker)
    result
  end
end

That’s it! Now you are absolutely sure that models application can handle the only limited number of requests.

Conclusion

As a conclusion I wanna give you some recommendations:

Start with microservices from the very beginning. It is very easy to do with Elixir umbrella project.
Use “context” and “implementation” modules to organize logic inside an application.
Think carefully about application’s interfaces. Do not allow direct calls to implementation functions between applications.
When scaling to distributed system, place “communication” logic into the separate application. Use Erlang Distribution Protocol for communication between BEAM applications

I hope, approaches and abstractions described in the article will help you to write better code with Elixir!

Hit the 👏 if you enjoyed the article and do not hesitate to contact me if you have questions or proposals!

Have a wonderful week,
Anton