A Guide to Process-oriented Programming in Elixir and OTP

People like to categorize programming languages into paradigms. There are object-oriented (OO) languages, imperative languages, functional languages, etc. This can be helpful in figuring out which languages solve similar problems, and what types of problems a language is intended to solve.

In each case a paradigm generally has one “main” focus and technique that is the driving force for that family of languages:

  • In OO languages, it is the class or object as a way to encapsulate state (data) with manipulation of that state (methods).
  • In functional languages, it can be the manipulation of functions themselves or the immutable data passed from function to function.

While Elixir (and Erlang before it) are often categorized as functional languages because they exhibit the immutable data common to functional languages, I would submit they represent a separate paradigm from many functional languages. They exist and are adopted because of the existence of OTP, and so I would categorize them as process-oriented languages.

In this post, we will capture the meaning of what process-oriented programming is when using these languages, explore the differences and similarities to other paradigms, see the implications for both training and adoption, and end with a short process-oriented programming example.

What Is Process-oriented Programming?

Let’s start with a definition: Process-oriented programming is a paradigm based on Communicating Sequential Processes, originally from a paper by Tony Hoare in 1977. This is also popularly called the actormodel of concurrency. Other languages with some relation to this original work include Occam, Limbo, and Go. The formal paper deals only with synchronous communication; most actor models (including OTP) use asynchronous communication as well. It is always possible to build synchronous communication on top of asynchronous communication, and OTP supports both forms.

On this history, OTP created a system for fault tolerant computing by communicating sequential processes. The fault tolerant facilities come from a “let it fail” approach with solid error recovery in the form of supervisors and the use of distributed processing enabled by the actor model. The “let it fail” can be contrasted to “prevent it from failing,” as the former is far easier to accommodate and has been proven in OTP to be far more reliable than the latter. The reason is that the programming effort required to prevent failures (as shown in the Java checked exception model) is much more involved and demanding.

So, process-oriented programming can be defined as a paradigm in which the process structure and communication between processes of a system are the primary concerns.

Object-oriented vs. Process-oriented Programming

In object-oriented programming, the static structure of data and function is the primary concern. What methods are required to manipulate the enclosed data, and what should be the connections between objects or classes. Thus, the class diagram of UML is a prime example of this focus, as seen in Figure 1.

Process-oriented programming: Sample UML class diagram

It can be noted that a common criticism of object-oriented programming is that there is no visible control flow. Because systems are composed from a large number of classes/objects defined separately, it can be difficult for a less experienced person to visualize the control flow of a system. This is especially true for systems with a lot of inheritance, which use abstract interfaces or have no strong typing. In most cases, it becomes important for the developer to memorize a large amount of the system structure to be effective (what classes have what methods and which are used in what ways).

The strength of the object-oriented development approach is that the system can be extended to support new types of objects with limited impact on existing code, so long as the new object types conform to the expectations of the existing code.

Functional vs. Process-oriented Programming

Many functional programming languages do address concurrency in various ways, but their primary focus is immutable data passing between functions, or the creation of functions from other functions (higher order functions that generate functions). For the most part, the focus of the language is still a single address space or executable, and communications between such executables are handled in an operating system specific manner.

For example, Scala is a functional language built on the Java Virtual Machine. While it can access Java facilities for communication, it is not an inherent part of the language. While it is a common language used in Spark programming, it is again a library used in conjunction with the language.

A strength of functional paradigm is the ability to visualize the control flow of a system given the top level function. The control flow is explicit in that each function calls other functions, and passes all the data from one to the next. In the functional paradigm there are no side effects, which makes problem determination easier. The challenge with pure functional systems is that “side effects” are required to have persistent state. In well architected systems, the persisting of state is handled at the top level of the control flow, allowing most of the system to be side effect free.

Elixir/OTP and Process-oriented Programming

In Elixir/Erlang and OTP, the communication primitives are part of the virtual machine that executes the language. The ability to communicate between processes and between machines are built in and central to the language system. This emphasizes the importance of communication in this paradigm and in these language systems.

While the Elixir language is predominantly functional in terms of the logic expressed in the language, its use is process oriented.

What Does It Mean to Be Process-oriented?

To be process-oriented as defined in this post is to design a system first in the form of what processes exist and how they communicate. One of the main questions is which processes are static, and which are dynamic, which are spawned on demand to requests, which serve a long-running purpose, which hold shared state or part of the shared state of the system, and which features of the system are inherently concurrent. Just as OO has types of objects, and functional has types of functions, process-oriented programming has types of processes.

As such, a process-oriented design is the identification of the set of process types required to solve a problem or address a need.

The aspect of time enters quickly into the design and requirements efforts. What is the lifecycle of the system? What custom needs are occasional and which are constant? Where is the load in the system and what is the expected velocity and volume? It is only after these types of considerations are understood that a process-oriented design begins to define the function of each process or the logic to be executed.

Training Implications

The implication of this categorization to training is that training should begin not with language syntax or “Hello World” examples, but with systems engineering thinking and a design focus on process allocation.

The coding concerns are secondary to the process design and allocation which are best addressed at a higher level, and involve cross-functional thinking about lifecycle, QA, DevOps, and customer business requirements. Any training course in Elixir or Erlang must (and generally does) include OTP, and should have a process orientation from the beginning, not as the “Now you can code in Elixir, so let’s do concurrency” type approach.

Adoption Implications

The implication for adoption is that the language and system is better applied to problems that require communication and/or distribution of computing. Problems that are single workload on a single computer are less interesting in this space, and may be better addressed with another language. Long-lived continuous processing systems are a prime target for this language because it has fault tolerance built in from the ground up.

For documentation and design work, it can be very helpful to use a graphical notation (like figure 1 for OO languages). The suggestion for Elixir and process-oriented programming from UML would be the sequence diagram (example in figure 2) to show temporal relationships between processes and identify which processes are involved in servicing a request. There is not a UML diagram type for capturing life-cycle and process structure, but it could be represented with a simple box and arrow diagram for process types and their relationships. For example, Figure 3:

Process-oriented programming sample UML sequence diagram

Process-oriented programming sample process structure diagram

An Example of Process Orientation

Finally, we will walk through a short example of applying process orientation to a problem. Suppose we are tasked with providing a system that supports global elections. This problem is chosen in that many individual activities are performed in bursts, but the aggregation or summarization of the results is desirable in real time and might see significant load.

Initial Process Design and Allocation

We can initially see that the casting of votes by each individual is a burst of traffic to the system from many discrete inputs, is not time ordered, and can have high load. To support this activity, we would want a large number of processes all collecting these inputs and forwarding them to a more central process for tabulation. These processes could be located near the populations in each country that would be generating votes, and thus provide low latency. They would retain local results, log their inputs immediately, and forward them for tabulation in batches to reduce bandwidth and overhead.

We can initially see that there will need to be processes that track the votes in each jurisdiction in which results must be presented. Let’s assume for this example that we need to track results for each country, and within each country by province/state. To support this activity, we would want at least one process per country performing the computation, and retaining the current totals, and another set for each state/province in each country. This assumes we need to be able to answer totals for country and state/province in real time or low latency. If the results can be obtained from a database system, we might choose a different process allocation where totals are updated by transient processes. The advantage of using dedicated processes for these computations is that the results occur at the speed of memory and can be obtained with low latency.

Finally, we can see that lots and lots of people will be viewing the results. These processes can be partitioned in many ways. We may want to distribute the load by placing processes in each country responsible for that country’s results. The processes could cache the results from the computation processes to reduce query load on the computation processes, and/or the computation processes could push their results to the proper results processes on a periodic basis, when results change by a significant amount, or upon the computation process becoming idle indicating a slowed rate of change.

In all three process types, we can scale the processes independently of each other, distribute them geographically, and ensure results are never lost through active acknowledgement of data transfers between processes.

As discussed, we have begun the example with a process design independent of the business logic in each process. In cases where the business logic has specific requirements for data aggregation or geography that can impact the process allocation iteratively. Our process design so far is shown in figure 4.

Process-oriented development example: Initial process design

The use of separate processes to receive votes allows each vote to be received independent of any other vote, logged upon receipt, and batched to the next set of processes, reducing load on those systems significantly. For a system that consumes a large amount of data, reducing the volume of data by use of layers of processes is a common and useful pattern.

By performing the computation in an isolated set of processes, we can manage the load on those processes and ensure their stability and resource requirements.

By placing the result presentation in an isolated set of processes, we both control load to the rest of the system and allow the set of processes to be scaled dynamically for load.

Additional Requirements

Now, let’s add some complicating requirements. Let’s suppose that in each jurisdiction (country or state), the tabulation of votes can result in a proportional result, a winner-takes-all result, or no result if insufficient votes are cast relative to the population of that jurisdiction. Each jurisdiction has control over these aspects. With this change, then the results of countries are not a simple aggregation of the raw vote results, but are an aggregation of the state/province results. This changes the process allocation from the original to require that results from the state/province processes feed into the country processes. If the protocol used between the vote collection and the state/province and the province to country processes is the same, then the aggregation logic can be reused, but distinct processes holding the results are needed and their communication paths are different, as shown in Figure 5.

Process-oriented development example: Modified process design

The Code

To complete the example, we will review an implementation of the example in Elixir OTP. To simplify things, this example assumes a web server like Phoenix is used to process actual web requests, and those web services make requests to the process identified above. This has the advantage of simplifying the example and keeping the focus on Elixir/OTP. In a production system, having these be separate processes has some advantages as well as separates concerns, allows flexible deployment, distributes load, and reduces latency. The full source code with tests can be found at https://github.com/technomage/voting. The source is abbreviated in this post for readability. Each process below fits into an OTP supervision tree to ensure that processes are restarted on failure. See the source for more on this aspect of the example.

Vote Recorder

This process receives votes, logs them to a persistent store, and batches the results to the aggregators. The module VoteRecoder uses Task.Supervisor to manage short lived tasks to record each vote.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
defmodule Voting.VoteRecorder do
  @moduledoc """
  This module receives votes and sends them to the proper
  aggregator. This module uses supervised tasks to ensure
  that any failure is recovered from and the vote is not
  lost.
  """

  @doc """
  Start a task to track the submittal of a vote to an
  aggregator. This is a supervised task to ensure
  completion.
  """
  def cast_vote where, who do
    Task.Supervisor.async_nolink(Voting.VoteTaskSupervisor,
      fn ->
        Voting.Aggregator.submit_vote where, who
      end)
    |> Task.await
  end
end

Vote Aggregator

This process aggregates votes within a jurisdiction, computes the result for that jurisdiction, and forwards vote summaries to the next higher process (a higher level jurisdiction, or a result presenter).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
defmodule Voting.Aggregator do
  use GenStage
  ...

  @doc """
  Submit a single vote to an aggregator
  """
  def submit_vote id, candidate do
    pid = __MODULE__.via_tuple(id)
    :ok = GenStage.call pid, {:submit_vote, candidate}
  end

  @doc """
  Respond to requests
  """
  def handle_call {:submit_vote, candidate}, _from, state do
    n = state.votes[candidate] || 0
    state = %{state | votes: Map.put(state.votes, candidate, n+1)}
    {:reply, :ok, [%{state.id => state.votes}], state}
  end

  @doc """
  Handle events from subordinate aggregators
  """
  def handle_events events, _from, state do
    votes = Enum.reduce events, state.votes, fn e, votes ->
      Enum.reduce e, votes, fn {k,v}, votes ->
        Map.put(votes, k, v) # replace any entries for subordinates
      end
    end
    # Any jurisdiction specific policy would go here

    # Sum the votes by candidate for the published event
    merged = Enum.reduce votes, %{}, fn {j, jv}, votes ->
      # Each jourisdiction is summed for each candidate
      Enum.reduce jv, votes, fn {candidate, tot}, votes ->
        Logger.debug "@@@@ Votes in #{inspect j} for #{inspect candidate}: #{inspect tot}"
        n = votes[candidate] || 0
        Map.put(votes, candidate, n + tot)
      end
    end
    # Return the published event and the state which retains
    # Votes by jourisdiction
    {:noreply, [%{state.id => merged}], %{state | votes: votes}}
  end
end

Result Presenter

This process receives votes from an aggregator and caches those results to service requests for presenting results.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
defmodule Voting.ResultPresenter do
  use GenStage
  …

  @doc """
  Handle requests for results
  """
  def handle_call :get_votes, _from, state do
    {:reply, {:ok, state.votes}, [], state}
  end

  @doc """
  Obtain the results from this presenter
  """
  def get_votes id do
    pid = Voting.ResultPresenter.via_tuple(id)
    {:ok, votes} = GenStage.call pid, :get_votes
    votes
  end

  @doc """
  Receive votes from aggregator
  """
  def handle_events events, _from, state do
    Logger.debug "@@@@ Presenter received: #{inspect events}"
    votes = Enum.reduce events, state.votes, fn v, votes ->
      Enum.reduce v, votes, fn {k,v}, votes ->
        Map.put(votes, k, v)
      end
    end
    {:noreply, [], %{state | votes: votes}}
  end
end

Takeaway

This post explored Elixir/OTP from its potential as a process-oriented language, compared this to object-oriented and functional paradigms, and reviewed the implications of this to training and adoption.

The post also includes a short example of applying this orientation to a sample problem. In case you’d like to review all the code, here is a link to our example on GitHub again, just so you don’t have to scroll back looking for it.

The key takeaway is to view systems as a collection of communicating processes. Plan the system from a process design point of view first, and a logic coding point of view second.

This article is originally posted in Toptal.