Elixir protocols, how do they work? the erlang perspective

2019-04-07 18:50

We will start by creating a new project to learn more about Elixir protocols:

mix new learn --module Learn
cd learn

note: I'm using Erlang 21.3 and Elixir 1.8.1 and I never coded in Elixir before :)

I searched for Elixir protocols and found the official documentation with an example for a Size protocol I added it to the lib/learn.ex file and added some calls on the hello function to try it, it ended up looking like this:

defmodule Learn do
  @moduledoc """
  Documentation for Learn.
  """

  @doc """
  Hello world.

  ## Examples

  iex> Learn.hello()

  """
  def hello do
    Learn.Size.size("asd")
    Learn.Size.size(%{})
    Learn.Size.size({1, 2, 3})
  end

  defprotocol Size do
    @doc "Calculates the size (and not the length!) of a data structure"
    def size(data)
  end

  defimpl Size, for: BitString do
    def size(string), do: byte_size(string)
  end

  defimpl Size, for: Map do
    def size(map), do: map_size(map)
  end

  defimpl Size, for: Tuple do
    def size(tuple), do: tuple_size(tuple)
  end

end

Compiled the project:

mix compile

Opened an Elixir shell:

iex

Wrote a little script to decompile all beam files to Erlang (warning: Elixir flavored Erlang ahead!):

for f <- :filelib.wildcard('./_build/dev/lib/*/*/*.beam') do
  result = :beam_lib.chunks(f,[:abstract_code])
  {:ok,{_,[{:abstract_code,{_,ac}}]}} = result
  code = :erl_prettypr.format(:erl_syntax.form_list(ac))
  out_path = :string.replace(f, '.beam', '.erl')
  :file.write_file(out_path, code)
end

The results:

$  tree
.
├── _build
│   └── dev
│       └── lib
│           └── learn
│               ├── consolidated
│               │   ├── Elixir.Collectable.beam
│               │   ├── Elixir.Collectable.erl
│               │   ├── Elixir.Enumerable.beam
│               │   ├── Elixir.Enumerable.erl
│               │   ├── Elixir.IEx.Info.beam
│               │   ├── Elixir.IEx.Info.erl
│               │   ├── Elixir.Inspect.beam
│               │   ├── Elixir.Inspect.erl
│               │   ├── Elixir.Learn.Size.beam
│               │   ├── Elixir.Learn.Size.erl
│               │   ├── Elixir.List.Chars.beam
│               │   ├── Elixir.List.Chars.erl
│               │   ├── Elixir.String.Chars.beam
│               │   └── Elixir.String.Chars.erl
│               └── ebin
│                   ├── Elixir.Learn.beam
│                   ├── Elixir.Learn.erl
│                   ├── Elixir.Learn.Size.beam
│                   ├── Elixir.Learn.Size.BitString.beam
│                   ├── Elixir.Learn.Size.BitString.erl
│                   ├── Elixir.Learn.Size.erl
│                   ├── Elixir.Learn.Size.Map.beam
│                   ├── Elixir.Learn.Size.Map.erl
│                   ├── Elixir.Learn.Size.Tuple.beam
│                   ├── Elixir.Learn.Size.Tuple.erl
│                   └── learn.app

From the result it seems that it "consolidates" the protocols into the consolidated folder and then puts the modules at ebin (with protocol implementations named like the protocol plus the type they handle).

It's also clear that all Elixir modules are prefixed with Elixir., also that if I declare a protocol inside a module the protocol "belongs" to the module, in this case the "full qualified name" of the protocol is Elixir.Learn.Size.

Let's start exploring what code is generated by inspecting the main module we wrote (I will cleanup unneeded code from the examples):

-module('Elixir.Learn').
-export([hello/0]).

hello() ->
    'Elixir.Learn.Size':size(<<"asd">>),
    'Elixir.Learn.Size':size(#{}),
    'Elixir.Learn.Size':size({1, 2, 3}).

We can see that calling a function from a protocol implies calling the desired function on the consolidated module for the protocol itself.

Let's now see what the Elixir.Learn.Size module does:

-module('Elixir.Learn.Size').
-export(['__protocol__'/1, impl_for/1, 'impl_for!'/1, size/1]).

'impl_for!'(__@1) ->
    case impl_for(__@1) of
      __@2 when __@2 =:= nil orelse __@2 =:= false ->
      erlang:error('Elixir.Protocol.UndefinedError':exception([{protocol,
                                    'Elixir.Learn.Size'},
                                   {value,
                                    __@1}]));
      __@3 -> __@3
    end.

size(__@1) -> ('impl_for!'(__@1)):size(__@1).

struct_impl_for(_) -> nil.

impl_for(#{'__struct__' := __@1})
    when erlang:is_atom(__@1) ->
    struct_impl_for(__@1);
impl_for(__@1) when erlang:is_tuple(__@1) ->
    'Elixir.Learn.Size.Tuple';
impl_for(__@1) when erlang:is_map(__@1) ->
    'Elixir.Learn.Size.Map';
impl_for(__@1) when erlang:is_bitstring(__@1) ->
    'Elixir.Learn.Size.BitString';
impl_for(_) -> nil.

'__protocol__'(module) -> 'Elixir.Learn.Size';
'__protocol__'(functions) -> [{size, 1}];
'__protocol__'('consolidated?') -> true;
'__protocol__'(impls) ->
    {consolidated,
     ['Elixir.Map', 'Elixir.BitString', 'Elixir.Tuple']}.

The exported function for the protocol (size/1) does a simple thing, it asks the impl_for!/1 function for the module that knows how to handle Learn.Size.size/1 for the given argument and then calls that module's size/1 function:

size(__@1) -> ('impl_for!'(__@1)):size(__@1).

impl_for!/1 just calls impl_for/1 with the argument and handles the case where the value doesn't have a known implementation, in that case it raises an exception (Elixir.Protocol.UndefinedError), otherwise it just returns the module name.

impl_for/1 starts by checking if the argument is an Elixir struct, which underneath is just a map with a "well known" key __struct__ that contains the type of the struct as an atom:

impl_for(#{'__struct__' := __@1})
    when erlang:is_atom(__@1) ->

if it's a struct it calls struct_impl_for/1 with the struct type as argument:

struct_impl_for(__@1);

In our example, there's no struct that implements this protocol so the implementation of struct_impl_for/1 is simple:

struct_impl_for(_) -> nil.

After that it starts trying to find the implementation for non protocol types (mostly Erlang types), it tries to match using guards to check for the types, if none match, it returns nil like struct_impl_for/1:

impl_for(__@1) when erlang:is_tuple(__@1) ->
    'Elixir.Learn.Size.Tuple';

impl_for(__@1) when erlang:is_map(__@1) ->
    'Elixir.Learn.Size.Map';

impl_for(__@1) when erlang:is_bitstring(__@1) ->
    'Elixir.Learn.Size.BitString';

impl_for(_) -> nil.

Now that we got the module that handles the protocol function for each type, let's see their implementations:

Elixir.Learn.Size.BitString:

size(_string@1) -> erlang:byte_size(_string@1).

Elixir.Learn.Size.Map:

size(_map@1) -> erlang:map_size(_map@1).

Elixir.Learn.Size.Tuple:

size(_tuple@1) -> erlang:tuple_size(_tuple@1)

Now that we got the basic call and dispatch sequence let's try adding two structs and implement this protocol to see how it works for them:

I added two structs to the lib/learn.ex module:

defstruct name: "John", age: 27

defmodule User do
  defstruct name: "John", age: 27
end

Added calls to Size.size/1 in the hello/0 function:

def hello do
  Learn.Size.size("asd")
  Learn.Size.size(%{})
  Learn.Size.size({1, 2, 3})
  Learn.Size.size(%User{age: 27, name: "John"})
  Learn.Size.size(%Learn{age: 27, name: "John"})
end

And implemented the protocol Size for both structs:

defimpl Size, for: Learn do
  def size(learn), do: learn.age + 1
end

defimpl Size, for: User do
  def size(user), do: user.age + 2
end

Compiled with mix compile and inside iex pasted the script again, let's see what changed.

The hello world function looks like this:

hello() ->
        'Elixir.Learn.Size':size(<<"asd">>),
        'Elixir.Learn.Size':size(#{}),
        'Elixir.Learn.Size':size({1, 2, 3}),
        'Elixir.Learn.Size':size(#{age => 27,
                                   name => <<"John">>,
                                   '__struct__' => 'Elixir.Learn.User'}),
        'Elixir.Learn.Size':size(#{age => 27,
                                   name => <<"John">>,
                                   '__struct__' => 'Elixir.Learn'}).

Which confirms that Elixir structs are maps with a special __struct__ key.

Checking the generated files, there's a new file for our User struct (Elixir.Learn.User.erl), the other struct is defined inside Elixir.Learn.erl.

The module code relevant for the struct doesn't have anything specific to the protocols it implements:

-module('Elixir.Learn.User').
-export([_struct__'/0, '__struct__'/1]).

'__struct__'() ->
        #{'__struct__' => 'Elixir.Learn.User', age => 27,
          name => <<"John">>}.

'__struct__'(__@1) ->
        'Elixir.Enum':reduce(__@1,
                         #{'__struct__' => 'Elixir.Learn.User', age => 27,
                           name => <<"John">>},
                         fun ({__@2, __@3}, __@4) ->
                                 maps:update(__@2, __@3, __@4)
                         end).

Almost the same code is inside Elixir.Learn.erl for the other struct.

This shows that each struct has two "constructors", one without arguments that returns a struct with the default values for all fields and one that merges the arguments on the default values.

Let's see what changed on the consolidated protocol module:

struct_impl_for('Elixir.Learn.User') ->
        'Elixir.Learn.Size.Learn.User';
struct_impl_for('Elixir.Learn') ->
        'Elixir.Learn.Size.Learn';
struct_impl_for(_) -> nil.

Each struct type returns the module where the protocol is implemented, let's see both implementations:

Elixir.Learn.Size.Learn.User.erl:

size(_user@1) ->
        case _user@1 of
          #{age := __@1} -> __@1;
          __@1 when erlang:is_map(__@1) ->
          erlang:error({badkey, age, __@1});
          __@1 -> __@1:age()
        end
          + 2.

Elixir.Learn.Size.Learn.erl:

size(_learn@1) ->
        case _learn@1 of
          #{age := __@1} -> __@1;
          __@1 when erlang:is_map(__@1) ->
          erlang:error({badkey, age, __@1});
          __@1 -> __@1:age()
        end
          + 1.

Summary:

Elixir protocols are compiled to its own module whose content is the consolidated dispatch logic for it.

This logic is created by getting all the defimpl statements for it and adding a function clause to the struct_impl_for/1 function if the target type is an Elixir struct and a clause to the impl_for/1 function if the target type is any other type.

The function (impl_for!/1) returns the module that has the protocol implementation for the provided type.

Each protocol function asks for the module via impl_for!/1 and calls it with the given arguments.

This is just guessing, but the module indirection must be there to allow hot code reloading protocol implementations for each type independently without requiring also reloading the protocol consolidation. The struct_impl_for function is there to destructure the map only once.

I don't see traces of dynamic dispatch in case a module is loaded with a protocol implementation that was not known at consolidation time, I need to research this further.

An extra guess, this logic on the struct field to get the age field:

case _learn@1 of
  #{age := __@1} -> __@1;
  __@1 when erlang:is_map(__@1) ->
  erlang:error({badkey, age, __@1});
  __@1 -> __@1:age()
end

May be because Elixir allows to call a struct "method" without parenthesis and that's why it looks for the field first and the function with the same name second? I'm not entirely sure since my Elixir knowledge is basically non existent :)

If you have any questions or corrections I'm @warianoguerra my other accounts here: https://keybase.io/marianoguerra

Mariano Guerra's Log