Elixir protocols, how do they work? the erlang perspective
We will start by creating a new project to learn more about Elixir protocols:
note: I'm using Erlang 21.3 and Elixir 1.8.1 and I never coded in Elixir before :)
I searched for Elixir protocols and found the official documentation with an example for a Size protocol I added it to the lib/learn.ex file and added some calls on the hello function to try it, it ended up looking like this:
defmodule Learn do @moduledoc """ Documentation for Learn. """ @doc """ Hello world. ## Examples iex> Learn.hello() """ def hello do Learn.Size.size("asd") Learn.Size.size(%{}) Learn.Size.size({1, 2, 3}) end defprotocol Size do @doc "Calculates the size (and not the length!) of a data structure" def size(data) end defimpl Size, for: BitString do def size(string), do: byte_size(string) end defimpl Size, for: Map do def size(map), do: map_size(map) end defimpl Size, for: Tuple do def size(tuple), do: tuple_size(tuple) end end
Compiled the project:
Opened an Elixir shell:
Wrote a little script to decompile all beam files to Erlang (warning: Elixir flavored Erlang ahead!):
for f <- :filelib.wildcard('./_build/dev/lib/*/*/*.beam') do result = :beam_lib.chunks(f,[:abstract_code]) {:ok,{_,[{:abstract_code,{_,ac}}]}} = result code = :erl_prettypr.format(:erl_syntax.form_list(ac)) out_path = :string.replace(f, '.beam', '.erl') :file.write_file(out_path, code) end
The results:
$ tree . ├── _build │ └── dev │ └── lib │ └── learn │ ├── consolidated │ │ ├── Elixir.Collectable.beam │ │ ├── Elixir.Collectable.erl │ │ ├── Elixir.Enumerable.beam │ │ ├── Elixir.Enumerable.erl │ │ ├── Elixir.IEx.Info.beam │ │ ├── Elixir.IEx.Info.erl │ │ ├── Elixir.Inspect.beam │ │ ├── Elixir.Inspect.erl │ │ ├── Elixir.Learn.Size.beam │ │ ├── Elixir.Learn.Size.erl │ │ ├── Elixir.List.Chars.beam │ │ ├── Elixir.List.Chars.erl │ │ ├── Elixir.String.Chars.beam │ │ └── Elixir.String.Chars.erl │ └── ebin │ ├── Elixir.Learn.beam │ ├── Elixir.Learn.erl │ ├── Elixir.Learn.Size.beam │ ├── Elixir.Learn.Size.BitString.beam │ ├── Elixir.Learn.Size.BitString.erl │ ├── Elixir.Learn.Size.erl │ ├── Elixir.Learn.Size.Map.beam │ ├── Elixir.Learn.Size.Map.erl │ ├── Elixir.Learn.Size.Tuple.beam │ ├── Elixir.Learn.Size.Tuple.erl │ └── learn.app
From the result it seems that it "consolidates" the protocols into the consolidated folder and then puts the modules at ebin (with protocol implementations named like the protocol plus the type they handle).
It's also clear that all Elixir modules are prefixed with Elixir., also that if I declare a protocol inside a module the protocol "belongs" to the module, in this case the "full qualified name" of the protocol is Elixir.Learn.Size.
Let's start exploring what code is generated by inspecting the main module we wrote (I will cleanup unneeded code from the examples):
-module('Elixir.Learn'). -export([hello/0]). hello() -> 'Elixir.Learn.Size':size(<<"asd">>), 'Elixir.Learn.Size':size(#{}), 'Elixir.Learn.Size':size({1, 2, 3}).
We can see that calling a function from a protocol implies calling the desired function on the consolidated module for the protocol itself.
Let's now see what the Elixir.Learn.Size module does:
-module('Elixir.Learn.Size'). -export(['__protocol__'/1, impl_for/1, 'impl_for!'/1, size/1]). 'impl_for!'(__@1) -> case impl_for(__@1) of __@2 when __@2 =:= nil orelse __@2 =:= false -> erlang:error('Elixir.Protocol.UndefinedError':exception([{protocol, 'Elixir.Learn.Size'}, {value, __@1}])); __@3 -> __@3 end. size(__@1) -> ('impl_for!'(__@1)):size(__@1). struct_impl_for(_) -> nil. impl_for(#{'__struct__' := __@1}) when erlang:is_atom(__@1) -> struct_impl_for(__@1); impl_for(__@1) when erlang:is_tuple(__@1) -> 'Elixir.Learn.Size.Tuple'; impl_for(__@1) when erlang:is_map(__@1) -> 'Elixir.Learn.Size.Map'; impl_for(__@1) when erlang:is_bitstring(__@1) -> 'Elixir.Learn.Size.BitString'; impl_for(_) -> nil. '__protocol__'(module) -> 'Elixir.Learn.Size'; '__protocol__'(functions) -> [{size, 1}]; '__protocol__'('consolidated?') -> true; '__protocol__'(impls) -> {consolidated, ['Elixir.Map', 'Elixir.BitString', 'Elixir.Tuple']}.
The exported function for the protocol (size/1) does a simple thing, it asks the impl_for!/1 function for the module that knows how to handle Learn.Size.size/1 for the given argument and then calls that module's size/1 function:
impl_for!/1 just calls impl_for/1 with the argument and handles the case where the value doesn't have a known implementation, in that case it raises an exception (Elixir.Protocol.UndefinedError), otherwise it just returns the module name.
impl_for/1 starts by checking if the argument is an Elixir struct, which underneath is just a map with a "well known" key __struct__ that contains the type of the struct as an atom:
if it's a struct it calls struct_impl_for/1 with the struct type as argument:
In our example, there's no struct that implements this protocol so the implementation of struct_impl_for/1 is simple:
After that it starts trying to find the implementation for non protocol types (mostly Erlang types), it tries to match using guards to check for the types, if none match, it returns nil like struct_impl_for/1:
impl_for(__@1) when erlang:is_tuple(__@1) -> 'Elixir.Learn.Size.Tuple'; impl_for(__@1) when erlang:is_map(__@1) -> 'Elixir.Learn.Size.Map'; impl_for(__@1) when erlang:is_bitstring(__@1) -> 'Elixir.Learn.Size.BitString'; impl_for(_) -> nil.
Now that we got the module that handles the protocol function for each type, let's see their implementations:
Elixir.Learn.Size.BitString:
Elixir.Learn.Size.Map:
Elixir.Learn.Size.Tuple:
Now that we got the basic call and dispatch sequence let's try adding two structs and implement this protocol to see how it works for them:
I added two structs to the lib/learn.ex module:
Added calls to Size.size/1 in the hello/0 function:
def hello do Learn.Size.size("asd") Learn.Size.size(%{}) Learn.Size.size({1, 2, 3}) Learn.Size.size(%User{age: 27, name: "John"}) Learn.Size.size(%Learn{age: 27, name: "John"}) end
And implemented the protocol Size for both structs:
defimpl Size, for: Learn do def size(learn), do: learn.age + 1 end defimpl Size, for: User do def size(user), do: user.age + 2 end
Compiled with mix compile and inside iex pasted the script again, let's see what changed.
The hello world function looks like this:
hello() -> 'Elixir.Learn.Size':size(<<"asd">>), 'Elixir.Learn.Size':size(#{}), 'Elixir.Learn.Size':size({1, 2, 3}), 'Elixir.Learn.Size':size(#{age => 27, name => <<"John">>, '__struct__' => 'Elixir.Learn.User'}), 'Elixir.Learn.Size':size(#{age => 27, name => <<"John">>, '__struct__' => 'Elixir.Learn'}).
Which confirms that Elixir structs are maps with a special __struct__ key.
Checking the generated files, there's a new file for our User struct (Elixir.Learn.User.erl), the other struct is defined inside Elixir.Learn.erl.
The module code relevant for the struct doesn't have anything specific to the protocols it implements:
-module('Elixir.Learn.User'). -export([_struct__'/0, '__struct__'/1]). '__struct__'() -> #{'__struct__' => 'Elixir.Learn.User', age => 27, name => <<"John">>}. '__struct__'(__@1) -> 'Elixir.Enum':reduce(__@1, #{'__struct__' => 'Elixir.Learn.User', age => 27, name => <<"John">>}, fun ({__@2, __@3}, __@4) -> maps:update(__@2, __@3, __@4) end).
Almost the same code is inside Elixir.Learn.erl for the other struct.
This shows that each struct has two "constructors", one without arguments that returns a struct with the default values for all fields and one that merges the arguments on the default values.
Let's see what changed on the consolidated protocol module:
struct_impl_for('Elixir.Learn.User') -> 'Elixir.Learn.Size.Learn.User'; struct_impl_for('Elixir.Learn') -> 'Elixir.Learn.Size.Learn'; struct_impl_for(_) -> nil.
Each struct type returns the module where the protocol is implemented, let's see both implementations:
Elixir.Learn.Size.Learn.User.erl:
size(_user@1) -> case _user@1 of #{age := __@1} -> __@1; __@1 when erlang:is_map(__@1) -> erlang:error({badkey, age, __@1}); __@1 -> __@1:age() end + 2.
Elixir.Learn.Size.Learn.erl:
size(_learn@1) -> case _learn@1 of #{age := __@1} -> __@1; __@1 when erlang:is_map(__@1) -> erlang:error({badkey, age, __@1}); __@1 -> __@1:age() end + 1.
Summary:
Elixir protocols are compiled to its own module whose content is the consolidated dispatch logic for it.
This logic is created by getting all the defimpl statements for it and adding a function clause to the struct_impl_for/1 function if the target type is an Elixir struct and a clause to the impl_for/1 function if the target type is any other type.
The function (impl_for!/1) returns the module that has the protocol implementation for the provided type.
Each protocol function asks for the module via impl_for!/1 and calls it with the given arguments.
This is just guessing, but the module indirection must be there to allow hot code reloading protocol implementations for each type independently without requiring also reloading the protocol consolidation. The struct_impl_for function is there to destructure the map only once.
I don't see traces of dynamic dispatch in case a module is loaded with a protocol implementation that was not known at consolidation time, I need to research this further.
An extra guess, this logic on the struct field to get the age field:
case _learn@1 of #{age := __@1} -> __@1; __@1 when erlang:is_map(__@1) -> erlang:error({badkey, age, __@1}); __@1 -> __@1:age() end
May be because Elixir allows to call a struct "method" without parenthesis and that's why it looks for the field first and the function with the same name second? I'm not entirely sure since my Elixir knowledge is basically non existent :)
If you have any questions or corrections I'm @warianoguerra my other accounts here: https://keybase.io/marianoguerra