Last year I was invited to ElixirConf Latin America in Colombia to give a talk, I proposed to also give a tutorial about Riak Core and they said that it should be in Elixir, so I started looking into Elixir to translate my Riak Core material to it.
At the same time I was learning about pretty printers and I decided to use it as a joke in my talk and a way to learn Elixir by implementing a pretty printer for Elixir from the Erlang Abstract Syntax Tree.
The joke didn't work, but it resulted in the prototype of Elixir Flavoured Erlang.
This year I was invited to give another talk about languages on the Erlang virtual machine at Code BEAM Brasil 2020 and I thought it would be a good idea to continue working on it and maybe announce it at the talk.
To measure progress I built some scripts that would transpile the Erlang standard library to Elixir and then try compiling the resulting modules with the Elixir compiler, I would pick one compiler error, fix it and try again.
With this short feedback loop and a counter that told me how many modules compiled successful it was just a matter of finding errors and fixing them. At the beginning each fix would remove lot of compiler errors and some times surface new ones, after a while each error was a weird corner case and progress slowed.
Some days before the talk I managed to transpile all of Erlang/OTP and 91% of the Elixir translations compiled successfully.
The result is of course Elixir Flavoured Erlang, but as a side effect I have Erlang/OTP in Elixir, so I decided to publish it too.
Enter otp.ex: Erlang/OTP transpiled to Elixir.
The objective of this repository is to allow Elixir programmers to read Erlang code for projects they use, most of the code compiles but I can't ensure that it behaves identically to the original source.
While writing the readme of efe I needed some example that wasn't OTP so I decided to also transpile a widely used project on Erlang and Elixir: the Cowboy web server
The ^ match operator in Elixir
In Elixir variable bindings by default rebind to the new value, if they are already bound and you want to pattern match on the current value you have to add the ^ operator in front:
In Erlang variables are bound once and then always pattern match, the easy part of the translation is that I know that when a variable is bound and in match position I have to add the ^, the thing is that I can't add the ^ on the first binding and I have to know where variables are in match position.
For this I do a pass on the Erlang Abstract Syntax Tree and I add annotations on variables to know if it's already bound and if it's in match possition, the pretty printer in the second pass checks those annotations to know if it has to add the ^ or not.
Why some modules don't compile?
Here's a list of reasons why the remaining modules don't compile after being transpiled.
For comprehensions must start with a generator
There's a weird trick in Erlang where you can generate an empty list if a condition is false or a list with one item if a condition is true by having a list comprehension that has no generator but has a filter.
I've been told that it's an artifact of how list comprehensions used to be translated to other code in the past.
The fact is that it's valid Erlang and is used in some places in the standard library.
For simple cases in efe I insert a dummy generator:
For more advanced cases with many filters I have to analyze if inserting a generator at the beginning doesn't change the result, that's why some cases are left as is.
Erlang records don’t evaluate default expressions, Elixir defrecord do
What the preprocessor does is to insert the default values "as is" on the places where a record is created, this means that if the default is a function call it won't be evaluated during definition, there will be a function call for each instantiation of the record.
Elixir has a module to deal with Erlang Records using macros, the thing is that Elixir will evaluate the defaults when they are defined, this means that if the call doesn't return a constant the behavior won't be the same. If the call returns a value that can't be represented as a constant in the code it won't compile either.
Another issue is if the function being called is declared after the record is defined, it will fail with an error saying that the function doesn't exit.
There could be a solution here by creating another module that tries to emulate the way default values behave in Erlang (they behave as "quoted" expressions) but I don't know so much about Elixir macros to know how to do it.
Named lambda functions
In Erlang lambda functions can have names to allow recursion, in Elixir this is not supported, there's no way to automatically change the code in a local/simple way, it's easy to change the code by hand so I decided to transpile it as if Elixir supported named lambda functions and get a compiler error.
Expressions in bitstrings
In Elixir size in bitstring expects an integer or a variable as argument, Erlang allows any expression there, it's easy to fix by hand by extracting the expression into a variable and putting the variable there, it could be doable but for now I just leave the expression in place and get a compiler error.
Variable defined inside scope and used outside
In Erlang variables introduced within the if, case or receive expressions are implicitly exported from the bodies, this means this works:
Elixir has more strict scoping rules and that is not allowed, this is highly discouraged in Erlang but used in some places in the standard library.
Corner cases all the way down
Here's a list of small differences that I had to fix.
Erlang vs Elixir imports
In Erlang you can import functions from a module in multiple imports and they "add up".
In Elixir later imports for the same module "shadow" previous ones.
The solution is to group imports for the same module and emit only one import per module.
In Erlang you can import a function more than once, in Elixir it's a compiler error, the solution is to deduplicate function imports.
Auto imported functions
Lowercase variables that become keywords
Erlang variables start with uppercase, Elixir variables with lowercase, this means in Erlang variable names can't clash with language keywords but the lowercase versions can, that's why I have to check if the variable is a keyword and add a suffix to them.
Local calls and Kernel autoimports
Elixir auto import functions from the Kernel module that may clash with local functions in the current Erlang module, for this case I have to detect Kernel functions and macros that are also local functions and add an expression to avoid auto importing them, like this:
Private on_load function
Erlang allows to define a private function to be run when the module loads, Elixir only allowed public functions, this has been reported and fixed in Elixir but not yet released.
Function capture/calls with dynamic values
In Erlang the syntax to pass a reference to a function is uniform for constants and variables:
In Elixir I had to special case when any part is a variable.
Something similar happens with function calls:
In Erlang binary operators are builtin.
In Elixir they are macros from the Bitwise module.
The fix was easy, just use the module.
In Erlang there's no extra syntax to call a function that is the result of an expression:
In Elixir it has to be wrapped in parenthesis and a dot added before the call:
Weird function names
In Erlang to declare or call function names whose names are not valid identifiers the name has to be in single quotes:
In Elixir the declaration is different from the call.
When the function is a keyword in Elixir the declaration is the same but a local call must be prefixed with the module to be valid syntax:
Erlang non short circuit boolean operators
For historical reasons Erlang's boolean operators and and or do not short circuit, this means they evaluate both sides before evaluating itself, for short circuit versions the newer and recommended andalso and orelse operators exist. Still the old versions are used in some places.
Elixir only has short circuit versions, to solve this I replace calls to those operators to the functions in the Erlang module that do the same, since I need to force the evaluation of both sides and function calls evaluate the arguments before calling it does what I need.
The problem is in guards, where only a subset of functions can be used, in Erlang since and and or are operators they are allowed, but in Elixir the function calls are not, only in this case I replace the non short circuit version for the short circuit ones since guards are expected to be side effect free and the evaluation of a side effect free expression on the right side should not change the result of the guard.
But there's a corner case in the corner case, a guard evaluates to false if the guard throws, if the right side throws then the semantics will differ, but well, I tried hard enough:
2> if true orelse 1/0 -> ok end. ok 3> if true or 1/0 -> ok end. ** exception error: no true branch found when evaluating an if expression 6> if (false andalso 1/0) == false -> ok end. ok 7> if (false and 1/0) == false -> ok end. ** exception error: no true branch found when evaluating an if expression
Valid character syntax
The character type is a syntax convenience to write numbers, Erlang supports more character ranges than Elixir, it was a matter of figuring out the valid ranges and generating the numbers instead for the ones that were not allowed:
Erlang doesn't support string interpolation, Elixir does, any case that looks like string interpolation coming from Erlang must be quoted because it's not:
Did you know that in Elixir you can interpolate in atoms?
Constant expressions in match position
Erlang allows expressions that evaluate to a constant on match position, Elixir doesn't so I had to implement a small evaluator to do it before translating expressions.
Erlang has a catch expression which Elixir does not, luckily since in Elixir everything is an expression I can expand it to a try/catch expression, the only downside is the extra verbosity.
Erlang/OTP as a fuzzer for the Elixir compiler
As I said I tested efe by transpiling the Erlang standard library and trying to compile it with the Elixir compiler.
The thing is that OTP has a lot of code, some of it really old and some of it using Erlang in weird ways, that meant that in some cases I would crash the Elixir compiler in the process or I would get an unexpected error that may be undefined behavior.
I reported the ones that made sense and the Elixir team had the patience to handle them and fixed them really fast, here's a list: