How does Elixir compile/execute code?

Xavier Noria is a member of the Ruby on Rails core team, Ruby Hero, and proud author of Rails Contributors. And we are very excited to have him as a speaker this year! We are happy to share the latest article by Xavier. Have questions? Then don't miss your chance to meet Xavier on RubyC-2017!

How does Elixir compile/execute code?

Introduction

Elixir always compiles and always executes source code. Both elixir and elixirc do both things.

You read that right, always, compilation and execution. elixir compiles (in addition to execute), elixirc executes (in addition to compile).

Main phases of Elixir compilation

Both elixir and elixirc work the same way:

Load the contents of the file in memory.
Produce an AST from it using a custom tokenizer and yecc.
Expand macros, inline functions, …, a bunch of transformations are applied here in what’s known as the expansion phase. That yields an expanded AST, which still conforms to the same spec.
Transform that final AST into Erlang Abstract Format, which is a standard representation of an Erlang AST using Erlang terms.
Manually build an abstract format tree for a function called __FILE__/1in a module called elixir_compiler_X, where X is an integer, with the abstract format of the program from the step above as function body.
Compile the result to BEAM assembly on the fly with compile:forms/2, which returns a binary (no file is written).
Load said binary into the Erlang VM using the Erlang code server.
Call elixir_compiler_X.__FILE__/1. Since this function has your whole program as body, the VM is effectively running the program. Check this one-liner in an .ex(s) file, you’ll see it reports that function and module names: IO.inspect(:erlang.process_info(self(), :current_function)).

There is some nesting in this process that explains the loop illustrated in the picture above. This is due to the way module definition is implemented, but we’ll leave it here.

Observations

Both elixir and elixirc do the same. elixirc executes top-level and module-level code like elixir does, it is the same code path.

For example, you can conditionally define a function while compiling. Why? Because the code is being executed. The other way around, elixir is able to invoke functions in modules defined in the same script. Why? Because they are compiled and loaded into the VM on the fly.

Since programs executed by elixir are compiled, they run at the speed of compiled modules. Compilation has a penalty, of course, the wall clock time is different, but the code itself runs equally fast.

How are `elixir` and `elixirc` different?

The main difference between elixir and elixirc is that elixirc produces a .beam file per module as a side-effect of module definition. It does so by dumping the binary returned by compiler:forms/2. That’s about it.

Extensions in file names do not matter, .ex and .exs are only conventions.

You can also compile a file that contains five modules, and you’ll get five different .beam files, each named after the module name (regardless of the name of the file defining the modules).

Top-level code or module-level code that does not end in a persisted module attribute or a function is gone in the .beam files. Those files contain module definitions for the VM expressed in object code, Elixir is gone there, those are BEAM programs that could have technically been generated by some other tool.

PS: Thanks a lot to José Valim for reviewing a draft of this post ❤️.

Original post

How does Elixir compile/execute code?

How does Elixir compile/execute code?

Introduction

Main phases of Elixir compilation

Observations

How are elixir and elixirc different?

How are `elixir` and `elixirc` different?