Ir al contenido principal

This is my blog, more about me at marianoguerra.github.io

🦋 @marianoguerra.org 🐘 @marianoguerra@hachyderm.io 🐦 @warianoguerra

Speedrunning WebAssembly's History: A Fictional First-Person Perspective

Context

This is a true story. The events depicted in this film took place in Minnesota in 1987. At the request of the survivors, the names have been changed. Out of respect for the dead, the rest has been told exactly as it occurred.

Fargo (1996 film)

While writing WebAssembly from the Ground Up we reached the point where we needed to write an introduction. We decided to write one each and see which one worked better or if we could merge both into one.

From the conversations I knew Patrick was going to write a "Standard" introduction so I decided to try something else.

At the end Patrick's version ended up in the book so I'm posting mine here in case anyone finds it interesting.

Note that what I'm posting here was a draft, complete enough for patrick to check and provide feedback, it would have gone through a couple of rounds of revisions and edits if it ended up in the book :)

Introduction

It's 2013, web browsers are getting faster and enabling more use cases to be available on the web, you work at Mozilla and start thinking how to enable and accelerate the adoption of the web for more use cases.

You think "What kind of software is still not available on the web?", some ideas come to mind:

  • Games

  • Image / Video editors, encoders and decoders

  • Compression

  • Cryptography

  • CAD

  • Scientific visualization and simulation

  • Simulators, emulators

  • IDEs

You notice a pattern, they are all "compute intensive" and most of them are written in low level/system programming languages.

There are already many languages that compile to JavaScript, why not try to compile existing C/C++ codebases to JavaScript and see if it's enough?

After all when a C/C++ compiler does its job the result is mostly about calling functions, manipulating local variables and loading and storing numbers directly in memory.

You have an idea, if you start with an existing compiler like clang and swap the last stage to emit JavaScript instead of assembly, with the help of some shims you should be able to run some of those programs in the browser.

Since JavaScript doesn't have direct access to memory you decide to simulate memory by using an Array of numbers, pointers are compiled to numbers that are used to index into the memory array.

After some hacking you have a working prototype, you decide to call it emscripten.

After compiling and running some C/C++ programs you notice that they are really slow, the translation to JavaScript "throws away" a lot of information that is present in the original program, mainly the variable types.

The JavaScript runtime has to "rediscover" all that information by profiling and JITing, that rediscovery takes a while and is not always perfect, it would be nice to just pass the information you already have.

Another problem is that using an Array as memory is slow, JavaScript arrays are really flexible, but that comes at the cost of performance.

After some conversations and collaborations you get Typed Arrays implemented, this speeds up the memory operations a lot.

You notice that after the JIT has done its work peak performance is pretty good, but it takes a while and since each browser has its own engine the performance differs among them.

Using your connections at Mozilla you start talking with the Spidermonkey team to find a way to hint the runtime about the types of variables emitted by the compiler.

Low level languages have multiple numeric types, JavaScript has only one, but looking at the spec you notice that binary bitwise operators apply the ToInt32 conversion to the operands.

You can use that to hint the runtime that a variable is a 32 bit integer, using an operation that "does nothing" like binary or between a value and zero like a | 0 does exactly that.

Numbers in JavaScript are internally represented as doubles. If the compiler emits a variable of type double then you are almost done, but a variable can be of any type, like a String, a boolean, an Array etc. How do you hint to the runtime that this variable contains a Number and specifically a double?

In JavaScript there's a trick to coerce any value into a number, just put the plus sign in front of a variable like +a and it will convert its operand to the Number type.

Great, there's one type left, now you need to hint the runtime that a variable is a 32 bit floating point value, but you ran out of tricks.

You convince the Spidermonkey team to introduce a new function Math.fround that returns the nearest 32-bit single precision float representation of a number.

With the new hints in place you modify emscripten to emit the hints and notice a significant speed improvement.

But still it takes a while for the JIT to "warm up", the hints are there, why doesn't the JIT just optimize them the first time they are processed?

If there only was a way to hint the JIT that a piece of code is not written by humans but emitted by a compiler and it adheres to your hints it would be much faster and it could do the optimizations "Ahead of Time" instead of "Just in Time".

There's already a backward compatible way to hint JavaScript that a piece of code adheres to a restricted subset of JavaScript, "use strict", you could emit a similar backward compatible directive that a runtime may opt into and do the ahead of time optimizations, since the JavaScript you are emitting is replacing the assembly a compiler would emit at that stage you decide to call it asm.js, then the directive is defined as "use asm".

With this backward compatible changes you convince the Spidermonkey team to prototype the changes in a branch and the results are impressive, asm.js code runs only twice as slow as native code! After some extra rounds of optimizations you get it to 50% slower than native.

This is more than a toy, this may actually enable new types of applications on the web.

To show it to the world you decide to do a flashy demo, collaborating with Epic Games by compiling the Unreal 3 Engine to asm.js and running a 3D demo in the browser called Epic Citadel.

With the demo out you start working to specify asm.js and try to convince other browsers to adopt it.

But even if the results are impressive there are still problems, the amount of JavaScript generated is huge, a demo may generate over 40 MBs of asm.js, the bottleneck now moves to a new place, downloading and parsing that amount of code takes a while, on mobile it can take over 20 seconds just to parse!

Since asm.js is meant to be a compiler target and you are in the process of convincing other browser vendors to add a new capability to their engines you all agree that there's an opportunity to "do it correctly" and define a binary format that is more compact and can be decoded much faster.

By defining a binary format you also avoid the problem of JavaScript having to be two different things at the same time, a language for humans to write and a compile target. With a new binary format that can eventually diverge from JavaScript's semantics you can achieve the initial objective of enabling more applications to run on the web.

Since this is a compiler target for the web, and compilers usually emit assembly, the new project is named WebAssembly.

After some rounds of discussions all browser vendors get on board and a new collaboration is announced to standardize WebAssembly.