Skip to main content

Hi, I'm Mariano Guerra, below is my blog, if you want to learn more about me and what I do check a summary here: marianoguerra.github.io or find me on twitter @warianoguerra or Mastodon @marianoguerra@hachyderm.io

Bootstrapping OOP Part 3: Who Parses the Parser?

In Bootstrap post-collapse OOP technology with Wasm GC (Part 2) we implemented the minimum viable runtime in raw WebAssembly to run our prelude and bootstrap a basic OOP language.

But the prelude was provided to our runtime as text, and our runtime has no parser (yet), how do we feed the prelude it?

In Part 1 I embedded the following video:

The video description says:

Demonstration of the convoluted process of booting an Altair floppy disk using a Teletype and paper tape. This was required when the Disk Boot Loader PROM was not available in an Altair 8800

I'm pretty sure a paper tape reader and a floppy disk drive aren't going to be easy to find. Since the bootstrapping process is probably going to enter the Wasm implementation by hand, we can expect the capability to initialize its memory to some binary representation that our runtime can interpret.

The choice is between a data serialization format or bytecode for a small VM.

πŸ€·β€β™€οΈ Why not both?

You may have noticed that our language follows the idea of "code as data", a "program" is just a data structure that is evaluated according to some semantics specified in an environment.

Since code is data and data is code, loading a program into memory is the same as loading a data structure. Our data serialization format can also be used as the representation of a program.

VM Instructions

Our little VM is going to have 12 basic instructions:

  • PUSH_NIL
  • PUSH_1
  • PUSH_INT
  • PUSH_FLOAT
  • PUSH_STR
  • POP_NAME
  • POP_LATER
  • POP_PAIR
  • POP_SEND
  • POP_MSG
  • POP_ARRAY
  • POP_BLOCK

Let's explore them one by one:

PUSH_NIL: pushes the nil value onto the stack

PUSH_1: pushes the integer 1 onto the stack

Since 1 is going to be used as the value for true and a common value in itself it gets its own instruction.

PUSH_INT: push an integer

The value to push is in the next 8 bytes / 64 bits in the bytecode stream.

PUSH_FLOAT: push a float

Like PUSH_INT the value to push comes after the instruction, the value is encoded using the IEEE-754 Floating Point binary representation.

PUSH_STR: push a string

The immediate 64 bit value is split in two:

  • The top 32 bits represent the length of the string to read
  • The bottom 32 bits represent the byte offset where the string starts in memory

POP_PAIR: pops the two top values from the stack and pushes a pair with them back to the stack

The order in which the two items are encoded allows to load a cons list in the order the values are read in the bytecode stream, for example:

1 : 2 : 3

Is encoded as:

  • PUSH_INT 3
  • PUSH_INT 2
  • POP_PAIR
  • PUSH_1
  • POP_PAIR

POP_ARRAY: pops N items from the top of the stack and pushes an array with the items in reverse order.

This operation builds the array in reverse to allow the brave humans encoding programs by hand to enter the array items in order.

The instruction first pops the array length from the top of the stack, allocates an array of that size and then pops N items setting them in the array from back to front.

To create an array with the values 10, 20, 30 the instructions are:

  • PUSH_INT 10
  • PUSH_INT 20
  • PUSH_INT 30
  • PUSH_INT 3
  • POP_ARRAY

POP_BLOCK: same as POP_ARRAY but pushes a block back instead

POP_LATER: pops the top of the stack and pushes a Later wrapping the value back onto it

POP_NAME: pops a string from the top and pushes a name using the string back

POP_MSG: pops a value (obj) and a string (verb) and pushes a message back

POP_SEND: pops a value (obj) a string (verb) and another value (subj) and pushes a send back

Notice that all "composite" types except Pair encode the values in the byte stream in the natural order they are entered from left to right.

VM Implementation

With this design we can implement a set of functions that evaluate the instructions and operate on a stack, if the instructions are set up correctly the end result is a stack with a single item that represents the program or data structure we wanted to build.

The most basic function to implement is a function that takes a stack, an instruction and an optional immediate value and returns a new stack with the result of executing the instruction on the input stack.

Let's call it $vmEvalInstr, we can implement it using the br_table instruction which let's us implement a pretty basic and verbose switch statement.

On top of it we can implement a function that takes a stack and a program counter, reads the next instruction at the memory location pointed by the program counter, if the instruction needs an immediate value it loads it from the next memory position, calls $vmEvalInstr and returns the new stack and the new position for the program counter (since depending on the instruction it may advance one or more bytes).

Let's call it $vmEvalNextInstr. This function uses the multiple return feature of Webassembly to return both values.

If the instruction is not in the valid range of instructions the function returns the stack and program counter as they were passed, this means that any instruction outside the range of valid instructions works as the halt instruction.

Finally we want to run a complete program, for that we can define a function that takes a stack, a program counter where the program should start and runs the next instruction until the returned program counter is equal to the previous one.

This means the program encountered a bad/halt instruction, for consistency let's define a standard HALT instruction with the value 255, it has no special meaning or implementation, just a standard invalid instruction that marks the end of a program.

Let's call that function $vmEvalRun.

To make sure our program terminates we should append an invalid instruction at the end, just in case the remaining memory accidentally contains valid instructions (a memory initialized all with 0s is a valid program that pushes nils forever).

With these three functions and one for the implementation of each instruction we are ready to load our binary encoded prelude, but we haven't defined how the stack is implemented.

To take advantage of the code we already have we can implement it as a cons list using the Pair data type.

Encoding the Prelude

Now it's time to encode our prelude in binary.

Since I still have computers around I adapted the existing parser to generate an AST that can emit its binary representation: fatt.ast.js

It also does the string layout in memory for the PUSH_STR instruction for us.

With this adapted parser we can parse the prelude and get the binary bytecode back, the implementation above generates a bytecode stream 4676 bytes long.

Optimizing the Encoding for Input Happiness

Thinking about the person that may have to read and write the bytes manually we can do some small optimizations.

The first is to notice that the prelude is full of repeated symbols, the most common ones are it and that. If we define instructions for PUSH_SYM_IT and PUSH_SYM_THAT then the prelude takes 4140 bytes.

The next observation is that many integers are really small, using 8 bytes to write an integer or the length of something small is 7 bytes too many, the second optimization I implemented is to check if the instruction has the 8th bit set, if it does then the immediate read from memory is only one byte long.

I could have used a variable length encoding for numbers like Wasm does, but it would take more code to implement and more effort to encode by hand.

With this optimization the prelude takes 3811 bytes.

We could continue optimizing but you get the idea :)

Bootstrap post-collapse OOP technology with Wasm GC (Part 2)

In Bootstrap post-collapse OOP technology (Part 1) I sketched the minimal set of features a language has to provide to allow booting/bootstrapping the rest of our minimal object oriented language in itself.

But if we are bootstrapping post collapse, we have to run the bootstrapping "prelude" in some implementation on top of hardware we can find.

Collapse OS selected Z80, 8086, 6809 and 6502 machines.

In The Birth & Death of JavaScript (2014) Gary Bernhardt proposes a future where WebAssembly is the foundation of computing.

If we assume WebAssembly chips to be available we can consider writing an implementation for our language in raw WebAssembly.

If no Wasm native chips are available we can write some transpilers or emulators on top of the chips we can find while keeping a single implementation instead of multiple ones like Collapse OS does.

To bootstrap our stack we have to find/remember the language implementation, the WebAssembly bytecodes and hand translate the WebAssembly Text Format to the binary instructions, start our implementation, run the bootstrapping prelude and we are done.

The drifters are all former intellectuals. They have each memorized books should the day arrive that society comes to an end, with the survivors learning to embrace the literature of the past.

-- Fahrenheit 451

I may not be the one who remembers the whole implementation by memory, but surely I can write it again :)

Here is an implementation of the features needed to run the prelude in WebAssembly: fatt.wat.

It uses some features that are not yet part of the standard (GC, reference types) but are already available on most browsers (firefox and chrome-ish) with safari support on the way.

The implementation is ~1500 lines of Wasm Text, when compiled the wasm file weights 5.8KB, 2.9KB gzipped.

The implementation is complete and can run a slightly adapted version of the bootstrap code presented in the previous post.

The adaptation is only to split binding of values and handlers in two different messages to make the types work.

I also moved the implementation of < for () and = for Name from the primitive environment to the bootstrap code.

As a comparison, the rpython implementation described in The proof is in the meta-tracing JIT interpreter is 759 lines of rpython (excluding the parser) and the binary weights 6.3 MB with JIT and 619 KB without JIT.

If you want to play with it check the commands in the justfile, here are the tools and versions I used:

  • wasm-tools version 1.212.0 to compile the wat to wasm
  • deno version 1.45.5 to run the tests (with some V8 experimental flags)
  • wasm-opt version 116 to optimize the wasm file

Later versions should work fine, you may have to remove or modify some of the experimental flags.

If you would like to learn more about WebAssembly there's a book you may like: WebAssembly from the Ground Up

Bootstrap post-collapse OOP technology (Part 1)

Quoting collapseos.org:

Winter is coming and Collapse OS aims to soften the blow. It is a Forth operating system and a collection of tools and documentation with a single purpose: preserve the ability to program microcontrollers through civilizational collapse.

But how do we preserve the ability to program OOP through civilizational collapse?

In previous posts: Make Your Self, Macros and optimizations: it's just a phase and The proof is in the meta-tracing JIT interpreter I described the design and implementation of a minimal object oriented language with implementations in JavaScript and the PyPy toolchain.

In this one I will sketch the minimal set of features the language has to provide to allow booting/bootstrapping the rest of the language in itself.

For future reference this post is written with the code on this commit in mind:

From the primitive environment definition we can extract the set of methods that are required (at least for now) to define the rest:

For type Nil (the only falsy value):

  • eval returns itself
    • () πŸ‘‰ ()
  • < less than (always false)
  • = equals, returns 1 if the left hand side (subject) and right hand side (object) are both nil
    • () = () πŸ‘‰ 1

For Int and Float types the implementations work as you would expect:

  • eval returns itself
  • <
  • =
  • +, -, *, /

For String:

  • eval returns itself
  • <
  • =
  • + string concatenation
  • size returns the string length

For Name, a binding to a value in the environment:

  • eval does a lookup for the name in the environment and returns it
    • note that the name e always returns the current environment
    • foo πŸ‘‰ 10 (if foo was previously bound to 10)
  • name returns the name as a string
    • @foo name () πŸ‘‰ "foo"

For Later:

  • eval returns its value unevaluated
    • @ a πŸ‘‰ a (returns the name, doesn't evaluate it)
    • @(a + 2) πŸ‘‰ a + 2 (returns the message send, doesn't evaluate it)

For Pair:

  • eval returns a pair with a and b evaluated
  • a returns the left value
    • 10 : 20 a () πŸ‘‰ 10
  • b returns the right value
    • 10 : 20 b () πŸ‘‰ 20

For Array:

  • eval returns an array with all items evaluated
    • [1, 2, a] πŸ‘‰ [1, 2, 3] (if a was bound to 3)
  • size returns the array length
    • [1, 2, 3] size () πŸ‘‰ 3
  • . returns the array item at the provided index
    • [1, 2, 3] . 0 πŸ‘‰ 1
    • [1, 2, 3] . 2 πŸ‘‰ 3

For Block:

  • eval evals all items and returns the result of the last one
    • {1, 2, 3} πŸ‘‰ 3

For Msg, what you would call a method, consist of a "verb" and an "object":

  • eval returns a new message with the object evaluated
    • \ + a πŸ‘‰ \ + 42 (if a was bound to 42)
  • verb returns the verb (a string)
    • \ + 10 verb () πŸ‘‰ "+"
  • obj returns the object (can be of any type, even another message)
    • \ + 10 obj () πŸ‘‰ 10

For Send, what you would call a method call, consists of a "subject" and a message like 1 + 2:

  • eval

    • evaluates the subject
    • evaluates the message
    • enters a new stack frame
    • binds the evaluated subject to it
    • binds the evaluated message to msg
    • binds the evaluated object to that
    • sends the message to the subject in the new environment
  • subj returns the subject

    • @(1 + 2) subj () πŸ‘‰ 1
  • msg returns the message
    • @(1 + 2) msg () πŸ‘‰ \ + 2

For Frame:

  • eval returns itself
    • e πŸ‘‰ returns the current environment
  • up returns the frame parent
    • e up () πŸ‘‰ returns the parent of the current environment
  • eval-in evaluates the object in the subject environment
    • e eval-in 42 πŸ‘‰ 42
  • bind binds the value b with the name in a, the subject must be a pair
    • e bind "answer" : 42 πŸ‘‰ returns the environment
  • find does a lookup in the subject environment for the object
    • e find "answer" πŸ‘‰ the value bound to answer in e
    • 42 in the bind above
  • get-type returns the type of the subject
    • e get-type 1 πŸ‘‰ a value representing the type for Int
    • e get-type e πŸ‘‰ a value representing the type for Frame
  • new-frame returns an empty root environment
    • e new-frame () πŸ‘‰ a new Frame instance

With those message handlers in the environment we can start our booting process in the language itself, first we need a way to bind new message handlers:

e find (e get-type e)
  bind "reply" : @{
    e find (e get-type (e eval-in (msg obj() a() subj())))
      bind ((msg obj() a() msg() verb()) : (msg obj() b())),
    ()
  },

This one is the most complex so don't worry if you don't understand it :)

The trick is to find in the environment the value bound to the type of the environment itself (Frame), you can think of it as the prototype of the type Frame (which is itself an instance of the Frame type 🐒).

We send the message bind "reply" : @{...} to the returned value to bind an implementation of the message reply when sent to an instance of type Frame (like the current environment).

The implementation of reply expects a pair:

  • the left side should be a message "example" of the type of message we want to reply to
  • the right side is the implementation, a value to evaluate to reply to that message

The high level implementation of reply is

e find $subjectType bind $verb : $implementation.

We get the $subjectType from the subject on the message in the argument's left side.

For example in e reply @(1 add _) : 42 we want to register an implementation for the message add for the type int, the implementation will always return 42.

We get the type by evaluating the subject 1 and asking for its type:

(e get-type (e eval-in (msg obj() a() subj())))

We get the $verb from the message on the left side of the argument:

((msg obj() a() msg() verb())

And the $implementation from the right value:

(msg obj() b()))

The message's object is ignored, most of the time it will be a placeholder like _.

In a different implementation we could bind "multimethods" by using the object's type and registering the message handlers for the pair of types in the subject and object, but to keep it simple we just use the subject's type.

Now we can define methods!

The first one is the message -> sent to the Send type, it's a convenience method to turn:

e reply @(()  not _) : 1,

Into:

@(()  not _) -> 1,

Which in my opinion looks much nicer, the implementation:

e reply @(@(subj verb obj) -> body) : @(e reply it : that),

You can see it like pattern matching, if you see a message like this:

@(subj verb obj) -> body

Run this:

e reply it : that

Remember that the message subject is bound to it and the object to that.

With our nice syntax to define methods we are ready to start growing our language, let's start by defining a way to bind variables, the syntax is going to be @myVar is 42, which will bind 42 to the name myVar:

@(@a is _) -> @(e up () bind (it name ()) : that)

Because a call to a method enters its own frame we want to bind the variable in the parent's frame, since after returning from this method the current frame will be discarded and with it all local bindings, that's why we do e up () and then we bind the object (that) to the name of the variable in the subject.

Those are the most complex and low level implementations, now let's move to boolean operators. We don't need any in our primitive environment because we can define different implementations for different types.

Since () is the only falsy value its implementation has to return any truthy value, we choose to return 1, but it could be 0 or anything else.

For the other types we return (), since the negation of any non-falsy value is a falsy value.

@(()  not _) -> 1,
@(0   not _) -> (),
@(0.0 not _) -> (),
@(""  not _) -> (),

Moving to boolean or, if the left hand side is falsy we evaluate the right hand side, this implementation is needed for nil, since it's the only falsy value.

We evaluate the right hand side to allow for short-circuit semantics if the right hand side is wrapped in a Later, in Macros and optimizations: it's just a phase I show how to write a phase that automatically wraps the right hand side for us to get short-circuit booleans without any special syntax or special core language semantics.

For all other types or returns the left hand side:

@(()  or _) -> @(e eval-in that),
@(0   or _) -> @it,
@(0.0 or _) -> @it,
@(""  or _) -> @it,

Boolean and is the reverse, for nil it returns nil without evaluating the right hand side, for all other values it returns the result of evaluating the right hand side:

@(()  and _) -> (),
@(0   and _) -> @(e eval-in that),
@(0.0 and _) -> @(e eval-in that),
@(""  and _) -> @(e eval-in that),

Simple conditionals can be implemented with the ternary operator, the message shape is condition ? ifTrue : ifFalse, the implementation for nil evals the right side of the pair (again, to allow Later and avoid evaluating the side not taken), the rest evaluate the left side:

@(()  ? _) -> @(e eval-in (that b ())),
@(0   ? _) -> @(e eval-in (that a ())),
@(0.0 ? _) -> @(e eval-in (that a ())),
@(""  ? _) -> @(e eval-in (that a ())),

Not equals can be implemented in terms of the primitive = and negation, which we defined above:

@(()  != _) -> @(it = that not()),
@(0   != _) -> @(it = that not()),
@(0.0 != _) -> @(it = that not()),
@(""  != _) -> @(it = that not()),

Greater or equals can be implemented in terms of the primitives = and < and the boolean operators defined above, we do some extra work to return the left hand side if the comparison is true to allow chaining comparisons like 3 >= 2 >= 1 πŸ‘‰ 3:

@(()  >= _) -> @(that = it),
@(0   >= _) -> @((it = that) or (that < it) and it),
@(0.0 >= _) -> @((it = that) or (that < it) and it),
@(""  >= _) -> @((it = that) or (that < it) and it),

Similar implementation for less or equals:

@(()  <= _) -> @(that = it),
@(0   <= _) -> @((it < that) or (it = that)),
@(0.0 <= _) -> @((it < that) or (it = that)),
@(""  <= _) -> @((it < that) or (it = that)),

And greater than:

@(()  > _) -> (),
@(0   > _) -> @((it <= that not()) and it),
@(0.0 > _) -> @((it <= that not()) and it),
@(""  > _) -> @((it <= that not()) and it),

We can define empty? for strings and arrays in terms of size and =

@("" empty?()) -> @(it size () = 0),
@([] empty?()) -> @(it size () = 0),

And some sample code to test our implementations:

[
  () not(), 1 not(), 1.0 not(), "" not(),

  () or 10, 10 or 11, 1.5 or 2, "" or 1,

  () and 1, 1 and 2, 2.5 and 3, "" and 4,

  ()  ? 1 : 2,
  1   ? 3 : 4,
  1.1 ? 5 : 6,
  "!" ? 7 : 8,

  () != (), 1 != 1, 1.1 != 1.1, "" != "",
  () != 0, 1 != 2, 1.1 != 1.2, "" != ".",

  () >= (), 1 >= 1, 1.1 >= 1.1, "a" >= "a",
  3 >= 2, 1.3 >= 1.2, "c" >= "b",
  1 >= 2, 1.1 >= 1.2, "a" >= "b",

  () <= (), 1 <= 1, 1.1 <= 1.1, "a" <= "a",
  3 <= 2, 1.3 <= 1.2, "c" <= "b",
  1 <= 2, 1.1 <= 1.2, "a" <= "b",

  () > (), 1 > 1, 1.1 > 1.1, "a" > "a",
  3 > 2, 1.3 > 1.2, "c" > "b",
  1 > 2, 1.1 > 1.2, "a" > "b",

  "" empty?(),
  [] empty?(),
  "1" empty?(),
  [1] empty?()
]

And now we can move on to bootstrap civilization the OOP way :)

The proof is in the meta-tracing JIT interpreter

In the previous posts: Make Your Self and Macros and optimizations: it's just a phase I described the design and implementation of a minimal object oriented language with the implementation done in JavaScript.

When we design our language on top of a high level language we may reuse accidentally or on purpose a lot of the host language semantics and make it hard to tell if our design is our own or we are just an alternative syntax on top of the host language.

To make sure the design is sound, to show an alternative way of implementing the language and to get a nice native JIT interpreter I did what anybody would do.

I implemented an interpreter on top of rpython, the toolchain that allows pypy to be a really fast Python interpreter implemented in "Python".

Before you say "you are trying to avoid the problem of using a high level language like JavaScript by using Python?", the quotes around Python are because the language used to implement the interpreter is a restricted subset of Python called rpython, short for "restricted python" which defines itself as: "RPython is everything that our translation toolchain can accept". In the end rpython is a language that looks like a subset of python but restricts programs written with it to be statically typed, it's just that all the types are inferred, you can read more about the restrictions here: RPython Language.

The implementations is in the mclulang/pypyfatt folder, follow the instructions in the readme if you want to build it yourself, if you want to believe me, it does everything fatt.cli.js does, the parsers may have some corner cases between them but I will fix them as I see them.

RPython build output

I came for the meta-tracing JIT compiler, I stayed for the fractals

Macros and optimizations: it's just a phase

This post builds upon the previous one where I built the smallest OOP language I could based only on message dispatch. Everything is as late bound as possible, even the language semantics, the core only has basic data types, a call stack and logic for message dispatch.

In the previous post I implemented the semantics for evaluating expressions with enough message handlers to show arithmetic operations and binding lookups, here we are going to show that those semantics are not special, they can be just a phase in a sequence of phases that evaluate the same expression tree using different semantics.

To show an example of this, in this post I'm going to add two phases before what we will now call the run phase:

  • macro phase: it quotes the right thing ℒ️ (if not already quoted) in reply handler definitions, name bindings and implements support for short circuit boolean operators by quoting the right hand side. When this phase is present the programmer doesn't have to quote anything.

  • opt phase: it applies some basic optimizations like evaluating constant sub expressions, removing unnecessary wraps and so on.

But an important idea here is that the fixed "compile time" and "runtime" phases in most programming languages is an arbitrary restriction and we can have as many "times" as we want.

Note that the macros and optimizations implemented here assume semantics from the run phase, since the language core has no semantics, only message dispatch, there can be many implementations of semantics for the run phase, this means that any phase that comes before assumes it's running in a sequence of phases that share the same language semantics.

In a less abstract way, the new phases assume name binding and lookup; arithmetic, comparison, boolean operations and conditionals behave as you would expect, which is what's implemented in the current run phase.

The macro phase assumes that replies and is have a specific meaning, that and and or are short circuit boolean operations, otherwise the wrapping it does makes no sense. Also that ints, floats and strings evaluate to themselves with no side effects.

The opt phase assumes arithmetic, comparison and boolean operations work as expected, otherwise it can't replace them with their result.

The key take away from this post and one of the objectives of this language is to show that you can not only play with language semantics by implementing your own run phase, but also to add as many extra phases as you want.

Nothing special about this phases, they run with the following code:

function main(code) {
  console.log("> ", code);
  console.log("");

  const eToStr = bindReplies(mergeToStr({})).right();
  runPhases(
    code,
    [
      { name: "macro", e: macroPhase().right() },
      { name: "opt", e: optPhase().right() },
      { name: "run", e: runPhase().right() },
    ],
    ({ name }, input, output) => {
      console.log("#  ", name);
      console.log("in ", toStr(input, eToStr));
      console.log("out", toStr(output, eToStr));
      console.log("");
    },
  );
}

You can see the array as a second argument to runPhases, just replace them with your own and have fun!

Before starting I want to note that all the code in this post is based on the following commit: 7302325, the code in the repo may change in the future and not work the same way as in this post.

Also notice that the modules here import from fatter.js instead of fatt.js, the only difference is that fatter.js has better error handling and an extra method in Frame: getSendHandler.

We are going to see examples and explain what happens on each phase, all examples are going to look like this:

>  1 + 2

#   macro
in  1 + 2
out 1 + 2

#   opt
in  1 + 2
out 3

#   run
in  3
out 3

The first line is the input, then for each phase you will see the name of the phase in the first line (macro, opt, run) then the input to the phase (in) followed by the output (out).

In the example above the macro does nothing:

#   macro
in  1 + 2
out 1 + 2

The opt phase evaluates 1 + 2 since it's a constant message send:

#   opt
in  1 + 2
out 3

The run phase gets 3 as input which evaluates as itself.

#   run
in  3
out 3

Below the same example for floats:

>  1.5 + 1.2

#   macro
in  1.5 + 1.2
out 1.5 + 1.2

#   opt
in  1.5 + 1.2
out 2.7

#   run
in  2.7
out 2.7

The optimization for both Int and Float is the same, we define reply handlers for all arithmetic and comparison operations in the opt phase using cBinOp and cCompOp helpers:

    "+": cBinOp((a, b) => a + b),
    "-": cBinOp((a, b) => a - b),
    "*": cBinOp((a, b) => a * b),
    "/": cBinOp((a, b) => a / b),
    "=": cCompOp((a, b) => a === b),
    "!=": cCompOp((a, b) => a !== b),
    ">": cCompOp((a, b) => a > b),
    ">=": cCompOp((a, b) => a >= b),
    "<": cCompOp((a, b) => a < b),
    "<=": cCompOp((a, b) => a <= b),

The helpers do almost the same thing:

  function cBinOp(fn) {
    return (s, m) => {
      if (typeof m.obj === typeof s) {
        return fn(s, m.obj);
      } else {
        return new Send(s, m);
      }
    };
  }

  function cCompOp(fn) {
    return (s, m) => {
      if (typeof m.obj === typeof s) {
        return fn(s, m.obj) ? s : NIL;
      } else {
        return new Send(s, m);
      }
    };
  }

cBinOp checks if the subject and the message object are of the same type (which will be the type of the type handling the message) and if so it calls fn with both.

If they are not of the same type (for example in the expression 1 + a) it returns the original message Send, but notice that since this phase, like a normal "run" phase, evaluates the subject and message before calling the handler, it means that both subject and object are going to be potentially optimized before the new Send(s, m) is constructed.

This also means that if an optimization happened on the subject or object that turned them into constants this one will optimize the constant expression even further.

For example, the expression 1 + 2 + 4 is two chained message sends: (1 + 2) + 4:

  • evals the first send (1 + 2)
    • which evals the subject 1
    • and the object 2
    • calls the handler for the verb + since its defined in the opt phase with 1 as subject and 2 as object
    • this returns 3 since the expression is constant
  • with 3 as the result it sends + 4 to it
    • which evals the subject 3
    • and the object 4
    • calls the handler for the verb + with 3 as subject and 4 as object
    • this returns 7 since the expression is constant

In this language comparisons return () if the comparison is "false" but return the subject if "true":

>  1 > 2

#   macro
in  1 > 2
out 1 > 2

#   opt
in  1 > 2
out ()

#   run
in  ()
out ()

This allows chaining of comparisons:

>  1 < 2 < 3

#   macro
in  1 < 2 < 3
out 1 < 2 < 3

#   opt
in  1 < 2 < 3
out 1

#   run
in  1
out 1
>  3 > 2 > 1

#   macro
in  3 > 2 > 1
out 3 > 2 > 1

#   opt
in  3 > 2 > 1
out 3

#   run
in  3
out 3
>  1 > 3 < 2

#   macro
in  1 > 3 < 2
out 1 > 3 < 2

#   opt
in  1 > 3 < 2
out ()

#   run
in  ()
out ()

By having the following replies for comparison operations on Nil in the optimization phase:

    ">": () => NIL,
    ">=": () => NIL,
    "<": () => NIL,
    "<=": () => NIL,

The same optimization works for strings:

>  "hello " + "joe"

#   macro
in  "hello " + "joe"
out "hello " + "joe"

#   opt
in  "hello " + "joe"
out "hello joe"

#   run
in  "hello joe"
out "hello joe"

But only implemented for the + message:

    "+": cBinOp((a, b) => a + b),

In the next example we can see that the optimization works even when the addition is a value inside a Pair:

>  (1 + 2) : 3

#   macro
in  (1 + 2) : 3
out (1 + 2) : 3

#   opt
in  (1 + 2) : 3
out 3 : 3

#   run
in  3 : 3
out 3 : 3

That's because eval for Pair in the opt phase is:

  eval: (s, _, e) => new Pair(e.eval(s.a), e.eval(s.b))

Which comes from the merge with the ident phase, it evaluates both sides of the Pair and returns a new Pair with them.

It also works inside Later:

>  @ (1 + 2)

#   macro
in  @(1 + 2)
out @(1 + 2)

#   opt
in  @(1 + 2)
out 3

#   run
in  3
out 3

The opt phase not only evaluates 1 + 2 but also unwraps the Later since it's unnecesary for a constant expression.

eval for Later in the opt phase is:

  eval(s, _, e) {
    const v = e.eval(s.value);
    return isConstantExpr(v) ? v : new Later(v);
  },

You should start to see a pattern already, the opt phase is similar to the run phase but it only evaluates an expression if it can optimize it, if not it returns the original. Like run it also walks the whole expression "tree" until the "leafs".

For Msg's obj:

>  {@foo is (1 + 42), foo}

#   macro
in  {@(foo) is (1 + 42), foo}
out {@(foo) is (1 + 42), foo}

#   opt
in  {@(foo) is (1 + 42), foo}
out {@(foo) is 43, foo}

#   run
in  {@(foo) is 43, foo}
out 43

and

>  (0 add+1 (0 + 2)) replies (0 + 1 + it + that)

#   macro
in  0 add+1 (0 + 2) replies (0 + 1 + it + that)
out @(0 add+1 (0 + 2)) replies @(0 + 1 + it + that)

#   opt
in  @(0 add+1 (0 + 2)) replies @(0 + 1 + it + that)
out @(0 add+1 2) replies @(1 + it + that)

#   run
in  @(0 add+1 2) replies @(1 + it + that)
out ()

Here the opt phase optimized 0 + 2 and 0 + 1 but also wraps the subject and object of the replies message, this is because replies has a handler in the macro phase:

  // no need to eval here since we were called by send
  replies: (s, m) =>
    new Send(new Later(s), new Msg(m.verb, new Later(m.obj)))

eval for Send in the opt phase is:

  eval: (s, _m, e) => {
    const subj = e.eval(s.subj),
      msg = e.eval(s.msg);
    if (e.getSendHandler(subj, msg)) {
      return e.send(subj, msg);
    } else {
      return new Send(subj, msg);
    }
  }

This one is a little more interesting, it first evaluates the subject and message and then checks if there's a handler for the message, if there is, it evaluates the message send and returns the result, if there's no message handler it builds a new send with the the evaluated subj and msg.

This is a big difference between this two phases and the run phase, this ones leave the send as it was if they don't know what to do with it, the run phase raises an error in that case.

Another way of seeing it is that this phases evaluate what they know how to handle and leave the rest "as is". More specifically, the macro phase expands what it knows how to expand and the optimization phase reduces what it knows how to reduce.

Optimizations inside Block:

>  {1 + 2, 42, ()}

#   macro
in  {1 + 2, 42, ()}
out {1 + 2, 42, ()}

#   opt
in  {1 + 2, 42, ()}
out {3, 42, ()}

#   run
in  {3, 42, ()}
out ()

Because eval for Block in the opt phase is:

  eval: (s, _m, e) => new Block(s.value.map((item) => e.eval(item)))

The difference between eval in the opt phase and the run phase is that in the run phase it evals every item but returns the result of the last one.

In the opt phase it evals all items and returns a new Block with the evaluated items.

The eval implementation for Block in the opt phase is almost the same as the eval implementation for Array in the run phase, the only difference is that the collection is different for each.

We could do a further optimization where if all items but the last are side effect free we could return the last one because we know it won't make a difference in semantics.

A similar optimization would be to filter all Block items that are constant expressions, since their evaluation doesn't affect the result, followed by a check to see if the resulting block has one item, in that case we don't wrap it back in a Block, if the result is an empty block we return () which is the result of evaluating an empty Block.

Another optimization is to pre evaluate conditional expressions where the condition is a constant, if it's Nil we can replace it with the optimization of the second item in the Pair:

>  () ? 1 : 2

#   macro
in  () ? 1 : 2
out () ? @(1 : 2)

#   opt
in  () ? @(1 : 2)
out 2

#   run
in  2
out 2

Nil has this handler for ?:

  "?": ternaryFalse

The other constants (Int, Float, Str) have this one:

  "?": ternaryTrue

Why those 3 and not all others? after all the only falsy value is Nil...

It's the same thing we talked about in the Block optimization, we should check that all other truthy expressions are side effect free if we want to optimize them out and keep the same semantics, something doable but not done here.

Notice that the macro phase did something too:

#   macro
in  () ? 1 : 2
out () ? @(1 : 2)

It noticed that the "body" of the conditional was not "quoted" with Later and it wrapped it with a @, here's the reply for ? in all types in the macro phase:

  "?": ternaryWrap

And the implementation of ternaryWrap:

  const ternaryWrap = (s, m, _e) =>
    new Send(s, new Msg(m.verb, pairToLaterOrLaterPair(m.obj)));

But it does more than just wrapping it if it's not a Later already:

function pairToLaterOrLaterPair(v) {
  if (v instanceof Pair) {
    return new Later(v);
  } else if (v instanceof Later) {
    if (v.value instanceof Pair) {
      return v;
    } else {
      console.error(
        "Expected pair or later of pair, got later of",
        getType(v.value),
        "fixing",
      );
      return new Later(new Pair(v.value, NIL));
    }
  } else {
    console.error("Expected pair or later of pair, got", getType(v), "fixing");
    return new Later(new Pair(v, NIL));
  }
}

It wraps a Pair and forces it to be a Pair if it's not one even if it's already wrapped in a Later.

This fixing could be its own phase, like eslint --fix.

An example showing the case for truthy values:

>  42 ? 1 : 2

#   macro
in  42 ? 1 : 2
out 42 ? @(1 : 2)

#   opt
in  42 ? @(1 : 2)
out 1

#   run
in  1
out 1

A "fix and wrap":

>  {@a is 1, a ? 1}

#   macro
in  {@(a) is 1, a ? 1}
out {@(a) is 1, a ? @(1 : ())}

#   opt
in  {@(a) is 1, a ? @(1 : ())}
out {@(a) is 1, a ? @(1 : ())}

#   run
in  {@(a) is 1, a ? @(1 : ())}
out 1

A "fix" in an already wrapped body:

>  {@a is 1, a ? @ 1}

#   macro
in  {@(a) is 1, a ? @(1)}
out {@(a) is 1, a ? @(1 : ())}

#   opt
in  {@(a) is 1, a ? @(1 : ())}
out {@(a) is 1, a ? @(1 : ())}

#   run
in  {@(a) is 1, a ? @(1 : ())}
out 1

The next examples shows macro and opt phase handling of short circuit boolean operations and and or:

>  () and 1

#   macro
in  () and 1
out () and @(1)

#   opt
in  () and @(1)
out ()

#   run
in  ()
out ()

The macro phase wraps the right hand side of boolean operations to make them "short circuit"

With the following reply handlers for both in all types:

  and: andWrap,
  or: orWrap,

The implementations of andWrap and orWrap are the same:

  function lazyRhs(s, m, _e) {
    return new Send(s, new Msg(m.verb, maybeWrapLater(m.obj)));
  }

  const andWrap = lazyRhs,
    orWrap = lazyRhs;

The optimization phase works similarly to the optimization for arithmetic and comparison operations, it evaluates the send expression if the subject is a constant and it's enough to know what to do with the operation:

For truthy constant values the reply handler is:

    and: trueAnd,
    or: trueOr,

With the implementations of trueAnd and trueOr being:

function maybeUnwrapLater(v) {
  return v instanceof Later ? v.value : v;
}

function trueAnd(_s, m, _e) {
  return maybeUnwrapLater(m.obj);
}

function trueOr(s) {
  return s;
}

This means that if the subject is true for and it returns the right side unwrapped if it's a Later.

If the subject is true for or it returns it directly, no need to evaluate the right hand side.

The optimization that applies here is the unwrapping of the right hand side if it's inside a Later which allows the next optimization to apply, which evaluates constant expressions, more examples:

>  1 or 2 and 3

#   macro
in  1 or 2 and 3
out 1 or @(2) and @(3)

#   opt
in  1 or @(2) and @(3)
out 3

#   run
in  3
out 3
>  () or 2 and 3

#   macro
in  () or 2 and 3
out () or @(2) and @(3)

#   opt
in  () or @(2) and @(3)
out 3

#   run
in  3
out 3
>  () or 2 or 3

#   macro
in  () or 2 or 3
out () or @(2) or @(3)

#   opt
in  () or @(2) or @(3)
out 2

#   run
in  2
out 2

Now let's see another combination of macro and optimization phase:

>  {@a is 1, a and (1 + 2 + a) or (a * 3)}

#   macro
in  {@(a) is 1, a and (1 + 2 + a) or (a * 3)}
out {@(a) is 1, a and @(1 + 2 + a) or @(a * 3)}

#   opt
in  {@(a) is 1, a and @(1 + 2 + a) or @(a * 3)}
out {@(a) is 1, a and @(3 + a) or @(a * 3)}

Here the macro phase quotes the Name in an is expression, short circuits the boolean operations and optimizes constant arithmetic expressions.

The reply handler for the message is in Name in the macro phase is:

  is: (s, m, _e) => new Send(new Later(s), m)

This last example should show you how optimizations compose because the evaluation order on all phases follows the evaluation order of the final run phase.

While there may be further optimizations it's interesting to note that both the macro and opt phases shown above are ~230 lines of code.

Some closing notes on the idea:

  • These two phases can either run "just in time" before run on each execution or they can be run once at "build time" and the result of the last one stored as the "optimized build".
  • There's no "Abstract Syntax Tree" and there are no "Intermediate Representations"
    • The "tree" is just a composition of the runtime value types
    • Each phase takes that "tree" as input and produces the same or a modification as output, all types in all phases are the same types the programmer uses.
    • πŸ’ is this πŸ¦‹ homoiconic?
  • There's an extra phase used in this post, did you notice it? to convert the inputs and outputs for each phase to string we run a phase that takes an expression as input and returns a string as output.

Make Your Self

In Search of Maxwell's equations of Object Oriented Software

Motivation

Yes, that was the big revelation to me when I was in graduate schoolβ€”when I finally understood that the half page of code on the bottom of page 13 of the Lisp 1.5 manual was Lisp in itself. These were β€œMaxwell’s Equations of Software!” This is the whole world of programming in a few lines that I can put my hand over.

-- A Conversation with Alan Kay

Which are Maxwell's equations of Object Oriented software?

This is an attempt at answering that question based on the following:

OOP to me means only messaging, local retention and protection and hiding of state-process, and extreme late-binding of all things.

-- Dr. Alan Kay on the Meaning of β€œObject-Oriented Programming”

I think our confusion with objects is the problem that in our Western culture, we have a language that has very hard nouns and verbs in it. Our process words stink. It's much easier for us when we think of an objectβ€”and I have apologized profusely over the last twenty years for making up the term object-oriented, because as soon as it started to be misapplied, I realized that I should have used a much more process-oriented term for it.β€”The Japanese have an interesting word, which is called ma. Spelled in English, just ma. Ma is the stuff in-between what we call objects. It's the stuff we don't see, because we're focused on the nounness of things rather than the processness of things. Japanese has a more process-feel oriented way of looking at how things relate to each other. You can always tell that by looking at the size of [the] word it takes to express something that is important. Ma is very short. We have to use words like interstitial or worse to approximate what the Japanese are talking about.

-- Alan Kay at OOPSLA 1997: The Computer Revolution has not Happened Yet

Don't Bury the Lede

At the end of this post you will understand the design and code that allows you to write and understand why this:

{
  @MyType is ("my-type" as-type ()),
  @name is "Joe",
  MyType apply-to name,
  @(name say-hello ()) replies @("Well, hello " + it),
  (name say-hello ()) : ("Mike" say-hello ())
}

Evaluates to:

"Well, hello Joe" : "Hello, Mike!"

With a runtime that consists of a Frame class that allows to bind values to names and find the value for a given name plus this two methods:

class Frame {
    // ...
    eval(v) {
      return this.send(v, new Msg("eval", this));
    }
    send(s, m) {
      return this.find(getType(s))
        .find(m.verb)
        .call(null, s, m, this);
    }
    // ...
}

And the following data types as the smallest set to make it all work:

  • Nil: similar to nil, null or Unit, the only falsy value
  • Int: Javascript's BigInt
  • Name: a variable name
  • Pair: a pair of values, like cons
  • Msg: similar to a method, a name and an argument
  • Send: like a method call, a method sent to a value
  • Later: like Lisp's quote
  • Block: like Lisp's progn, a sequence of instructions that returns the result of the last one
  • Symbol: Javascript's Symbol

A runnable version of this post can be found here: fatt.test.js (fatt: frame all the things)

A full implementation of the language with extra support for Array and Map in 100 lines for the runtime and 73 lines for the parser can be found here fatt.js.

A CLI to evaluate expressions can be found here fatt.cli.js, here are some expressions you can evaluate and their results:

./fatt.cli.js '42' '10 + 2' '10 + 2 - 5 * 2' '1 : 2.5 : () : "hi"' '() ? @ "true" : "false"' '\ + 2' '@(1 + 2)' '[]' '[1, 1.4, ["hi", ()]]' '"my-type" as-type ()' '@{42, ()}' '#{1: "one", "two": 2}' '#{} . "foo"' '#{"foo": "bar"} . "foo"' '{@foo is 42, foo}' '{@(0 add+1 0) replies @(it + that + 1), 1 add+1 2}'
>  42
42

>  10 + 2
12

>  10 + 2 - 5 * 2
14

>  1 : 2.5 : () : "hi"
1 : 2.5 : () : "hi"

>  () ? @ "true" : "false"
"false"

>  \ + 2
\ + 2

>  @(1 + 2)
1 + 2

>  []
[]

>  [1, 1.4, ["hi", ()]]
[1, 1.4, ["hi", ()]]

>  "my-type" as-type ()
("my-type" as-type ())

>  @{42, ()}
{42, ()}

>  #{1: "one", "two": 2}
#{1: "one", "two": 2}

>  #{} . "foo"
()

>  #{"foo": "bar"} . "foo"
"bar"

>  {@foo is 42, foo}
42

>  {@(0 add+1 0) replies @(it + that + 1), 1 add+1 2}
4

Bindings

If we had a calculator language and wanted to upgrade it to make it closer to a programming language probably the first feature would be the ability to give names to values and look them up later.

In programming languages bindings are stored in Frames in a call stack, let's try a simple example in JavaScript:

{
  let a = 10;
  let b = 20;

  test("single scope bindings", () => {
    expect(a).toBe(10);
    expect(b).toBe(20);
  });
}

Frame 1

Let's start with a simple Frame class that holds bindings in a Map and has two operations:

  • bind(name, value): store a value associated to a name
  • find(name): return the value associated with a name or undefined if not found
{
  class Frame {
    constructor() {
      this.binds = new Map();
    }

    bind(name, value) {
      this.binds.set(name, value);
      return this;
    }

    find(name) {
      return this.binds.get(name);
    }
  }

  test("single scope bindings implementation", () => {
    const env = new Frame().bind("a", 10).bind("b", 20);
    expect(env.find("a")).toBe(10);
    expect(env.find("b")).toBe(20);
  });
}

But in most languages bindings are not all in a single global namespace, in languages like JavaScript binding lookup starts in the current scope and continues in outer scopes:

{
  let a = 10;
  let b = 20;

  {
    let b = 30;

    test("nested scopes", () => {
      expect(a).toBe(10);
      expect(b).toBe(30);
    });
  }
}

Frame 2

To replicate this our new implementation of Frame gains a new attribute: up.

find starts in the current scope and if the binding is not found it continues in the scope referenced by up until the binding is found or up is null.

The method down enters a new Frame with its up attribute set to the current instance.

{
  class Frame {
    constructor(up = null) {
      this.up = up;
      this.binds = new Map();
    }

    bind(name, value) {
      this.binds.set(name, value);
      return this;
    }

    find(name) {
      const v = this.binds.get(name);
      if (v === undefined && this.up !== null) {
        return this.up.find(name);
      } else {
        return v;
      }
    }

    down() {
      return new Frame(this);
    }
  }

  test("nested scopes implementation", () => {
    const env = new Frame()
                  .bind("a", 10).bind("b", 20)
                  .down().bind("b", 30);

    expect(env.find("a")).toBe(10);
    expect(env.find("b")).toBe(30);
  });
}

But binding lookup stops for a second reason other than up being null, let's see it with an example:

{
  function f1() {
    let a = 10;
    let b = 20;
    f2();
  }

  function f2() {
    let b = 30;
    return a + b;
  }

  test("call frames", () => {
    expect(f1).toThrow();
  });
}

Frame 3

a is not available in f2 even if it was called from f1 where it was defined, this is because binding lookup stops at call frames.

We can implement this by adding a marker attribute upLimit that makes the lookup stop:

{
  class Frame {
    constructor(up = null) {
      this.up = up;
      this.upLimit = false;
      this.binds = new Map();
    }

    bind(name, value) {
      this.binds.set(name, value);
      return this;
    }

    find(name) {
      const v = this.binds.get(name);
      if (v === undefined) {
        if (this.upLimit || this.up === null) {
          return v;
        } else {
          return this.up.find(name);
        }
      } else {
        return v;
      }
    }

    down() {
      return new Frame(this.left, this);
    }

    setUpLimit() {
      this.upLimit = true;
      return this;
    }
  }

And test it:

  test("call frames implementation", () => {
    const env = new Frame()
      .bind("a", 10)
      .bind("b", 20)
      .down()
      .setUpLimit()
      .bind("b", 30);

    expect(env.find("a")).toBe(undefined);
    expect(env.find("b")).toBe(30);
  });
}

Even when binding lookup stops at the first call frame boundary there are two simple examples showing that the lookup continues "somewhere else":

{
  let a = 10;

  function f() {
    return a;
  }

  test("top level and prelude bindings", () => {
    expect(f()).toBe(10);
    expect(parseInt("42", 10)).toBe(42);
  });
}

In the first case after stopping at the first call frame it "continues" the lookup with bindings available at the top level (module) scope.

It the second case it finds a value that is not bound in our program: parseInt.

This is one of the bindings that are available everywhere without the need to include them, in JavaScript you may call it the window object, in other languages it is described as a set of bindings that are automatically imported on every module, or prelude for short.

If the "look up" stops at the call frame then after reaching that point it has to go somewhere else. We could say that module and "prelude" bindings are bound "before" the bindings in the call stack. In many cultures the past is to the left, so let's continue there.

Let's add a left attribute to our Frame class and make it work in a similar way to up, start the lookup in the current Frame and continue up until upLimit or until up is null, then continue left until leftLimit or until left is null.

The right method is similar to the down method but it returns a new Frame instance that has the current frame as its left and up set to null.

We redefine down to return a new Frame instance where left is the same as the left of the current frame and up is the current frame itself.

Frame 4

{
  class Frame {
    constructor(left = null, up = null) {
      this.up = up;
      this.left = left;
      this.upLimit = false;
      this.leftLimit = false;
      this.binds = new Map();
    }

    bind(name, value) {
      this.binds.set(name, value);
      return this;
    }

    find(name) {
      const v = this.binds.get(name);
      if (v === undefined) {
        if (this.upLimit || this.up === null) {
          if (this.leftLimit || this.left === null) {
            return v;
          } else {
            return this.left.find(name);
          }
        } else {
          return this.up.find(name);
        }
      } else {
        return v;
      }
    }

    down() {
      return new Frame(this.left, this);
    }

    right() {
      return new Frame(this, null);
    }

    setUpLimit() {
      this.upLimit = true;
      return this;
    }

    setLeftLimit() {
      this.leftLimit = true;
      return this;
    }
  }

  {
    test("prelude implementation", () => {
      const env = new Frame()
        .bind("parseInt", parseInt)
        .right()
        .bind("a", 10)
        .right()
        .down()
        .setUpLimit();

      expect(env.find("parseInt")("42", 10)).toBe(42);
      expect(env.find("a")).toBe(10);
    });
  }
}

Prototype Chains

Another thing in object oriented languages that can be described as looking up bindings is "message dispatch", let's see some examples.

If we define an empty class A in JavaScript it "inherits by default" from the Object class:

{
  class A {}

  test("Object default inheritance", () => {
    expect(new A().toString()).toBe("[object Object]");
  });
}

We can emulate the lookup of toString with the Frame class as it is:

{
  test("Object default inheritance implementation", () => {
    const a = new Frame().bind("toString",
                 () => "[object Object]").right();

    expect(a.find("toString")()).toBe("[object Object]");
  });
}

We can declare a class B that defines its own toString method:

class B {
  toString() {
    return "B!";
  }
}

test("method", () => {
  expect(new B().toString()).toBe("B!");
});

We can again emulate it with the Frame class:

test("method implementation", () => {
  const b = new Frame()
    .bind("toString", () => "[object Object]")
    .right()
    .bind("toString", () => "B!");

  expect(b.find("toString")()).toBe("B!");
});

A more complicated prototype chain:

class C extends B {
  toString() {
    return "C!";
  }
}

test("method override", () => {
  expect(new C().toString()).toBe("C!");
});

test("method override implementation", () => {
  const c = new Frame()
    .bind("toString", () => "[object Object]")
    .right()
    .bind("toString", () => "B!")
    .down()
    .bind("toString", () => "C!");

  expect(c.find("toString")()).toBe("C!");
});

A class can have instance attributes, each instance binds it's own attributes but looks up methods in the prototype chain:

class D extends C {
  constructor(count) {
    super();
    this.count = count;
  }
}

test("instance attributes", () => {
  const d1 = new D(10);
  const d2 = new D(20);

  expect(d1.toString()).toBe("C!");
  expect(d1.count).toBe(10);
  expect(d2.toString()).toBe("C!");
  expect(d2.count).toBe(20);
});

We can emulate this by having the prototype chain "to the left" and the instance attributes in its own scope.

test("method override implementation", () => {
  const D = new Frame()
    .bind("toString", () => "[object Object]")
    .right()
    .bind("toString", () => "B!")
    .down()
    .bind("toString", () => "C!")
    .down();
  const d1 = D.down().bind("count", 10);
  const d2 = D.down().bind("count", 20);

  expect(d1.find("toString")()).toBe("C!");
  expect(d1.find("count")).toBe(10);
  expect(d2.find("toString")()).toBe("C!");
  expect(d2.find("count")).toBe(20);
});

We can do an analogy and say that in OOP Object is a kind of "class prelude", the class hierarchy is an analog to nested modules and the instance is the call stack :)

Growing a Language

But manipulating frames directly doesn't feel like a programming language, if we want to create a really simple language on top we should be able to at least bind and lookup names and do some operations on those values, like arithmetic operations on numbers.

This is the point where most articles pull the "small Lisp or Forth interpreter" trick, but the initial motivation for this exploration was to find a small object oriented language that could be grown and expressed from a small set of primitives.

We are going to start with numbers, specifically integers, since we are implementing our language on top of JavaScript let's use BigInts.

To express a variable we can define a Name class that holds the variable name in its value attribute:

{
  class Name {
    constructor(value) {
      this.value = value;
    }

    getType() {
      return "Name";
    }
  }

The OOP way to eval a Name would be to send it the eval message.

To do that we need a Msg class that can hold eval as the verb.

Following the vocabulary of message, name and verb, the message is sent to the subject and has an object, in case of eval the object is the current scope:

Some verbs (called transitive verbs) take direct objects; some also take indirect objects. A direct object names the person or thing directly affected by the action of an active sentence. An indirect object names the entity indirectly affected

-- Wikipedia: Traditional Grammar

  class Msg {
    constructor(verb, obj) {
      this.verb = verb;
      this.obj = obj;
    }

    getType() {
      return "Msg";
    }
  }

Let's redefine Frame with just two extra methods:

  • eval(value): sends the message eval to value and return the result
  • send(subject, message): sends the message to the subject
  class Frame {
    constructor(left = null, up = null) {
      this.up = up;
      this.left = left;
      this.upLimit = false;
      this.leftLimit = false;
      this.binds = new Map();
    }

    eval(v) {
      return this.send(v, new Msg("eval", this));
    }

    send(s, m) {
      return this
        .find(s.getType())
        .find(m.verb)
        .call(null, s, m, this);
    }

    bind(name, value) {
      this.binds.set(name, value);
      return this;
    }

    find(name) {
      const v = this.binds.get(name);
      if (v === undefined) {
        if (this.upLimit || this.up === null) {
          if (this.leftLimit || this.left === null) {
            return v;
          } else {
            return this.left.find(name);
          }
        } else {
          return this.up.find(name);
        }
      } else {
        return v;
      }
    }

    down() {
      return new Frame(this.left, this);
    }

    right() {
      return new Frame(this, null);
    }

    setUpLimit() {
      this.upLimit = true;
      return this;
    }

    setLeftLimit() {
      this.leftLimit = true;
      return this;
    }
  }

The implementation of send:

  • gets the subject type
  • looks up the type in the environment
    • the result should be a Frame instance with the "prototype" for that type
  • it does a lookup for the Msg verb in the prototype
  • calls the handler passing the subject, message and environment as arguments

We can try it by:

  • creating an instance of Name for the name "a"
  • creating a Frame that works as the prototype for Name, it holds a binding for eval that when called does a variable lookup in the environment
  • creating a Frame for the call stack, binding "a" to 42, nameEnv to the type of Name (returned by a.getType())
  • evaluating a in env and checking that it returns 42
  test("Name resolution with eval message", () => {
    const a = new Name("a");
    const nameEnv = new Frame()
                      .bind("eval", (s, _m, e) => e.find(s.value));
    const env = new Frame()
                      .bind("a", 42)
                      .bind(a.getType(), nameEnv);

    expect(env.eval(a)).toBe(42);
  });

We now have a language that supports bindings but we still can't express message sends in it.

Let's fix this by defining a Send class that has a subject and a message as attributes:

  class Send {
    constructor(subj, msg) {
      this.subj = subj;
      this.msg = msg;
    }

    getType() {
      return "Send";
    }
  }

Since we are going to be using BigInt as our language's Int type we will need to monkey patch BigInt's prototype with our getType method to be able to lookup handlers for Ints:

  BigInt.prototype.getType = () => "Int";

Note: In the next version we are going to use Symbols to avoid monkey patching.

We can now implement message sends in our language by defining eval message handlers for:

  • Name: does a lookup for the name in the environment
  • BigInt: returns itself
  • Msg: returns a new Msg instance where the verb is the same but obj is evaluated
  • Send:
    • evaluates subject and message
    • enters a call frame
    • binds it to the subject
      • I use it instead of this to differenciate from this and self in other OOP languages
    • binds msg to the message
    • binds that for the message's object and
    • sends the evaluated msg to the evaluated subject

To have some message to send that we can test we define a handler for the + message for Ints which does a lookup for it and adds it to the value bound to that.

There's an alternative implementation commented that directly uses s and m.obj that contain the same values.

Finally we test it by building an object that represents the expression 10 + a and check that it results in 42n since a was bounds to 32n in the environment.

  test("Msg Send eval", () => {
    const nameEnv = new Frame()
                      .bind("eval", (s, _m, e) => e.find(s.value));
    const intEnv = new Frame()
      .bind("eval", (s, _m, _e) => s)
      .bind("+", (_s, _m, e) => e.find("it") + e.find("that"));
    //.bind("+", (s, m, _e) => s + m.obj);
    const msgEnv = new Frame().bind(
      "eval",
      (s, _m, e) => new Msg(s.verb, e.eval(s.obj)),
    );
    const sendEnv = new Frame().bind("eval", (s, _m, e) => {
      const subj = e.eval(s.subj),
        msg = e.eval(s.msg);
      return e
        .down()
        .setUpLimit()
        .bind("it", subj)
        .bind("msg", msg)
        .bind("that", msg.obj)
        .send(subj, msg);
    });
    const env = new Frame()
      .bind("Name", nameEnv)
      .bind("Int", intEnv)
      .bind("Msg", msgEnv)
      .bind("Send", sendEnv)
      .right()
      .bind("a", 32n);

    // 10 + a
    const expr = new Send(10n, new Msg("+", new Name("a")));
    expect(env.eval(expr)).toBe(42n);
  });
}

A Language with Syntax

Let's write a parser for our language to make it easier to test, we are going to use ohmjs

import * as ohm from "./node_modules/ohm-js/index.mjs";

Since we are going to be Growing a Language let's create an utility function to define new languages:

function mkLang(g, s) {
  const grammar = ohm.grammar(g),
    semantics = grammar.createSemantics().addOperation("toAst", s),
    parse = (code) => {
      const matchResult = grammar.match(code);

      if (matchResult.failed()) {
        console.warn("parse failed", matchResult.message);
        return null;
      }

      return semantics(matchResult).toAst();
    },
    run = (code, e) => {
      const ast = parse(code);
      return ast ? e.eval(ast) : null;
    };

  return { run, parse };
}

Let's define our types again to use Symbols instead of monkey patching and to add a base class that allows any type to be used as a reply handler for a message by implementing the call method:

class Base {
  call(_, s, m, e) {
    return e.eval(this);
  }
}

Name, Msg and Send are almost the same as before:

class Name extends Base {
  constructor(value) {
    super();
    this.value = value;
  }
}

class Msg extends Base {
  constructor(verb, obj) {
    super();
    this.verb = verb;
    this.obj = obj;
  }
}

class Send extends Base {
  constructor(subj, msg) {
    super();
    this.subj = subj;
    this.msg = msg;
  }
}

But now instead of implementing getType as a method each type is going to have a unique Symbol used to lookup its "prototype" in the call stack when looking for a message handler.

We are going to create a Symbol called typeSym to get and set the type for each object and 3 utility functions to get, set and make a type, which creates the type sets it on a class and returns the type Symbol:

const typeSym = Symbol("TypeSym"),
  getType = (v) => v[typeSym],
  setType = (Cls, type) => ((Cls.prototype[typeSym] = type), type),
  mkType = (name, Cls) => setType(Cls, Symbol(name));

Let's define types for the classes we already have:

const TYPE_NAME = mkType("Name", Name),
  TYPE_MSG = mkType("Msg", Msg),
  TYPE_SEND = mkType("Send", Send),
  TYPE_INT = mkType("Int", BigInt);

Let's redefine Frame for the last time to use getType to get the type associated with a value:

class Frame {
  constructor(left = null, up = null) {
    this.up = up;
    this.left = left;
    this.upLimit = false;
    this.leftLimit = false;
    this.binds = new Map();
  }

  eval(v) {
    return this.send(v, new Msg("eval", this));
  }

  send(s, m) {
    return this
            .find(getType(s))
            .find(m.verb)
            .call(null, s, m, this);
  }

  bind(name, value) {
    this.binds.set(name, value);
    return this;
  }

  find(name) {
    const v = this.binds.get(name);
    if (v === undefined) {
      if (this.upLimit || this.up === null) {
        if (this.leftLimit || this.left === null) {
          return v;
        } else {
          return this.left.find(name);
        }
      } else {
        return this.up.find(name);
      }
    } else {
      return v;
    }
  }

  down() {
    return new Frame(this.left, this);
  }

  right() {
    return new Frame(this, null);
  }

  setUpLimit() {
    this.upLimit = true;
    return this;
  }

  setLeftLimit() {
    this.leftLimit = true;
    return this;
  }
}

We have everything in place to create the first version of our language:

const { run: run1 } = mkLang(
  `Lang {
    Main = Send
    name = (letter | "_") (letter | "_" | digit)*
    Msg = verb Value
    verb = verbStart verbPart*
    verbStart = "+" | "-" | "*" | "/" | "-" | "%" | "&" | "<" | ">" | "!" | "?" | "." | letter
    verbPart = verbStart | digit
    Send = Value Msg*
    Value = int | name
    int = digit+
  }`,
  {
    name(_1, _2) {
      return new Name(this.sourceString);
    },
    Msg: (verb, obj) => new Msg(verb.toAst(), obj.toAst()),
    verb(_1, _2) {
      return this.sourceString;
    },
    Send: (v, msgs) =>
      msgs.children.reduce(
        (acc, msg) => new Send(acc, msg.toAst()), v.toAst()
      ),
    int(_) {
      return BigInt(this.sourceString);
    },
  },
);

Let's create another utility function to make type prototype definitions more readable:

function mkProto(obj) {
  const frame = new Frame();

  for (const name in obj) {
    frame.bind(name, obj[name]);
  }

  return frame;
}

And yet another function that creates a basic environment with eval handlers for Name, BigInt, Msg and Send that will be reused from now on to test our languages:

function mkEnv1() {
  return new Frame()
    .bind(TYPE_NAME,
       mkProto({ eval: (s, _m, e) => e.find(s.value) }))
    .bind(
      TYPE_INT,
      mkProto({
        eval: (s, _m, _e) => s,
        "+": (_s, _m, e) => e.find("it") + e.find("that"),
      }),
    )
    .bind(
      TYPE_MSG,
      mkProto({
       eval: (s, _m, e) => new Msg(s.verb, e.eval(s.obj))
      }),
    )
    .bind(
      TYPE_SEND,
      mkProto({
        eval(s, _m, e) {
          const subj = e.eval(s.subj),
            msg = e.eval(s.msg);
          return e
            .down()
            .setUpLimit()
            .bind("it", subj)
            .bind("msg", msg)
            .bind("that", msg.obj)
            .send(subj, msg);
        },
      }),
    );
}

Let's test some basic expressions in our first language:

test("Msg Send eval with parser", () => {
  const env = mkEnv1().right().bind("a", 32n);

  expect(run1("10 + 4", env)).toBe(14n);
  expect(run1("10 + a", env)).toBe(42n);
  expect(run1("10 + a + 4", env)).toBe(46n);
});

Conditionals

After arithmetic operations the next feature that sets a language apart from an advanced calculator are conditional expressions, to support them we need some new types, one to express false when evaluating conditions, we can also use it to express the lack of a value, a useful type for this is usually called null, nil or Unit, in our language it will be called Nil and its syntax will be ():

Let's create the class and a singleton instance:

class Nil extends Base {}
const NIL = new Nil();

For conditionals we need a way to express two branches and pick one of them, for that and, as Lisp as taught us, many other reasons we are going to create the Pair type that has two fields, not car/cdr, not first/rest, not head/tail but a and b:

class Pair extends Base {
  constructor(a, b) {
    super();
    this.a = a;
    this.b = b;
  }
}

The final ingredient for conditionals is the Later type, which I will describe... later ;)

class Later extends Base {
  constructor(value) {
    super();
    this.value = value;
  }
}

Let's not forget to create the Symbols for the new types:

const TYPE_NIL = mkType("Nil", Nil),
  TYPE_PAIR = mkType("Pair", Pair),
  TYPE_LATER = mkType("Later", Later);

The second version of our language adds support for the new types:

const { run: run2 } = mkLang(
  `Lang {
    Main = Send
    nil = "(" ")"
    Pair = PairHead ":" Value
    PairHead = Scalar | Later | ParSend
    name = (letter | "_") (letter | "_" | digit)*
    Msg = verb Value
    verb = verbStart verbPart*
    verbStart = "+" | "-" | "*" | "/" | "-" | "%" | "&" | "<" | ">" | "!" | "?" | "." | letter
    verbPart = verbStart | digit
    Send = Value Msg*
    ParSend = "(" Send ")"
    Later = "@" Value
    Value = Pair | PairHead
    Scalar = int | nil | name
    int = digit+
  }`,
  {
    nil: (_o, _c) => NIL,
    Pair: (a, _, b) => new Pair(a.toAst(), b.toAst()),
    name(_1, _2) {
      return new Name(this.sourceString);
    },
    Msg: (verb, obj) => new Msg(verb.toAst(), obj.toAst()),
    verb(_1, _2) {
      return this.sourceString;
    },
    Send: (v, msgs) =>
      msgs.children.reduce(
        (acc, msg) => new Send(acc, msg.toAst()), v.toAst()
      ),
    ParSend: (_o, v, _c) => v.toAst(),
    Later: (_, v) => new Later(v.toAst()),
    int(_) {
      return BigInt(this.sourceString);
    },
  },
);

The simplest implementation for conditionals in a language with no side effects and free CPU time could be a message with the format condition ? when-true : when-false where:

  • ? is a message sent to a condition expression
  • when-true : when-false is the message object as a pair of expressions
  • a message reply handler on Nil that picks the second item of the Pair
  • an implementation for the remaining types that picks the Pair's first item
test("eager conditional", () => {
  const env = mkEnv1()
    .bind(
      TYPE_NIL,
      mkProto({
        eval: (s, _m, e) => s,
        "?": (s, m, e) => m.obj.b
      }),
    )
    .bind(
      TYPE_PAIR,
      mkProto({
        eval: (s, _m, e) => new Pair(e.eval(s.a), e.eval(s.b))
      }),
    )
    .bind(
      TYPE_INT,
      mkProto({
       eval: (s, _m, e) => s,
       "?": (s, m, e) => m.obj.a
      }),
    )
    .right();

  expect(run2("0 ? 1 : 2", env)).toBe(1n);
  expect(run2("() ? 1 : 2", env)).toBe(2n);
  expect(run2("() ? 1 : () ? 2 : 3", env)).toBe(3n);
  expect(() => run2("0 ? 1 : (1 * 2)", env)).toThrow();
});

We can see that the last test throws, this is because there's no reply handler for * defined for Ints. From this we can tell that this implementation evaluates both sides of the pair, something we probably don't want.

Let's fix this by implementing eval for Later which wraps any other value and returns the wrapped value unevaluated on eval:

test("lazy conditional", () => {
  const env = mkEnv1()
    .bind(TYPE_LATER,
       mkProto({ eval: (s, _m, e) => s.value }))
    .bind(
      TYPE_NIL,
      mkProto({
        eval: (s, _m, e) => s,
        "?": (s, m, e) => e.eval(m.obj.b)
      }),
    )
    .bind(
      TYPE_PAIR,
      mkProto({
        eval: (s, _m, e) => new Pair(e.eval(s.a), e.eval(s.b))
      }),
    )
    .bind(
      TYPE_INT,
      mkProto({
        eval: (s, _m, e) => s,
        "?": (s, m, e) => e.eval(m.obj.a)
      }),
    )
    .right();

  expect(run2("0 ? 1 : 2", env)).toBe(1n);
  expect(run2("() ? 1 : 2", env)).toBe(2n);
  expect(run2("() ? 1 : () ? 2 : 3", env)).toBe(3n);
  expect(run2("0 ? @ 1 : (1 * 2)", env)).toBe(1n);
});

With Later we can "delay" the evaluation of the pair until we know which branch we want to take.

Notice that the implementations of ? for Nil and Int now have to explicitly evaluate the branch they take.

Blocks

The next feature we probably want is the ability to define reply handlers in our language instead of "native" JavaScript functions.

To test this we need to be able to have more than one expression in our language.

We could do it with pairs but let's create a Block type which contains a sequence of expressions that on eval it evaluates each in turn and returns the result of the last one:

class Block extends Base {
  constructor(value) {
    super();
    this.value = value;
  }
}

Let's add the type for Block and to avoid repeating a lot of code for small changes let's also introduce Float and Str to our language by adding their type tags and adding them to the parser:

// NOTE: we wrap string to be able to attach a Symbol at runtime further down
class Str extends String {}

const TYPE_BLOCK = mkType("Block", Block),
  TYPE_FLOAT = mkType("Float", Number),
  TYPE_STR = mkType("Str", Str);

const { run: run3 } = mkLang(
  `Lang {
    Main = Send
    nil = "(" ")"
    Pair = PairHead ":" Value
    PairHead = Block | Scalar | Later | ParSend
    name = (letter | "_") (letter | "_" | digit)*
    Block = "{" Exprs "}"
    Exprs = Send ("," Send )*
    Msg = verb Value
    MsgQuote = "\\\\" Msg
    verb = verbStart verbPart*
    verbStart = "+" | "-" | "*" | "/" | "-" | "%" | "&" | "<" | ">" | "!" | "?" | "." | letter
    verbPart = verbStart | digit
    Send = Value Msg*
    ParSend = "(" Send ")"
    Later = "@" Value
    Value = Pair | PairHead
    Scalar = float | int | str | nil | name | MsgQuote
    float = digit+ "." digit+
    int = digit+
    str = "\\\"" (~"\\\"" any)* "\\\""
  }`,
  {
    nil: (_o, _c) => NIL,
    Pair: (a, _, b) => new Pair(a.toAst(), b.toAst()),
    name(_1, _2) {
      return new Name(this.sourceString);
    },
    Block: (_o, exprs, _c) => new Block(exprs.toAst()),
    Exprs: (first, _, rest) =>
      [first.toAst()]
        .concat(rest.children.map((v) => v.toAst())),
    Msg: (verb, obj) => new Msg(verb.toAst(), obj.toAst()),
    verb(_1, _2) {
      return this.sourceString;
    },
    MsgQuote: (_, msg) => msg.toAst(),
    Send: (v, msgs) =>
      msgs.children.reduce(
        (acc, msg) => new Send(acc, msg.toAst()), v.toAst()
      ),
    ParSend: (_o, v, _c) => v.toAst(),
    Later: (_, v) => new Later(v.toAst()),
    int(_) {
      return BigInt(this.sourceString);
    },
    float(_a, _d, _b) {
      return parseFloat(this.sourceString);
    },
    str: (_1, s, _3) => new Str(s.sourceString),
  },
);

Message Reply Definition

With block support let's implement "message reply definition".

Since we are going to be using this message handlers in subsequent tests let's define a function to create an environment that supports reply definitions:

function mkEnv2() {
  return mkEnv1()
    .bind(TYPE_LATER, mkProto({ eval: (s, _m, e) => s.value }))
    .bind(
      TYPE_BLOCK,
      mkProto({
        eval: (s, _m, e) => {
          let r = NIL;
          for (const item of s.value) {
            r = e.eval(item);
          }
          return r;
        },
      }),
    )
    .bind(
      TYPE_SEND,
      mkProto({
        eval(s, _m, e) {
          const subj = e.eval(s.subj),
            msg = e.eval(s.msg);
          return e
            .down()
            .setUpLimit()
            .bind("it", subj)
            .bind("msg", msg)
            .bind("that", msg.obj)
            .send(subj, msg);
        },
        replies(s, m, e) {
          const target = e.up.eval(s.subj),
            targetType = getType(target),
            msgVerb = s.msg.verb,
            impl = m.obj,
            proto = e.up.find(targetType);

          proto.bind(msgVerb, impl);
          return NIL;
        },
      }),
    );
}

And test it:

test("Msg Send reply definition", () => {
  const env = mkEnv2().right();

  const code = "{@(0 add+1 0) replies @(it + that + 1), 1 add+1 2}";
  expect(run3(code, env)).toBe(4n);
});

We support reply definitions by adding a handler for the reply message on the Send type, without Later there's no way to send a message to a Send but with it we can "quote" a Send and send a message to it, yes, we send a message to a message send.

replies implementation:

  • takes Send's subject
  • gets its type
  • finds the current prototype for it in the environment
  • binds a handler for Send's verb using replies' object

A little convoluted, let's try again, this is the shape of an expression to define a reply to a message: @SampleSend replies @ReplyImplementation.

SampleSend is a Send expression, which we get by using Later on a Send to delay its evaluation, it's an example of the kind of expression that we want to handle.

As a reminder Send has the shape Subject Verb Object.

We take SampleSend's subject to get the type associated with the new reply.

From SampleSend we also get the verb that we want to reply to.

Finally ReplyImplementation is used as the handler for the message, which you have to quote to delay its evaluation until the message is handled.

Walk and Map

We still don't have iteration, there are many ways to implement it but here's a fun set of "primitives" i've been playing with: walk and map.

  • map forwards the quoted message to the subject's items
  • walk also forwards a quoted message but it forwards the walk message itself, not the quoted one, this makes it recursive
  • scalar values implement handlers for both by sending the quoted message to themselves
test("walk and map", () => {
  function esend(s, m, e) {
    return e.eval(new Send(s, m));
  }

  function pair(a, b) {
    return new Pair(a, b);
  }

  const env = mkEnv1()
    .bind(
      TYPE_PAIR,
      mkProto({
        eval: (s, _m, e) =>
                 pair(e.eval(s.a), e.eval(s.b)),
        walk: (s, m, e) =>
                 pair(esend(s.a, m, e), esend(s.b, m, e)),
        map: (s, m, e) =>
                 pair(esend(s.a, m.obj, e), esend(s.b, m.obj, e)),
      }),
    )
    .bind(
      TYPE_INT,
      mkProto({
        eval: (s, _m, _e) => s,
        "+": (s, m, e) => e.find("it") + e.find("that"),
        walk: (s, m, e) => esend(s, m.obj, e),
        map: (s, m, e) => esend(s, m.obj, e),
      }),
    )
    .right();

  expect(run3("1 walk \\ + 2", env)).toBe(3n);
  expect(run3("1 map \\ + 2", env)).toBe(3n);

  const p1 = run3("1 : 2 map \\ + 2", env);
  expect(p1.a).toBe(3n);
  expect(p1.b).toBe(4n);

  const p2 = run3("1 : 2 : 3 walk \\ + 2", env);
  expect(p2.a).toBe(3n);
  expect(p2.b.a).toBe(4n);
  expect(p2.b.b).toBe(5n);
});

Custom Types

You may be asking "but what about user defined types?", well, glad you asked because I was planning on explaining that just about now.

We first need to bring the Symbol type to our language:

const TYPE_SYM = mkType("Symbol", Symbol);

Then we need a way to create new Symbols, instead of adding syntax for it we are going to add a reply to the as-type message for strings.

And an apply-to handler to the Symbol type to apply itself as the type to the message object.

And... that it?

test("custom type definition", () => {
  const env = mkEnv2()
    .bind(TYPE_NIL, mkProto({ eval: (s) => s }))
    .bind(
      TYPE_PAIR,
      mkProto({
        eval: (s, _m, e) => new Pair(e.eval(s.a), e.eval(s.b)),
      }),
    )
    .bind(
      TYPE_STR,
      mkProto({
        eval: (s) => s,
        "as-type"(s, _m, e) {
          const type = Symbol(s);
          e.left.bind(
            type,
            new Frame().bind("eval", (s) => s),
          );
          return type;
        },
        "+": (s, m) => new Str(s + ("" + m.obj)),
        "say-hello": (s) => new Str(`Hello, ${s}!`),
      }),
    )
    .bind(
      TYPE_NAME,
      mkProto({
        eval: (s, _m, e) => e.find(s.value),
        is(s, m, e) {
          e.up.bind(s.value, m.obj);
          return m.obj;
        },
      }),
    )
    .bind(
      TYPE_SYM,
      mkProto({
        eval: (s) => s,
        "apply-to"(s, m) {
          m.obj[typeSym] = s;
          return s;
        },
      }),
    )
    .right();

  const pair = run3(
    `{
    @MyType is ("my-type" as-type ()),
    @name is "Joe",
     MyType apply-to name,
     @(name say-hello ()) replies @("Well, hello " + it),
     (name say-hello ()) : ("Mike" say-hello ())
   }`,
    env,
  );
  // coerse to String from Str to test
  expect("" + pair.a).toBe("Well, hello Joe");
  expect("" + pair.b).toBe("Hello, Mike!");
});

Let's go line by line:

@MyType is ("my-type" as-type ())

Define a new type with label "my-type" and bind it to the name MyType.

Notice that until now we had no way to bind new values in the environment, we defined a handler for is in the Name type that binds the object in the environment for the current name. Since each message handler enters its own call frame we bind it in the parent frame.

@name is "Joe"

Bind the String "joe" to the name name.

MyType apply-to name

Apply the type bound in MyType to the value bound in name.

@(name say-hello ()) replies @("Well, hello " + it)

Define a reply for the say-hello message for the type in name (notice that replies evaluates the subject in the current environment before getting the type so the type is not Name but our new type.

(name say-hello ()) : ("Mike" say-hello ())

Return a pair with a being the result of say-hello in our type and b the same message but on a String.

Future Work

That's all for now, here's a list of things I want to improve, explore, expand on this topic:

  • how to support closures in Frame
  • somehow support reply handlers that don't enter a call frame for lightweight things like arithmetic operators
  • since the language semantics are "late bound" in the environment we can do multiple passes on the same program by having different environments with different semantics and having the result of evaluating a program in one environment be the input of the next "phase", why stop at "compile time" and "runtime"?
  • explore the "emergent behavior" of the current design:
    • since a type message handler is in the stack we can "shadow" type methods per module or even in the call stack
    • alternative where prototype lookup is dynamically scoped and not lexically scoped, either by default or with a different syntax
  • have Frame as a first class type, and base objects on it?
  • make bind override the value in the right scope if the binding already exists
  • make send dispatch not only on subject's type but also on the message object type too

Speedrunning WebAssembly's History: A Fictional First-Person Perspective

Context

This is a true story. The events depicted in this film took place in Minnesota in 1987. At the request of the survivors, the names have been changed. Out of respect for the dead, the rest has been told exactly as it occurred.

β€”Fargo (1996 film)

While writing WebAssembly from the Ground Up we reached the point where we needed to write an introduction. We decided to write one each and see which one worked better or if we could merge both into one.

From the conversations I knew Patrick was going to write a "Standard" introduction so I decided to try something else.

At the end Patrick's version ended up in the book so I'm posting mine here in case anyone finds it interesting.

Note that what I'm posting here was a draft, complete enough for patrick to check and provide feedback, it would have gone through a couple of rounds of revisions and edits if it ended up in the book :)

Introduction

It's 2013, web browsers are getting faster and enabling more use cases to be available on the web, you work at Mozilla and start thinking how to enable and accelerate the adoption of the web for more use cases.

You think "What kind of software is still not available on the web?", some ideas come to mind:

  • Games

  • Image / Video editors, encoders and decoders

  • Compression

  • Cryptography

  • CAD

  • Scientific visualization and simulation

  • Simulators, emulators

  • IDEs

You notice a pattern, they are all "compute intensive" and most of them are written in low level/system programming languages.

There are already many languages that compile to JavaScript, why not try to compile existing C/C++ codebases to JavaScript and see if it's enough?

After all when a C/C++ compiler does its job the result is mostly about calling functions, manipulating local variables and loading and storing numbers directly in memory.

You have an idea, if you start with an existing compiler like clang and swap the last stage to emit JavaScript instead of assembly, with the help of some shims you should be able to run some of those programs in the browser.

Since JavaScript doesn't have direct access to memory you decide to simulate memory by using an Array of numbers, pointers are compiled to numbers that are used to index into the memory array.

After some hacking you have a working prototype, you decide to call it emscripten.

After compiling and running some C/C++ programs you notice that they are really slow, the translation to JavaScript "throws away" a lot of information that is present in the original program, mainly the variable types.

The JavaScript runtime has to "rediscover" all that information by profiling and JITing, that rediscovery takes a while and is not always perfect, it would be nice to just pass the information you already have.

Another problem is that using an Array as memory is slow, JavaScript arrays are really flexible, but that comes at the cost of performance.

After some conversations and collaborations you get Typed Arrays implemented, this speeds up the memory operations a lot.

You notice that after the JIT has done its work peak performance is pretty good, but it takes a while and since each browser has its own engine the performance differs among them.

Using your connections at Mozilla you start talking with the Spidermonkey team to find a way to hint the runtime about the types of variables emitted by the compiler.

Low level languages have multiple numeric types, JavaScript has only one, but looking at the spec you notice that binary bitwise operators apply the ToInt32 conversion to the operands.

You can use that to hint the runtime that a variable is a 32 bit integer, using an operation that "does nothing" like binary or between a value and zero like a | 0 does exactly that.

Numbers in JavaScript are internally represented as doubles. If the compiler emits a variable of type double then you are almost done, but a variable can be of any type, like a String, a boolean, an Array etc. How do you hint to the runtime that this variable contains a Number and specifically a double?

In JavaScript there's a trick to coerce any value into a number, just put the plus sign in front of a variable like +a and it will convert its operand to the Number type.

Great, there's one type left, now you need to hint the runtime that a variable is a 32 bit floating point value, but you ran out of tricks.

You convince the Spidermonkey team to introduce a new function Math.fround that returns the nearest 32-bit single precision float representation of a number.

With the new hints in place you modify emscripten to emit the hints and notice a significant speed improvement.

But still it takes a while for the JIT to "warm up", the hints are there, why doesn't the JIT just optimize them the first time they are processed?

If there only was a way to hint the JIT that a piece of code is not written by humans but emitted by a compiler and it adheres to your hints it would be much faster and it could do the optimizations "Ahead of Time" instead of "Just in Time".

There's already a backward compatible way to hint JavaScript that a piece of code adheres to a restricted subset of JavaScript, "use strict", you could emit a similar backward compatible directive that a runtime may opt into and do the ahead of time optimizations, since the JavaScript you are emitting is replacing the assembly a compiler would emit at that stage you decide to call it asm.js, then the directive is defined as "use asm".

With this backward compatible changes you convince the Spidermonkey team to prototype the changes in a branch and the results are impressive, asm.js code runs only twice as slow as native code! After some extra rounds of optimizations you get it to 50% slower than native.

This is more than a toy, this may actually enable new types of applications on the web.

To show it to the world you decide to do a flashy demo, collaborating with Epic Games by compiling the Unreal 3 Engine to asm.js and running a 3D demo in the browser called Epic Citadel.

With the demo out you start working to specify asm.js and try to convince other browsers to adopt it.

But even if the results are impressive there are still problems, the amount of JavaScript generated is huge, a demo may generate over 40 MBs of asm.js, the bottleneck now moves to a new place, downloading and parsing that amount of code takes a while, on mobile it can take over 20 seconds just to parse!

Since asm.js is meant to be a compiler target and you are in the process of convincing other browser vendors to add a new capability to their engines you all agree that there's an opportunity to "do it correctly" and define a binary format that is more compact and can be decoded much faster.

By defining a binary format you also avoid the problem of JavaScript having to be two different things at the same time, a language for humans to write and a compile target. With a new binary format that can eventually diverge from JavaScript's semantics you can achieve the initial objective of enabling more applications to run on the web.

Since this is a compiler target for the web, and compilers usually emit assembly, the new project is named WebAssembly.

After some rounds of discussions all browser vendors get on board and a new collaboration is announced to standardize WebAssembly.

Playing with Code: Programming-Adjacent Games

Some weeks ago I was working on some explorable explanation for binary operators and I started thinking if there was any interesting work on making programming interactive, interesting, even fun.

I asked on twitter, mastodon and the future of coding slack the following question:

Do you know any games where the core game mechanic is about programming? Things like Zachtronics games, factorio or Robotopia

Below is an edited summary of the anwers I got.

Thanks to Ivan Reese, Cameron Yick, Joe Nash, Jeffrey Tao, George Campbell, Daniel Sosebee, Kartik Agaram, Richard Carlsson, Asbjorn, Janne Auki and Dragan Okanovic for the contributions.

Quadrilateral Cowboy

In Quadrilateral Cowboy, the player takes the role of a computer hacker in the 1980s, armed with a "top-of-the-line hacking deck outfitted with a 56.6k modem and a staggering 256k RAM".

The game is played from the first-person perspective. The player acts as the hacker overseeing one or more adept agents that have missions to infiltrate buildings and steal documents.

Human Resource Machine

Human Resource Machine is a visual programming-based puzzle video game developed by Tomorrow Corporation.

Human Resource Machine uses the concept of a corporate office worker assigned to perform tasks that involve moving objects between an inbox, an outbox, and to and from storage areas as a metaphor for assembly language concepts. The player works through some forty puzzles in constructing a program to complete a specific task.

Dreams

Dreams is a game creation system video game developed by Media Molecule.

Players can create and play user-generated content in the forms of games, audiovisual experiences and game assets, which can be shared or remixed to be used in other players' creations.

LightBot

Solve Puzzles using Programming!

LightBot is a puzzle game based on coding; it secretly teaches you programming logic as you play!

LightBot was designed with first-time coders. It's been played by over 20 million kids and has been used by tens of thousands of teachers worldwide.

Zoombinis (series)

Zoombinis was a series of educational puzzle computer games that were originally developed by TERC and published by Broderbund.

The series consists of three games: Logical Journey of the Zoombinis (1996), Zoombinis: Mountain Rescue (2001), and Zoombinis: Island Odyssey (2002). Logical Journey was remade as Zoombinis for modern operating systems in 2015. The series focuses on the Zoombinis, small blue creatures each with different appearances and personalities, which the player must guide through strange puzzle-filled lands.

TwilioQuest

TwilioQuest is an educational video game designed to teach a new generation of developers how to change the world with code.

TwilioQuest prepares you for real-world programming by helping you configure a local development environment and introducing tools used by professional programmers around the world. From learning how to use your terminal, to coding in Python, JavaScript, and PHP, TwilioQuest will help you develop practical engineering skills.

From the author:

I worked on a now defunct programming education game (TwilioQuest) and myself and another dev used to stream gameplay and interviews with programming game devs, there’s still some of the vods kicking around at twitch.tv/twilioquest, including a chat with Zach of Zachtronics

Nintendo game builder garage

In Game Builder Garage, the player uses a visual programming language centralized on the concept of creatures called Nodon. The Nodon represent various facets of input, game output, logic, and on-screen objects, such as a Stick Nodon that reports input from the Joy-Con analog stick or a Person Nodon that represents an on-screen character. The player builds a program by adding Nodon and making connections between the various nodes on Nodon, such as connecting the Stick Nodon to the Person Nodon as to tie the analog stick to movement of the character on-screen.[1] Nodon are available to interface nearly all features of the Switch and Joy-Con, including the infrared sensors and motion controls.

The game features a lesson mode to guide the player through using the Nodon language and to help them understand some of the principles of game development through a series of seven built-in games that the player can create.

Rabbids coding

Across 32 levels, players are tasked with cleaning up a spaceship that has been overrun by Rabbids, which is achieved by providing simple instructions to a Rabbid wearing a mind-control device.

Players drag instructions for their Rabbid from a menu and place them in order, before pressing the play button to test them out.

The goal in each level is to provide the simplest instructions possible to complete the task. Eventually players will unlock a sandbox environment, allowing them to explore and play with the instructions as they see fit.

Signal state

Set in a post-apocalyptic future, The Signal State challenges you with complex puzzles inspired by modular synthesizers. Repair machines, rebuild an abandoned farm, and be part of a revolution that will change the fate of agriculture once and for all.

Battlesnake

A competitive game where your code is the controller.

All you need is a web server that responds to the Battlesnake API.

Develop your own algorithm to find food, stay alive, and eliminate others. Battlesnakes are controlled by a web server you deploy, running the code you write.

Shenzhen I/O

Shenzhen I/O is a puzzle video game set in the near future in which players assume the role of an electronics engineer who has emigrated to Shenzhen, China to work for fictional technology company Shenzhen Longteng Electronics. The player is tasked with creating products for clients, which involves constructing circuits and then writing code to run them. The programming language used in the game is similar to assembly language and the circuit elements resemble simplified versions of real-world electronics.

The game allows players to create their own challenges by writing Lua scripts.

Exapunks

Exapunks takes place in an alternate timeline in the year 1997. The fictional world of Exapunks is heavily computerized, and a disease called "the phage" is ravaging the population, turning the bodies of those affected into computerized components. The player takes on the role of Moss, a hacker who breaks into computer systems in order to afford a $700/day drug to slow the progress of his phage affliction. His hacking missions are given to him by a mysterious artificial intelligence known as EMBER-2.

Each mission takes place inside a network of interconnected and specialized computer systems. Using programmable software agents called EXAs, the player must accomplish each given task by writing computer code to cleverly manipulate the data stored on the network's systems. The EXAs' instruction set features a few simple opcodes for movement, data processing, network messaging, and interfacing with files and registers. Due to their limited memory capacity, these tasks often require several agents working together in a highly coordinated fashion. EXA units also have the ability to replicate themselves inside the network. Typical missions include retrieving data from secured storage systems, hacking into company databases, and causing an automated teller machine to dispense free cash. Some puzzles also require the player to hack Moss's body to maintain his health. Some puzzles challenge the player to hacker battles, where they must pit their EXAs against an opponent's agents, for example altering a television station's program to broadcast Moss' content instead.

Lastcall BBS

A collection of ideas that weren’t big enough for full games in their own right but still absolute bangers, has a game that I think a lot of people here will get a kick out of, called β€œX’BPGH: The Forbidden Path”, which is kind of a cellular automata programming game where the rules of the automata are obscured by the eldritch horror dressings of the whole thing

Boot up your Z5 Powerlance and dial into Last Call BBS, the last game from Zachtronics! The Barkeep’s loaded up his retro computer with a full set of puzzle games for you to download and play. No need to worry about copy protection, they’re all fully cracked and ready to enjoy!

SpaceTraders

SpaceTraders is an API-based game where you acquire and manage a fleet of ships to explore, trade, and fight your way across the galaxy. Use any programming language with our API to control the most powerful fleet in universe.

Baba is You

Baba Is You is a puzzle game where the rules you have to follow are present as physical objects in the game world. By manipulating the rules, you can change how the game works, repurpose things you find in the levels and cause surprising interactions!

TIS-100

TIS-100 is an open-ended programming game by Zachtronics, the creators of SpaceChem and Infinifactory, in which you rewrite corrupted code segments to repair the TIS-100 and unlock its secrets. It’s the assembly language programming game you never asked for!

SineRider

graphing equations is the core mechanic

SineRider is a game about love and graphing, built by a global team of teenagers at Hack Club

Synthesis

Synthesis is I think something that doesn't look like programming, but seems to me like programming in a deep way. Seems similar to SineRider in that respect (which I love, but man it gets difficult quickly. Somebody should graph the learning curve of SineRider within SineRider :)

shapez.io

Shapez is a relaxed game in which you have to build factories for the automated production of geometric shapes. As the level increases, the shapes become more and more complex, and you have to spread out on the infinite map.

Turing Complete

Turing Complete is a game about computer science. If you enjoy the thrill of figuring things out and those moments where a deeper perspective is revealed about something you thought you understood, this game is for you.

ComputerCraft

ComputerCraft is a mod created for Minecraft by dan200 that adds Computers, Monitors, Modems, Turtles and more! ComputerCraft's Computers and Turtles are programmed with the easy-to-learn Lua programming language. You can use Redstone, RedPower or even MineFactory Reloaded alongside with your devices for the best experience.

Screeps

It's an open-source game for programmers, wherein the core mechanic is programming your units' AI. You control your colony by writing JavaScript.

Brawl.AI

The idea is: Surely there are some really smart people who can write the bot to beat all other bots. That could be you!

Here, on this site, you can write bots that play a turn-based-squad-based game inspired by XCOM. Especially the tactical layer. There is no strategy layer on this site as that is very game dependent.

Duskers

In Duskers you pilot drones into derelict spaceships to find the means to survive and piece together how the universe became a giant graveyard. In film terms it's The Road meets the first Alien movie. In game terms: It's a roguelike with elements of dungeon crawling and real time strategy, but in a survival horror setting that focuses on subterfuge, and adapting to survive.

Features:

  • Gritty retro digital atmosphere
  • Use a Command Line Interface to control drones & ship systems
  • Explore procedurally generated derelict ships and universe
  • Upgrade and modify drones with the salvage you find
  • Discover ship logs and piece together what happened
  • Find creative ways out of bad situations using tools and your environment

Old-school Games

Check Category:Programming games for a complete list.

Incredible machine (series - Rube Goldberg machines)

The Incredible Machine (TIM) is a series of video games in which players create a series of Rube Goldberg devices.

The general goal of the games is to create a series of Rube Goldberg devices: arrange a given collection of objects in a needlessly complex fashion so as to perform some simple task, such as "put the ball into a box" or "start a mixer and turn on a fan". Available objects range from simple ropes and pulleys to electrical generators, bowling balls, and even cats and mice to humans, most of which have specific interactions with or reactions to other objects: for example, mice will run towards nearby cheese, and light sources placed next to a magnifying glass will ignite wicks. Levels have a set of fixed objects that cannot be moved by the player, and the player must solve the puzzle by carefully arranging a provided set of objects around the fixed items. There is also a "freeform" option that allows the user to "play" with all the objects with no set goal or to also build their own puzzles with goals for other players to attempt to solve.

Rocky's Boots

Rocky's Boots is an educational logic puzzle game by Warren Robinett and Leslie Grimm, published by The Learning Company in 1982.

It was one of the first educational software products for personal computers to successfully use an interactive graphical simulation as a learning environment.

The object of the beginning part of Rocky's Boots is to use a mechanical boot to kick a series of objects (purple or green squares, diamonds, circles, or crosses) off a conveyor belt; each object will score some number of points, possibly negative. To ensure that the boot only kicks the positive objects, the player must connect a series of logic gates to the boot.

Robot Odyssey

Robot Odyssey is a programming game developed by Mike Wallace and Dr. Leslie Grimm and published by The Learning Company in December 1984.

It is a sequel to Rocky's Boots, and it was released for the Apple II, TRS-80 Color Computer, and MS-DOS.

The player is readying for bed when, suddenly, they fall through the floor into an underground city of robots, Robotropolis. The player begins in the sewers of the city with three programmable robots, and must make their way to the top of the city to try to find their way home again.

Core War

Core War is a 1984 programming game created by D. G. Jones and A. K. Dewdney in which two or more battle programs (called "warriors") compete for control of a virtual computer. These battle programs are written in an abstract assembly language called Redcode. The standards for the language and the virtual machine were initially set by the International Core Wars Society (ICWS), but later standards were determined by community consensus.

Tierra (computer simulation)

Tierra is a computer simulation developed by ecologist Thomas S. Ray in the early 1990s in which computer programs compete for time (central processing unit (CPU) time) and space (access to main memory). In this context, the computer programs in Tierra are considered to be evolvable and can mutate, self-replicate and recombine.

Halite AI Programming Competition

Halite is an open-source computer programming contest developed by the hedge fund/tech firm Two Sigma in partnership with a team at Cornell Tech. Programmers can see the game environment and learn everything they need to know about the game. Participants are asked to build bots in whichever language they choose to compete on a two-dimensional virtual battle field.

Ruby Warrior

Game written in Ruby for learning Ruby and artificial intelligence.

You play as a warrior climbing a tall tower to reach the precious Ruby at the top level. On each floor you need to write a Ruby script to instruct the warrior to battle enemies, rescue captives, and reach the stairs. You have some idea of what each floor contains, but you never know for certain what will happen. You must give the Warrior enough artificial intelligence up-front to find his own way.

Programming-Adjacent or Game-Adjacent

MockMechanics (arguably a game)

StarEdit or other level-editors that ship with a game

Pixel Starships

Pixel Starships has a rule-based AI engine that I think is pretty cool. You create a bunch of these rules for each crew member and give them an ordering. The highest-precedence rule whose condition is currently fulfilled is the one the crew member will take.

Games Big Enough to Require Explanation

Board Games

Robogem

A programming game, designed to teach kids (6+) to program. Players try to collect gems by programming a robot on the board.

The robot takes three commands: move forward, turn left and turn right. Players string these commands to guide the robots. Advanced games include functions that can contain many moves which can then be repeated when wanted.

First player to collect three gems and return them to their home base wins the game.

Comments

Cameron Yick:

On a meta note, this question made me think about what elements of game design are β€œenough” to qualify as programming eg

  • Opportunity for emergent behavior? (Conway)
  • Ability to manage state / control flow?
  • System simulations (city / tycoon) - you achieve goals by modifying the environment rather than the agents
  • Has elements that can be optimized or automated
  • No single path to β€œwin”, but some are quantitatively better than others

Sorting like an ant

Edited from the transcript for readability:

Ants actually have a really cool grouping algorithm. One of the things that they do is they keep their larva sorted by age and frequently things will happen where the larva get disordered because they had to move the colony or something, but they do it through just seemingly random behavior, they just randomly walk around pick up larva and put them down and over time they end up perfectly sorted and they investigated this and what they found out was that if they found a larva in an area giving off a particular scent and that area has none of that scent around it they'll tend to pick it up and if they happen to be carrying a larva with a particular scent and they get into an area that's high in that scent from other larva they'll tend to put it down and basically those simple rules over time will sort amazingly efficiently.

From Will Wright's Dynamics for Designers

How ants know the queen is dead?

Edited from the transcript for readability:

Ants have this local communication where they drop pheromones and they can change nearby ant states that can smell a pheromone and say "Oh food is nearby" or "it's time to attack" or "time to do nesting" instinct.

So they have this very elaborate horizontal propagation of information through pheromones, but they do this other thing called trophallaxis, what happens is an ant will come up to another one and basically vomit up food and then the other ant slips it up, so ants in fact have two stomachs, they have this private stomach and they have the social stomach that they share the food with every other hand now what this ends up being it's their global variables it's like a global bulletin board system for the entire colony. For instance the Queen generates this very particular chemical that no other ant produces and she does trophallaxis just like all the other ants and what happens is over the course of maybe two hours that little trace of chemical will disseminate through the entire colony and if that chemical ever evaporates because the Queen died or was lost the entire ant colony instantly knows that they've lost their queen and then they actually start a whole new behavior promoting a new queen. So ants actually have a balance of communication here between the global and local the way they propagate information.

From Will Wright's Dynamics for Designers