Bootstrap post-collapse OOP technology (Part 1)

2024-08-09 08:07

Winter is coming and Collapse OS aims to soften the blow. It is a Forth operating system and a collection of tools and documentation with a single purpose: preserve the ability to program microcontrollers through civilizational collapse.

But how do we preserve the ability to program OOP through civilizational collapse?

In previous posts: Make Your Self, Macros and optimizations: it's just a phase and The proof is in the meta-tracing JIT interpreter I described the design and implementation of a minimal object oriented language with implementations in JavaScript and the PyPy toolchain.

In this one I will sketch the minimal set of features the language has to provide to allow booting/bootstrapping the rest of the language in itself.

For future reference this post is written with the code on this commit in mind:

The implementation of the core language runtime is at fatter.js
The grammar is at fatter.js:147
The code testing the bootstrap script is at fatt.boot.test.js:206

From the primitive environment definition we can extract the set of methods that are required (at least for now) to define the rest:

For type Nil (the only falsy value):

eval returns itself
- () 👉 ()
< less than (always false)
= equals, returns 1 if the left hand side (subject) and right hand side (object) are both nil
- () = () 👉 1

For Int and Float types the implementations work as you would expect:

eval returns itself
<
=
+, -, *, /

For String:

eval returns itself
<
=
+ string concatenation
size returns the string length

For Name, a binding to a value in the environment:

eval does a lookup for the name in the environment and returns it
- note that the name e always returns the current environment
- foo 👉 10 (if foo was previously bound to 10)
name returns the name as a string
- @foo name () 👉 "foo"

For Later:

eval returns its value unevaluated
- @ a 👉 a (returns the name, doesn't evaluate it)
- @(a + 2) 👉 a + 2 (returns the message send, doesn't evaluate it)

For Pair:

eval returns a pair with a and b evaluated
a returns the left value
- 10 : 20 a () 👉 10
b returns the right value
- 10 : 20 b () 👉 20

For Array:

eval returns an array with all items evaluated
- [1, 2, a] 👉 [1, 2, 3] (if a was bound to 3)
size returns the array length
- [1, 2, 3] size () 👉 3
. returns the array item at the provided index
- [1, 2, 3] . 0 👉 1
- [1, 2, 3] . 2 👉 3

For Block:

eval evals all items and returns the result of the last one
- {1, 2, 3} 👉 3

For Msg, what you would call a method, consist of a "verb" and an "object":

eval returns a new message with the object evaluated
- \ + a 👉 \ + 42 (if a was bound to 42)
verb returns the verb (a string)
- \ + 10 verb () 👉 "+"
obj returns the object (can be of any type, even another message)
- \ + 10 obj () 👉 10

For Send, what you would call a method call, consists of a "subject" and a message like 1 + 2:

eval
- evaluates the subject
- evaluates the message
- enters a new stack frame
- binds the evaluated subject to it
- binds the evaluated message to msg
- binds the evaluated object to that
- sends the message to the subject in the new environment
subj returns the subject
- @(1 + 2) subj () 👉 1
msg returns the message
- @(1 + 2) msg () 👉 \ + 2

For Frame:

eval returns itself
- e 👉 returns the current environment
up returns the frame parent
- e up () 👉 returns the parent of the current environment
eval-in evaluates the object in the subject environment
- e eval-in 42 👉 42
bind binds the value b with the name in a, the subject must be a pair
- e bind "answer" : 42 👉 returns the environment
find does a lookup in the subject environment for the object
- e find "answer" 👉 the value bound to answer in e
- 42 in the bind above
get-type returns the type of the subject
- e get-type 1 👉 a value representing the type for Int
- e get-type e 👉 a value representing the type for Frame
new-frame returns an empty root environment
- e new-frame () 👉 a new Frame instance

With those message handlers in the environment we can start our booting process in the language itself, first we need a way to bind new message handlers:

e find (e get-type e)
  bind "reply" : @{
    e find (e get-type (e eval-in (msg obj() a() subj())))
      bind ((msg obj() a() msg() verb()) : (msg obj() b())),
    ()
  },

This one is the most complex so don't worry if you don't understand it :)

The trick is to find in the environment the value bound to the type of the environment itself (Frame), you can think of it as the prototype of the type Frame (which is itself an instance of the Frame type 🐢).

We send the message bind "reply" : @{...} to the returned value to bind an implementation of the message reply when sent to an instance of type Frame (like the current environment).

The implementation of reply expects a pair:

the left side should be a message "example" of the type of message we want to reply to
the right side is the implementation, a value to evaluate to reply to that message

The high level implementation of reply is

e find $subjectType bind $verb : $implementation.

We get the $subjectType from the subject on the message in the argument's left side.

For example in e reply @(1 add _) : 42 we want to register an implementation for the message add for the type int, the implementation will always return 42.

We get the type by evaluating the subject 1 and asking for its type:

(e get-type (e eval-in (msg obj() a() subj())))

We get the $verb from the message on the left side of the argument:

((msg obj() a() msg() verb())

And the $implementation from the right value:

(msg obj() b()))

The message's object is ignored, most of the time it will be a placeholder like _.

In a different implementation we could bind "multimethods" by using the object's type and registering the message handlers for the pair of types in the subject and object, but to keep it simple we just use the subject's type.

Now we can define methods!

The first one is the message -> sent to the Send type, it's a convenience method to turn:

e reply @(()  not _) : 1,

Into:

@(()  not _) -> 1,

Which in my opinion looks much nicer, the implementation:

e reply @(@(subj verb obj) -> body) : @(e reply it : that),

You can see it like pattern matching, if you see a message like this:

@(subj verb obj) -> body

Run this:

e reply it : that

Remember that the message subject is bound to it and the object to that.

With our nice syntax to define methods we are ready to start growing our language, let's start by defining a way to bind variables, the syntax is going to be @myVar is 42, which will bind 42 to the name myVar:

@(@a is _) -> @(e up () bind (it name ()) : that)

Because a call to a method enters its own frame we want to bind the variable in the parent's frame, since after returning from this method the current frame will be discarded and with it all local bindings, that's why we do e up () and then we bind the object (that) to the name of the variable in the subject.

Those are the most complex and low level implementations, now let's move to boolean operators. We don't need any in our primitive environment because we can define different implementations for different types.

Since () is the only falsy value its implementation has to return any truthy value, we choose to return 1, but it could be 0 or anything else.

For the other types we return (), since the negation of any non-falsy value is a falsy value.

@(()  not _) -> 1,
@(0   not _) -> (),
@(0.0 not _) -> (),
@(""  not _) -> (),

Moving to boolean or, if the left hand side is falsy we evaluate the right hand side, this implementation is needed for nil, since it's the only falsy value.

We evaluate the right hand side to allow for short-circuit semantics if the right hand side is wrapped in a Later, in Macros and optimizations: it's just a phase I show how to write a phase that automatically wraps the right hand side for us to get short-circuit booleans without any special syntax or special core language semantics.

For all other types or returns the left hand side:

@(()  or _) -> @(e eval-in that),
@(0   or _) -> @it,
@(0.0 or _) -> @it,
@(""  or _) -> @it,

Boolean and is the reverse, for nil it returns nil without evaluating the right hand side, for all other values it returns the result of evaluating the right hand side:

@(()  and _) -> (),
@(0   and _) -> @(e eval-in that),
@(0.0 and _) -> @(e eval-in that),
@(""  and _) -> @(e eval-in that),

Simple conditionals can be implemented with the ternary operator, the message shape is condition ? ifTrue : ifFalse, the implementation for nil evals the right side of the pair (again, to allow Later and avoid evaluating the side not taken), the rest evaluate the left side:

@(()  ? _) -> @(e eval-in (that b ())),
@(0   ? _) -> @(e eval-in (that a ())),
@(0.0 ? _) -> @(e eval-in (that a ())),
@(""  ? _) -> @(e eval-in (that a ())),

Not equals can be implemented in terms of the primitive = and negation, which we defined above:

@(()  != _) -> @(it = that not()),
@(0   != _) -> @(it = that not()),
@(0.0 != _) -> @(it = that not()),
@(""  != _) -> @(it = that not()),

Greater or equals can be implemented in terms of the primitives = and < and the boolean operators defined above, we do some extra work to return the left hand side if the comparison is true to allow chaining comparisons like 3 >= 2 >= 1 👉 3:

@(()  >= _) -> @(that = it),
@(0   >= _) -> @((it = that) or (that < it) and it),
@(0.0 >= _) -> @((it = that) or (that < it) and it),
@(""  >= _) -> @((it = that) or (that < it) and it),

Similar implementation for less or equals:

@(()  <= _) -> @(that = it),
@(0   <= _) -> @((it < that) or (it = that)),
@(0.0 <= _) -> @((it < that) or (it = that)),
@(""  <= _) -> @((it < that) or (it = that)),

And greater than:

@(()  > _) -> (),
@(0   > _) -> @((it <= that not()) and it),
@(0.0 > _) -> @((it <= that not()) and it),
@(""  > _) -> @((it <= that not()) and it),

We can define empty? for strings and arrays in terms of size and =

@("" empty?()) -> @(it size () = 0),
@([] empty?()) -> @(it size () = 0),

And some sample code to test our implementations:

[
  () not(), 1 not(), 1.0 not(), "" not(),

  () or 10, 10 or 11, 1.5 or 2, "" or 1,

  () and 1, 1 and 2, 2.5 and 3, "" and 4,

  ()  ? 1 : 2,
  1   ? 3 : 4,
  1.1 ? 5 : 6,
  "!" ? 7 : 8,

  () != (), 1 != 1, 1.1 != 1.1, "" != "",
  () != 0, 1 != 2, 1.1 != 1.2, "" != ".",

  () >= (), 1 >= 1, 1.1 >= 1.1, "a" >= "a",
  3 >= 2, 1.3 >= 1.2, "c" >= "b",
  1 >= 2, 1.1 >= 1.2, "a" >= "b",

  () <= (), 1 <= 1, 1.1 <= 1.1, "a" <= "a",
  3 <= 2, 1.3 <= 1.2, "c" <= "b",
  1 <= 2, 1.1 <= 1.2, "a" <= "b",

  () > (), 1 > 1, 1.1 > 1.1, "a" > "a",
  3 > 2, 1.3 > 1.2, "c" > "b",
  1 > 2, 1.1 > 1.2, "a" > "b",

  "" empty?(),
  [] empty?(),
  "1" empty?(),
  [1] empty?()
]

And now we can move on to bootstrap civilization the OOP way :)

Mariano Guerra's Log