Bootstrap post-collapse OOP technology (Part 1)
Quoting collapseos.org:
Winter is coming and Collapse OS aims to soften the blow. It is a Forth operating system and a collection of tools and documentation with a single purpose: preserve the ability to program microcontrollers through civilizational collapse.
But how do we preserve the ability to program OOP through civilizational collapse?
In previous posts: Make Your Self, Macros and optimizations: it's just a phase and The proof is in the meta-tracing JIT interpreter I described the design and implementation of a minimal object oriented language with implementations in JavaScript and the PyPy toolchain.
In this one I will sketch the minimal set of features the language has to provide to allow booting/bootstrapping the rest of the language in itself.
For future reference this post is written with the code on this commit in mind:
- The implementation of the core language runtime is at fatter.js
- The grammar is at fatter.js:147
- The code testing the bootstrap script is at fatt.boot.test.js:206
From the primitive environment definition we can extract the set of methods that are required (at least for now) to define the rest:
For type Nil (the only falsy value):
-
eval
returns itself-
()
👉()
-
-
<
less than (always false) -
=
equals, returns1
if the left hand side (subject) and right hand side (object) are both nil-
() = ()
👉1
-
For Int and Float types the implementations work as you would expect:
-
eval
returns itself <
=
-
+
,-
,*
,/
For String:
-
eval
returns itself <
=
-
+
string concatenation -
size
returns the string length
For Name, a binding to a value in the environment:
-
eval
does a lookup for the name in the environment and returns it- note that the name
e
always returns the current environment -
foo
👉10
(iffoo
was previously bound to10
)
- note that the name
-
name
returns the name as a string-
@foo name ()
👉"foo"
-
For Later:
-
eval
returns its value unevaluated-
@ a
👉a
(returns the name, doesn't evaluate it) -
@(a + 2)
👉a + 2
(returns the message send, doesn't evaluate it)
-
For Pair:
-
eval
returns a pair witha
andb
evaluated -
a
returns the left value-
10 : 20 a ()
👉10
-
-
b
returns the right value-
10 : 20 b ()
👉20
-
For Array:
-
eval
returns an array with all items evaluated-
[1, 2, a]
👉[1, 2, 3]
(ifa
was bound to3
)
-
-
size
returns the array length-
[1, 2, 3] size ()
👉3
-
-
.
returns the array item at the provided index-
[1, 2, 3] . 0
👉1
-
[1, 2, 3] . 2
👉3
-
For Block:
-
eval
evals all items and returns the result of the last one-
{1, 2, 3}
👉3
-
For Msg, what you would call a method, consist of a "verb" and an "object":
-
eval
returns a new message with the object evaluated-
\ + a
👉\ + 42
(ifa
was bound to42
)
-
-
verb
returns the verb (a string)-
\ + 10 verb ()
👉"+"
-
-
obj
returns the object (can be of any type, even another message)-
\ + 10 obj ()
👉10
-
For Send, what you would call a method call, consists of a "subject" and a message like 1 + 2
:
-
eval
- evaluates the subject
- evaluates the message
- enters a new stack frame
- binds the evaluated subject to
it
- binds the evaluated message to
msg
- binds the evaluated object to
that
- sends the message to the subject in the new environment
-
subj
returns the subject-
@(1 + 2) subj ()
👉1
-
-
msg
returns the message-
@(1 + 2) msg ()
👉\ + 2
-
For Frame:
-
eval
returns itself-
e
👉 returns the current environment
-
-
up
returns the frame parent-
e up ()
👉 returns the parent of the current environment
-
-
eval-in
evaluates the object in the subject environment-
e eval-in 42
👉42
-
-
bind
binds the valueb
with the name ina
, the subject must be a pair-
e bind "answer" : 42
👉 returns the environment
-
-
find
does a lookup in the subject environment for the object-
e find "answer"
👉 the value bound toanswer
ine
-
42
in thebind
above
-
-
get-type
returns the type of the subject-
e get-type 1
👉 a value representing the type for Int -
e get-type e
👉 a value representing the type for Frame
-
-
new-frame
returns an empty root environment-
e new-frame ()
👉 a new Frame instance
-
With those message handlers in the environment we can start our booting process in the language itself, first we need a way to bind new message handlers:
e find (e get-type e) bind "reply" : @{ e find (e get-type (e eval-in (msg obj() a() subj()))) bind ((msg obj() a() msg() verb()) : (msg obj() b())), () },
This one is the most complex so don't worry if you don't understand it :)
The trick is to find in the environment the value bound to the type of the environment itself (Frame
), you can think of it as the prototype of the type Frame
(which is itself an instance of the Frame
type 🐢).
We send the message bind "reply" : @{...}
to the returned value to bind an implementation of the message reply
when sent to an instance of type Frame
(like the current environment).
The implementation of reply
expects a pair:
- the left side should be a message "example" of the type of message we want to reply to
- the right side is the implementation, a value to evaluate to reply to that message
The high level implementation of reply
is
e find $subjectType bind $verb : $implementation
.
We get the $subjectType
from the subject on the message in the argument's left side.
For example in e reply @(1 add _) : 42
we want to register an implementation for the message add
for the type int, the implementation will always return 42
.
We get the type by evaluating the subject 1
and asking for its type:
(e get-type (e eval-in (msg obj() a() subj())))
We get the $verb
from the message on the left side of the argument:
((msg obj() a() msg() verb())
And the $implementation
from the right value:
(msg obj() b()))
The message's object is ignored, most of the time it will be a placeholder like _
.
In a different implementation we could bind "multimethods" by using the object's type and registering the message handlers for the pair of types in the subject and object, but to keep it simple we just use the subject's type.
Now we can define methods!
The first one is the message ->
sent to the Send
type, it's a convenience method to turn:
e reply @(() not _) : 1,
Into:
@(() not _) -> 1,
Which in my opinion looks much nicer, the implementation:
e reply @(@(subj verb obj) -> body) : @(e reply it : that),
You can see it like pattern matching, if you see a message like this:
@(subj verb obj) -> body
Run this:
e reply it : that
Remember that the message subject is bound to it
and the object to that
.
With our nice syntax to define methods we are ready to start growing our language, let's start by defining a way to bind variables, the syntax is going to be @myVar is 42
, which will bind 42
to the name myVar
:
@(@a is _) -> @(e up () bind (it name ()) : that)
Because a call to a method enters its own frame we want to bind the variable in the parent's frame, since after returning from this method the current frame will be discarded and with it all local bindings, that's why we do e up ()
and then we bind the object (that
) to the name of the variable in the subject.
Those are the most complex and low level implementations, now let's move to boolean operators. We don't need any in our primitive environment because we can define different implementations for different types.
Since ()
is the only falsy value its implementation has to return any truthy value, we choose to return 1
, but it could be 0
or anything else.
For the other types we return ()
, since the negation of any non-falsy value is a falsy value.
@(() not _) -> 1, @(0 not _) -> (), @(0.0 not _) -> (), @("" not _) -> (),
Moving to boolean or
, if the left hand side is falsy we evaluate the right hand side, this implementation is needed for nil, since it's the only falsy value.
We evaluate the right hand side to allow for short-circuit semantics if the right hand side is wrapped in a Later
, in Macros and optimizations: it's just a phase I show how to write a phase that automatically wraps the right hand side for us to get short-circuit booleans without any special syntax or special core language semantics.
For all other types or
returns the left hand side:
@(() or _) -> @(e eval-in that), @(0 or _) -> @it, @(0.0 or _) -> @it, @("" or _) -> @it,
Boolean and
is the reverse, for nil it returns nil without evaluating the right hand side, for all other values it returns the result of evaluating the right hand side:
@(() and _) -> (), @(0 and _) -> @(e eval-in that), @(0.0 and _) -> @(e eval-in that), @("" and _) -> @(e eval-in that),
Simple conditionals can be implemented with the ternary operator, the message shape is condition ? ifTrue : ifFalse
, the implementation for nil evals the right side of the pair (again, to allow Later
and avoid evaluating the side not taken), the rest evaluate the left side:
@(() ? _) -> @(e eval-in (that b ())), @(0 ? _) -> @(e eval-in (that a ())), @(0.0 ? _) -> @(e eval-in (that a ())), @("" ? _) -> @(e eval-in (that a ())),
Not equals can be implemented in terms of the primitive =
and negation, which we defined above:
@(() != _) -> @(it = that not()), @(0 != _) -> @(it = that not()), @(0.0 != _) -> @(it = that not()), @("" != _) -> @(it = that not()),
Greater or equals can be implemented in terms of the primitives =
and <
and the boolean operators defined above, we do some extra work to return the left hand side if the comparison is true to allow chaining comparisons like 3 >= 2 >= 1
👉 3
:
@(() >= _) -> @(that = it), @(0 >= _) -> @((it = that) or (that < it) and it), @(0.0 >= _) -> @((it = that) or (that < it) and it), @("" >= _) -> @((it = that) or (that < it) and it),
Similar implementation for less or equals:
@(() <= _) -> @(that = it), @(0 <= _) -> @((it < that) or (it = that)), @(0.0 <= _) -> @((it < that) or (it = that)), @("" <= _) -> @((it < that) or (it = that)),
And greater than:
@(() > _) -> (), @(0 > _) -> @((it <= that not()) and it), @(0.0 > _) -> @((it <= that not()) and it), @("" > _) -> @((it <= that not()) and it),
We can define empty?
for strings and arrays in terms of size
and =
@("" empty?()) -> @(it size () = 0), @([] empty?()) -> @(it size () = 0),
And some sample code to test our implementations:
[ () not(), 1 not(), 1.0 not(), "" not(), () or 10, 10 or 11, 1.5 or 2, "" or 1, () and 1, 1 and 2, 2.5 and 3, "" and 4, () ? 1 : 2, 1 ? 3 : 4, 1.1 ? 5 : 6, "!" ? 7 : 8, () != (), 1 != 1, 1.1 != 1.1, "" != "", () != 0, 1 != 2, 1.1 != 1.2, "" != ".", () >= (), 1 >= 1, 1.1 >= 1.1, "a" >= "a", 3 >= 2, 1.3 >= 1.2, "c" >= "b", 1 >= 2, 1.1 >= 1.2, "a" >= "b", () <= (), 1 <= 1, 1.1 <= 1.1, "a" <= "a", 3 <= 2, 1.3 <= 1.2, "c" <= "b", 1 <= 2, 1.1 <= 1.2, "a" <= "b", () > (), 1 > 1, 1.1 > 1.1, "a" > "a", 3 > 2, 1.3 > 1.2, "c" > "b", 1 > 2, 1.1 > 1.2, "a" > "b", "" empty?(), [] empty?(), "1" empty?(), [1] empty?() ]
And now we can move on to bootstrap civilization the OOP way :)