Tuesday, December 21, 2021

SectorLISP: Lisp With GC in 436 Bytes

Justine Tunney (Hacker News):

The SectorLISP project has achieved its goal of creating a LISP that’s tiny enough to fit in the master boot sector of a floppy disk. To the best of our knowledge, this is the tiniest LISP to date. Since a master boot record is only 512 bytes, that means LISP is now tied with FORTH to be the most lightweight high-level programming language in the world.

[…]

One of the most important code size saving techniques has been to avoid the temptation of defining data structures in such a way that NIL is encoded as zero. For example, if the lowest bit of a word is a flag bit for telling atoms apart from cons, then that bit must be 1 for cons cells since NIL is an atom. In that case, all words representing cons cells effectively become a misaligned pointer and extra code needs to be written so the 1 bit can be cleared before addressing memory. Avoiding those address calculation woes by defining atoms as oddly-numbered words is far more profitable than avoiding explicit NIL compares.

Justine Tunney (tweet, Hacker News):

There’s been many changes over the past few months that made it possible to shave away another hundred bytes from the i8086 assembly implementation. It left plenty of room to add a 40 byte garbage collector.

[…]

It works by saving the position of the cons stack before and after evaluation. Those values are called A and B. It then decreases the cx cons stack pointer further by recursively copying the Eval result. The new stack position is called C. The memory between B and C is then copied up to A. Once that happens, the new cons stack position becomes A - B + C. The purpose of this operation is to discard all the cons cells that got created which aren’t part of the result, because we know for certain they can’t be accessed anymore (assuming functions aren’t added which mutate cells).

[…]

Similar to how a Chess game may unfold very differently if a piece is moved to an unintended adjacent square, an x86 program can take on an entirely different meaning if the instruction pointer becomes off by one. We were able to use this to our advantage, since that lets us code functions in such a way that they overlap with one another.

1 Comment RSS · Twitter

The first time I saw the off-by-one IP trick was on the 6502, in the BBC micro's ROM disassembly. It's useful in tiny programming. You're much more limited on the 68000 (instructions start at an even address) or ARM.

It also has a practical consequence when writing a debugger. How do you know which instructions precede the current one? People generally want to know what the previous instructions were when debugging, so you have to display them. There's a trick to make it work, even if you don't have symbols...

Leave a Comment