S-Expression Parsing in Lime

Description

This is an exciting time in Lime development! I've been working a lot on string parsing in Lime, and I've finally gotten to the point where I was able to implement a simple S-Expression parser in Lime.

An S-expression parser is a good idiomatic example of programming in a given language, and as the development of LimeOS continues, a self-hosted Lime runtime (Lime, written in Lime) will be needed to dynamically run Lime programs stored in memory and typed by the user.

The Code

Strategy

I decided to go with a relatively simple procedure for parsing S-expressions, taken from Peter Norvig's wonderful (How to Write a (Lisp) Interpreter (in Python)) (the article that first got me into Lisp!). In Python, this is simply:

def parse(string):
    return string.replace("(", " ( ").replace(")", " ) ").split()

At first, this looks a bit odd. Working it out on paper however shows that it indeed works:

   "(this (is an (s expression)))"
-> " ( this  ( is an  ( s expression ) ) ) "
-> ["(", "this", "(", "is", "an", "(", "s", "expression", ")", ")", ")"]

However, str.replace and str.split aren't implemented in Lime...yet!

string_split

First, I needed to implement a string_split method to split strings by spaces. I decided to go with the following strategy:

(define (string_split string separator)
        (filter is_nonempty_string (_string_split string separator "" (list 0))))

(define (_string_split string separator buf arr)
          (if (= string "") (list_append arr buf)
                 (if (string_startswith string separator)
                     (_string_split (string_ltrim (string_length separator) string) separator "" (list_append arr buf))
                     (_string_split (string_cdr string) separator (string_add buf (string_car string)) arr))))

In the main recursive function, _string_split (the regular string_split exists just to set the default arguments of buf and arr to default values), the flow is as follows:

  • if we have no more characters to parse, return the list we've created
  • otherwise,
  • if the current string in memory starts with the delimiter, remove the delimiter and add a new empty item to the list
  • otherwise, add the character at the beginning of the string to the previous element in the iterated list

This performs relatively well. One thing I did have to account for was concurrent delimiters. That is, when the string an item is being separated by occurs directly next to itself 2 or more times. For example, string_split not taking this into account would have the following behavior:

   (string_split "a b  c")
-> [a, b, , c]

which doesn't make good heuristic sense: separating words by spaces should also separate by blocks of space. Therefore, I defined some pretty simple functions, is_empty_string and is_nonempty_string to get around this. Before the result of string_split is returned, I use the filter function and is_nonempty_string to only return the results that are nonempty.

string_replace

The string iteration algorithm for string_replace is almost exactly the same as that for string_split:

(define (string_replace string search replacement)
        (_string_replace string search replacement ""))

(define (_string_replace string search replacement buf)
        (if (= string "") buf
            (if (string_startswith string search)
                (_string_replace (string_ltrim (string_length search) string) search replacement (string_add buf replacement))
                (_string_replace (string_cdr string) search replacement (string_add buf (string_car string))))))

However, instead of adding elements to a list of split elements, it simply traverses the string, and when it hits the delimiter, it replaces that delimiter with the replacement string and jumps to the character immediately after the replacement string.

Bringing It All Together

The final program I wrote, sexp.lime, looks like:

;;
;; [sexp.lime]
;;
;; A simple S-expression parser, written in Lime.
;; Copyright (C) 2018, Liam Schumm
;;

(define (parse string) (string_split
                       (string_replace (string_replace string
                                        ")" " ) ")
                                        "(" " ( ")
                       " "))
(define (main args) (print (parse (!! 0 args))))

Compiling this program with llc and running it then yields:

$ llc sexp.lime
$ ./a '(+ (- 3 4))
[(, +, (, -, 3, 4, ), )]

A properly parsed S-expression!

Lime System Information

This article was written and tested on Lime alpha, commit 53f8ae1, under 64-bit Arch Linux, kernel 4.18.5-arch1-1-ARCH.

Lime Roadmap.

I'm in an independent study class at my high school, and for it we're supposed to come up with a "roadmap" for the projects we're working on. I'm working on Lime as a part of this class, so I was supposed to write out a plan with times. Here's what I came up with

  • September 20: Get bootloader/os written in Lime working
  • October 20: Get lambda lifting and higher order functions working
  • December 20: Present at QED (a local high school math conference)
  • January 20: Run benchmarks and comparisons to C, Chez Scheme, and Haskell to determine performance
  • February 20: Get basic optimizer working (constant folding, deforestation, other standard techniques)
  • March 20: Fully Lisp hygenic macro system

This is subject to change! I'll edit this post if I change my schedule.

Lime Update 1

Tomorrow's my first day of school, so I'll probably have less development and YouTube time. However, I'll get a chance to do some more 3D printing and robotics.

I've been working on some development for Lime, my programming language. I've added a few new features:

Multi-Line if Statements

You can now write if statements with multiple conditionals and clauses. For example, the following code:

(define (print x)
        (if (equal (type x) 0) (iprint x)
            (if (equal (type x) 1) (map $print x)
                (if (equal (type x) 2) (sprint x)
                    (if (equal (type x) 3) (if x (sprint "true") (sprint "false"))
                        (if (equal (type x) 4) (fprint x)
                    (exception "print: unknown type.")))))))

could be refactored as:

(define (print x)
        (if (equal (type x) 0) (iprint x)
            (equal (type x) 1) (map $print x)
            (equal (type x) 2) (sprint x)
            (equal (type x) 3) (if x (sprint "true") (sprint "false"))
            (equal (type x) 4) (fprint x)
            (exception "print: unknown type.")))

Originally, I wanted to make case/elif statements that would be compiled to nested if statements with macros, but purely functional macros don't exactly have the functionality you need to do that.

Floating Point Operations

For some strange reason, floating point numbers don't work properly in Lime. x87's FPU instruction set is a million times more complicated than x86 already is. The operation

(print (/ 3.0 2.0))                                                                                           

returns the value

1.4261667296

while integer operations like

(print (+ 3.0 2.0))

returns

5.0

Because Lime objects must be passed around on an integer stack, they're stored in 4 32 bit integers:

4   # the first number, 4, indicator of the number of integers required to store the float
4   # 4, the type ID of floating point numbers
667 # the main integer
1   # number of times to divide by 10
0   # 0 if the number is postive, 1 if it's negative

thus, extensive rounding occurs, truncating a lot of decimal places. I'm not sure if inaccuracy this high could be a result of floating point error propogation or if there's an error in my code.

YouTube

Video as a Medium of Communication

YouTube videos have been a really important resource to me throughout my adventures learning to code. Video as a medium harnesses both sight and hearing, as opposed to just a static page, and allows a much greater level of detail and replicability (every keypress, application, mouse click, etc.) for technical guides.

Throughout my adventures teaching coding I've been trying to look for a good medium to convey tutorials.

Comparison of Teaching Mediums

Text

I've undoubtably used so many text-only resources for coding. StackOverflow is all text, Medium blog posts about setting up your Webpack config are all text. Man pages and all documentation is text. Text is incredibly convenient, incredibly dynamic, indexible, and incredibly easy to serve. However, while writing and distributing text content is easy, distributing good text content is very difficult. In today's digital age, humans aren't really used to reading long blocks of text, and if your tutorial isn't very engaging while also being detailed, it's easy to get derailed and totally lost.

Lecture

The history of lecturing goes back to the times of medieval universities. A critical aspect of this lecturing was the transfer of the instructor's original source to a student's handwritten notes. In this case, lecturing was the only way to transfer information before the invention of the printing press.

A 1736 engraving, by William Hogarth, titled "Scholars at a Lecture". Students are creammed into a lecture room, discordantly moving about and making various faces, and taking notes.

Lecturing's value, as well as its downfall, comes from pacing. All students must follow the pace of the lecturer. This helps a teacher ensure a class of students are all at the same level; however, falling behind or losing track of a lecture then can render the rest of the lecture useless. Depending on the student, their strengths, and their prerequisite knowledge, lecturing can often spend most time on content a student already knows, and the least on the areas with which they're least familiar.

However, lecturing is easy for a teacher, and allows them an efficient means of communication through physical presence, and lectures with demonstration and audience interaction can connect the student, teacher, and class together.

Rembrandt's "The Anatomy Lecture of Dr. Nicolaes Tulp". Shows many medieval students gathered around Dr. Nicolaes Tulp, observing his dissection of a human body

Video

Video is somewhere in between text and lecturing. Video captivates the student a lot more than plain text does, and with pausing (you can't pause a lecture) and YouTube speed controls, it's possible to adapt video pretty closely to a student's pace. However, video isn't indexable and searchable the way text is, so to find a video snippet explaining one small technical detail may take a lot of searching and watching, and possibly not the result you're looking for.

Video requires a lot of investment on the part of the teacher to create, record, produce, and script, and can really only follow one path of reasoning. However, it's considerably more accessible than physical teaching (limited to students at schools offering the classes they want to take), and through modern visual effects and editing can be a lot more condensed and punchy than a lecturer talking. Therefore, I think they're a great medium for specifically focused, project-based, coding tutorials!

My First Video

Setting Up Equipment

Remember what I said about "a lot of investment on the part of the teacher"? Yeah...

Turns out, I had a Behringer Xenyx 502 preamp,

A Behringer Xenyx 502 preamp

a Shure SM57,

A Shure SM57 microphone

and a mic stand lying around. Setting this up was pretty easy. Plug the microphone into the mic port on my preamp (it accepts a mic and two stereo line-in), use an adapter to connect the stereo output to a standard size AUX microphone plug. Unfortunately, most computers (my Thinkpad T430s and MacBook Air included) have an audio port that serves as both the microphone and headphone port.

A comparison of iPhone headphones, with an integrated mic versus a standard audio-only headphone.

This AUX port uses a fourth pin on the top for the microphone. Adapting a microphone-only AUX cable to a combo one requires a dongle like this one:

An adapter going from separate microphone and headphone AUX cables to a single 4-pin AUX plug

Overall, I'm happy with my setup.

A photo of my desk with computers, monitors, lamp, preamp, and the microphone on its stand.

Recording

I decided to test out my new setup by doing a tutorial video to create a calculator app in HTML, CSS, and Javascript.

I have a new appreciation for the patience and tenacity of YouTubers. I had to re-record the tutorial about 15 times because of fumbling up the intro, writing the code in a way I realized was confusing or didn't work, and because I forgot to turn on my audio recorder for a take. The few takeaways I have are:

  • Speak slowly and clearly.
  • Don't be afraid to re-take a part.
  • Come up with a script to roughly follow, and try out the tutorial once (i.e., write the code and narrate).

Now the next challenge: editing. I've been able to clip together the audio and video, speed up some parts, and enhance the vocals with iMovie. Editing up a good intro is surprisingly difficult. I'm not releasing this first video, but I'll keep trying different things until I'm happy with the result.

Segmented Integers

Introduction

I've been working on Tequila, a NASM emulator in JavaScript. One problem I ran into pretty quickly is x86's register subdivision. In x86, you have 4 general-purpose registers.

A diagram of x86 registers.

Each register can hold up to 64 bits of memory. However you can address parts of this 64 bits. For example, if the register rax was initialized to 64 bits of zeros, the instruction

MOV eax, 11

would fit the decimal value 11 (1011 in binary) into the second 32 bits of rax. Similarly,

MOV al, 3

would move 11 into the last 8 bytes of rax. After modifying a part of the register, the full register may still be addressed. For example, if rax was initialized to all zeroes,

MOV al, 3

would move the decimal value 3 into the end of rax. Since all the leading bits are zero, the total value of register rax will also be 3. Then,

MOV ah, 2

would load 10 into ah. Thus, ignoring leading zeroes, the value of rax would be:

00000010 00000011

yielding a value of 515.

Note that you can only address the upper and lower half of eax. You cannot address the upper half of rax. This was a design choice in x86 to save space in the instruction set (less addressible register names).

Implementation

Binary-Based Integers

First, what I needed to do was create a base integer type that would get and set integer values from a binary store. I implement the h/l registers using this class. I call this SizedInteger.

I chose to store these values in pure binary instead of as an array of bytes. Since all addressible sizes in x86 are a multiple of 8 bits, this would be a better decision for performance and memory efficiency. I chose to stick with binary as a design choice.

The implementation is as follows:

class SizedInteger {
    constructor (size) {
    this.size = size                       // store the size of the register, in bits
    this.contents = '0'.repeat(this.size)  // fill the value store with 0's to initialize
    }

    get () {
    return parseInt(this.contents, 2)      // parse the binary store as a binary integer
    }

    set (n) {
    let x = (n % (2 ** this.size)).toString(2)

    // take up the full size with binary, padding with 0, so that when this integer
    // is appended to others in memory, they take up exactly the size allocated
    let leftpad = this.size - x.length
    this.contents = '0'.repeat(leftpad) + x
    }
}

Segmented Integers

Now for the good stuff: segmented integers!

class SegmentedInteger {

    // a segmented integer is constructed with two SizedInteger objects,
    // r(egister)1 and r(egister)2
    constructor (r1, r2) {
    this.r1 = r1
    this.r2 = r2
    }

    get contents () {
    return this.r1.contents + this.r2.contents
    }

    set contents (x) {
    this.r1.contents = x.slice(0, this.r1.size)
    this.r2.contents = x.slice(this.r1.size, this.r1.size + this.r2.size)
    }

    get size () {
    return this.r1.size + this.r2.size
    }

    get () {
    return parseInt(this.r1.contents + this.r2.contents, 2)
    }

    set(n) {
    let combined_size = this.r1.size + this.r2.size
    let x  = (n % (2 ** combined_size)).toString(2)
    let leftpad = combined_size - x.length
    x = '0'.repeat(leftpad) + x

    // fill r1, then r2
    this.r1.contents = x.slice(0, this.r1.size)
    this.r2.contents = x.slice(this.r1.size, this.r1.size + this.r2.size)
    }

}

The implementation of SegmentedIntegers was essentially the same as that for SizedIntegers; I simply had to split the binary store between two registers. Because the constructor takes in SizedInteger objects, the objects the constructor is passed references to can be manipulated seperately from the SegmentedInteger.

I used ECMAScript 5's new syntax for getters and setters so that SegmentedIntegers exposed an identical API and functioned exactly the same as SizedIntegers, both so that you could make a SegmentedInteger of a SegmentedInteger, but also to simpify my code for Tequila.

Testing

Here's some test code:

> ah = new SizedInteger(16)
> al = new SizedInteger(16)
> eax = new SegmentedInteger(ah, al)
> rax = new SegmentedInteger(new SizedInteger(32), eax)
> eax.set(9)
'00000000000000000000000000001001'
> eax.get()
9
> al.contents
'0000000000001001'
> ah.contents
'0000000000000000'
> al.set(0)
> eax.get()
0

Welcome

Welcome to my blog!

Here I'll be writing personal things, opinions on tech, and tutorials for various programming things; basically a dumping ground for all my thoughts, technical and not.