S-Expression Parsing in Lime

Description

This is an exciting time in Lime development! I've been working a lot on string parsing in Lime, and I've finally gotten to the point where I was able to implement a simple S-Expression parser in Lime.

An S-expression parser is a good idiomatic example of programming in a given language, and as the development of LimeOS continues, a self-hosted Lime runtime (Lime, written in Lime) will be needed to dynamically run Lime programs stored in memory and typed by the user.

The Code

Strategy

I decided to go with a relatively simple procedure for parsing S-expressions, taken from Peter Norvig's wonderful (How to Write a (Lisp) Interpreter (in Python)) (the article that first got me into Lisp!). In Python, this is simply:

def parse(string):
    return string.replace("(", " ( ").replace(")", " ) ").split()

At first, this looks a bit odd. Working it out on paper however shows that it indeed works:

   "(this (is an (s expression)))"
-> " ( this  ( is an  ( s expression ) ) ) "
-> ["(", "this", "(", "is", "an", "(", "s", "expression", ")", ")", ")"]

However, str.replace and str.split aren't implemented in Lime...yet!

string_split

First, I needed to implement a string_split method to split strings by spaces. I decided to go with the following strategy:

(define (string_split string separator)
        (filter is_nonempty_string (_string_split string separator "" (list 0))))

(define (_string_split string separator buf arr)
          (if (= string "") (list_append arr buf)
                 (if (string_startswith string separator)
                     (_string_split (string_ltrim (string_length separator) string) separator "" (list_append arr buf))
                     (_string_split (string_cdr string) separator (string_add buf (string_car string)) arr))))

In the main recursive function, _string_split (the regular string_split exists just to set the default arguments of buf and arr to default values), the flow is as follows:

  • if we have no more characters to parse, return the list we've created
  • otherwise,
  • if the current string in memory starts with the delimiter, remove the delimiter and add a new empty item to the list
  • otherwise, add the character at the beginning of the string to the previous element in the iterated list

This performs relatively well. One thing I did have to account for was concurrent delimiters. That is, when the string an item is being separated by occurs directly next to itself 2 or more times. For example, string_split not taking this into account would have the following behavior:

   (string_split "a b  c")
-> [a, b, , c]

which doesn't make good heuristic sense: separating words by spaces should also separate by blocks of space. Therefore, I defined some pretty simple functions, is_empty_string and is_nonempty_string to get around this. Before the result of string_split is returned, I use the filter function and is_nonempty_string to only return the results that are nonempty.

string_replace

The string iteration algorithm for string_replace is almost exactly the same as that for string_split:

(define (string_replace string search replacement)
        (_string_replace string search replacement ""))

(define (_string_replace string search replacement buf)
        (if (= string "") buf
            (if (string_startswith string search)
                (_string_replace (string_ltrim (string_length search) string) search replacement (string_add buf replacement))
                (_string_replace (string_cdr string) search replacement (string_add buf (string_car string))))))

However, instead of adding elements to a list of split elements, it simply traverses the string, and when it hits the delimiter, it replaces that delimiter with the replacement string and jumps to the character immediately after the replacement string.

Bringing It All Together

The final program I wrote, sexp.lime, looks like:

;;
;; [sexp.lime]
;;
;; A simple S-expression parser, written in Lime.
;; Copyright (C) 2018, Liam Schumm
;;

(define (parse string) (string_split
                       (string_replace (string_replace string
                                        ")" " ) ")
                                        "(" " ( ")
                       " "))
(define (main args) (print (parse (!! 0 args))))

Compiling this program with llc and running it then yields:

$ llc sexp.lime
$ ./a '(+ (- 3 4))
[(, +, (, -, 3, 4, ), )]

A properly parsed S-expression!

Lime System Information

This article was written and tested on Lime alpha, commit 53f8ae1, under 64-bit Arch Linux, kernel 4.18.5-arch1-1-ARCH.