This is an exciting time in Lime development! I've been working a lot on string parsing in Lime, and I've finally gotten to the point where I was able to implement a simple S-Expression parser in Lime.
An S-expression parser is a good idiomatic example of programming in a given language, and as the development of LimeOS continues, a self-hosted Lime runtime (Lime, written in Lime) will be needed to dynamically run Lime programs stored in memory and typed by the user.
I decided to go with a relatively simple procedure for parsing S-expressions, taken from Peter Norvig's wonderful (How to Write a (Lisp) Interpreter (in Python)) (the article that first got me into Lisp!). In Python, this is simply:
return string.replace("(", " ( ").replace(")", " ) ").split()
At first, this looks a bit odd. Working it out on paper however shows that it indeed works:
"(this (is an (s expression)))"
-> " ( this ( is an ( s expression ) ) ) "
-> ["(", "this", "(", "is", "an", "(", "s", "expression", ")", ")", ")"]
str.split aren't implemented in Lime...yet!
First, I needed to implement a
string_split method to split strings by spaces. I decided to go with the following strategy:
(define (string_split string separator)
(filter is_nonempty_string (_string_split string separator "" (list 0))))
(define (_string_split string separator buf arr)
(if (= string "") (list_append arr buf)
(if (string_startswith string separator)
(_string_split (string_ltrim (string_length separator) string) separator "" (list_append arr buf))
(_string_split (string_cdr string) separator (string_add buf (string_car string)) arr))))
In the main recursive function,
_string_split (the regular
string_split exists just to set the default arguments of
arr to default values), the flow is as follows:
- if we have no more characters to parse, return the list we've created
- if the current string in memory starts with the delimiter, remove the delimiter and add a new empty item to the list
- otherwise, add the character at the beginning of the string to the previous element in the iterated list
This performs relatively well. One thing I did have to account for was concurrent delimiters. That is, when the string an item is being separated by occurs directly next to itself 2 or more times. For example,
string_split not taking this into account would have the following behavior:
(string_split "a b c")
-> [a, b, , c]
which doesn't make good heuristic sense: separating words by spaces should also separate by blocks of space. Therefore, I defined some pretty simple functions,
is_nonempty_string to get around this. Before the result of
string_split is returned, I use the
filter function and
is_nonempty_string to only return the results that are nonempty.
The string iteration algorithm for
string_replace is almost exactly the same as that for
(define (string_replace string search replacement)
(_string_replace string search replacement ""))
(define (_string_replace string search replacement buf)
(if (= string "") buf
(if (string_startswith string search)
(_string_replace (string_ltrim (string_length search) string) search replacement (string_add buf replacement))
(_string_replace (string_cdr string) search replacement (string_add buf (string_car string))))))
However, instead of adding elements to a list of split elements, it simply traverses the string, and when it hits the delimiter, it replaces that delimiter with the replacement string and jumps to the character immediately after the replacement string.
Bringing It All Together
The final program I wrote,
sexp.lime, looks like:
;; A simple S-expression parser, written in Lime.
;; Copyright (C) 2018, Liam Schumm
(define (parse string) (string_split
(string_replace (string_replace string
")" " ) ")
"(" " ( ")
(define (main args) (print (parse (!! 0 args))))
Compiling this program with
llc and running it then yields:
$ llc sexp.lime
$ ./a '(+ (- 3 4))
[(, +, (, -, 3, 4, ), )]
A properly parsed S-expression!
Lime System Information
This article was written and tested on Lime alpha, commit
53f8ae1, under 64-bit Arch Linux, kernel