RacketCon 2018

I'm attended RacketCon from September 29-30th this 2018. Here are my talk/conference notes and biased thoughts.

Talks

DSSL2: A #lang For Data Structures Students

DSSL2 is a language designed by a professor at Northwestern to provide a good environment for data structures students. It uses a Python-like syntax to be familiar and accessible to students from a wide range of backgrounds. The language looks like this:

#lang dssl2

def insert!(t, k):
    if empty?(t): new_node(k)
    elif zero?(random(size(t) + 1)):
        root_insert!(t, k)
    elif k < t.key:
        t.left = insert!(t.left, k)
        fix_size!(t)
        t
    elif k > t.key:
        t.right = insert!(t.right, k)
        fix_size!(t)
        t
    else: t

and is constructed entirely using the extensive Racket macro system.

  • It's a really effective DSL because it uses existing knowledge of students, and has been successful in undergrad data structures course.
  • There's no existing standard of who knows what in undergrad engineering (aside from 10 weeks of C++) so they'll have to learn something else anyways, unless the professor wants to manage 10 different languages for each assignment.
  • DSSL2 is also effective in that it essentially forces semantic programming (no mutation, etc.), helping guide them in every step of their programming.

Bringing Back Design By Numbers

Design By Numbers is a pretty old language initially designed for artists who wished to learn programming. Some Racket folks have re-implement Design By Numbers in Racket. Code is simple and looks like plain english:

#lang dbn

Paper 0
Pen 100

Command HL Y {
   Set X 25
   Line X Y X Y
}

HL 50

and supports a lot of different language constructs.

  • I think Design By numbers wasn't particularly successful because it doesn't follow semantic guidelines at all.
  • It has inconsistent capitalization, brackets are not scope, and a really odd standard library.
  • While it's difficult for beginners to learn and understand semantics, making a confusing language will make doing anything advanced at all nonobvious to experienced programmers and especially beginners.

Language-Oriented Programming

Language oriented programming is a term that comes up a lot in the Lisp community. Language-Oriented Programming refers to the paradigm of creating DSLs (Domain-Specific Languages) to serve specific tasks: either an entire language (Racket is a DSL built on top of its core primitives) or a small language for a specific task (see DSSL2 above).

How To Ask For The Time In Racket

Gregor is a date (not datetime!) library for Racket.

  • Gregor's (today) and (now) differ because (today) only holds the data for the current day; not time, utc offset, etc. etc.
  • new Date("2001-01-01") returns something different from the actual day, like Fri Dec 31 1999 18:00:00 GMT-0600 (Central Standard Time), because it interprets this as UTC and localizes it.
  • Many other bugs and caveats like this exist. As such, it's useful to only store the date when you want to store a date, not the accompanying time.

A Racket Story

  • DSLs can be really useful for specific coding classes.
  • #lang minecraft-lang
  • #lang vr
  • Specifically difficult languages (e.g., microcontrollers) can greatly benefit from having a language for them written in Racket, by removing the learning curve for a new environment and providing a similar programming interface (Racket) for each task.
  • Cool Racket Packages
  • Protocol Buffers
  • Generates binary objects storing arbitrary data, and can serialize and deserialize automatically. Useful for network transfers.
  • feature-profile
  • LISP is typically slow, since that's not it's focus.
  • Profiling is underused.
  • Built-in profile functionality in Racket is statistical and typically operaties on limited data.
  • The feature-profile module can operate on individual features. While this can lead to micro-optimizations, it also allows for a greater benchmarking granularity that can result in optimization.
  • for-acc
  • User-specified accumulators with automated post-processing.
  • Now part of the base distribution!
  • fc or find-collection
  • Find a module on disk :)
  • handy
  • A mish-mash library. Just no.

Racket For Everyone (Else)

Teaching programming to people who aren't from a computer science/math background can be difficult, but really fruitful (see next section "Digital Ricoeur").

  • A majority of computer science tutorials use mathematical concepts so math people can understand, but this is a huge barrier to entry.
  • DSLs are a good way to provide small interactive deliverables.
  • Scribble and Frog, specifically, for making academic papers and personal blogs.
  • Dr. Racket significantly lowers the barrier to entry, and has great suppotr for application-y interactive projects.

Digital Ricoeur

Digital Ricoeur is an online archive of Paul Ricoeur's works.

  • Paul Ricoeur published in a lot of different journals, so as a result no one university has access to all of his works
  • What's something that can easily search a lot of sources? Computers!
  • Copyright prevents one organization from digitizing and distributing all works
  • Digital Ricoeur compiles lists of records to tell academics where to look for a specific source
  • DSL application: a DSL to notate publications and publication queries

SINBAD

Sinbad (automated Structure InfereNce and Binding of Data) is a library to access and import data uniformly from online RSS/XML feeds. Examples are:

(require sinbad)

(define faa ;; fetch airport status information
    (sail-to "http://services.faa.gov/airport/status/ATL?format=application/xml"
        (cache-timeout 300)  ; refresh every 5 minutes
        (manifest)           ; display schema for available data upon load
        (load)))

(define-struct port (name loc condition delay?))

(fetch faa (make-port "Name" "State" "Weather/Weather" "Delay"))

=> (make-port "Hartsfield-Jackson Atlanta International" "Georgia" "Fair" #false)

;;
;; do more complicated stuff with large datasets
;;

(define vehicle-data-source
    (sail-to "http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip"
        (load) (manifest)))

(define-struct auto (make model year trans city-mpg hwy-mpg))

;; pull a random data record...
(fetch-random vehicle-data-source
    (make-auto "make" "model" "year" "trany" "city08" "highway08"))

;; get all ~40,000 (!) of them...
(define all-vehicles
    (fetch vehicle-data-source
        (make-auto "make" "model" "year" "trany" "city08" "highway08")))

(length all-vehicles)
  • Get data feeds online, and get RSS/XML link, by requesting sail-to a URL.
  • Add in labels, and it'll fetch them out of the XML.
  • Easily load fetched data into a Racket struct.

Cassius: Verified Web Pages

Cassius is a mathematical formalization of web page layout.

  • How do you verify the appearance of a webpage?
  • Webpages are programs: taking inputs (font size, horizontal size, vertical size, etc.), and returning visual output.
  • W3C unittests are out of date and often don't even work.
  • For those with low vision, ensuring full accessibility compliance for websites is really important.
  • Cassius formalizes web page layout as a mathematical system and uses existing theorem-proving techniques to verify webpage layout.

Petri-Net Flavored Places

A petri net or transition net diagrams the flow and behavior of a distributed system.

Dataflow Network Programming with Neuron

Neuron is a compositional framework to make network software development easy and fun.

  • What if Facebook likes could actually contribute something?
  • Users can give computing power! But how do you computationally facilitate this?
  • Neuron defines two constructs:
  • Sockets
  • Structure to pair input/output ports.
  • Codecs
  • Reader and writer with encoder and decoder functions.

State of Racket

Racket's moving to a Chez scheme base! Spaghetti-ish core code (written in C) is being replaced with Chez Scheme code. To compile Chez scheme, just run

make cs

instead of

make

when building Racket.

FARM

FARM (Functional Art, Music, Modeling, and Design) (yes I know it's not an acronym, I didn't choose it) was a night of algorithmic and functionally-generated music. I didn't get a chance to go to the talks on how the algorithms worked (they happened at the same time as RacketCon), but they sounded pretty cool. Livecoding was also a major part of the event, and another cool application of DSLs!

Workshops

syntax/parse

  • Canonical reference for learning Lisp macros: Macro-By-Example.
  • racket/private/template - good example of macros
  • ~? operator
x = #'apple
#'(eat (~? x 'nothing'))
= (eat apple) if x is defined
otherwise, (eat 'nothing')

Web Development in Racket

I went to a workshop with the amazing Jesse Alama about the state of web development in Racket. Here were a few standout things:

  • sweet.js provides the goodness of Racket-like hygienic macros to unholy JavaScript.
  • db is a nice ORM.
  • Here is a nice intro to web development in Racket workshop written using the standard Node/Flask/etc. method of URL dispatchers.

Unrelated

Prime Checking in NASM

Primality Checking

It's very difficult to quickly determine the prime factors of a number because there's no known deterministic algorithm to do so. Therefore, the only method to determine prime factors and primality is a brute force approach with as much information as possible.

Simple Brute-Force

It's rather trivial to show that if a number n has multiple prime factors, every prime factor must be strictly smaller than n. Therefore, the simplest algorithm to check the primality of a number starts n and loops all the way down to 2, stopping before it reaches 1 (every number is divisible by 1). The pseudocode for this is:

function is_prime(n) {
         i = n
         while i > 1 {
               if (n % i) == 0 {
                     // n is divisible by i, abort
                     return false
               }
               i -= 1
         }
         return true // we've made it through
}

Although this algorithm has to iterate through lots of possible divisors, it does work.

Bounded Brute-Force

A less trivial fact is that non-prime number must have at least one prime factor less than or equal to its square root. We proceed by contradiction: multiple factors must exist, since the number is composite, each being larger than the square root, whose product will then at least be larger than the product of the square root with itself, and thus larger than the given number itself.

Therefore, we may set an upper bound of sqrt(n) on our set of divisors, while still ensuring that at least 1 exists. This is promising, because as the number grows, its square root grows more and more slowly, making our algorithm more and more performant for larger integers. Thus, we can use our same is_prime implementation, with a significantly smaller upper bound:

function is_prime(n) {
         i = sqrt(n)
         while i > 1 {
               if (n % i) == 0 {
                     // n is divisible by i, abort
                     return false
               }
               i -= 1
         }
         return true // we've made it through
}

One more trick we can apply is checking first if a number is even. If a number is even and it isn't 2, the number is divisible by 2 and is thus not prime. We'll apply this trick before we start doing any type of intensive computation:

function is_prime(n) {
         if (i % 2) == 0 {
            if i == 2 { return true  }
            else      { return false }
         }
         i = sqrt(n)
         while i > 1 {
               if (n % i) == 0 {
                     // n is divisible by i, abort
                     return false
               }
               i -= 1
         }
         return true // we've made it through
}

Babylonian Bounded Brute-Force

Now, hold on a second... NASM doesn't have a convenient means of taking the square root truncated to an integer. To calculate the square root would necessitate loading to/from the FPU, which is very computationally intensive. However, we may instead use the Babylonian approximation for the square root of a number instead of the square root; which will still satisfyingly reduce the size of our test space. Recall the Babylonian method to calculate the square root of \(N\):

\begin{equation*} f(x) = \frac{x + N / x}{2} \end{equation*}

Repeated application of \(f\) onto any seed (number to start with) will result in a number that gets closer and closer to the true value of \(\sqrt{N}\). If \(x = \sqrt{N}\) exactly, then \(f\) will return the same value:

\begin{equation*} f(\sqrt{N}) = \frac{\sqrt{N} + N / \sqrt{N}}{2} = \frac{\sqrt{N} + \sqrt{N}}{2} = \sqrt{N}. \end{equation*}

For our purposes, we chose the seed \(x = N / 2\). Calculating one iteration of the Babylonian method then yields:

\begin{equation*} f(x) = \frac{(N / 2) + N / (N / 2)}{2} = \frac{(N / 2) + 2}{2} = \frac{N + 4}{4}. \end{equation*}

Using this convenient expression instead rids our algorithm of those pesky square roots:

function is_prime(n) {
         if (i % 2) == 0 {
            if i == 2 { return true  }
            else      { return false }
         }
         i = (n + 4) / 4
         while i > 1 {
               if (n % i) == 0 {
                     // n is divisible by i, abort
                     return false
               }
               i -= 1
         }
         return true // we've made it through
}

NASM Implementation

Below is the final implementation, in NASM, of the proposed prime checking algorithm. The number to check is passed in on register eax and the result, 0 (not prime) or 1 (prime) is returned in the same eax register.

is_prime:
        ;; save registers
        push ebx ;; store the value of eax, ebx, ecx, edx \
        push ecx ;; so functions called before and after won't have \
        push edx ;; values changed

       ;; even/odd check
       test eax, 1
       cmp eax, 2
       je is_prime_end
       jmp not_prime_end

       ;; save the value of eax in ebx
       mov ebx, eax

       ;; calculate our square root approximation
       add eax, 4 ;; eax = eax + 4

       mov ecx, 4 ;; set dividend to 0
       mov edx, 0 ;; set remainder to 0
       div ecx    ;; divide eax by ecx and put rounded result in eax

       ;; store our iteration variable in ecx
       mov ecx, eax

       is_prime_loop:
                ;; if ecx is 1 and we haven't found
                ;; a factor, it's prime
                cmp ecx, 1
                je is_prime_end

                ;; restore eax to original value
                mov eax, ebx

                ;; divide by ecx
                mov edx, 0 ;; set remainder to 0
                div ecx    ;; divide eax by ecx, leaving remainder in edx

                ;; if edx is 0, there's no remainder and we found a divisor
                cmp edx, 0
                je is_not_prime

                ;; decrement iterator
                dec ecx

                ;; loop through again
                jmp is_prime_loop

       is_not_prime:
                ;; restore values of registers
                pop edx
                pop ecx
                pop ebx

                ;; return value 0 in eax
                mov eax, 0
                ret

       is_prime:
                ;; restore values of registers
                pop edx
                pop ecx
                pop ebx

                ;; return value 1 in eax
                mov eax, 1
                ret

System Information

I ran this code on my Linux machine using the Netwide Assembler (NASM). Note that any other assembler may be used (the Intel instruction set is the same), but syntax may differ.

S-Expression Parsing in Lime

Description

This is an exciting time in Lime development! I've been working a lot on string parsing in Lime, and I've finally gotten to the point where I was able to implement a simple S-Expression parser in Lime.

An S-expression parser is a good idiomatic example of programming in a given language, and as the development of LimeOS continues, a self-hosted Lime runtime (Lime, written in Lime) will be needed to dynamically run Lime programs stored in memory and typed by the user.

The Code

Strategy

I decided to go with a relatively simple procedure for parsing S-expressions, taken from Peter Norvig's wonderful (How to Write a (Lisp) Interpreter (in Python)) (the article that first got me into Lisp!). In Python, this is simply:

def parse(string):
    return string.replace("(", " ( ").replace(")", " ) ").split()

At first, this looks a bit odd. Working it out on paper however shows that it indeed works:

   "(this (is an (s expression)))"
-> " ( this  ( is an  ( s expression ) ) ) "
-> ["(", "this", "(", "is", "an", "(", "s", "expression", ")", ")", ")"]

However, str.replace and str.split aren't implemented in Lime...yet!

string_split

First, I needed to implement a string_split method to split strings by spaces. I decided to go with the following strategy:

(define (string_split string separator)
        (filter is_nonempty_string (_string_split string separator "" (list 0))))

(define (_string_split string separator buf arr)
          (if (= string "") (list_append arr buf)
                 (if (string_startswith string separator)
                     (_string_split (string_ltrim (string_length separator) string) separator "" (list_append arr buf))
                     (_string_split (string_cdr string) separator (string_add buf (string_car string)) arr))))

In the main recursive function, _string_split (the regular string_split exists just to set the default arguments of buf and arr to default values), the flow is as follows:

  • if we have no more characters to parse, return the list we've created
  • otherwise,
  • if the current string in memory starts with the delimiter, remove the delimiter and add a new empty item to the list
  • otherwise, add the character at the beginning of the string to the previous element in the iterated list

This performs relatively well. One thing I did have to account for was concurrent delimiters. That is, when the string an item is being separated by occurs directly next to itself 2 or more times. For example, string_split not taking this into account would have the following behavior:

   (string_split "a b  c")
-> [a, b, , c]

which doesn't make good heuristic sense: separating words by spaces should also separate by blocks of space. Therefore, I defined some pretty simple functions, is_empty_string and is_nonempty_string to get around this. Before the result of string_split is returned, I use the filter function and is_nonempty_string to only return the results that are nonempty.

string_replace

The string iteration algorithm for string_replace is almost exactly the same as that for string_split:

(define (string_replace string search replacement)
        (_string_replace string search replacement ""))

(define (_string_replace string search replacement buf)
        (if (= string "") buf
            (if (string_startswith string search)
                (_string_replace (string_ltrim (string_length search) string) search replacement (string_add buf replacement))
                (_string_replace (string_cdr string) search replacement (string_add buf (string_car string))))))

However, instead of adding elements to a list of split elements, it simply traverses the string, and when it hits the delimiter, it replaces that delimiter with the replacement string and jumps to the character immediately after the replacement string.

Bringing It All Together

The final program I wrote, sexp.lime, looks like:

;;
;; [sexp.lime]
;;
;; A simple S-expression parser, written in Lime.
;; Copyright (C) 2018, Liam Schumm
;;

(define (parse string) (string_split
                       (string_replace (string_replace string
                                        ")" " ) ")
                                        "(" " ( ")
                       " "))
(define (main args) (print (parse (!! 0 args))))

Compiling this program with llc and running it then yields:

$ llc sexp.lime
$ ./a '(+ (- 3 4))
[(, +, (, -, 3, 4, ), )]

A properly parsed S-expression!

Lime System Information

This article was written and tested on Lime alpha, commit 53f8ae1, under 64-bit Arch Linux, kernel 4.18.5-arch1-1-ARCH.

Lime Roadmap.

I'm in an independent study class at my high school, and for it we're supposed to come up with a "roadmap" for the projects we're working on. I'm working on Lime as a part of this class, so I was supposed to write out a plan with times. Here's what I came up with

  • September 20: Get bootloader/os written in Lime working
  • October 20: Get lambda lifting and higher order functions working
  • December 20: Present at QED (a local high school math conference)
  • January 20: Run benchmarks and comparisons to C, Chez Scheme, and Haskell to determine performance
  • February 20: Get basic optimizer working (constant folding, deforestation, other standard techniques)
  • March 20: Fully Lisp hygenic macro system

This is subject to change! I'll edit this post if I change my schedule.

Lime Update 1

Tomorrow's my first day of school, so I'll probably have less development and YouTube time. However, I'll get a chance to do some more 3D printing and robotics.

I've been working on some development for Lime, my programming language. I've added a few new features:

Multi-Line if Statements

You can now write if statements with multiple conditionals and clauses. For example, the following code:

(define (print x)
        (if (equal (type x) 0) (iprint x)
            (if (equal (type x) 1) (map $print x)
                (if (equal (type x) 2) (sprint x)
                    (if (equal (type x) 3) (if x (sprint "true") (sprint "false"))
                        (if (equal (type x) 4) (fprint x)
                    (exception "print: unknown type.")))))))

could be refactored as:

(define (print x)
        (if (equal (type x) 0) (iprint x)
            (equal (type x) 1) (map $print x)
            (equal (type x) 2) (sprint x)
            (equal (type x) 3) (if x (sprint "true") (sprint "false"))
            (equal (type x) 4) (fprint x)
            (exception "print: unknown type.")))

Originally, I wanted to make case/elif statements that would be compiled to nested if statements with macros, but purely functional macros don't exactly have the functionality you need to do that.

Floating Point Operations

For some strange reason, floating point numbers don't work properly in Lime. x87's FPU instruction set is a million times more complicated than x86 already is. The operation

(print (/ 3.0 2.0))                                                                                           

returns the value

1.4261667296

while integer operations like

(print (+ 3.0 2.0))

returns

5.0

Because Lime objects must be passed around on an integer stack, they're stored in 4 32 bit integers:

4   # the first number, 4, indicator of the number of integers required to store the float
4   # 4, the type ID of floating point numbers
667 # the main integer
1   # number of times to divide by 10
0   # 0 if the number is postive, 1 if it's negative

thus, extensive rounding occurs, truncating a lot of decimal places. I'm not sure if inaccuracy this high could be a result of floating point error propogation or if there's an error in my code.

YouTube

Video as a Medium of Communication

YouTube videos have been a really important resource to me throughout my adventures learning to code. Video as a medium harnesses both sight and hearing, as opposed to just a static page, and allows a much greater level of detail and replicability (every keypress, application, mouse click, etc.) for technical guides.

Throughout my adventures teaching coding I've been trying to look for a good medium to convey tutorials.

Comparison of Teaching Mediums

Text

I've undoubtably used so many text-only resources for coding. StackOverflow is all text, Medium blog posts about setting up your Webpack config are all text. Man pages and all documentation is text. Text is incredibly convenient, incredibly dynamic, indexible, and incredibly easy to serve. However, while writing and distributing text content is easy, distributing good text content is very difficult. In today's digital age, humans aren't really used to reading long blocks of text, and if your tutorial isn't very engaging while also being detailed, it's easy to get derailed and totally lost.

Lecture

The history of lecturing goes back to the times of medieval universities. A critical aspect of this lecturing was the transfer of the instructor's original source to a student's handwritten notes. In this case, lecturing was the only way to transfer information before the invention of the printing press.

A 1736 engraving, by William Hogarth, titled "Scholars at a Lecture". Students are creammed into a lecture room, discordantly moving about and making various faces, and taking notes.

Lecturing's value, as well as its downfall, comes from pacing. All students must follow the pace of the lecturer. This helps a teacher ensure a class of students are all at the same level; however, falling behind or losing track of a lecture then can render the rest of the lecture useless. Depending on the student, their strengths, and their prerequisite knowledge, lecturing can often spend most time on content a student already knows, and the least on the areas with which they're least familiar.

However, lecturing is easy for a teacher, and allows them an efficient means of communication through physical presence, and lectures with demonstration and audience interaction can connect the student, teacher, and class together.

Rembrandt's "The Anatomy Lecture of Dr. Nicolaes Tulp". Shows many medieval students gathered around Dr. Nicolaes Tulp, observing his dissection of a human body

Video

Video is somewhere in between text and lecturing. Video captivates the student a lot more than plain text does, and with pausing (you can't pause a lecture) and YouTube speed controls, it's possible to adapt video pretty closely to a student's pace. However, video isn't indexable and searchable the way text is, so to find a video snippet explaining one small technical detail may take a lot of searching and watching, and possibly not the result you're looking for.

Video requires a lot of investment on the part of the teacher to create, record, produce, and script, and can really only follow one path of reasoning. However, it's considerably more accessible than physical teaching (limited to students at schools offering the classes they want to take), and through modern visual effects and editing can be a lot more condensed and punchy than a lecturer talking. Therefore, I think they're a great medium for specifically focused, project-based, coding tutorials!

My First Video

Setting Up Equipment

Remember what I said about "a lot of investment on the part of the teacher"? Yeah...

Turns out, I had a Behringer Xenyx 502 preamp,

A Behringer Xenyx 502 preamp

a Shure SM57,

A Shure SM57 microphone

and a mic stand lying around. Setting this up was pretty easy. Plug the microphone into the mic port on my preamp (it accepts a mic and two stereo line-in), use an adapter to connect the stereo output to a standard size AUX microphone plug. Unfortunately, most computers (my Thinkpad T430s and MacBook Air included) have an audio port that serves as both the microphone and headphone port.

A comparison of iPhone headphones, with an integrated mic versus a standard audio-only headphone.

This AUX port uses a fourth pin on the top for the microphone. Adapting a microphone-only AUX cable to a combo one requires a dongle like this one:

An adapter going from separate microphone and headphone AUX cables to a single 4-pin AUX plug

Overall, I'm happy with my setup.

A photo of my desk with computers, monitors, lamp, preamp, and the microphone on its stand.

Recording

I decided to test out my new setup by doing a tutorial video to create a calculator app in HTML, CSS, and Javascript.

I have a new appreciation for the patience and tenacity of YouTubers. I had to re-record the tutorial about 15 times because of fumbling up the intro, writing the code in a way I realized was confusing or didn't work, and because I forgot to turn on my audio recorder for a take. The few takeaways I have are:

  • Speak slowly and clearly.
  • Don't be afraid to re-take a part.
  • Come up with a script to roughly follow, and try out the tutorial once (i.e., write the code and narrate).

Now the next challenge: editing. I've been able to clip together the audio and video, speed up some parts, and enhance the vocals with iMovie. Editing up a good intro is surprisingly difficult. I'm not releasing this first video, but I'll keep trying different things until I'm happy with the result.

Segmented Integers

Introduction

I've been working on Tequila, a NASM emulator in JavaScript. One problem I ran into pretty quickly is x86's register subdivision. In x86, you have 4 general-purpose registers.

A diagram of x86 registers.

Each register can hold up to 64 bits of memory. However you can address parts of this 64 bits. For example, if the register rax was initialized to 64 bits of zeros, the instruction

MOV eax, 11

would fit the decimal value 11 (1011 in binary) into the second 32 bits of rax. Similarly,

MOV al, 3

would move 11 into the last 8 bytes of rax. After modifying a part of the register, the full register may still be addressed. For example, if rax was initialized to all zeroes,

MOV al, 3

would move the decimal value 3 into the end of rax. Since all the leading bits are zero, the total value of register rax will also be 3. Then,

MOV ah, 2

would load 10 into ah. Thus, ignoring leading zeroes, the value of rax would be:

00000010 00000011

yielding a value of 515.

Note that you can only address the upper and lower half of eax. You cannot address the upper half of rax. This was a design choice in x86 to save space in the instruction set (less addressible register names).

Implementation

Binary-Based Integers

First, what I needed to do was create a base integer type that would get and set integer values from a binary store. I implement the h/l registers using this class. I call this SizedInteger.

I chose to store these values in pure binary instead of as an array of bytes. Since all addressible sizes in x86 are a multiple of 8 bits, this would be a better decision for performance and memory efficiency. I chose to stick with binary as a design choice.

The implementation is as follows:

class SizedInteger {
    constructor (size) {
    this.size = size                       // store the size of the register, in bits
    this.contents = '0'.repeat(this.size)  // fill the value store with 0's to initialize
    }

    get () {
    return parseInt(this.contents, 2)      // parse the binary store as a binary integer
    }

    set (n) {
    let x = (n % (2 ** this.size)).toString(2)

    // take up the full size with binary, padding with 0, so that when this integer
    // is appended to others in memory, they take up exactly the size allocated
    let leftpad = this.size - x.length
    this.contents = '0'.repeat(leftpad) + x
    }
}

Segmented Integers

Now for the good stuff: segmented integers!

class SegmentedInteger {

    // a segmented integer is constructed with two SizedInteger objects,
    // r(egister)1 and r(egister)2
    constructor (r1, r2) {
    this.r1 = r1
    this.r2 = r2
    }

    get contents () {
    return this.r1.contents + this.r2.contents
    }

    set contents (x) {
    this.r1.contents = x.slice(0, this.r1.size)
    this.r2.contents = x.slice(this.r1.size, this.r1.size + this.r2.size)
    }

    get size () {
    return this.r1.size + this.r2.size
    }

    get () {
    return parseInt(this.r1.contents + this.r2.contents, 2)
    }

    set(n) {
    let combined_size = this.r1.size + this.r2.size
    let x  = (n % (2 ** combined_size)).toString(2)
    let leftpad = combined_size - x.length
    x = '0'.repeat(leftpad) + x

    // fill r1, then r2
    this.r1.contents = x.slice(0, this.r1.size)
    this.r2.contents = x.slice(this.r1.size, this.r1.size + this.r2.size)
    }

}

The implementation of SegmentedIntegers was essentially the same as that for SizedIntegers; I simply had to split the binary store between two registers. Because the constructor takes in SizedInteger objects, the objects the constructor is passed references to can be manipulated seperately from the SegmentedInteger.

I used ECMAScript 5's new syntax for getters and setters so that SegmentedIntegers exposed an identical API and functioned exactly the same as SizedIntegers, both so that you could make a SegmentedInteger of a SegmentedInteger, but also to simpify my code for Tequila.

Testing

Here's some test code:

> ah = new SizedInteger(16)
> al = new SizedInteger(16)
> eax = new SegmentedInteger(ah, al)
> rax = new SegmentedInteger(new SizedInteger(32), eax)
> eax.set(9)
'00000000000000000000000000001001'
> eax.get()
9
> al.contents
'0000000000001001'
> ah.contents
'0000000000000000'
> al.set(0)
> eax.get()
0

Welcome

Welcome to my blog!

Here I'll be writing personal things, opinions on tech, and tutorials for various programming things; basically a dumping ground for all my thoughts, technical and not.