MNIST Digit Recognition

Before starting this tutorial, make sure you have keras and sklearn installed.

$ pip3 install keras sklearn

I've recently been getting into machine learning. One of the canonical problems in machine learning (particularly neural networks) is the MNIST dataset (Modified National Institute of Standards and Technology database); a collection of handwritten digits and their values.

Conveniently, this dataset is included in sklearn. However, the formatting is a little funky: we'll have to parse it more a bit later.

from sklearn import datasets
digits = datasets.load_digits()

Then, let's take a look at this dataset (always a good idea for any data):

import matplotlib.pyplot as plt
plt.gray()
plt.matshow(digits.images[0])
https://d33wubrfki0l68.cloudfront.net/bfae03fbac788a95b9bd9dffc5c4e7faa9396373/a2937/images/mnist-example.png

Let's split the data into an X and Y (in and out):

X = digits.data
Y = digits.targets

Now, a few things to get this data ready for a neural network. First, we're going to want to format each of these images as a tensor, or a n-dimensional matrix (in this case n=3). For example:

[
  [ [0], [128], [30], ...],
  [[12],  [65], [19], ...],
  ...
]

would be an image in our dataset. Each of these dimensions holds important information: the first separates rows in the image, the second separates pixels within each of these rows, and the third separates the components of the color of that pixel (usually red, green, and blue). Since the MNIST dataset is only black and white, we don't really use the last one; however it's something that Keras expects for image data.

Currently, the image data comes as a list of lists of pixels:

[[0, 128, 30, ...], [12, 65, 19, ...], ...]

If we print X.shape, we'll get:

(1797, 64)

The first dimension is the number of these lists, the second is the number of elements in each of these lists. The dimensions of our final array should have the same first dimension (1797), since it has the same amount of examples. However, we'll want to break this other dimension three other dimensions. To reshape it, we'll use this command:

X = digits.data.reshape(-1, 1, 8, 8)

A -1 in the .reshape method is a placeholder; it tells Numpy to not touch the 1797 dimension.

Neural networks use functions that operate in the range [0, 1]. Therefore, we'll have to fit the range of all numbers in X between 0 and 1. Dividing by the maximum value will achieve this:

X *= (1.0/np.max(X))

Now, time for some Python-fu. The outputs in the dataset are the actual numbers (e.g., 0, 1, 2, ...). While one option for normalizing these data would be to divide each of these numbers by 9, resulting in scalars between 0 and 1 (like above), neural networks will operate significantly better if you make the output as multidimensional vectors.

So, we'll be transforming scalars, like 3 or 9 to 10-dimensional normalized vectors like so:

3 => [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
9 => [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]

Here's the code I wrote:

new_Y = []
for y in Y:
    n = [0] * 10
    n[y] = 1
    new_Y.append(n)
Y = np.array(new_Y)

Finally, we'll have to convert this Y tensor from int64 to float64:

Y = Y.astype(np.float64)

Now, let's partition the data into a training set and a testing set. It's important to test the model on a separate dataset, since models can learn trends in datasets that aren't necessarily the actual thing we want to predict; therefore they may get good

X_train = X[:-200]
Y_train = Y[:-200]

X_test = X[-200:]
Y_test = Y[-200:]

Now, let's configure the network in Keras. We'll be using a Sequential model.

model = Sequential()
model.add(Conv2D(32, kernel_size=3,
                     activation='relu',
                     input_shape=(1,8,8),
                     data_format='channels_first'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

rmsp = optimizers.RMSprop(lr=0.001)
model.compile(loss='logcosh', optimizer=rmsp, metrics=['accuracy'])

The last components here-the loss function and optimizer-can be tuned for better performance on particular datasets. It's good to try a few options for these.

Now, let's train the model:

model.fit(X_train, Y_train, epochs=150, batch_size=10)

Finally, let's evaluate our model on the test portion of our dataset:

print(model.evaluate(X_test, Y_test))

The first number it prints will be the loss, the second will be the accuracy.

Here's all the complete code:

from sklearn import datasets
import matplotlib.pyplot as plt

digits = datasets.load_digits()

plt.gray()
plt.matshow(digits.images[0])

X = digits.data
Y = digits.targets

X = digits.data.reshape(-1, 1, 8, 8)


X *= (1.0/np.max(X))

new_Y = []
for y in Y:
       n = [0] * 10
       n[y] = 1
       new_Y.append(n)
Y = np.array(new_Y)
Y = Y.astype(np.float64)

X_train = X[:-200]
Y_train = Y[:-200]

X_test = X[-200:]
Y_test = Y[-200:]

model = Sequential()
model.add(Conv2D(32, kernel_size=3,
                     activation='relu',
                     input_shape=(1,8,8),
                     data_format='channels_first'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

rmsp = optimizers.RMSprop(lr=0.001)
model.compile(loss='logcosh', optimizer=rmsp, metrics=['accuracy'])

model.fit(X_train, Y_train, epochs=150, batch_size=10)

print(model.evaluate(X_test, Y_test))

Parentheses Puzzle

Over my spring break, I've been trying to get more into network security. One amazing resource I've found for an introduction to hacking is a website called picoCTF. It's a CTF (Capture The Flag), which is a collection of hacking exercises. Originally it was a competition, but it's now open to anybody signing up.

IF YOU DON'T WANT SPOILERS, DON'T READ THIS ARTICLE!

One of the later problems in this competition is as follows:

Can you understand the language and answer the questions to retrieve the flag? Connect to the service with nc 2018shell.picoctf.com 8672

I connected to the server using netcat as directed. Upon connecting, I got the following prompt:

Rules:
() + () = ()()                                      => [combine]
((())) + () = ((())())                              => [absorb-right]
() + ((())) = (()(()))                              => [absorb-left]
(())(()) + () = (())(()())                          => [combined-absorb-right]
() + (())(()) = (()())(())                          => [combined-absorb-left]
(())(()) + ((())) = ((())(())(()))                  => [absorb-combined-right]
((())) + (())(()) = ((())(())(()))                  => [absorb-combined-left]
() + (()) + ((())) = (()()) + ((())) = ((()())(())) => [left-associative]

Example:
(()) + () = () + (()) = (()())

Let's start with a warmup.
(()(())) + (()) = ???

>

As a Lisp developer, I felt that I needed to solve this problem. I was a little frustrated that it wasn't really a formal set of rules (rather, a collection of examples). Additionally, looking through messes of tiny parentheses makes it near impossible to debug large output.

Initially, I assumed that the only thing affecting how an addition operation was performed was whether one (or both) of the operations was (). For example, in this expression:

(()) + () => (() [()] ) => (()())

The () gets absorbed on the right. Likewise,

() + ((()()())) => ( [()] (()()())) => (()(()()()))

When both arguments are (), we apply the [combine] rule:

() + () => ()()

This was pretty simple to implement. I wrote the following Lime code for the operations:

(function combine (str1:str str2:str) -> str
          (sjoin str1 str2))

(function absorb_left (str1:str str2:str) -> str
          (sjoin "(" (sjoin
              str1
              (sslice str2 1)))))

(function absorb_right (str1:str str2:str) -> str
          (sjoin (sjoin
                 (sslice str1 0 (isub (slength str1) 1))
                 str2) ")")

My rules were working, when I encountered this example:

()() + () = ()()()

According to my rules, the result should be ()(()). However, this isn't the case. I then thought that the importance was in adjacent parentheses, i.e.:

...() + ()... = ...()()...

However, this only worked about 50% of the time. Getting frustrated, I googled the problem and found a great clue from Redditor u/hielkew:

So, the rule I found was that absorption takes place by the element that has the largest parenthesis depth. For #6 for example (())(()) goes only 2 deep and ((())) goes 3 deep, so the latter will absorb the former.

This rule matches all of my earlier rules. To check max parenthese depth, I wrote a simple recursive Lime function:

(function max_depth_ (str:str depth:int current_max:int) -> int
          (if (sequal str "")
              current_max
              (if (sequal (sslice str 0 1) "(")
                  (max_depth_ (sslice str 1) (iadd depth 1) current_max)
                  (if (sequal (sslice str 0 1) ")")
                      (if (igreater depth current_max)
                          (max_depth_ (sslice str 1) (isub depth 1) depth)
                          (max_depth_ (sslice str 1) (isub depth 1) current_max))
                      -1))))

(function max_depth (str:str) -> int
          (max_depth_ str 0 0))

Wrapping it all up, here's the final Lime program.

(function combine (str1:str str2:str) -> str
          (sjoin str1 str2))

(function absorb_left (str1:str str2:str) -> str
          (sjoin "(" (sjoin
                 str1
                 (sslice str2 1))))

(function absorb_right (str1:str str2:str) -> str
          (sjoin (sjoin
                 (sslice str1 0 (isub (slength str1) 1))
                 str2) ")"))

(function max_depth_ (str:str depth:int current_max:int) -> int
          (if (sequal str "")
              current_max
              (if (sequal (sslice str 0 1) "(")
                  (max_depth_ (sslice str 1) (iadd depth 1) current_max)
                  (if (sequal (sslice str 0 1) ")")
                      (if (igreater depth current_max)
                          (max_depth_ (sslice str 1) (isub depth 1) depth)
                          (max_depth_ (sslice str 1) (isub depth 1) current_max))
                      -1))))

(function max_depth (str:str) -> int
          (max_depth_ str 0 0))

(function op (str1:str str2:str) -> str
          (if (igreater (max_depth str1) (max_depth str2))
              (absorb_right str1 str2)
              (if (igreater (max_depth str2) (max_depth str1))
                  (absorb_left str1 str2)
                  (combine str1 str2))))

(function compute (in:str) -> str (foldl op (ssplit in " + ")))
(function main () (sprint (compute "() + ()() +...")))

This article was written as of Lime 1.4.2. Code examples may not work in newer versions.

On Senioritis

Guest article by Liam Schumm; liamschumm.com.

I've had a peculiar experience with "senioritis". Freshman year I took calculus, a class with many seniors. Many of my friends graduated in the next couple years, and I felt like I got early-onset senioritis. As somebody who grew up homeschooled, I've never really liked structured education, and as a result I never liked high school that much, and found it difficult to do menial worksheets and classwork that weren't engaging to me. Over my 4 years, I gradually learned the art of extrinsic motivation: how to do things I didn't want to do, keep track of assignments with no particular relevance to me or my life, and all the other "skills" school teaches you.

Something that struck me sophomore year was a poster in a teacher's room of a "senior declaration of incompetence." After thinking it over, I don't find it funny because after 4 years of high school and all school before that, the blame for a lack of educational competence should not be on the student, but instead on the teacher and on the broader educational system. There's fundamentally a contradiction in adults and teachers getting mad at students for not trying once the incentives to achieve in school are taken away. It's the teachers' fault for not creating curriculums that aren't interesting when they aren't driven by grades. If classes in school were truly engaging enough to students, "senioritis" would not exist.

Even among high-performing individuals attending highly-ranked universities, the same trend of a loss of motivation occurs, indicating that success or drive isn't some kind of reflection of seniors' lack of drive. In fact, I've observed that the change in committment to school is greater among high-performing individuals: graduating students who committed to Harvard or Yale stopped attending the clubs they started and Georgetown students who stopped showing up to school (sorry Jorge! :) ).

Again, regardless of an individual student's individual passion or initiative, the psychological effect of residual pressure of an entire life of education is not negligable. Many kids spend their entire life attending school, summer camp, school, summer camp, and so on for as long as they can remember. Since middle school, students start prepping for tests to get into the right high school, and in high school they prep for getting into the right college. Second semester senior year is the first time in many peoples' recent lives that immense pressure and definition of self is lifted.

As a result, even in interesting classes or in extracurriculars senioritis is absolutely justified. No amount of passion in a subject can remedy physical exhaustion: due to the neurological effects of being stressed all the time, the physical effects, or even the literal exhaustion caused by chronic sleep deprivation throughout the American high school system. In my case, I feel like I've become more productive senior year after the pressure of college was lifted. After I got admitted to UMass Amherst, I've been able to redistribute time from classwork to things I'm actually passionate about. However, for many people who haven't known a life without pressure and driven by intrinsic motivation before, I absolutely understand senioritis. I think it should be treated as an important developmental period in many peoples' lives, and I advocate for self-exploration during that time, or taking a gap year to further explore ones' self.

In the end, I think that senioritis is a broader reflection of failure of the educational system. When the incentives to achieve are removed, students universally stop trying to overachieve---and that's not a bad thing.

Debugging in Lime

Most coders think debugging software is about fixing a mistake, but that’s bullshit. Debugging’s actually all about finding the bug, about understanding why the bug was there to begin with, about knowing that its existence was no accident. It came to you to deliver a message, like an unconscious bubble floating to the surface, popping with a revelation you’ve secretly known all along.

Mr. Robot, "eps1.2d3bug.mkv"

Since one of the important things about Lime is continuity and completeness, I decided I'd implement a debugging system.

It feels hypocritical to work on a debugger, given that I've never really used one. I've tried a couple times, but I always fall back to printf debugging. Turns out I'm not the only one:

I don't like debuggers. Never have, probably never will. I use gdb all the time, but I tend to use it not as a debugger, but as a disassembler on steroids that you can program. None of the arguments for a kernel debugger has touched me in the least. And trust me, over the years I've heard quite a lot of them.

excerpt from linux kernel mailing list, from https://lkml.org/lkml/2000/9/6/65.

I definitely agree with Torvalds' view. Since debugging is essentially the process of understanding

The principles I'd like to follow in designing debugging architecture are:

  • Provide all the context necessary to understand the program's execution state at a breakpoint; but
  • Only provide the context explicitly requested by the user.

In my view, the ideal system that allows the user to precisely specify what information to view or operations to perform is a programming language itself. I've realized that "printf debugging" is so appealing to me because it is programmed inline with my code.

Here's a few ideas I have for how to implement debugging systems in Lime.

Bracket Expressions

(function abc () (f ?[a b c])

Bracket expressions, using ?[] would dump the context of a function application; the names of all the arguments, their values, as well as the value of the function pointer.

Variable Dumping

(iprint ?a)

Variables can be dumped by prefacing a certain reference or variable with a ? character.

Breakpoints

A dump expression can be converted into a breakpoint by changing the ? to a ?!. When executing the Lime program, the dump will execute, and the code may be stepped through.

Lamby

This last weekend at Hack Ridge, I worked on a project called Lamby. Lamby is a super simple platform for serverless, function-driven development. It works by defining individual API routes as functions which return string values. I'm really happy with the progress that I made; the app is currently live on https://www.lamby.xyz.

Getting started with lamby

In lamby, the basic unit of computation is a function. Each "function" has a main procedure:

def main(session):
       return "<h1>test</h1>"

This procedure returns a string which is then returned when the route to the function has been GETed.

The Session object

In lamby, the session variable that is passed to every function is a mutable dictionary that is shared between all sessions of all functions.

The Request object

In Lamby, all POST functions are passed a request object that has information about the request that has been performed. An example request object looks like:

{'path': '/query/example',
'args': {'a': 2},
'method': 'POST',
'form': {}}

Using modules

Lamby allows for the creation of shared modules which can then be referenced in functions. For example, if you made the following module:

def double(input):
      return input * 2

and named it test, then in a function you could reference it with the include statement:

include test
def main(session):
      return double("abc")

and use its functions.

Implementation

I built a simple server using Flask, uWSGI, nginx, and Docker. The website itself is a simple Flask application, run through uWSGI and served through nginx. The main application runs as a Docker container, but is also able to spin up sibling containers for each API call, so arbitrary code can be adequately sandboxed.

Future Goals

During the hackathon, I did end up implementing the Stripe API, with the goal of monetizing API requests. I'm considering pursing expanding Lamby to encompass and containerize all of the necessities to make functional web apps.

Algebraic Typing in Lime

Before I started working on Lime, and was only familiar with statically typed languages in the context of Java and C, I thought type systems were really limiting. However, the incredible capability of type systems of languages like Haskell and Rust (Rust's is even turing complete!) have proven this very wrong.

A lot of these innovations come from a field of mathematics called category theory. The fundamental object of category theory is a directed graph called a category:

A drawing showing the objects X, Y, and Z connected by f, g, and f of g. X is mapped to Y by f, Y is mapped to Z by g, and Z is connected to X by g of f.

The nodes of these graphs are called objects, and the edges are called morphisms (hence the term polymorphism).

One of the most important parts of modern type systems is something called algebraic typing. Algebraic typing allows for functions that can take multiple forms, based on an algebraic expression. For example, the car command which returns the first element of a list must have a form for int$, str$, int$$, and so on, cannot be defined through polymorphism: there would be infinitely many forms. Instead, it can be defined algebraically:

car (list:A$) -> A

where A is a placeholder, which may be substituted for any type. Therefore, car will be defined for lists of all types.

More complicated expressions can be done with container types like dictionaries:

dget (dict:[A->B] key:A) -> B

In this example, the theoretical dget command (to get a value from a key in a dictionary) accepts dictionaries mapping any type to any other type, accepts a key of the key type of the dictionary, and returns a value with the type of the value type of the dictionary.

Additionally, since Lime is a first-order language, all functions have explicit types signatures, declaring their input and output types and arity. Functions that operate on other functions can have super explicit type signatures. For example, the map command's signature is:

map (lambda[A->B] A$) -> B$

which will ensure that the list being mapped is of the input type of the function, and will make the output type of map a list of the output type of the function.

The new type system will be merged in 1.2.0.

My MacOS Configuration

This article outlines my configuration for my Mac. All of my dotfiles, as well as a copy of this guide, are available at https://gitlab.com/lschumm/dotfiles.

OS Installation

  • Erase the main disk, erase and rename as Macintosh HD and format it as plain APFS.
  • Then, choose the Reinstall MacOS option on the new disk.
  • On installation make sure to choose: - "Don't transfer any information now" on the Migration Assistant page - Sign in with Apple ID - Set username to lschumm instead of liamschumm; but do not allow Apple ID to reset this password - Under "Express Set-Up", choose "Customize Settings" - Enable Location Services - Disable analytics - Enable Siri - Do not "store files from Documents and Desktop in iCloud Drive" - Enable FileVault; but do not allow iCloud password to unlock the disk - Choose Light theme

OS Configuration

Install XCode

Go into the App Store and download XCode.

To install the XCode command line tools, run:

xcode-select --install
sudo xcodebuild -license

Install Homebrew

Use the brew default installer.

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Change hostname

Under System Preferences > Sharing, change the hostname to the preferred name of your machine (I use the last names of famous computer scientists):

Shows the Computer Name field of the Sharing preferences pane with the text set to my computer name: :code:`dijkstra`.

Set up Git credentials

To set up your git name and email, run

git config --global --edit

Install Python3

MacOS (as of Mojave) installs Python 2.7.10, which is a reasonable version of Python 2 to have. To install Python 3, I recommend brew:

brew install python3

Install python2-pip

When brew installed python3, it also installed pip3. However, MacOS' default version of Python does not include pip. It does include easy_install, which can be used to install pip:

sudo easy_install pip

Install the latest version of Emacs

MacOS (as of Mojave) installs Emacs 22.1.1, which is ancient. Install emacs26 with brew:

brew install emacs

App Installation

I like installing apps through brew cask because command line stuff is great, and it allows all of my applications to be centrally managed by a utility. Nothing is in the Mac App Store.

brew cask install telegram
brew cask install gimp
brew cask install spotify

Dock Configuration

I personally don't really like the Dock, because it takes up space on my small 11 inch screen. I use Spotlight to open applications, and Command + Tab to switch applications/see what's open. This command will set the Dock autohide delay to 1000 seconds, so you have to hover over the bottom of the screen for 1000 seconds to display the dock.

If you really need your dock, Command + Option + D will manually open it; press it again to hide it.

defaults write com.apple.dock autohide-delay -float 1000; killall Dock

Disabling Gatekeeper

Gatekeeper is that annoying "can't be opened because it is from an unidentified developer" thing. To disaple it, run:

sudo spctl --master-disable

I'm a responsible user and don't download random binaries and execute them; I don't like MacOS second guessing me.

Setting up File Hierarchy

I actually like to remap my home directory to Desktop, so it's visible on my screen. I remap my HOME environment variable to ~/Desktop.

mkdir Desktop/research
mkdir Desktop/blog

Run dotfiles installer

cd Desktop
cd dotfiles
./install

Install wakatime

Install wakatime globally using pip:

sudo pip install wakatime

Then, install the wakatime-mode package in emacs. First run:

M-x package-refresh-contents [RET]

to fetch the latest versions of packages. Then, install the package:

M-x package-install [RET] wakatime-mode [RET]

Make Finder show all file extensions

Not showing file extensions on some files is really needlessly confusing. Disable it in Finder > Preferences. I also like disabling warnings when changing extensions.

Shows the :code:`Finder > Prefences` pane with "Show all filename extensions" selected, and "Show warning before changing an extension" unselected.

Set key repeat rate

I like setting my key repeat rate and key repeat time to very fast settings. You can set this under System Preferences > Keyboard > Keyboard.

Shows the :code:`System Preferences > Keyboard > Keyboard` menu with "Key Repeat" set to the "Fast" setting, and "Delay Until Repeat" set to the "Short" setting.

Enable meta key in terminal

To use option as the meta key in Terminal, enable it in Terminal > Preferences > Profiles > Keyboard:

Shows the :code:`Terminal > Preferences > Profiles > Keyboard` pane with "Use Option as Meta key" selected"

Types in Lime

Lime is a strongly, strictly typed language.

Default Types

The default atomic types in Lime are:

  • bool (binary truth value, true or false)
  • int (integer, e.g., -12)
  • float (floating-point number, e.g., 2.0)
  • str (string, e.g., "abc")

Lists can be made out of any type, even lists themselves, by appending a $ token. For example, a list of int`s have the signature :code:`int$, and a list of int$`s will have the signature :code:`int$$.

Custom Types

Defining a struct

To define a struct, use the struct directive at the top level of a program.

(struct person first_name:str last_name:str age:int)

This will also define a new type, person.

Initializing a struct

To initialize a struct, use the new keyword, which will initialize and return a new instance of the struct specified.

(new person first_name "Liam" last_name "Schumm" age 17)

You can specify only a subset of the attributes when initializing, by not including the keys for those items:

(new person first_name "Liam" age 17)

Getting values of a struct

To get the values of a struct, use the get command. For example,

(get example_person first_name)

would return the first_name attribute of example_person.

Setting values of a struct

To set the values of a struct, use the set command. For example,

(set example_person (first_name 2))

would set the first_name attribute of example_person to 2, and return a reference to the object.

Type Signatures

In Lime, all functions must have a type specified for their return value and for their inputs. The type signature BNF is as following:

type           ::= (int|float|str...)
                 | type$
function_name  ::= [A-z]+
type_signature ::= function_name:type (type*)

Polymorphism

Lime has polymorphism, aka "overriding". This allows you to define a single function with different behavior when it is passed different types. For example,

(defun increment:float (x:float) (fadd x 1.0))
(defun increment:int   (x:int)   (iadd x 1))

would define the increment function for ints and floats. This can also be used for optional arguments, like so:

(defun sinput:str () (sinput ":"))
(defun sinput:str (prompt:str) (...c implementation...))

if a prompt is not explicitly defined, it will default to ":".

The any type

The any type allows methods to be specified that take arbitrary combinations of arguments that match a specific expression. For example, the lget command is defined to operate on lists of any and return a singular element of type any. The type signature looks like:

lget:any (arr:any$)

Reimplementing Lime in C

In my quest for minimalism, I decided I needed to stop carrying around two laptops---a Linux one that I used for developing Lime, and a MacOS one iOS development (and schoolwork). Turns out all of the Lime code I wrote used 32 bit assembler. The Mach kernel doesn't support 32-bit addressing, rendering all of that code unusable. Instead of emulating, I decided to totally revamp and rewrite the Lime language from scratch, this time starting in C instead of in Assembler. I realized that a lot of the work I put in developing an Assembler version focused on small-scale operations that gcc/clang would know how to do better than I could. structs and formal data types also mean that I have to spend less time developing a typing system and reimplementing basic operations, and more time actually working on interesting language features.

I took my time with implementing this, and came up with a pretty neat implementation, still utilizing my original stack-based design for the language.

I first implemented a basic stack as an array of longs. Integers are stored as long values, while strings/floats/etc.'s pointers are stored.

I then created a struct for a lambda function with partial application support:

typedef void (*func)();

typedef struct {
       int resolved;
       func function;
       int args_left;
       long* args;
       int args_len;
} lambda;

resolved is 0 if more arguments must be supplied, 1 if it has been processed, applied to the arguments, and the args pointer has been free()d.

The Lime language outputs three basic instructions: push, pop, and resolve. push and pop mutate the stack, and resolve pops a lambda pointer off the stack and applies it to the supplied arguments. My implementation for resolve was:

void resolve(int n) {
       f = *(lambda*)pop();
       if (f.args_left == 0) {
              f.function();
              f.resolved = 1;
       }

       for (int i = 0; i < n; i++) {
              f = apply_lambda(f, pop());
       }
       if (!f.resolved) {
              push((long)&f);
              fp = malloc(sizeof(lambda));
              f = *fp;
       }
}

As a result, Lime compiler output, instead of being a mess of assembler, now looks like, from the following Lime code:

System Message: ERROR/3 (<string>, line 54)

Error in "code" directive: maximum 1 argument(s) allowed, 3 supplied.

.. code:: c
   #include "lib/lime.h"

   int add3() {
          int stack_base_ptr = stack_ptr;
          push((long)3);
          push((long)get_arg(0, stack_base_ptr));
          push(make_lambda((void (*)())iadd, 2));
          resolve(2);
          clear_args(1, stack_base_ptr);
          return 0;
   }

   int main() {
          int stack_base_ptr = stack_ptr;
          push((long)2);
          push((long)1);
          push(make_lambda((void (*)())iadd, 2));
          resolve(2);

          push(make_lambda((void (*)())add3, 1));
          resolve(1);

          push(make_lambda((void (*)())iprint, 1));
          resolve(1);
          clear_args(0, stack_base_ptr);
          return 0;
   }

It's still a bit messy, with lots of ugly casting code and meaningless int return values; but with some macro (and potentially preprocessor) magic it'll get even more understandable. Instead of needing to convert between type systems and calling conventions for Lime, it'll be pretty simple to build a compatability layer to interoperate C and Lime functions, so I can leverage C's standard library in the development and use of Lime.

A Reasonable, Minimal Arch Linux Configuration

This post is a guide for setting up Arch Linux with reasonable defaults. Since this also doubles as documentation for my personal computer configuration, I'll update this post as I make changes.

Please follow this guide at your own risk. Arch is a hobbyist distro, and I am not responsible for any damage you may do to, or data you may lose from, your computer.

Disk Partitioning and Encryption

Overwriting the Disk

First, writing random data to the disk will ensure security by overwriting previous data on the disk, and also making it more difficult to determine the used size of the encrypted volume. The shred command will overwrite the disk with random data (make sure you've backed up your hard drive!):

$ shred --verbose --random-source =/dev/urandom --iterations=3 /dev/sda

Note that with 3 iterations, this may take around 7 hours.

Configuring Partitions

Next, we'll configure our two partitions. The first partition will be a 100 megabyte boot partition, and the second will hold all of our data, encrypted.

$ cfdisk /dev/sda
New > Partition Size: 100M > primary > Bootable
New > Partition Size: [RET] > primary

Configuring Encryption

On the second partition we created, we'll use cryptsetup to initialize an encrypted volume (which may then be mounted like any other device or drive):

$ cryptsetup --verbose --cipher aes-xts-plain64 --key-size 512 --hash sha512 --iter-time 5000 --use-random luksFormat /dev/sda2

Then, we'll mount the second encrypted partition under the name cryptroot.

$ cryptsetup open --type luks /dev/sda2 cryptroot

Partitioning Disks

Format the boot partition as ext4:

$ mkfs.ext4 /dev/sda1

and then format our main (encrypted) partition also as ext4:

$ mkfs.ext4 /dev/mapper/cryptroot

Then, mount the newly created filesystems:

$ mount -t ext4 /dev/mapper/cryptroot /mnt
$ mkdir -p /mnt/boot
$ mount -t ext4 /dev/sda1 /mnt/boot

Installing the Base System

To install the base system and the base developer utilities (required for building packages from the Arch User Repository), use Arch's package manager pacman to write these to our new root partition:

$ pacstrap -i /mnt base base-devel

Then, you'll need to generate something called an fstab. The fstab (FileSystem TABle) tells the computer which partitions to mount at boot:

$ genfstab -U -p /mnt >> /mnt/etc/fstab

Then, changeroot into the new root partition (tell the Linux kernel to use the new filesystem as the root one), so you can operate in the new OS:

$ arch-chroot /mnt

Configuring Locale

Locale is an important parameter of Linux. It tells applications how to render text in the language specified, how to display dates, and other things.

Since you're reading this guide in English, I'm going to assume you'll want US English, UTF-8 encoding. Open up your locale.conf:

$ nano /etc/locale.gen

and uncomment the line that reads:

#en_US.UTF-8 UTF-8

so that it now says:

en_US.UTF-8 UTF-8

Then, generate your newly configured locale:

$ locale-gen

Create your locale.conf:

$ echo LANG=en_US.UTF-8 > /etc/locale.conf

And tell your current running shell to use this new locale:

$ export LANG=en_US.UTF-8

Setting Time Zone

To set your time zone, make a symlink from one of the timezones in /usr/share/zoneinfo to /etc/localtime, like so:

$ ln -s /usr/share/zoneinfo/America/Chicago /etc/localtime

If you're not in Chicago, shell autocomplete (or ls /usr/share/zoneinfo) can help you find your appropriate timezone.

Set your hardware clock time to UTC:

$ hwclock --systohc --utc

Setting Hostname

To set your hostname, just write to the /etc/hostname file with the desired name:

echo dreammachine > /etc/hostname

Configuring Users

First, we'll need to configure a password for the root user. Enter your desired password after running:

$ passwd root

To add a new user with reasonable defaults, use the following like (with username replaced with your desired username):

useradd -m -g users -G wheel,games,power,optical,storage,scanner,lp,audio,video -s /bin/bash username

The -G flag indicates which groups you'd like this new user to be a part of, which indicates what kinds of operations they can perform on the system. wheel is the standard name for users who can use the sudo command (coming from the old-timey phrase "big wheel" meaning somebody important). Other groups are not necessarily necessary, but are standard and may be used by other software.

To set a password for this new user, use the passwd command we used earlier with the username of the user you just created:

passwd lschumm

Configuring grub

GNU GRUB (GNU GRand Unified Bootloader), is a simple libre bootloader. We'll need to install this in order to tell the computer how to boot the new OS.

First, you'll need to install the grub and grub-bios packages on the new system:

$ pacman -S grub grub-bios

Then, you'll need to tell grub that you're using an encrypted drive. Open up your grub configuration:

$ nano /etc/default/grub

and change

GRUB_CMDLINE_LINUX=""

to

GRUB_CMDLINE_LINUX="cryptdevice=/dev/sda2:cryptroot"

Configuring mkinitcpio

mkinitcpio generates the initial ramdisk loaded by Linux. You'll also need to tell this that you're using an encrypted disk by adding encrypt to it's list of "hooks". Open mkinitcpio's configuration file:

$ nano /etc/mkinitcpio.conf

and change this line:

HOOKS="base udev autodetect modconf block filesystems keyboard fsck"

to

HOOKS="base udev autodetect modconf block encrypt filesystems keyboard fsck"

Then, regenerate the initramfs image by running the mkinitcpio utility again:

$ mkinitcpio -p linux

Preparing for First Boot

To wrap up all of our configuration, first let's finish the GRUB configuration:

$ grub-install --recheck /dev/sda
$ grub-mkconfig --output /boot/grub/grub.cf

Then exit the system and reboot:

$ exit
$ umount -R /mnt/boot
$ umount -R /mnt
$ cryptsetup close cryptroot
$ systemctl reboot

First Boot

Now that our system is booting, let's get started with configuration. Since we'll have to run pacman and we haven't configured sudo yet, log in as root.

Connecting to the Internet

For this guide, you'll need an ethernet cable. With your computer plugged in to this cable, run:

$ dhcpcd

This will get an IP for your computer and connect you to the internet.

Give your packages a quick update:

$ pacman -Syy

And then give your system an update:

$ pacman -Syu

And reboot:

$ reboot

Configuring Wifi

To get Wi-Fi working with the default wifi-menu interface, you'll have to install the dialog and wpa_supplicant packages:

$ pacman -S dialog wpa_supplicant

Launching wifi-menu then scans for networks and lets you connect:

$ wifi-menu

Configuring sudo

First, install sudo (it's not included by default in arch base or base-devel):

$ pacman -S sudo

After that's finished installing, you'll need to tell sudo to let users of group wheel use it. Open /etc/sudoers with nano:

$ nano /etc/sudoers

and uncomment the line which says:

# %wheel ALL=(ALL) ALL

This allows wheel users to run execute all commands with root privileges.

Reboot, and log in as your user account.

Configuring i3 and Xorg

First, install Xorg:

$ sudo pacman -S xorg-server xorg-server-utils xorg-xinit

Then, install the i3 window manager:

$ sudo pacman -S i3-wm

And tell Xorg to launch i3 on startup:

$ echo "exec i3" > ~/.xinitrc

To start the window manager, just run:

$ startx

Other Configuration

All of my app-specific configuration is stored in public configuration files on my GitLab.

Credits

  • Most of the initial setup for this guide was taken from this HowtoForge tutorial with added explanation and notes.