Lifted from the interpreter

Right now, I am working on a toy language, called Wuji. It is in the stage of development where it is becoming FUN! And by FUN! I mean some design puzzles are starting to show up.

One idea that keeps popping up for me is this - "should this function be a builtin, or implemented in the language itself?".

(print)

The most recent example was println. Here is the source for my original code.

LispValue builtinPrintln(LispValue args, Env *env) {
  size_t len = arrayLen(args.list);

  for (size_t i = 0; i < len; i++) {
    LispValue v = arrayIndex(args.list, i);
    if (i > 0)
      printf(" ");

    // Print strings without quotes, everything else with quotes
    if (v.kind == STRING) {
      printf("%s", v.string);
    } else {
      char *s = lispValueToString(&v);
      printf("%s", s);
      free(s);
    }
  }
  printf("\n");

  return mkEmpty();
}

This code iterates over each argument, and prints out its string representation. It's not bad. I had to make a special case for quotes, otherwise "Børge" would print to "\"Børge\"", which was not nice.

What bothered me was that this function was not "primitive". It could be used for printing several elements, or one element. It could be used for printing strings, or LispValues. It also adds a " " space between each element. Oh, and it also prints a newline. That is not optional.

At first I just wanted to cut the newline, and add a function (println) that would just append a newline, something like this:

In order to do that though, we need a function to concatanate strings. For that, we implement 'str':

LispValue builtinStr(LispValue args, Env *env) {
  size_t nargs = arrayLen(args.list);

  StringBuilder *sb = new_builder(256);

  for (size_t i = 0; i < nargs; i++) {
    LispValue arg = arrayIndex(args.list, i);

    if (arg.kind == STRING) {
      add_to(sb, "%s", arg.string);
    } else {
      char *str_val = lispValueToString(&arg);
      add_to(sb, "%s", str_val);
      free(str_val);
    }
  }

  char *result = strdup(to_string(sb));
  free_builder(sb);

  return mkString(result);
}

Now, we can combine these two builtins in a nice lisp definition:

(defn println [& args]
  (print (apply str (append args "\n"))))

This function receives its arguments in a list, appends "\n" to that list, and joins it together to a single string

◯ (println "Numbers: " [7 22 144])
; => "Numbers: [7 22 144]"

Well.. now it seems kind of redundant for the builtinPrint to turn every value into a string, doesn't it? builtinStr already does that work!

That means we can simplify builtinPrint to simply print its argument

LispValue builtinPrint(LispValue args, Env *env) {
  size_t len = arrayLen(args.list);
  if (len != 1)
    return mkError("print-line: expected exactly 1 argument (string)");

  LispValue v = arrayIndex(args.list, 0);
  if (v.kind != STRING)
    return mkError("print-line: argument must be a string");

  printf("%s", v.string);
  return mkEmpty();
}

Isn't that nice? We have pulled apart println to two concepts - printing strings and concatenating strings. Each of these primitives are more useful tools than their parent 'println' was.

But what about format strings? Using two primitives, 'split' and 'str', we can build up a simple 'format' function. It's source is a bit too long to paste, but can be read here. We can now do nice template formating on strings, all implemented in Wuji!

◯ (format "Cool numbers -> {} <-" [7 22 144])
; => "Cool numbers: -> [7 22 144] <-"

In addition, it is super easy to implement a print-fmt function using this:

(defn print-fmt [& args]
  (print (apply format args)))

◯ (print-fmt "Cool numbers -> {} <-" [7 22 144])
; "Cool numbers: -> [7 22 144] <-"

I really like this pattern of pulling complexity out from the interpreter and into the language itself. It forces you to rethink what primitives are useful utilities to build other functions later.

(add)

The same sort of pattern appeared when looking at the arithmetic and boolean operators. Each of the operators did "the lisp thing" where the function can take variadic elements:

LispValue builtinAdd(LispValue args, Env *env) {
  int sum = 0;

  for (size_t i = 0; i < arrayLen(args.list); i++) {
    LispValue v = arrayIndex(args.list, i);

    if (v.kind != NUMBER) {
      return mkErrorf("+: expected number, got %s", typeName(v.kind));
    }
    sum += v.number;
  }

  return mkNumber(sum);
}

◯ (+ 1 2 3 4 5)
; => 15

The implementation became very repetitive, as each single binary operator had to implement this complexity… In addition, some of the operators, like '-' and the boolean ops, have special properties, so I could not just implement a big wrapper for all of them. A better solution was to implement more distinct primitives again. I implemented each function as a proper binary function. In fact, I wrapped them all up in a single binary operator dispatch:

LispValue builtinAdd(LispValue args, Env *env) {
  return builtinBinOp(BIN_ADD, "add", args);
}

However, this does not answer the question of how to make this function variadic again! We have simply created a worse function The clue to solving this is that we need the 'variadic argument' syntax from Clojure:

(defn variadic-example [arg & args]
        (print-fmt "arg={}\nargs={}\n" arg args))

◯ (variadic-example 1 2 3 4 5)
; arg=1
; args=(2 3 4 5)

This allows us to parse some (or all!) arguments to a function as its own list of values, instead of names values. The function can then iterate over that list. In fact, we also used this super power to define the 'println' and 'print-fmt' functions earlier. Sorry for not explaining it then!

Now then, implementing the operator '+' is as simple as folding with 'add' over the list of arguments, like this:

(defn + [& nums]
  (cond
    (null? nums) 0
    (null? (cdr nums)) (car nums)
    :else (foldl add (car nums) (cdr nums))))

This same pattern can be repeated for all the binary operators, instead of being repeated in the interpreter! Isn't that nice!?

It also means we can special rules, for example a single argument '-' returning the negative of its argument

(defn - [& nums]
  (cond
    (null? (cdr nums)) (sub 0 (car nums))
    :else (foldl sub (car nums) (cdr nums))))

◯ (- 2 1)
; => 1

◯ (- 1)
; => -1

(why?)

One question has kept popping up as I work on these design changes - why? What is the purpose of doing this? I know enough about CPUs, interpreters and compilers to know that this is devestating for performance. Also, the overall footprint of the codebase is not exactly going down simply by moving functionality from the interpreter to the language itself. It is however a very satisfying excercise in design. In solidifying primitives. In seeing what language features are useful, and how to combine them. I am heavily plagiarizing design ideas from Rich Hickey, but implementing them by myself and seeing how they can be put together is a rewarding process.