Skip to content

Latest commit

 

History

History
1358 lines (1091 loc) · 46.8 KB

README.adoc

File metadata and controls

1358 lines (1091 loc) · 46.8 KB

LISP1.5 implementation on Gauche

1. M-expressions

When LISP was in its infancy, it wasn’t wrapped with so many parentheses which would later become its most memorable characteristic. At least in front of people’s eyes.

An early LISP program looked like this:

member[a;l] = [null[l] → NIL;
               eq[a;car[l]] → l;
               T → member[a;cdr[l]]]

S-expressions were also used, but only to denote data.

member[C;(A B C D E)]

LISP programmers back then read and wrote programs in this form, on paper. Yes, program sources were meant to be read by humans. In order for computers to understand programs, one must have prepared a deck of cards with holes punched according to BCD character code. The character set didn’t have many punctuations and it was impossible to punch M-expressions directly. Instead, people translated M-expressions into S-expressions, which was already used to describe nested list structures. In other words, people fed the machine the internal representation of syntax tree directly.

Eventually, people got used to write programs in S-expresions directly, and M-expressions were faded into oblivion.

However, when we read papers of that era, we see M-expressions everywhere. Now that we can write M-expressions in a text file, let’s make the computer understand it directly!

1.1. M-expression syntax

Here’s a summary of M-expression sytnax:

  • Identifiers are in all lowercase. Literal symbols are in all uppercase. No quote in M-expression. Literal list can be written using parentheses, e.g. (A B (C D . E)), without a quote.

  • Function call is written as fn[arg1;arg2;…​]

  • Conditional is [test1 -> expr1; test2 -> expr2;…​]. It works just like cond. Unicode arrow character (U+2192) can be used in place of ->.

  • Symbol NIL is used for list terminator. Hence cons[A;NIL] yields (A).

  • Literal function is written as lambda[[arg;..];expr]. This lambda form itself doesn’t have a value — it must be called with arguments to take effect.

  • Local recursive function can be written as label[name;lambda[[arg;…​];expr]], where you can use name in the expr to refer to itself.

  • In toplevel, you can define a function by name[arg;…​] = expr.

An M-expression can be translated into an S-expression as follows:

  • Literal data (symbols and lists) are embedded in the output by (QUOTE <literal>).

  • Function call fn[arg1;arg2;…​] becomes (FN ARG1 ARG2 …​)`. We didn’t have lower case characters on computers back then.

  • Conditional [test1 -> expr1; test2 -> expr2;…​] becomes (COND (TEST1 EXPR1) (TEST2 EXPR2) …​).

  • Literal function lambda[[arg;..];expr] becomes (LAMBDA (ARG …​) EXPR).

  • Local recursive function label[name;lambda[[arg;…​];expr]] becomes (LABEL NAME (LAMBDA (ARG …​) EXPR)).

1.2. Parsing a single M-expression

Module LISP1.5.mexpr implements parsers. A procedure parse-mexpr takes a string or an input port, and parses one M-expression and returns an S-expression.

gosh> ,u LISP1.5.mexpr
gosh> (parse-mexpr "cons[(A . B);C]")
(CONS (QUOTE (A . B)) (QUOTE C))
gosh> (parse-mexpr "lambda[[x];[eq[NIL;x]→T; T→F]]")
(LAMBDA (X) (COND ((EQ (QUOTE NIL) X) (QUOTE T)) ((QUOTE T) (QUOTE F))))

See how liteals (upper-case names and lists) are QUOTE -d once parsed.

💡

If you try the above example on your REPL, make sure your terminal character encoding matches your Gauche setting (it’s utf-8 by default).

Interestingly, there seemed no direct translation of definition syntax name[arg;…​] = expr into an S-expression. When a user translates a prorgam in M-expressions into S-expressions to punch the cards, she gathers all the definitions into a call of DEFINE pseudo function (see the note below).

For our purpose, we want the parser to yield a single S-expression from one M-expression. So, we return ($= …​) form as the result of parsing a defintion:

gosh> (parse-mexpr "null[x] = [eq[NIL;x]→T; T→F]]")
($= (NULL X) (COND ((EQ (QUOTE NIL) X) (QUOTE T)) ((QUOTE T) (QUOTE F))))

(The preceding $ of $= indicates it is an internal or implementation-dependent feature.)

The parse-mexpr function only parses one M-expression. If the input may contain more than one M-expression, use parse-mexprs instead, which reads input up to EOF and returns an lseq of result S-expressions.

1.3. Comments and other exhancements

We also add these extensions, for the convenience.

1.3.1. Comments

There’s no comment syntax defined in M-expressions. Since it’s for humans to read, you could freely intermix code and natural language descriptions. For our purpose, we make a hash sign # to the end of line is a comment. (We avoid ;, for it is used as a separator.)

1.3.2. Extended COND form

In Appendix B of LISP 1.5 Programmer’s Manual, they use a pseudo extension of conditional expression for concise explanation, in which you can access the result of test expression from the expression in that branch. Such extension wasn’t formalized and the actual code is written in assembly language instead of M-expressions. But for our purpose it’ll be convnient to support such extension.

It is to allow a conditional expression to have the following clause:

test => fun

Here, fun must be a LAMBDA form that takes one argument, or an expression that yield a function. First, test is evaluated, and if it yiels a true value (a value neither NIL nor F), the value is passed to the function. It’s the same as Scheme’s cond feature with =>.

We’ll explain the actual use case and implementation of this extension when we get to the full toplevel environment support.

1.4. Writing source in M-expression:

With Gauche’s reader directive feature, you can write source in M-expressions, as follows:

;;
;; Scheme comments
;;
(use LISP1.5.mexpr)
#!m-expr

# M-expression function definitions
function1[arg;...] = expression1
function2[arg;...] = expression2
...

For our purpose, we want to treat M-expressions as our source code, and the parser returns a single S-expression as a result. So we introduce our own extension.

($TOPLEVEL <toplevel-form> ...)

<toplevel-form> : ($= <expr> <expr> [<key>])
                | <expr>

When read, the entire source is wrapped in $TOPLEVEL form. Inside it, each toplevel form becomes either ($= <expr> <expr>) (in case of definition) or just an <expr> in case of toplevel function call. This $TOPLEVEL form is merely our parser’s way to wrap the result, and its interpretation depends on the caller of the parser; it doesn’t mean we’ll have a special form called $TOPLEVEL.

ℹ️

In the actual use case, all definitions in a program were gathered and translated into the following form to be punched:

DEFINE ((
(NAME (LAMBDA (ARG ...) EXPR))
(NAME (LAMBDA (ARG ...) EXPR))
...
))

This is actially a special syntax to execute a function call on toplevel. It takes a form FUNC (ARG …​), where ARG s are implicitly quoted. The function DEFINE takes one argument, which is a form of ((NAME LAMBDA-EXPR) …​).

If you want to perform some calculation, you list the call of the function after the DEFINE form, as follows:

DEFINE ((
 ... definitions ..
(DOSOMETHING (LABMDA (ARG ...) EXPR))
))

DOSOMETHING (ARG ...)

Examples are shown in p.15 and pp.48-51 of LISP1.5 Programmer’s Manual.

The #!m-expr directive translates those M-expressions into a LISP1.5 DEFINE form:

($TOPLEVEL
 ($= (FUNCTION1 ARG ...) EXPRESSION1)
 ($= (FUNCTION2 ARG ...) EXPRESSION2)
  ...)

Note that you have to have definitions of $TOPLEVEL and other primitive LISP1.5 forms before loading the source file; The LISP1.5.mexpr module only handles parsing.

We provide several implementations of those LISP1.5 primitives, which we’ll show you in the following chapters.

2. A Universal LISP Function

2.1. Running EVAL with minimal axioms

Section 1.6 of "LISP 1.5 Programmer’s Manual" is one of the pinnacles of the document. They show how to implement Lisp interpreter on top of Lisp systems. They call it "a Universal LISP function".

We write out their code in mx/eval.mx.

What’s interesting about it is that you only need a handful of functions and syntaxes to run the interpreter. We define those minimal set of primitives in LISP1/5/axioms.scm. It provides the definition of the following primitives: CAR, CDR, CONS, ATOM, EQ, QUOTE, and COND, as well as a definition of $TOPLEVEL to handle toplevel forms.

To try the eval function, first use the axioms module, then load the eval.mx file. Assuming you have load path set to the top directory of LISP1.5 source, you can say the following in the gosh REPL:

gosh> ,u LISP1.5.axioms
gosh> ,l mx/eval.mx
#t

Or, you can start gosh with loading necessary modules (this assumes you’re in the top directory of LISP1.5 source):

$ gosh -I. -u LISP1.5.axioms -l mx/eval.mx

On the gosh prompt, you can call EVAL. The first argument is the S-expression to evaluate, and the second argument is the environment (assoc list of symbols and values):

gosh> (EVAL '(CONS (CAR (QUOTE (X . Y))) (QUOTE Z)) 'NIL)
(X . Z)

Be aware of the difference of ' (quote) and QUOTE. The former one is recognized by Gauche. The latter one is recognized by EVAL.

If you prefer, you can write M-expressions using read-time constructor #,(m-expr "…​"):

gosh> (EVAL '#,(m-expr "cons[car[(X . Y)];Z]") 'NIL)
(X . Z)

Following is a bit more convoluted example. It defines append as a recursive funciton using LABEL, and calls it with two arguments, (A B C) and (X Y Z):

gosh> (EVAL '#,(m-expr "label[append;lambda[[xs;r];\
                               [eq[xs;NIL] -> r;\
                                T -> cons[car[xs];append[cdr[xs];r]]]]]\
                        [(A B C);(X Y Z)]")
            'NIL)
(A B C X Y Z)

This interpreter only knows the minimal 7 primitives: CAR, CDR, CONS, ATOM, EQ, QUOTE, and COND. To refer to anything other than that, you have to pass them in the environment argument.

The following example reverses a list, using the definition of NULL, APPEND and REVERSE given to the environment:

gosh> (EVAL '#,(m-expr "reverse[(A B C D E F G)]")
            '((NULL . #,(m-expr "lambda[[x];[eq[x;NIL] -> T; T -> F]]"))
              (APPEND . #,(m-expr "lambda[[xs;r];\
                                     [eq[xs;NIL] -> r;\
                                      T -> cons[car[xs];append[cdr[xs];r]]]]"))
              (REVERSE . #,(m-expr "lambda[[xs];\
                                      [null[xs] -> NIL;\
                                       T -> append[reverse[cdr[xs]];cons[car[xs];NIL]]]]"))
             ))
(G F D C B A)
ℹ️

We need to provide the function NULL in the environment, since the one defined in eval.mx exists in the world of Gauche, and is not visible from the world of EVAL.

💡

When you refer to an identifier that’s neither one of the built-in primitive nor the one given in the environment, you’ll get an error like the following:

*** ERROR: pair required, but got NIL
Stack Trace:
_______________________________________
  0  (car x)
        at "./LISP1/5/axioms.scm":9
  1  (CAR X)
        [unknown location]
  2  (CAAR A)
        [unknown location]
  3  (EQUAL (CAAR A) X)
        [unknown location]
  4  (ASSOC E A)
        [unknown location]
  5  (EVAL FN A)
        [unknown location]
...

The code searches the environment alist by ASSOC, hits the end of the alist without finding it and complains. Remember, we have minimal interpreter and there’s no fancy error handling mechanism.

2.2. Going Metacircular

Since the universal LISP function defined in eval.mx understands the primitives required to interpret functions in eval.mx, you can use our EVAL to evaluate eval.mx to run EVAL on top of EVAL — now you’re running a metacircular interpreter!

You might have noticed though, that axioms.scm provides $TOPLEVELS, which is missing in eval.mx. In our context of discussing metacircular interpreter, $TOPLEVELS appears as a result of parsing M-expression definitions, and should be understood as a meta-language to direct the set-up, rather than an integrated part of the language (one way to think of it is that if other primitives are C built-ins then $TOPLEVELS is #pragma or Makefile — they belong to a different layer.)

Of course, it is more convenient to have an ability in the core language to add new toplevel definitions, and we’ll deal with it later. For now, let’s stick to the 7 primitives.

In order to run EVAL inside EVAL, we need to prepare the definitions in eval.mx as an environment alist passed to outer EVAL. Run the following command in the toplevel source directory:

$ gosh tools/mexpr-env.scm mx/eval.mx

It reads eval.mx and prints the definitions in an alist. Copy the output, then start gosh again, read axioms and load eval.mx, and evaluate the EVAL expression, passing the copied alist as the environment (don’t forget the quote before the alist!):

gosh> ,u LISP1.5.axioms
gosh> ,l mx/eval.mx
#t
gosh> (EVAL '(EVAL (QUOTE (CAR (QUOTE (X . Y)))) (QUOTE NIL))
            '...<<here, copy & paste the output of mexpr-env.scm>>)
X

The result X is the result of (CAR (QUOTE (X . Y))), computed by the EVAL function implemented in LISP1.5, not the underlying Gauche.

If cut&pasting the environment alist is too tedious, mexpr-env.scm can create a definition of an auxiliary function EVAL*, which calls EVAL with the environment that has all the definitions in the given source file. Run mexpr-env.scm with -e option, and save the result in lisp/eval.lisp:

$ gosh tools/mexpr-env.scm -e mx/eval.mx > lisp/eval.lisp
💡

Instead of manually executing tools/mexpr-env.scm, you can run the standard build process (./configur && make) and all the converted files are placed under lisp/.

We use suffix lisp to indicate it is not a Scheme code (even though Gauche can understand it after using LISP1.5.axioms). The created lisp/eval.lisp looks as follows:

($TOPLEVELS ($= (EVAL* X) (EVAL X '...<<environment defined in eval.mx>>...
)))

That is, it defines EVAL* which takes one LISP1.5 expression and evaluates it under the enviornment where all the definitions in eval.mx is visible.

The created eval.lisp can be loaded to gosh after using LISP1.5.axioms. Together with mx/eval.mx, you can run EVAL on top of EVAL:

$ gosh -I. -uLISP1.5.axioms -lmx/eval.mx -leval-star.lisp
gosh> (EVAL* '#,(m-expr"eval[(CONS (QUOTE X) (QUOTE Y));NIL]"))
(X . Y)

This time we used M-expression in the inner call. It’s the same as writing '(EVAL (QUOTE (CONS (QUOTE X) (QUOTE Y))) (QUOTE NIL)).

Let’s recap what’s happening. The outer EVAL (via EVAL*) is executed by Gauche, using the initially loaded eval.mx. The inner EVAL is interpreted by the outer EVAL, using the enviornment created by mexpr-env.scm. And the expression (CONS (QUOTE X) (QUOTE Y)) is interpreted by the inner EVAL:

        +----------------------------+
        | (CONS (QUOTE X) (QUOTE Y)) |
        +----------------------------+
        |           EVAL             |  ; inner EVAL
        +----------------------------+
        |           EVAL             |  ; outer EVAL
        +----------------------------+
        |          Gauche            |
        +----------------------------+

If it is not obvious, try it with an altered environment. For example, edit the eval.lisp created above to change the inner EVAL recognizes KWOTE instead of QUOTE. There’s only one place to change:

 (EVAL
  LAMBDA
  (E A)
  (COND
   ((ATOM E) (CDR (ASSOC E A)))
   ((ATOM (CAR E))
    (COND ((EQ (CAR E) (QUOTE KWOTE)) (CADR E))
                              ^^^^^
     ((EQ (CAR E) (QUOTE COND)) (EVCON (CDR E) A))
     ((QUOTE T) (APPLY (CAR E) (EVLIS (CDR E) A) A))))
   ((QUOTE T) (APPLY (CAR E) (EVLIS (CDR E) A) A))))

(Leave other QUOTE intact, for they are recognized by the outer EVAL).

Now, try it:

(EVAL* '(EVAL (QUOTE (CONS (KWOTE X) (KWOTE Y))) (QUOTE NIL)))
  => (X . Y)

The two QUOTE s are recognized by the outer EVAL, and the two KWOTE s are recognized by the inner EVAL. Furthermore, the ' (quote) is recognized by Gauche.

2.3. Having FUN with ARG

(If you know what we’ll talk about from the section title, you can skip this section. Yes, it’s just about that.)

One advantage of having a simple language with a concise interpreter is that we can tweak it easily.

In the universal EVAL, a function is represented as a literal list whose car is LAMBDA. It is a powerful idea—​now you can have a function as a first-class citizen of the language, that you can construct it, pass it to another function, and return it from another funciton. However, it has a flaw.

Let’s try a failure case and see if we can fix it.

Consider MAPCAR function, which takes a function and a list, and returns a list of results of the function applied to each element of the given list (that is, Scheme’s map function):

mapcar[fn;x] = [null[x] -> NIL;
                T -> cons[fn[car[x]];mapcar[fn;cdr[x]]]]

It is in mx/mapcar.mx. You can’t load it directly into Gauche, however. Treating a list starting with LAMBDA as a function is a feature of EVAL, not Gauche. We have to make EVAL understand the above definition.

We can use the same technique we used in the metacircular interpreter — that is, translate the definition of MAPCAR above into an enviroment alist. We also need the definition of NULL, so let’s combine eval.mx together with mapcar.mx. It can be done with the following command line:

$ gosh tools/mexpr-env.scm -e mx/eval.mx mx/mapcar.mx > lisp/mapcar.lisp

Alternatively, run ./configure then make in the toplevel source directory.

Once you have lisp/mapcar.lisp, you can load it (after mx/eval.mx) and you can call MAPCAR inside EVAL*:

$ gosh -I. -uLISP1.5.axioms
gosh> ,l mx/eval.mx
#t
gosh> ,l lisp/mapcar.lisp
#t
gosh> (EVAL* '(MAPCAR (QUOTE (LAMBDA (X) (CONS X (QUOTE Y)))) (QUOTE (A B C))))
((A . Y) (B . Y) (C . Y))
gosh> (EVAL* '#,(m-expr "mapcar[(LAMBDA (X) (CONS X (QUOTE Y)));(A B C)]"))
((A . Y) (B . Y) (C . Y))

So far, so good.

Now, Let’s try nesting MAPCAR. We’ll do equivalent to the following Scheme code:

(map (lambda (x) (map (lambda (y) (cons x y)) '(p q r))) '(a b c))
  => (((a . p) (a . q) (a . r)) ((b . p) (b . q) (b . r)) ((c . p) (c . q) (c . r)))

Here’s LISP1.5 version:

(EVAL* '(MAPCAR (QUOTE (LAMBDA (X)
                         (MAPCAR (QUOTE (LAMBDA (Y) (CONS X Y)))
                                 (QUOTE (P Q R)))))
                (QUOTE (A B C))))
  => ((((P Q R) . P) ((Q R) . Q) ((R) . R)) (((P Q R) . P) ((Q R) . Q) ((R) . R)) (((P Q R) . P) ((Q R) . Q) ((R) . R)))

Oops, what happened? Let’s examine the details. Outer MAPCAR receives two actual parameters, (LAMBDA (X) …​) and (A B C) (QUOTE s are stripped when arguments are evaluated by evlis before calling the function). They are bound to the local parameters, FN and X, respectively. In other words, the body of MAPCAR:

[null[x] -> NIL;
 T -> cons[fn[car[x]];mapcar[fn;cdr[x]]]]

is evaluated with the following environment:

((FN . (LAMBDA (X)
         (MAPCAR (QUOTE (LAMBDA (Y) (CONS X Y)))
                 (QUOTE (P Q R)))))
 (X . (A B C)))

Since X is not NIL, evaluation goes to cons[…​] branch. The first argument is fn[car[x]], so first car[x] is evaluated and yields A, fn evaluated to the outer LAMBDA form and we call it with A. The body of inner LAMBDA form, which is the inner MAPCAR call, is evaluated with the following environment (Keep in mind that the new local bindings are inserted in front of outer environment):

((X . A)
 (FN . (LAMBDA (X)
         (MAPCAR (QUOTE (LAMBDA (Y) (CONS X Y)))
                 (QUOTE (P Q R)))))
 (X . (A B C)))

Inner MAPCAR gets (LAMBDA (Y) (CONS X Y)) and (P Q R) as two actual parameters, which are bound to MAPCAR 's formal paramter FN and X again, and the environment under which innter MAPCAR 's body is evaluated looks like this:

((FN . (LAMBDA (Y) (CONS X Y)))
 (X . (P Q R))
 (X . A)
 (FN . (LAMBDA (X)
         (MAPCAR (QUOTE (LAMBDA (Y) (CONS X Y)))
                 (QUOTE (P Q R)))))
 (X . (A B C)))

Finally, innter LAMBDA is called — first, P as the actual parameter, which is bound to Y. Hence the body of the inner LAMBDA, which is (CONS X Y), is evaluated under the following environment:

((Y . P)
 (FN . (LAMBDA (Y) (CONS X Y)))
 (X . (P Q R))                                (1)
 (X . A)                                      (2)
 (FN . (LAMBDA (X)
         (MAPCAR (QUOTE (LAMBDA (Y) (CONS X Y)))
                 (QUOTE (P Q R)))))
 (X . (A B C)))                               (3)
  1. Argument for the inner MAPCAR

  2. Argument for the outer LAMBDA

  3. Argument for the outer MAPCAR

Now it is clear why it didn’t work. When we write the initial nested MAPCAR form, we expect that X in the innermost expression (CONS X Y) refer to the formal parameter of the outer LAMBDA. But it is shadowed by the formal parameter of the MAPCAR.

This is a well-known problem, and in lambda calculus it is avoided by renaming the parameter names to avoid conflict. In our case, if we rename the formal parameter of inner LAMBDA to something different from the formal parameter of MAPCAR, it works as expected:

(EVAL* '(MAPCAR (QUOTE (LAMBDA (Z)                                  (1)
                         (MAPCAR (QUOTE (LAMBDA (Y) (CONS Z Y)))
                                 (QUOTE (P Q R)))))
                (QUOTE (A B C))))
 => (((A . P) (A . Q) (A . R)) ((B . P) (B . Q) (B . R)) ((C . P) (C . Q) (C . R)))
  1. We use Z to avoid confclit with MAPCAR 's X.

However, we can’t possibly avoid all potential conflict manually, and renaming all formal parameters programatically to unique ones can be costly.

LISP1.5 employed another way to solve this problem. Instead of passing LAMBDA form quoted, it introduced another form, called FUNCTION. The rule is that whenever you pass a function as an argument, you wrap it with FUNCTION instead of QUOTE. With this rule, our call of nested MAPCAR would look like this:

(EVAL* '(MAPCAR (FUNCTION (LAMBDA (X)
                            (MAPCAR (FUNCTION (LAMBDA (Y) (CONS X Y)))
                                    (QUOTE (P Q R)))))
                (QUOTE (A B C))))

Now we modify our universal LISP function to deal with FUNCTION. We only need to change two lines. First, make EVAL understand (FUNCTION <fn>) form. Whenver it sees the form, it just returns a list (FUNARG <fn> <env>), where <env> is the evaluation enviornment:

eval[e;a] =
  [atom[e] -> cdr[assoc[e;a]];
   atom[car[e]] -> [eq[car[e];QUOTE] -> cadr[e];
                    eq[car[e];FUNCTION] -> cons[FUNARG;cons[cadr[e];cons[a;NIL]]]; (1)
                    eq[car[e];COND] -> evcon[cdr[e];a];
                    T -> apply[car[e];evlis[cdr[e];a];a]];
   T -> apply[car[e];evlis[cdr[e];a];a]]
  1. If we see (FUNCTION <fn>) form, wrap the function and the current environment in FUNARG form, as (FUNARG <fn> <env>).

Then, in APPLY, we call <fn> with the rememberd <env> instead of the passed environment:

apply[fn;x;a] =
  [atom[fn] -> [eq[fn;CAR] -> caar[x];
                eq[fn;CDR] -> cdar[x];
                eq[fn;CONS] -> cons[car[x];cadr[x]];
                eq[fn;ATOM] -> atom[car[x]];
                eq[fn;EQ] -> eq[car[x];cadr[x]];
                T -> apply[eval[fn;a];x;a]];
   eq[car[fn];FUNARG] -> apply[cadr[fn];x;caddr[fn]];                  (1)
   eq[car[fn];LAMBDA] -> eval[caddr[fn];pairlis[cadr[fn];x;a]];
   eq[car[fn];LABEL] -> apply[caddr[fn];x;cons[cons[cadr[fn];caddr[fn]];a]]]
  1. Apply the wrapped function in the rememberd environment

The changed definitions are in mx/funarg.mx. You can load it and see it addresses the issue (which has been called FUNARG problem).

$ gosh -I. -u LISP1.5.axioms -l mx/funarg.mx
gosh> ,l lisp/mapcar.lisp
#t
gosh> (EVAL* '(MAPCAR (FUNCTION (LAMBDA (X)
                         (MAPCAR (FUNCTION (LAMBDA (Y) (CONS X Y)))
                                 (QUOTE (P Q R)))))
                (QUOTE (A B C))))
(((A . P) (A . Q) (A . R)) ((B . P) (B . Q) (B . R)) ((C . P) (C . Q) (C . R)))
ℹ️

Did you notice that you actually did’t need FUNCTION? Instead of introducing another form, you can let EVAL create FUNARG when it sees a bare LAMBDA form. The definition will look like this:

eval[e;a] =
  [atom[e] -> cdr[assoc[e;a]];
   atom[car[e]] -> [eq[car[e];QUOTE] -> cadr[e];
                    eq[car[e];LAMBDA] -> cons[FUNARG;cons[e;cons[a;NIL]]];
                    eq[car[e];COND] -> evcon[cdr[e];a];
                    T -> apply[car[e];evlis[cdr[e];a];a]];
   T -> apply[car[e];evlis[cdr[e];a];a]]

The updated definition is in mx/funarg-lambda.mx. Using it, calling MAPCAR becomes quite simpler:

$ gosh -I. -u LISP1.5.axioms -l mx/funarg-lambda.mx
gosh> ,l lisp/mapcar.lisp
#t
gosh> (EVAL* '(MAPCAR (LAMBDA (X)
                        (MAPCAR (LAMBDA (Y) (CONS X Y))
                                (QUOTE (P Q R))))
                      (QUOTE (A B C))))
(((A . P) (A . Q) (A . R)) ((B . P) (B . Q) (B . R)) ((C . P) (C . Q) (C . R)))

This idea was realized by Sussman and Steele in 1975, as a dialect Scheme. The first paper of Scheme stated it at the beginning:

SCHEME is essentially a full-funarg LISP.  LAMBDA expressions need
not be QUOTEd, FUNCTIONed, or *FUNCTIONed when passed as arguments or
returned as values; they will evaluate to closures themselves.

2.4. Symbols and toplevel environment

So far, our EVAL requires any bindings to be provided via the environment argument. Preprocessing the source with mexpr-env.scm was a remedy, but it’s still troublesome. So our next step is to add a toplevel environment, that keeps global bindings of symbols.

The easiest way is to keep a global table, and when we search a variable binding via ASSOC (in the first branch of EVAL), we also look up the table when we didn’t find any local bindings.

However, LISP1.5 took a bit different approach. Since its symbol had a property list, or plist, which could hold arbitrary key-value pairs, so I suspect it was natural to store the global value of the symbol in its plist. In fact, even the name of a symbol was merely one of its properties. In LISP1.5, a symbol was just another type of list where the car of its head was marked with a special value (-1).

ℹ️

A property list (plist) associates keys to values, much like an associative list (alist), but its structure alternates keys and values. For example, if key A has value APPLE and key B has a value BANANA, it can be represented with the following alist and plist, respectively:

;; alist
((A . APPLE) (B . BANANA))

;; plist
(A APPLE B BANANA)

The number of cons cells used are the same. We’re not sure why LISP1.5 creators used plist for symbol properties, while they used alist for environment in EVAL.

In our minimal infrastructure (LISP1/5/axioms.scm) we just used Gauche symbols for LISP symbols. It might be interesting, though, to reproduce what LISP1.5 did — using a list to implement symbols!

That is, from now on, our LISP symbol is a pair whose car is a special marker. We use Gauche symbol ATOM. From LISP world, a LISP symbol is an unbreakable unit (hence it is called atom), so the marker is never be visible. Under the hood, in Gauche level, we can break an atom to access its internal structure. It is as if LISP world deals with chemical reactions and Gauche world deals with nuclear reactions.

In LISP symbols, its name is stored as a value of the property PNAME. Since the property list is scanned by LISP function, we have to use LISP symbols as the property key. For the name itself, we use a Scheme string; in real LISP1.5, the name is stored in a special way and treated specially (there wasn’t a string type).

Thus, LISP symbol PNAME has the following structure in Gauche:

(define *PNAME* '#0=(ATOM #0# "PNAME"))

The #0= notation is a Scheme way to write a circular structure. The symbol PNAME has a propoerty list, in which the key PNAME is associated to the name "PNAME". Note that they LISP symbol PNAME itself doesn’t have a global value.

The global value of symbols is stored as a propery value with the key APVAL. So we need the LISP symbol APVAL, which looks like the following in Gauche. APVAL itself doesn’t have a global value either:

(define *APVAL* `(ATOM ,*PNAME* "APVAL"))

Once we have PNAME and APVAL, we can define NIL, whose name is "NIL" and value is itself. We can’t use #0= notation this time, since we have to construct the list using values of *PNAME\* etc.

(define *NIL* (rlet1 nil (list 'ATOM *PNAME* "NIL" *APVAL*)
                (set! (cddddr nil) (list nil))))

Here’s how *NIL* looks like in Gauche world. 1=(ATOM #1 "PNAME") is LISP symbol PNAME, and (ATOM 1 "APVAL") is LISP symbol APVAL. Remember we’re looking at the internal of atoms — from LISP world, this is just a symbol NIL.

gosh> *NIL*
#0=(ATOM #1=(ATOM #1# "PNAME") "NIL" (ATOM #1# "APVAL") #0#)

We can define several symbols in this way. See LISP1/5/runtime.scm for all the predefined symbols.

Let’s start building infrastructure. Our LISP world only have symbols and cons cells so far (we’ll add numbers later). We can define $atom? and $cons? as follows (The $ indicates it deals with LISP objects):

(define ($atom? obj) (and (pair? obj) (eq? (car obj) 'ATOM)))
(define ($cons? obj) (and (pair? obj) (not (eq? (car obj) 'ATOM))))

Then we can define $lisp->scheme, which converts LISP data structure into Scheme data structure, handy for debugging. We map NIL inside the structure into Scheme empty list, so that list structure can be printed naturally (instead of having . NIL) at the end.) We also convert non-LISP object into a string #[…​].

(define ($lisp->scheme obj)
  (define (rec obj)
    (cond [(eq? obj *NIL*) '()]
          [($atom? obj) (string->symbol (cadr (member *PNAME* (cdr obj))))]
          [(pair? obj) (cons (rec (car obj)) (rec (cdr obj)))]
          [else (format "#[~s]" obj)]))
  (if (eq? obj *NIL*)
    'NIL
    (rec obj)))

It’s also handy to have $scheme->lisp, which converts Scheme structure into LISP structure. One important point: We want to keep symbol’s eq -ness, that is, LISP symbols with the same name can be compared with eq. So we keep a hashtable to map Scheme symbol to LISP symbols.

(define *obtable* (hash-table-r7 eq-comparator
                                 'NIL *NIL*
                                 'PNAME *PNAME*
                                 'APVAL *APVAL*))

(define ($scheme->lisp obj)
  (cond [(null? obj) *NIL*]
        [(symbol? obj) (or (hash-table-get *obtable* obj #f)
                           (rlet1 s (list 'ATOM *PNAME* (symbol->string obj))
                             (hash-table-put! *obtable* obj s)))]
        [(pair? obj) (cons ($scheme->lisp (car obj))
                           ($scheme->lisp (cdr obj)))]
        [else (errorf "Cannot convert ~s to LISP" obj)]))

Let’s try them. Converting Scheme (A B C D E) into LISP results somewhat scary structure, but converting it back shows it’s nothing to be afraid of:

gosh> ($scheme->lisp '(A B C D E))
((ATOM #0=(ATOM #0# "PNAME") "A") (ATOM #0# "B") (ATOM #0# "C")
 (ATOM #0# "D") (ATOM #0# "E") . #1=(ATOM #0# "NIL" (ATOM #0# "APVAL") #1#))
gosh> ($lisp->scheme *1)
(A B C D E)

Not all global values are stored in APVAL property. LISP1.5 uses several different keys, depending on the type of the value. APVAL is used when a symbol is used as a variable, and other keys are used when a symbol is used in the function position of the function call.

Key Value

APVAL

The value is a LISP object.

EXPR

The value is a LISP-defined function (LAMBDA or FUNARG form). The arguments are evaluated before passed to it.

FEXPR

The value is a LISP-defined function (LAMBDA or FUNARG form). The arguments are not evaluated, and passed as a single list.

SUBR

The value is a native function (written in assembly in the acutal LISP1.5, written in Gauche in our case). The arguments are evaluated before passed it.

FSUBR

The value is a native function (written in assembly in the acutal LISP1.5, written in Gauche in our case). The arguments are not evaluated, and passed as a single list.

It is worth to mention that EXPR form receives fixed-number of arguments. If you want to write a function in LISP that takes variable number of arguments, you have to make it FEXPR, and evaluate the given list of arguments by yourself.

ℹ️

Lisp dialects can be categorized to either Lisp-1 or Lisp-2. They are not versions, but about namespaces.

Lisp-1 unifies function and variable namespaces, so in the function call syntax, the function name is looked up the same way as variable look-up. Scheme is Lisp-1.

Lisp-2 have separate namespaces for functions and variables. You can use the argument named list, and it is treated separately from the function list. When you need to call a function stored in a variable, you need to use an extra function, funcall. Common Lisp is Lisp-2.

This design of having different keys for function call and variable makes LISP1.5 a Lisp-2. However, interestingly, to call a function stored in a variable you can place the variable in the function position, without funcall, just like Scheme. So, coincidentally, we can say LISP1.5 is somewhat between Lisp-1 and Lisp-2.

2.5. Enhancement of M-expressions

The EVAL procedure that uses symbol’s property lists are shown in Appending B of "LISP1.5 Programmer’s Manual". However, it contains some pseudo code which were actually implemented in the assembly. Although we can rewrite the code in pure LISP1.5, it would be pretty verbose; instead, we enhance our M-expressions a bit so that the pseudo code in Appendix B can be written naturally in our implementation.

Specifically, we allow this clause in the cond form:

test => proc

The test expression is evaluated, and if it yields neither NIL nor F, the procedure proc is called with the result of test as the sole argument. It is the same as Scheme’s cond. You can also use Unicode character (U+21d2) in place of .

We also allow λ (U+03bb) in place of lambda, for conciseness and similarity to the listing in "LISP1.5 Programmer’s Manual".

Here’s a Scheme’s filter-map written in our M-expression:

filtermap[pred;lis] =
   [null[lis] → NIL;
    pred[car[lis]] ⇒ λ[[x];cons[x;filtermap[pred;cdr[lis]]]];
    T → filtermap[pred;cdr[lis]]]

It works as follows:

filtermap[atom;(A (B) C (D) E)]
  ⇒ (A C E)

2.6. Bridging worlds

As we did in our first version with axioms.scm and eval.mx, we want to keep Scheme code minimal and write the rest of the system in LISP itself. We also want to write so-called standard libraries in LISP, too.

When you write language X in the language X itself, you have to be epecially careful which world you’re dealing with. Before proceeding, let’s recap the layered structure we saw in the previous sections.

  • In axioms.scm, we defined minimal operators in Scheme to run LISP 1.5. It is the bottom world, or the Basement. We can see all the mechanics that runs the LISP system from the Basenment.

  • Then we loaded eval.mx, which is written in LISP 1.5 itself. At this time though, the functions in eval.mx, such as NULL, ASSOC or EVAL, are actually Gauche variables, bound to Gauche procedures; The DEFINE macro in axioms.scm translates LISP 1.5 definitions into Gauche definitons. The functions in eval.mx doesn’t know about Gauche, even though they themselves are running as Gauche procedures. We’re in the Ground Floor.

  • Then we processed eval.mx with mexpr-env.scm to produce eval.lisp. It has EVAL*, which is still Ground Floor function. It takes a LISP1.5 expression and evaluates it. The expression passed to EVAL* lives in the First Floor, above the Ground Floor. As we’ve seen, the habitants in the First Floor knows nothing about the Ground Floor or the Basement, except the bindings passed as the environment.

Now, in our revised runtime, difference between the Basement and the Ground Floor becomes wider: A LISP symbol is an unbreakable atom in the Ground Floor, but it’s just a pair in the Basement.

We do need a few conduits between the floors, so that the upper floor can access the functionality of lower floor:

  • SUBR and FSUBR are functions implemented in the basement. In the original LISP1.5, they were implemented in assembly language. In our case, they are written in Gauche. In order to invoke those functions from the ground floor, we need an additional primitive, CALLSUBR, which takes the instance of SUBR or FSUBR and arguments, to call it.

  • Atoms in LISP world now has structure---somewhat like that an atom in original sense was the minimal unbreakable building block of the universe, then mankind found it has electrons and nucleus in it. We do want to treat LISP atoms as unbreakable entity for most of the time, except when we want to access its property list.

  • The previous version of EVAL doesn’t have error handling mechanism. For usability, we need some minimal mechanism to signal an error. Theoretically, a sophisticated error handling mechanism can be implemented fully in LISP layer---e.g. we can define a special "error" value, and whenever something yields an error value, we let all the expressions that got the error value just returns it, so that the error value propagates to the final result. However, it is convenient to stop and examine the evaluator at the moment when error occurs, and such a mechanism needs access to the basement. For now, we provide ERROR procedure as another primitive.

  • The previous version of EVAL didn’t have literal fuction (lambda form) in EVAL code itself---the lambda form is dealt by EVAL, but the Basement that executes EVAL doesn’t need to know that. Now that, for our convenience, we use several lambda forms in the definition of EVAL, so the Basement needs to deal with them, too.

The revised runtime Basement routines---replacing axioms.scm is in LISP1/5/runtime.scm:

(define (CAR x) (if (null? (car x)) *NIL* (car x)))
(define (CDR x) (if (null? (cdr x)) *NIL* (cdr x)))
(define (CONS x y) (cons x (if (eq? y *NIL*) '() y)))
(define (ATOM x) (if ($atom? x) *T* *F*))
(define (EQ x y) (if (eq? x y) *T* *F*))
(define (CALLSUBR subr args) (apply subr args))
(define (ERROR obj) (error "Meta*LISP Error:" ($lisp->scheme obj)))
(define T *T*)
(define F *NIL*)
(define NIL *NIL*)

(define-syntax LAMBDA lambda)
(define-syntax QUOTE
  (syntax-rules ()
    [(_ x) ($scheme->lisp 'x)]))
(define-syntax COND
  (syntax-rules (=>)
    [(_) *NIL*]
    [(_ (test expr) . more)
     (let ([t test])
       (if (or (eq? t *NIL*) (eq? t *F*))
         (COND . more)
         expr))]
    [(_ (test => proc) . more)          ; extension
     (let ([t test])
       (if (or (eq? t *NIL*) (eq? t *F*))
         (COND . more)
         (proc t)))]))

(define-syntax $TOPLEVELS
  (syntax-rules ($=)
    [(_ ($= (name args ...) expr) ...)
     (begin (define name
              (let ([lsym ($scheme->lisp 'name)]
                    [lfn ($scheme->lisp '(LAMBDA (args ...) expr))])
                (set! (cdr lsym) `(,($scheme->lisp 'EXPR) ,lfn ,@(cdr lsym)))
                (lambda (args ...) expr)))
            ...)]))

When we read the definition of new EVAL in Gauche, the literals are passed as Gauche’s liteals. We treat it as LISP1.5 literals, hence our QUOTE form in the basement translates Gauche literals to LISP1.5 literals by $scheme→lisp.

The COND is also enhanced to handle our extension.

The $TOPLEVELS now not only defines Gauche procedures, but also registers the defined form to the LISP1.5 symbol’s EXPR property. That is, if we load this M-expression on top of the Basement:

caar[x] = car[car[x]]

It defines (1) a Gauche procedure CAAR, and (2) it registers EXPR property of LISP1.5 symbol CAAR with S-expression (LAMBDA (X) (CAR (CAR X))).

The runtime.scm also defines SUBR property of several symbols that are used as primitives:

(define-syntax defglobal
  (syntax-rules ()
    [(_ var key val)
     (let1 lsym ($scheme->lisp 'var)
       (set! (cdr lsym) `(,($scheme->lisp key) ,val ,@(cdr lsym))))]))

(defglobal CAR 'SUBR CAR)
(defglobal CDR 'SUBR CDR)
(defglobal CONS 'SUBR CONS)
(defglobal ATOM 'SUBR ATOM)
(defglobal EQ 'SUBR EQ)
(defglobal ERROR 'SUBR ERROR)
(defglobal CALLSUBR 'SUBR CALLSUBR)

2.7. Revised EVAL with global environment:

Now we write EVAL that understands the global environment. First we need a couple of auxiliary procedures.

Previously, we used assoc to search local bindings in the local environment. We didn’t consider an error there, so if you use undefined variable it yielded Gauche error. The following sassoc takes an extra thunk (a procedure with no arguments) and invokes it when the item x isn’t found in an associative list a:

sassoc[x;a;thunk] =
  [null[a] → thunk[];
   equal[caar[a];x] → car[a];
   T → sassoc[x;cdr[a];thunk]]

Another function is to get the property value with the key y in the symbol x. As we explained above, we access symbol’s property list just using car and cdr:

get[x;y] =
  [null[x] → NIL;
   eq[car[x];y] → cadr[x];
   T → get[cdr[x];y]]

Note: Since a property list alternates keys and values, it must loop with cddr---skipping a value. However, LISP1.5 Programmer’s Manual lists the above code, just looping with cdr. I don’t know if it was just an overlook or they just reused exisitng get procedure.

One consequence of that choice is that we can’t simply store the symbol’s global value as APVAL’s value, since if that value happens to be a symbol such as `EXPR, it’ll confuse get procedure. So the `APVAL’s value is wrapped with a list.

Now, we can write revised APPLY and EVAL that maintaines the global environment in symbols' propetry lists:

apply[fn;args;a] =
  [null[fn] → NIL;
   atom[fn] → [get[fn;EXPR] ⇒ λ[[e];apply[e;args;a]];
                get[fn;SUBR] ⇒ λ[[s];callsubr[s;args]];
                T → apply[cdr[sassoc[fn;a;λ[[];error[A2]]]];args;a]];
   eq[car[fn];LABEL] → apply[caddr[fn];args;cons[cons[cadr[fn];caddr[fn]];a]];
   eq[car[fn];FUNARG] → apply[cadr[fn];args;caddr[fn]];
   eq[car[fn];LAMBDA] → eval[caddr[fn];pairlis[cadr[fn];args;a]];
   T → apply[eval[fn;a];args;a]]

eval[form;a] =
  [null[form] → NIL;
   atom[form] → [get[form;APVAL] ⇒ λ[[v];car[v]];
                  T → cdr[sassoc[form;a;λ[[];error[A8]]]]];
   eq[car[form];QUOTE] → cadr[form];
   eq[car[form];FUNCTION] → cons[FUNARG;cons[cadr[form];cons[a;NIL]]];
   eq[car[form];COND] → evcon[cdr[form];a];
   atom[car[form]] → [get[car[form];EXPR]
                         ⇒ λ[[e];apply[e;evlis[cdr[form];a];a]];
                       get[car[form];FEXPR]
                         ⇒ λ[[f];apply[f;cons[cdr[form];cons[a;NIL]];a]];
                       get[car[form];SUBR]
                         ⇒ λ[[s];callsubr[s;evlis[cdr[form];a]]];
                       get[car[form];FSUBR]
                         ⇒ λ[[f];callsubr[f;cons[cdr[form];cons[a;NIL]]]];
                       T → eval[cons[cdr[sassoc[car[form];a;λ[[];error[A9]]]];
                                     cdr[form]];
                                a]];
   T → apply[car[form];evlis[cdr[form];a];a]]

This is pretty close to what is shown in the Appendix B of "LISP1.5 Programmer’s Manual".

Note that We have a couple of calls of ERROR. The argument is an error code.

  • error[A2] : A symbol is used as a procedure but doesn’t have a binding.

  • error[A8] : A symbol is used as a variable but doesn’t have a binding.

  • error[A9] : A symbol is used as a procedure or syntax but doesn’t have a binding.

Another curious point. Check the atom branch of eval:

   atom[form] → [get[form;APVAL] ⇒ λ[[v];car[v]];
                  T → cdr[sassoc[form;a;λ[[];error[A8]]]]];

It first accesses symbol’s APVAL property, then searches the environment. That is, symbol’s global value takes precedence from local values.

You can run this version of EVAL by loading mx/genv.mx into Gauche. Note that EVAL now accepts LISP1.5 data, and returns LISP1.5 data. You have to convert them to Gauche’s data back and forth.

Here, we just evaluate the global variable F, whose value is NIL:

gosh> ($lisp->scheme (EVAL ($scheme->lisp 'F) ($scheme->lisp '())))
NIL

The following code defines REVERSE function locally and calls it. Note that null and append are already stored as EXPR property of those symbols when we loaded genv.mx, so we don’t need to provide them in the environment:

gosh> ($lisp->scheme
        (EVAL ($scheme->lisp '#,(m-expr "reverse[(A B C D E F G)]"))
              ($scheme->lisp
               '((REVERSE . #,(m-expr "lambda[[xs];\
                                         [null[xs] -> NIL;\
                                         T -> append[reverse[cdr[xs]];cons[car[xs];NIL]]]]"))))))

(G F E D C B A)