François-René Rideau (fare) wrote,
François-René Rideau

A tale of many nests

This short essay will tell you about my favorite macro, nest, discuss the modularity of syntax extension, and use the implementation of that macro as an illustration for how to use defmacro, syntax-rules and syntax-case, providing along the way a comparison between these respective macro definition systems.

Using the nest macro

When I started using Scheme as my main Lisp, the first macro I wrote was the nest macro. What macro? The nest macro. The one that in Common Lisp helps my code avoid drifting hopelessly to the right as I nest binding form inside binding form... by doing the nesting for me. To illustrate the kind of issues that I'm concerned with, consider the Common Lisp code snippet below:

(multiple-value-bind (a1 b1 p1) (foo1)
  (with-open-file (f1 p1 ...)
    (let ((x1 (read f1)))
      (when x1
        (multiple-value-bind (a2 b2 p2) (foo2)
          (with-open-file (f2 p2 ...)
            (let ((x2 (read f2)))
              (when x2
                (bar x1 x2 ...))))))))

You're doing the same thing twice, but because of all those binding forms, the code moves right, the symmetry is broken, and the line limit makes you cut your lines more and more as you nest more forms, until you run out of space. It's really ugly. So much so that it makes me miss block-oriented languages in the ALGOL tradition (like C, OCaml or Python) where all the bindings (at least simple ones) go at the same level of indentation, and symmetry is preserved between consecutive bindings, and the line limit isn't an increasing threat as my functions get more complex.

Of course, it is always better when you can break down your functions into simpler chunks, at which point it doesn't matter that you move a little bit to the right, because no function is long enough for this right shift to matter much. However, when you are really computing some correspondence between two (or more) sets of entities, there's no way around doing a lot of nested bindings before you have all the input elements aligned together in a way that you can even start your computation. That's what happened for instance with my famous macros implementing the isomorphisms between pure and stateful data structures and between interface-passing-style (typeclasses) and object-oriented style (classes): the more complex macros, to "classify" interfaces, had up to 18 levels of nesting. That's a lot of indentation, that reflects as much context to fit in your limited brain registers at once (and indeed, it took me over a month to complete the first in that series of macros). Happily there is a lot of symmetry, that will be more readily apparent if only your code doesn't have to get indented so much.

The traditional solution to this approach, in Common Lisp, was to invent a "mother of all" binding macro, that could replace all the other ones at once: Attempts at providing such universal binding form include metabang-bind, let+, at least one attempt internal at ITA that I saw, and probably many more attempts that I don't know about, not to mention pattern matchers like my old fare-matcher and its better replacements optima and trivia. Now, the problem with this approach is that whoever writes this universal binding form must offer a way to supersede each and every binding form in the language. But since the language is extensible and people keep defining new binding forms (especially using CALL-WITH-* style), the task is an endless, sisyphean endeavor. What more, it is also a modularity nightmare, as there is no clear responsible party for each of the N*M extensions of N universal binding forms to each match each of M new kinds of bindings. I believe the best universal binding system in town these days is trivia, the extensible pattern matcher; but even it has only limited mind share. (Of course, I'm partial to pattern matchers: when, decades ago, I switched from OCaml to Common Lisp, I missed pattern-matching so much that the first thing I did was to write the first ML-style pattern matcher for Common Lisp; which could of course be done within the language using macros.)

As I was discussing this topic a few years ago, and potential extensions to macro-expanders to support capture of syntactic "block" continuations so they would work well with python-like syntax, my then colleague Marco Baringer, of arnesi fame, told me about this beautiful, simple solution that he knew of: the nest macro. This one remarkable macro already supports all binding forms past, present and future, without anyone having to write any code whatsoever to extend it — because it trivially embraces the syntax of them all with one weird trick: recognizing that all binding macros end with a body of code inside which the variable are bound, and nesting each form passed to the macro inside the body of the previous form, at the end. Thus, for instance the above snippet is flattened this way:

 (multiple-value-bind (a1 b1 p1) (foo1))
 (with-open-file (f1 p1 ...))
 (let ((x1 (read f1))))
 (when x1)
 (multiple-value-bind (a2 b2 p2) (foo2))
 (with-open-file (f2 p2 ...))
 (let ((x2 (read f2))))
 (when x2)
 (bar x1 x2 ...))

Notice how all these binding forms are now neatly indented at the same level! Remark that each of the forms is closed before its body ends (sometimes before its body begins), said body being present or completed in the next form. And see how it also works for forms that are not binding forms, but still provide context for the body inside, such as the when forms. And note that if you didn't want your when form to wrap around all the rest, but simply to contribute a side-effect before the rest is evaluated, then you might want to wrap your when form inside a progn form (in Common Lisp) or begin form (in Scheme), that would provide this sequential behavior. Thus the nest macro works not only with binding forms, but with all kinds of expressions. Better yet, the nest macro even works with forms that are not expressions stricto sensu (though the overall form has to be an expression for the macro to be expanded)! For instance, it will work with case or match clauses:

 (for-each! list1) (lambda (elem1))
 (match elem1) ((list x1 y1 z1 ...))
 (for-each! list2) (lambda (elem2))
 (match elem2) ((list x2 y2 z2 ...))
 (begin (assert (equal x1 x2)) ...)
 (case (foo y1 y2) ((A B C ...) easy-caseA ...) ((D E F ...) easy-caseD ...))
 ((G H I ...) (hard-caseG main-body ...)) ...)

Notice how the (list x1 y1 z1 ...) or the (G H I ...) are not expressions but one a matching pattern and the other a list of cases. Each of their syntactic roles is different from that of normal expressions, and each follows its own distinct grammar. That's some additional expressive power that the nest macro has in Lisp that other more sophisticated macros do not have in Lisp or in any other language. Notice also how this style allows to detach the lambda expressions and their bindings from the rest of their body (which covers the rest of the arguments to the nest macro), and to place the head of these lambda expressions right next to the calling expression that calls them and binds the lambda variables: thus it becomes very clear that elem1 will be bound to each of the elements in list1, or that x2 y2 z2 will be bound to elements of a list that matches the contents of elem2. Meanwhile, the body below each of these forms doesn't have to care what kind of form it is the body of. This is all a beautiful division of labor, with power, expressiveness, brevity, relevance, symmetry, etc. All that for the price of understanding one simple macro.

I like this macro so much that I made it available as UIOP:NEST, as part of UIOP, the Common Lisp "Utilities for Implementation- and OS- Portability", a library is transcluded in ASDF, the build system used by nearly all software written in Common Lisp today. Thus, every contemporary Common Lisp program can assume that this macro is readily available. And ASDF itself is making good use of the macro: the macro shines not just to keep indentation in check, but also in conjunction with Common Lisp reader conditionals, so that some wrapping forms are only used on relevant implementations and not other implementations.

Implementing the nest macro

But just how simple is the nest macro? So simple it's literally a one liner in Common Lisp:

(defmacro nest (&rest r) (reduce (lambda (o i) `(,@o ,i)) r :from-end t))

That is, it is just a right fold (hence the :from-end t) on the list of forms r, to nest each (rightmost) form i into the end of the previous one o.

Now, moving to Scheme, the benefits of nest are about the same, just better, since there are even more higher-order function that call functions; if only we reorganize the order of their arguments so the function comes last (as with the for-each! variant of the standard for-each function above), then we can easily chain together the bindings and bodies of all these forms with the nest macro. And so, soon enough, I wrote a Scheme version of the nest macro. Here was my first, naive, attempt; can you tell what was wrong with it without reading what follows?

(define-syntax nest
  (syntax-rules ()
    ((nest (x ...) y z ...) (x ... (nest y z ...)))
    ((nest x) x)))

This macro uses the simple but limited pattern-matching macro-defining macro syntax-rules. It recurses into each of the forms to nest by inserting itself as the head of successive shorter sub-lists of forms, that will recursively expand it, until a single form is left that is returned as the innermost form with nothing to nest in it. And the problem with this simple macro is... that the recursive inner form (nest y z ...) will only expand this inner nest if it is an expression, i.e. the kind of form that gets evaluated into a value by the evaluator (whether based on an interpreter, compiler, JIT, or whatever else), and corresponds to a single non-terminal of the language grammar. Therefore, the macro won't work when (y z ...) is a case or match clause, a type level expression, or anything but a normal expression (I was tempted to say regular expression or normal form, but these are terms of art with their own entrenched meaning). And so began my quest for a correct implementation of nest.

The difficulty here is that you really want to fold-right, starting with the inner form and bubbling up inside each consecutive outer form; but what was trivial to express recursively, yet not quite correct, was that fold-left above, assuming each nested form was an expression. My first correct solution used two macros as follows:

(define-syntax Rnest
  (syntax-rules ()
    ;;((_ () ()) ()) ;; This case is an error, actually
    ((_ (rev ...) (one more ...)) (Rnest (one rev ...) (more ...))) ;; reverse the outer form list
    ((_ (x (y ...) z ...) ()) (Rnest ((y ... x) z ...) ())) ;; recursively nest
    ((_ (x) ()) x))) ;; return
(define-syntax nest
  (syntax-rules ()
    ((_ x ...) (Rnest () (x ...)))))

The Rnest macro that does the job of nesting the forms, in two phases: first by reversing the list by accumulating its elements one by one into an accumulator list; and second by folding left on the accumulated list. Each step is tail recursive so the macro always remains in control of the expansion until the end, without having to rely on any sub-form to itself be an expression that partakes in the expansion protocol (syntax-rules, like maybe all but one macro systems, only allows macro-expansion for one kind of grammatical non-terminal, the expression; the only exception I know to this rule is Racket's syntax/parse, where multiple kind of grammatical non-terminals each can have their own macro extensions). The reverse step will be familiar to anyone who ever tried to prove correct an implementation of reverse in e.g. Coq; or to prove correct an implementation of append, which often involves the append-reverse function.

Now, exposing the binding to the Rnest macro above isn't very hygienic. Ideally, you'd like to have Rnest itself be lexically defined, such that it is seen by nest but not anything else. Here is how I eventually did it, after someone tipped me that (... ...) was the proper way of quoting the ellipsis ... so it could be present in nested macros:

(define-syntax nest
  (syntax-rules ()
    ((_ x ...)
           (syntax-rules ()
             ((_ (xx (... ...)) (y z (... ...))) (r (y xx (... ...)) (z (... ...)))) ;; reverse the list
             ((_ (xx (y (... ...)) z (... ...)) ()) (r ((y (... ...) xx) z (... ...)) ())) ;; nest
             ((_ (xx) ()) xx)))) ;; bottom case
       (r () (x ...))))))

Note how it's essentially the same macro as previously, except for some ugly renamings: the internal version of Rnest is just called r, but its ellipsis has to be quoted as (... ...), and its variable x has to be renamed xx not to clash with the outer x that further demands to be used with an ellipsis (unless I suppose you quote it everywhere as (... x)). So, this hygienic version of nest using syntax-rules works but is particularly ugly. That said, it is not quite as ugly as what I had to go through before I was told how to quote the ellipsis...

Without quoting the ellipsis, you can still use syntax-rules to define the nest macro, but now you have to get creative, and use tail-recursion only with continuation-passing style so that all your transformations are done without requiring expansion from any subform, none of which might be an expression. That's where we actually use this tail-recursive append-reverse macro rev-app (in the body of the macro, at the bottom of the definition); its continuation will be calling the left-folding macro nest-rev that nests the reversed list of forms. It's all straightforward if you know continuation-passing style applied to macros (or even just to functions in general; macros being just source-transforming functions):

(define-syntax nest
  (syntax-rules ()
    ((_ . forms)
           (syntax-rules ()
             ((_ . args) (error "nest error" 'args))))
          (rev-app ;; k ctx lst acc ==> k ctx ,(append (reverse lst) acc))
           (syntax-rules ()
             ((_ k ctx (hd . tl) rev) (rev-app k ctx tl (hd . rev)))
             ((_ k ctx () rev) (k ctx rev))
             ((_ k ctx x ()) (k ctx x))))
          (app ;; k ctx l1 l2 ==> (k ctx ,(append l1 l2))
           (syntax-rules ()
             ((_ k ctx l1 l2) (rev-app app-ret (k ctx l2) l1 ()))))
          (app-ret ;; (k ctx l2) rev-l1 ==> (k ctx ,(append (reverse rev-l1) l2))
           (syntax-rules ()
             ((_ (k ctx l2) revl1) (rev-app k ctx revl1 l2))))
          (nest-rev ;; given the reverse list of forms, setup the recursion
           (syntax-rules ()
             ((_ () ()) (nest-error))
             ((_ () (final . more)) (nest-rev2 more final))))
          (nest-rev2 ;; recurse
           (syntax-rules ()
             ((_ () done) done)
             ((_ (form . more) done) (app nest-rev2 more form (done))))))
       (rev-app nest-rev () forms ())))))

However, straightforward or not, this macro CPS work is extremely tedious; and if you want to do non-trivial processing in this style, you'll have to develop a library of macros in continuation-passing style to mirror each of the list-transforming functions you might have wanted to use if only you had the full power of the language while meta-programming. And then debugging meta-programs written this way will be atrocious, lacking adequate debugging support from your regular tools (the only exception here being Racket, that sports dedicated support for debugging macros). This horrible situation of having a braindamaged meta-programming language completely disconnected from your base language in which you must reinvent all data structures and libraries from scratch, without proper tooling, of course is remindful of template metaprogramming in C++, which, dreadful as it is, is still one of the more powerful of blub languages with respect to metaprogramming (then there is compile-time reflection in Java, but by the time you've handled all the boilerplate to do the simplest of things, you'll either have forgotten why you were doing it in the first place, or will have committed suicide in disgust — unless you embrace the Greenspunning and re-create Clojure or such).

Of course, if you're willing to assume that your nesting level will still remain small, and that macro-expansion of nested forms won't be a significant drag on your compilation time, then you could use this much simpler version that uses an O(n²) algorithm instead of an O(n) algorithm, by expressing your fold-right in a more direct though less efficient way (implementation courtesy of gwatt on IRC #Scheme):

(define-syntax nest
  (syntax-rules ()
    ((nest x) x)
    ((nest x ... (y ...) z) (nest x ... (y ... z)))))

Now, all this may have finished convincing you that while syntax-rules makes it simple to write simple macros, it might not be the best tool to write more elaborate macros that do not fit its simplistic assumptions. It is then time to unleash a more powerful macro-defining macro, syntax-case. Syntax-case, like syntax-rules, is hygienic, in that it tracks source location and naming contexts, so that you don't have to do it manually and carefully insert gensym everywhere; but unlike syntax-rules, it is not limited to a simple pattern language, it allows for metaprogramming using the very same language. Here is a straightforward version of the nest macro using syntax-case:

(define-syntax (nest stx)
  (syntax-case stx ()
    ((nest . xs)
     (let loop ((forms (syntax->list #'xs)))
        ((null? forms) #'xs)
        ((null? (cdr forms)) (car forms))
        (else #`(#,@(syntax->list (car forms)) #,(loop (cdr forms)))))))))

The above loop follows the same naive approach as we were initially trying to use with syntax-rules: it manually does a left fold on the list of forms; unlike the syntax-rules version, though, it works even if the forms are not expressions, because we recurse directly inside the macro-expanding function, rather than by hoping that the next form will be an expression that recursively macro-expands. Recursion is much easier and nicer to use with syntax-case, because you have the full power of your language as a meta-language, instead of an ad hoc term-rewrite engine.

Now, if the #`(#,@ characters looked like line noise to you, they were quasisyntax and unsyntax-splicing, the syntax-case analogue to quasiquote and unquote-splicing that you use with Common Lisp style macros. But if quasisyntax is alien to you or unimplemented in your Scheme, you can also manipulate the syntax directly, using the datum->syntax and syntax->list primitives:

(define-syntax (nest stx)
  (syntax-case stx ()
    ((nest . xs)
     (let loop ((forms (syntax->list #'xs)))
        ((null? forms) #'xs)
        ((null? (cdr forms)) (car forms))
        (else (datum->syntax #'nest
                (append (syntax->list (car forms)) (list (loop (cdr forms)))))))))))

Of course, instead of doing the recursion manually, you could explicitly use a left fold, just like the reduce in Common Lisp (there again thanks to gwatt for his help):

(define-syntax (nest stx)
  (syntax-case stx ()
    ((nest outer ... inner)
     (foldl (lambda (o i) #`(#,@o #,i)) #'inner (reverse (syntax->list #'(outer ...)))))))

And there again, we can write the same thing without quasisyntax:

(define-syntax (nest stx)
  (syntax-case stx ()
    ((nest outer ... inner)
     (foldl (lambda (o i) (datum->syntax #'nest `(,@(syntax->list o) ,i)))
            #'inner (reverse (syntax->list #'(outer ...)))))))

And of course we can directly use a right fold, instead of a left fold on the reverse. That's very similar to the Common Lisp macro, just with some extra wrapping and unwrapping to maintain hygiene.

(define-syntax (nest stx)
  (syntax-case stx ()
    ((nest outer ... inner)
     (foldr (lambda (o i) #`(#,@o #,i)) #'inner (syntax->list #'(outer ...))))))

And as always, we can do it without quasisyntax, instead using quasiquote to implicitly express a call to the append function:

(define-syntax (nest stx)
  (syntax-case stx ()
    ((nest outer ... inner)
     (foldr (lambda (o i) (datum->syntax #'nest `(,@(syntax->list o) ,i)))
            #'inner (syntax->list #'(outer ...))))))

Now the macro is so simple, with a single trivial pattern to match, that you could even write the expander directly without syntax-case:

(define-syntax (nest stx)
  (let ((forms (syntax->list stx)))
    (let loop ((more (cdr forms)))
       ((null? more) #'stx)
       ((null? (cdr more)) (car more))
       (else (datum->syntax (car forms) ;; in Racket, stx would do
               (append (syntax->list (car more)) (list (loop (cdr more))))))))))

Also, you could use syntax->datum instead of syntax->list but that would needlessly lose syntax location data on the recursive objects:

(define-syntax (nest stx)
  (syntax-case stx ()
    ((nest . xs)
     (datum->syntax #'nest ;; Gerbil wants an identifier; Racket works well with stx.
       (let ((forms (reverse (syntax->datum #'xs))))
         (let loop ((acc (car forms)) (more (cdr forms)))
           (if (null? more)
             (loop `(,@(car more) ,acc) (cdr more)))))))))

Last but not least, here is the version I actually use in my code, as proposed by vyzo. It uses ellipses to do the appending directly on syntax; to achieve this, it first uses with-syntax to establish a binding between the macro's runtime variable o and i and the macro's compile-time syntax variables outer and inner; outer can then use the ellipsis to express the appending (note that Gerbil unlike Racket does not need the (syntax->list ...) wrapper):

(define-syntax (nest stx)
  (syntax-case stx ()
    ((nest outer ... inner)
     (foldr (lambda (o i)
              (with-syntax (((outer ...) o)
                            (inner i))
                #'(outer ... inner)))
            #'inner (syntax->list #'(outer ...))))))

So there, we've seen a simple macro, nest, how it interestingly trivializes a problem that others tried very hard to solve with extremely elaborate macros, how it can be implemented in three different widely used macro systems, and what are some issues with writing macros in Scheme rather than Lisp — the price you pay for hygiene (mind that just because I do not discuss the benefits does not mean there aren't such very valuable benefits). Note that this macro is pretty much the worst case scenario when translating a Common Lisp macro into a Scheme macro: it doesn't use any gensym so doesn't benefit from any of the hygiene machinery, and its pattern is just a bit off from what is easy with syntax-rules. Yet in the end, it isn't too hard to translate it. Translating it was a good exercise in learning Scheme macro systems.

Tags: #scheme, en, lisp, macros, modularity, programming languages, scheme, software

  • Comparative lessons of French vs US voting processes

    In France, there are always enough polling stations. Schools and town halls are polling stations. More people whose ballots to count? That's…

  • Slave Reparation Racism

    Regarding Slave reparations: About everyone in Europe was a slave (in Latin, "servus" — serf) since Diocletian's socialist "reforms". Emancipation…

  • Emotiology / Émotiologie

    People who call for "unity" as the only imaginable solution to violence thereby admit that they are incapable of imagining non-violence towards…

  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded