September 15th, 2008

eyes black and white

Metaprogramming from the ground up: avoid C

Long ago, assembly languages were endowed with expressive macroprocessing facilities. But it still sucked to write in non-portable languages with incompatible proprietary such metalanguages. And thus, I wanted to metaprogram in something reasonably portable, which at the time pretty much meant C. The first obvious choice was to look at what the standard C pre-processor, CPP offered.

So as to prevent people from shooting themselves in the foot, the language designers made sure the macro expansion algorithm would terminate, by disabling recursion on already-expanded tokens in expressions where the token was already processed. Some people, notably hbaker, thought of using #include as a recursion mechanism. Unhappily, this isn't enough, because you cannot store infinite state in a CPP program: there is a finite number of variable-setting clauses in a program, each to a fixed variable known at compile-time, which leads to a finite number of variables usable in tests. There is a finite number number of test statements, each combining into a boolean a finite number of variables, in a computation restricted to operators of modular arithmetics; variables being expanded as lexical text can actually expand to something that has more combinations than a fixed-precision integer, but up to the reduction to some arithmetic operations, there is still but a finite number of observable grammatical states that a variable can take.

All in all, it is impossible to write a useful non-trivial metaprogram in CPP. But that doesn't mean it is impossible to write trivial harmful metaprograms in CPP, as is easily demonstrated in my counter-example die_die_stupid_c_compiler.c. So CPP is but one more example of a fascist bondage and discipline meta-language. At the same time, the C++ language was slowly extending itself with a meta-programming system, its template language, that soon enough became weakly "Turing equivalent", and allowed wizards to write metaprograms to do all kind of wonderful things. Except that this metalanguage was a pure functional language completely disconnected from C++ itself, extremely hard to debug -- you pretty much have re-develop all meta-level libraries from scratch in a completely new language to do non-trivial metaprograms, and cannot reuse libraries across language levels to bootstrap new functionality. Yet another misguided design.

The correct approach was OpenC++, that provides metaprogramming in the same recursively-bootstrappable meta-language as C++. But by the time you get there, you understand that C++ is not a language you want to use for metaprogramming anyway. Like Perl, C++ is a swiss army chainsaw of a programming language. Unlike Perl, it's got all the blades simultaneously and permanently cast in a fixed half-open position. Don't turn it on. If you want to metaprogram a language in itself, you'll do yourself a favor by instead choosing Lisp, OCaml, Oz, Haskell, Erlang, or any other HOT language.

As for C, it's rather bad as a portable assembly language, as it doesn't handle continuations, multiple-value returns, doesn't allow you precise access to the memory model and temporary variables as required for precise garbage collection, etc. I will spare you my pitiful attempts at metaprogramming it with m4 (don't try it -- m4 really sucks, being better than CPP is a very low bar; however you may look at ThisForth for a relative success at using it). Tom Lord did interesting things with the hard part of metaprogramming: not just generative but analytic, too. He achieved the automatic verification of some GC invariants in the C layer of his Scheme implementation -- and that convinced me that even when done the best possible way, metaprogramming C still sucks: the CPP layer makes it hard to reason about actual source, and the language has lots of arbitrary semantics that make it hard to reason about where side effects happen unless you have intimate knowledge of the compiler, but there is no way to access this knowledge unless you re-write your own compiler at which point, why choose C?

These days, LLVM seems to be the main thing as far as a portable low-level language target for metaprogramming goes (unless you join the dark side and drink the .NET kool-aid). And if you don't care as much for mainstream and portability, you could try to go the way of Factor or your own COLA and build a system from the ground up around sound metaprogramming principles.