Log in

No account? Create an account
eyes black and white

EVAL-WHEN considered harmful to your mental health

In my adventures in the building of Common Lisp software, I have had to deal more than I wish I had with something that is largely misunderstood, because it is completely crazy: EVAL-WHEN. In the hope that the loss of my sanity might be redeemed however fractionally by the slightest enlightment of my better, I thought I would write up my conclusions. Dumping the topic on paper will also hopefully allow me to empty my mind from the sheer horror.

Common Lisp features the notion that processing the code from source text to effective meaning happens in many stages that may or may not be interleaved or distinguished: read-time, macro-expansion-time, compile-time, load-time, execution-time. EVAL-WHEN allows you to control up to a point what gets evaluated in the latter three "times".

The interleaving between the evaluation stages comes from the fact that code processing happens one form at a time, with the result of each stage being passed on to the next stage before the next form is read. This allows for any side-effect by the processor to be effective by the time the next form is processed. However, depending on whether you're typing code at the REPL, LOADing a Lisp source file, COMPILE-FILEing it then LOADing a compiled file, EVALing or COMPILEing a form, then which of compile-time, load-time and execution-time happens, which is deferred, which doesn't happen, and how they interleave with each other and with macro-expansion-time may vary wildly.

Read-time always happens first. The Lisp processor reads a stream of text (an unfortunate choice of specification, INTERLISP and Smalltalk fans may tell you) and dumps CONS cells to represent Lisp code in S-Expression forms (an unfortunate choice of representation, PLT Scheme fans will tell you). The *READTABLE*, through its macro-characters and its dispatch-macro-characters such as #., allows you to specify code that is evaluated at read-time, which actually enables you (though in a painful way) to locally or globally compile code that uses any syntax of your choice (see how kpreid does it for E).

Macro-expansion time happens second, separately on each toplevel form read by the reader at read-time. Macro-expansion is specified thanks to DEFMACRO, DEFINE-SYMBOL-MACRO and their lexical variants MACROLET and SYMBOL-MACROLET, as well as the awesome DEFINE-SETF-EXPANDER (and friend DEFINE-MODIFY-MACRO), but also by the tricky DEFINE-COMPILE-MACRO and the not-so-great *MACROEXPAND-HOOK*. Common Lisp macros are at the same time all-powerful (being able to transform any source code into any other source code) and all-ignorant (being unable to track the surrounding lexical context or original source location, and limited in how they may analyze the enclosed lexical content short of reimplementing a full code walker for CL and any implementation-defined extension that you or the libraries you use might depend on). This causes minor but nagging issues with lack of hygiene, difficulty to trace errors, but more importantly, it makes it impossible to write non-local transformations in a modular way, short of reinventing a whole new system to replace the current one or somehow sit on top of it (but then one might find that because the *MACROEXPAND-HOOK* isn't called on special forms and normal syntax, one needs to shadow a lot of standard forms and/or hack the reader altogether if one wants to intercept everything without requiring the user to wrap each form in a magic wrapper).

Then the processor goes through some stages of evaluation: either compilation that may or may not be followed by loading, or direct execution that may or may not involve some kind of compilation. Whichever stages happen then may or may not be themselves interleaved with the macro-expansion of the given form itself: it is allowed for the implementation to start compiling or executing part of a form while the rest of it isn't macroexpanded, or to fully macroexpand before it compiles (though full macroexpansion does require at least some level of code-walking and lexical scope analysis to distinguish actual macros from other forms that happen to use the same symbols as macros).

If you are typing code at the REPL, LOADing a Lisp source file, or EVALuating some form, then the execution-time happens and only it happens, with all its side-effects. Any EVAL-WHEN turns into a PROGN if it contains the :EXECUTE clause, otherwise into NIL. Note that this may or may not be interleaved with macro-expansion, so that for instance, SBCL may start evaluating (when nil (foo)) and reduce it to NIL without ever expanding the macro (foo), so that if you were expecting this expansion to happen and cause side-effects, you'll be surprised to find it won't (as we were when testing ASDF-DEPENDENCY-GROVEL).

If you are calling COMPILE on some code, said code is also to be run in a future :EXECUTE-time and EVAL-WHEN behaves as above. Because the code to be compiled has to be a function (named or anonymous), there is no toplevel, and :COMPILE-TOPLEVEL or :LOAD-TOPLEVEL clauses are irrelevant and ignored. If I understand correctly, the compiler is also allowed to not expand macros when it can statically prove that they are in an unreachable piece of code; however, in practice, the way compilers are typically written in passes, macros are often fully expanded by the time any advanced analysis happens based on which such dead code could be eliminated.

Where EVAL-WHEN behaves differently is when you are using COMPILE-FILE and when you LOAD the result of a FASL resulting from such a COMPILE-FILE. In these cases, each toplevel form after expansion (or interleaved with it) is processed in a way that separates some effects that happen at compile-time and effects that are to happen at load-time. The :COMPILE-TOPLEVEL clause of an EVAL-WHEN indicates that all enclosed effects are to happen at compile-time (i.e. they happen in the current image, and they are also dumped in the CFASL if you use them in SBCL or later, so they may happen when each time said CFASL is loaded in this image or a future one). The :LOAD-TOPLEVEL clause of an EVAL-WHEN indicates that all enclosed effects are to happen at load-time (i.e. they are dumped in the FASL and will happen each time said FASL is LOADed, in this image or a future one, but they do not happen in the current image unless the :COMPILE-TOPLEVEL clause was also specified). Some special forms have effects at both compile-time and load-time, such as IN-PACKAGE that changes the current package at both times, or DEFVAR that will declare the variable special at compile-time in addition to declaring it and optionally setting it at load-time, etc. The :EXECUTE clause is ignored at the toplevel (but it and only it remains meaningful in non-toplevel subforms).

In practice, — and if you only need to remember one thing about EVAL-WHEN, this is it, — only three combinations are safe, and then again, only one is useful: (EVAL-WHEN (:COMPILE-TOPLEVEL :LOAD-TOPLEVEL :EXECUTE) ...) should wrap things that need be available in the compilation environment as well as the target environment, such as the functions, variables and side-effects used by macros themselves; it is the only combination that is both useful and safe.

(:LOAD-TOPLEVEL :EXECUTE) is safe, but it is the implicit default around any toplevel form and thus you never need to specify it explicitly, except possibly to restrict the effects of a surrounding EVAL-WHEN when used inside a macro that may expand to such thing (which would be poor style).

(:COMPILE-TOPLEVEL :EXECUTE) is the last safe combination, but its utility is very restricted. It is to be used for side-effects you only want in the compilation environment, most notably local modification of the readtable. Now, such a side effect will happen when you originally compile the file, and unless to some locally bound symbol like the *READTABLE*, it will persist in the session where you compile but not further sessions that are restarted from already compiled FASLs, causing non-deterministic havoc in incremental ASDF builds. Note that modifications to the readtable object itself (as opposed to a copy of it) will indeed be such "maybe escaping, maybe not" side effects; if you're going to customize the readtable, I recommend you spare yourself some aggravation and use named-readtables. File-system access to compute anything non-deterministic will similarly be very bad taste (If deterministic, you may want to do it at read-time, and if not, you may want to defer it until image-dump-time). One redeeming context for the use for such combination for lasting side-effects is when you use XCVB while enabling SBCL's CFASL mechanism (present since SBCL, which isn't portable to other implementations (yet), but will guarantee that the enclosed side-effects will be replayed before compiling each file that depends on yours, and none other. To summarize, this combination, while safe, is for very special restricted usage patterns. Unless you're an expert, don't try this combination; and don't even think of using other combinations.

Indeed all other combinations are bogus, except imaginably in the guts of some fairly low-level optimization macro, because there will always be cases when they won't do what you think, depending on how the source file is being processed. i.e. the user may depending on his needs either LOAD the source or COMPILE-FILE it then LOAD of the FASL, or he may only LOAD the FASL in a new image to incrementally re-compile a previously compiled system with ASDF, and your code should behave reasonably well and mean the same thing in all these cases.

Finally, when a FASL or CFASL file is loaded, the dumped side-effects happen: symbols are interned, LOAD-TIME-VALUEs are computed, function, variables and macros are defined, any toplevel side-effect happens, etc. Note that read-time and macro-expansion time were considered neither compile-time nor load-time side-effects and thus should not be present in either CFASL or FASL. This is actually a feature, as it allows one to use side effects while reading or macro-expanding, without the same side-effects having to be dumped then later replayed at load-time. Indeed they won't be dumped with SBCL and other sane implementations, though there might or might not conceivably exist crazier implementations. However, if there are side-effects you do at macro-expansion-time, you want to persist and be replayed at compile-time-time (when using CFASLs) or load-time (when using FASLs), then your macros should typically expand to code that includes the same side-effects at compile-time and/or load-time through the proper use of EVAL-WHEN, in addition to (or in replacement to) said code being evaluated at macro-expansion-time (if needed). While migrating a big project from ASDF to XCVB, I have notably had to debug a macro that was using (EVAL (DEFCLASS ...)) and a FINALIZE-INHERITANCE at macroexpansion time so as to be able to invoke the MOP and query the parsed class, but was failing to include the DEFCLASS in the expansion, thus working in a build from clean, but causing failures when building incrementally from ASDF, or when building deterministically from XCVB, as some macro-expansions in further files were expecting the class to have been defined and its inheritance finalized, which they would not be when loading from FASLs.

In conclusion, EVAL-WHEN is a tool that makes it easy to shoot yourself in the foot, but only has one legitimate usage (two if using XCVB). You should understand the few but important cases when you need it, ergo for functions and variables that will be used at macroexpansion-time. My recommendation, though, if you like to do non-trivial metaprogramming, would be to avoid the primitive madness of Common Lisp, and use a modern language where staged compilation has some well-defined semantics, such as the PLT Scheme module language, OCaml's camlp4, and other systems that preprocess files deterministically. PLT macros have the additional advantages of being hygienic, and playing well with source locations for the debugger and other tools, etc.



One wish

Hey Faré,
Long time no see.
While I really enjoy your CL rants, I always wonder why you're still using it then ;-)
I think I could really enjoy a post about you explaining this little inconsistency.


Re: One wish

I'm being paid to use it!

But yes, I'm starting to get a bit tired of it.


Re: One wish

Nothing special - me too.

So no more "ITA Software, a fine employer of Lisp hackers (disclaimer: I work there)" from you? ;-)

I sympathize with your wish for a more sane IT world - but don't really feel much of this when working with CL. EVAL-WHEN never occured me as a fundamental problem and I somewhat fail to follow what you are actually arguing for (or against). My guess is (please correct me if I'm wrong) that you have some strict ideas about how XCVB should work, which collide with the state how CL works. It seemed that you found it to be bad that CL defines protocols to access the different stages of compilation and execution (or that this stages are visible). A special kind of purist lispers would now object to the notion "protocols" and call them "just some primitive hooks". CL is not a good language for a purist's mind; I think growing into CL will naturally lead to a point where the question arises, why this thing is done in that and not in another way. Sometimes the answer is "because its better", often it is "because it doesn't matter", but there are even cases were one can argue "because they didn't know better". Faré - you're a bright guy and even if *I* sometimes get the feeling you overcomplexify things - I still think your input would be very welcome in process leading to solid progress toward a CLTL3.



(No, nothing else needs to be said.)
I've always found EVAL-WHEN to be some mystical element of CL. Lots of parts of CL feel hackish, but I think EVAL-WHEN is the worst. The complication of your explanation is a testament to the confusion it causes.


only :COMPILE-TOPLEVEL also useful for declarations too

Like in (eval-when (:compile-toplevel) (declaim (optimize speed)))

Re: only :COMPILE-TOPLEVEL also useful for declarations too

You probably really mean (:COMPILE-TOPLEVEL :EXECUTE) here. Otherwise, LOADing the source won't take the declaration into account.

Oh yes, no clause at all (EVAL-WHEN () ...) is also a safe but wholly useless combination.


Re: only :COMPILE-TOPLEVEL also useful for declarations too

According the the hyperspec, :execute is not relevant for a top-level eval-when where :compile-toplevel is specified.

See the table at point 5 of http://www.franz.com/support/documentation/8.1/ansicl/subsubse/processi.htm


Re: only :COMPILE-TOPLEVEL also useful for declarations too

I never LOAD sources.

"I never LOAD sources"

Maybe YOU don't, but someone else will. That is, whoever will try to convert your code from ASDF to XCVB using ASDF-DEPENDENCY-GROVEL.

An Emulator Design Pattern

User redline6561 referenced to your post from An Emulator Design Pattern saying: [...] is a bit hairy, particularly the EVAL-WHEN [...]

Making eval-when redundant

Just some aside note: sometimes splitting the file in two or more pieces makes eval-when redundant. E.g. if function f1 is used by macro m2 which is in turn used in function f3, I split my asdf system into f1.lisp, m2.lisp and f3.lisp and use :serial t (in fact I always use :serial t with asdf). Two files can be sufficient in this situation, too.

Re: Making eval-when redundant

Indeed. I was going to retort that this trick doesn't work if you only load cfasl's before you compile something. But it's not like anyone has used XCVB for years, and these days Bazel uses loading from source with a specially tuned interpreter to be even faster than using cfasls would be.

Re: Making eval-when redundant

I never used XCVB, but at a glance, splitting the system and using :build-depends-on should work there. However it can impose even harder burden than eval-when :)

My take on the right direction to parallelising ASDF

I had been always thinking we should not rely on hand-coded dependency management of compiling/loading files. The dependencies should be discovered automatically. (I don't discuss system-level dependencies at the point, only file-level, but the discussion below might extend to them.)

As you mentioned, CL has various processing phases and allows implicit dependencies caused by the side effects. This means that the static-analysis based approaches are doomed.

I rather think of a trial-and-error based, systematic approach for searching the (sequential or parallel) plan to process files. It should save the successful plan to a cache inside the directory (or within the asdf definition by overwriting the file). The result is easy to be distributed via quicklisp.

The search should start from the most common `:serial t` mode and refine the partial order plan by merging some operations. Each trial should be run in a separate process since compiling and loading files alters the lisp image. The search should be anytime i.e. you can terminate the search at any time you feel satisfied and obtain the current best plan.

The files need no declaration nor additional manual tagging that specifies the dependency. Given the implicit side effects, I believe this is the only feasible approach to achieving a parallel build system in CL. In fact, parallelism always implies non-determinism, we should accept it. There are also various methods for reducing the failure rate. You can also reduce the parallelism to achieve more robustness.
eyes black and white

October 2017



Powered by LiveJournal.com