François-René Rideau (fare) wrote,
François-René Rideau

Common Lisp as a Scripting Language, 2015 edition

The first computer I used had about 2KB of RAM. The other day, I compiled a 2KB Common Lisp script into a 16MB executable to get its startup (and total execution) time down from 2s to subjectively instantaneous — and that didn't bother me the least, for my current computer has 8GB of working memory and over 100GB of persistent memory. But it did bother me that it didn't bother me, for 16MB was also the memory on the first computer in which I felt I wasn't RAM-starved: I could run an X server, an Emacs editor and a shell terminal simultaneously without swapping! Now an entire comfortable software development universe could be casually wasted over a stupid optimization — that I have to care about because software systems still suck. And to imagine that before sentientkind reaches its malthusian future, code bumming will have become a popular activity again...

Background (skip to the next paragraph if you don't care for hardware war stories): I just returned my many-years old work laptop (a Lenovo Thinkpad X230), because of various hardware issues I was starting to experience: mostly a bad connection with the batteries at times causing the machine to shutdown at the least auspicious moment, in addition to the traditional overheating and the wifi card that often failed to connect requiring the wpa_supplicant daemon to be killed. I liked the Thinkpad form factor a lot, but my employer wasn't offering Thinkpad-s anymore, so I opted instead for a slim HP EliteBook Folio 1040: its form factor is obviously inspired from the macbook air, except it is running a Linux system whereby I was master of my ship. Now, the EliteBook has a touchpad that is particularly bad, even worse than the Thinkpad's in being triggered all the time by my thumb as I type; I decided to disable it immediately, just like I did eventually with the Thinkpad; however, unlike the Thinkpad, the EliteBook doesn't have a "clit" interface to supplement the touchpad. Therefore I had to toggle the touchpad on and off instead of permanently disabling it. A Google search quickly found a shell script to toggle the touchpad, and instructions on how to map Penguin-Space to it (the Penguin is Super, much more so than the Windows it replaces). But the shell script frankly made me puke, and I decided to rewrite it in Lisp, which yielded a very nice program less than 2KB long...

Indeed for several years now, I've been peddling the use of Common Lisp as a scripting language: the combination of syntactic abstraction, higher-order functions and an advanced object system, the relatively simple semantic model allowing for efficient compilation, the robust compilers with decent performance and portability to all platforms that matter, and the support for interactive debugging and structured editing — all put it years ahead of all the other dynamically typed scripting languages in common use (shell, perl, python, ruby, javascript), even though it was initially developed years or decades before them. However, until recently, it was missing a few bits to be usable as a scripting language, and I am proud of having hammered the last few nails on the coffin: zero-configuration in looking for source libraries, zero-management in storing compiled outputs, portable invocation from other programs, portable invocation of other program — if you implement these, you too can make your favorite programming language suitable for "scripting".

Well, I recently added an extra nail to the coffin, that addresses the remaining tradeoff between startup times and memory occupancy: it is now possible to easily share a dumped image between all the scripts you need, to achieve instant startup without massive bloat of either working memory or persistent storage. Admittedly, you could already do it semi-portably on SBCL and CCL using Xach's buildapp; but now you can do it fully portably on all implementations using the cl-launch utility that you would use to invoke the program as a script.

The portable way to write a Common Lisp script is to use cl-launch, typically via a #!/usr/bin/cl script specification line when you're using Unix. However, when launching a script this way, even a relatively simple program can take one to several seconds to start: the Lisp compiler locates and loads all the object files into memory, linking the code into symbol, class and method tables; and this somehow takes a non-negligible amount of time even when the files were precompiled, because compilers were never optimized to make this fast; indeed the typical Lisp hacker only recompiles and reloads one file at a time at his interactive REPL, and doesn't often reload all the files from scratch. By installing ASDF 3.1.4 over the one provided by SBCL using the provided install-asdf.lisp script, and by using the provided cl-source-registry-cache.lisp script to avoid search through my quite large collection of CL source code, I could get the startup time down to around .7 or .8s, but that was still too much. This is fine for computation-intensive and/or long-running programs for which this startup latency doesn't matter. But that makes this solution totally impractical for interactive scripts where execution latency is all-important, as compared to other scripting languages, that while inferior as languages, at least start up instantaneously in subjective time.

/bin/sh or perl execute an empty command in about 5 ms of wall clock time, python in about 18ms (all timing and sizes rough averages and estimates on my current linux x86-64 laptop). Without my portability infrastructure, you can also do the same with sbcl in 10 ms or clisp in 15 ms, but then you lose the portability and are either restricted to not using any software library, or are back in non-portable configuration and compilation hell in addition to having the same slow loading issue. With such startup pause, Common Lisp might remain somewhat suitable to scripting, unlike the vast majority of compiled programming languages, that require a explicit compilation step with non-trivial configuration of source and object files; still it finds itself unsuitable for producing scripts destined for use as instantaneous interactive commands outside its own autistic interactive development environment.

Now, all serious Common Lisp implementations also allow you to dump a memory image, with all the code already loaded and linked, and such images start quite fast, about 20 ms for a fully loaded image on sbcl, about 35 ms on clisp; and you can portably dump an image using my cl-launch utility by just adding --output /path/to/executable --dump ! to the very same command you'd use to start a script. Thus, at the expense of an extra but trivial build step that takes many seconds once, you can portably transform your slow-starting scripts into a precompiled executable, that will have startup time competitive with other scripting language, and efficiency competitive with other compiled languages.

The problem is that such an image has a significant overhead in terms of space: an empty cl-launch program has an image of size 13MB with CLISP, 28MB on CCL, or 52MB on SBCL (which isn't that bad when you consider this contains the entire compiler and basic libraries — GCC is bigger than that!); an image with all the code I want loaded takes 27MB on CLISP, 50MB on CCL, 82MB on SBCL. A poly-deca-megabyte image file is no big deal. The biggest of these images is 1% of the memory of my laptop. So, by today's standards, it's a small additive overhead. But if you need one image per script, then 80MB of memory to execute a 2KB script is a multiplicative factor 40000 in memory waste — and that is not acceptable if like me you want to replace lots of small shell scripts with Common Lisp code. Compare that to the incremental space expenditure for each additional 1KB of scripting code, which is typically between 1KB and 10KB of additional size for the image, a reasonable factor of 1 to 10. This suggests an obvious solution: to share the image-dumping expenditure between all your CL scripts, so the space overhead is back to a negligible additive overhead and reasonable multiplicative factor, instead of being an outrageous multiplicative factor.

busybox made popular the old concept of a multi-call binary: a same executable binary program that when executed behaves differently based on what name the program was called, such that by using multiple symbolic links (or hardlinks) to the same program, you can replace multiple different binaries with a single one, benefitting from both the sharing effects of dynamic linking and the optimizations of static linking. The same can be done for Common Lisp code. Xach's buildapp already let you do that on SBCL then on CCL using its option --dispatched-entry. I just enriched cl-launch 4.1.2 to support the very same interface on all its 12+ supported implementations (well, the same interface, modulo a different treatment of corner cases). Now, I already share the same executable for 7 personal use scripts, and will only use CL for new scripts while slowly migrating all my old scripts. [November 2015 update: now 44 personal scripts in a 95MB SBCL image that starts in 16ms]

The feature was a hundred lines of code total, including comments, documentation and a new cl-launch-dispatch.asd file; the Lisp support for this feature is only loaded on-demand if you use --dispatched-entry, at which point it is marginally free to load a tiny additional ASDF system. I love how Common Lisp lets me implement this feature in such a modular way. Here is the documentation:

  • If option -DE --dispatch-entry is used, then the next argument must follow the format NAME/ENTRY, where NAME is a name that the program may be invoked as (the basename of the uiop:argv0 argument), and ENTRY is a function to be invoked as if by --entry when that is the case. Support for option -DE --dispatch-entry is delegated to a dispatch library, distributed with cl-launch but not part of cl-launch itself, by
    1. registering a dependency on the dispatch library as if --system cl-launch-dispatch had been specified (if not already)
    2. if neither --restart nor --entry was specified yet, registering a default entry function as if by --entry cl-launch-dispatch:dispatch-entry.
    3. registering an init-form that registers the dispatch entry as if (cl-launch-dispatch:register-name/entry "NAME/ENTRY" :PACKAGE) had been specified where PACKAGE is the current package. See the documentation of said library for further details.

    Now, this is a great workaround, but doesn't fully solve the original issue. To completely solve it, an obvious strategy would be for some implementation to radically optimize loading of compiled objects (so called FASL files, for FASt Loading, which some jest should be renamed SLOw Loading), so it becomes actually fast. For instance, the compiler could produce a prelinked object that optimistically assumes it knows the load address, that there will be no conflict in symbol tables, class and method definitions, etc., and at runtime patches only a minimal set of pointers in the usual case. Doing it for 12+ implementations is not doable, but only one suffices, say SBCL or CCL. Alternatively, an "incremental image" feature might do, whereby one could dump all the symbols in some set of packages and not others, with associated functions, classes, etc.; it would require minor change in programmers' habits, though, so is less likely to happen. But any such complete solution will require hacking into the guts of a CL implementation, and that's no small undertaking.

    Assuming we are not going to improve the underlying implementations, a more long-winded "solution" might be to extend the workaround until it becomes a solution: enabling the automatic sharing of executables between all the programs that matter. The old Common-Lisp-Controller from Debian could be resurrected, to create shared images and/or shared executables for software installed by the system's package manager; a similar mechanism could declaratively manage all the programs of a given user (possibly layered on top of the above when available). This might require some tweaks to ASDF so that it doesn't try to build pre-built software from system-managed directories using system-managed implementations, but compiles the usual way when there is a user-specified upgrade, the software wasn't built, or the implementation isn't system-managed. Importantly, there must not be an insecure writeable system-wide FASL cache. (i.e. reverting to per-user cache when any write access is required, or somehow talking to a trusted daemon to compile trusted sources with trusted compilers). This workaround through system management is somewhat ugly, though.

    Note that these issues do not affect Common Lisp developers who run the functionality provided by these scripts from the Common Lisp REPL; they can already do that. It only affect users who run the functionality from these scripts from the shell command line or some other external non-Lisp programs. To a Common Lisp developer who needs such a use case, the solution to these issues is now trivial thanks to this new cl-launch feature. But these issues do make it hard for people to publish scripts that will "just work" for end-users — an end-user being someone who shan't be required to manage an installation or configuration step. These end-users will have to either suffer a multi-second pause at startup, or be burdened with a poly-deca-mega-byte executable for every script or set of related scripts they use. And so, the temporary conclusion is that while Common Lisp is in many ways far ahead of competition with respect to being a low-overhead "scripting language", it does at the moment have an issue putting it at a disadvantage against this competition in one crucial way with respect to deployment to end-users.

    PS: Examples available in my github repo, with some common functionality in my repo

    Tags: en, lisp, scripting

    • Siddhartha, nihilistic fantasy of the wealthy

      Yesterday, I watched Siddhartha (1972), a beautifully made movie based on Hermann Hesse’s novel. The protagonist, contemporaneous and homonymous…

    • Tokyo Chorus (1931)

      Tonight I somehow watched "Tokyo Chorus", a 1931 silent movie by Yasujirō Ozu. It tells the struggle of a middle class man to do what's right in…

    • MYOB, Abortion edition

      The foundation for the right of mothers to abort: MYOB. Mind Your Own (Goddamn) Business. The only acceptable justice is retributive justice.…

    • Post a new comment


      Anonymous comments are disabled in this journal

      default userpic

      Your reply will be screened

      Your IP address will be recorded