You are viewing fare

eyes black and white

Who's responsible for that moving part?

Common Lisp pathnames have long been a source of frustration for users and implementers alike. I'll argue that the deep reason for this frustration is that the Common Lisp standardization process resulted in declaring as fixed an interface to a moving functionality, preventing users' needs to be addressed with no one in charge of addressing the discrepancy.

The Common Lisp standard was built in the 1980's by gathering language experts, most of them representing implementation vendors, describing what was common practice, bringing some uniformization where there were arbitrary differences, and cleaning up the easy or obvious problems. Any irreconciliable differences were simply left as unspecified implementation dependencies.

What that means for the programmers is that if they restrict themselves to those behaviors that are specified in the standard or a widely accepted extension of it, they can write programs that are portable from one implementation to other existing and future implementations. If they don't care too much about portability, then the standard is a wholly irrelevant document to them; they are better off reading directly the manual for the particular implementation they are using. But if they do, it is important that they avoid unspecified non-portable situations.

Most of what the language standard describes is basic tools to build algorithms; this is a somewhat self-contained field, and it allows for arbitrary decisions that become correct by fiat. Should (CAR NIL) return NIL, signal an error, or remain unspecified? Common Lisp says "return NIL", and tries to make other choices consistently with that decision; it could have gone another way (and does, in other languages), but there's not much point arguing about it. On the one hand, it allows for a lots of programming puns; and Common Lispers enjoy punning. As Henry Baker puts it, "[A] Computer [programming] language is inherently a pun -- [it] needs to be interpreted by both men & machines." On the other hand, it makes it harder to distinguish meaningful programs from nonsense; but Common Lispers don't care to make that easy, which isn't today's issue. One could also reproach Common Lisp not being declarative enough, but most programming languages are based on operational semantics anyway, so this is also not to be debated today.

What matters today is that some parts of the programming language are not an arbitrary choice left to the language designer. They represent the need that programmers have to designate programming concepts that describe things outside the program being written: interfaces to the rest of the world, I/O devices and the routines that drive them, programming conventions and the libraries that implement them. That such interfaces are needed is an unescapable fact of reality; yet that fact is often blanked out in discussions about programming language design: interfaces to legacy systems, foreign-functions, existing libraries, are often thrown in after-the-fact without much thought, careful design, or understanding of their impact on the formal properties otherwise claimed to be achieved by using the designed programming language.

If all the software in the world were to be written in said language, on top of the functionality provided by the standard, then indeed it wouldn't be necessary to interface with other software, for there would be no such other software (but it would be all the more necessary to have a good way to program in a modular way in said language). People who try to program everything in an autarkic world may ignore the notion of interface, at the cost of having to mind all the possible issues ever minded by anyone else whose code they could have used but have to do without, and doing away with features they don't have time to code themselves. But for most programmers, and every day more so as code gets written to handle an exponentially increasing number of things that one may want to handle, ignoring other software is not an option. And so it is important that their programming language should handle interfaces to foreign (and native) functionality.

Now, file system access is precisely an example of such foreign functionality, as the Common Lisp standard never claimed to encompass operating systems, to specify the precise semantics of how data persists, or any such thing. Quite the contrary, the standard tried to accommodate to the fact that Lisp programs would run under such vastly different environments as Unix, Genera, VMS, MSDOS, Windows, Multics, embedded systems, and any future operating system. The goal is all good and fine; but the bone of contention lies in how to properly implement this goal.

The Common Lisp standard contains three chapters specifying in great detail the structure of pathnames used to designate files, the semantics of opening a file, the operations permitted, etc., yet at the same time leaving plenty of leeway for implementations to match or not match the respective underlying operating systems.

Because the standard only specifies a "greatest common denominator", programmers who strictly adhere to the discipline of only using portable behavior find themselves unable to use any but the most basic filesystem functionality available to users of other programming languages. Because the specified functionality is relatively abstract, there is a gap between what the underlying operating system provides and what the language implementation exposes; language implementers have to spend a lot of resources bridging that gap one way; and language users who accept implementation-specific extensions to go beyond what the standard offers have to spend just as many resources bridging that gap the other way around when they want to interact with both the underlying system and the standardized services. Language implementers and users are thus pitted ones against the others in a huge waste of resources. Because there is a degree of arbitrariness in the way the gap is bridged, multiple language implementations on top of the same operating system are not interoperable and the work of abstraction and reversion has to be done twice for each pair of language implementation and operating system.

The Common Lisp standard includes a lot of things that don't mean much in modern operating systems, such as file versioning (people nowadays choose from a wide range of versioning tools available in user-space, that have much more functionality, in a much more flexible way, than what filesystems used to provide in a hardcoded way), a now murky division of pathnames in host, device, directory, name and type that non-portably gets in the way of handling actual pathnames (each component may or may not make sense depending on the system), or a mostly useless crippled "logical pathname" facility (the only portable use of which lies in abstracting somewhat the location of source code in Lisp-only projects), or some semi-useful "wildcard" mechanism. All of it both over-specified in the concrete representation of pathnames and under-specified as far as the syntax, semantics or extensibility mechanisms are concerned (which for the portable user equates "specified to be unusable"). The same standard crucially lacks support for binary filenames that don't obviously map to or from characters in the day of Unicode, for pathnames that are not yet known to be pointing to files or directories, for symlinks especially, for multiplexing between many I/O channels, for advanced globbing and pathname selection techniques, for all kinds of security permissions and additional attributes that files have in modern operating systems and that one has to care for, etc. Many of these evolutions indeed the standardizers couldn't possibly have anticipated, though many already were well underway when the standard was finally stamped. What the standardizers could and probably did anticipate was that evolutions would happen that would make their document obsolete.

It is a fine and honest thing to humbly admit incompetence. But from this statement of incompetence, the conclusion should not have been to specify a half-baked version of a file system interface in their standard. The conclusion should have been to omit file system access from the standard, and to either explicitly delegate it to an existing competent authority, or call for other such future authorities to complete the standardized work. Maybe the standards committee could have acquired competence by spawning a permanent subcommittee to constantly keep that aspect of the language up-to-date with technological progress in that field to ensure that user's needs are always covered. But admitting incompetence and then claiming authority notwithstanding this admission of incompetence was a great disservice done to the whole community.

The correct way to not standardize would have been to acknowledge that each operating system offers its own interface, and to recommend that each Lisp system should provide a direct access to such interface. Inasmuch as such interfaces may themselves be expressed in other programming language, the correct way to standardize would have been to standardize on foreign function interfaces that allow to call arbitrary such functionality from Lisp, instead of requiring a new extension to the standard with every new library written in a foreign language.

Happily, where the language standard committee has failed, free software projects have recently (decades later) taken the relay and done the Right Thing(tm). CFFI standardizes interfacing to foreign (C) language functions across all implementations and platforms. IOLIB gives you direct access to operating system functions on many platforms, with the intent of supporting all meaningful platforms and building meaningful portable interfaces where they actually make sense. Hopefully, IOLIB will standardize for each platform on the operating system's native representation for pathnames as the one single concrete representation that makes sense (i.e. under Unix, wrapped foreign C strings), and provide accessors that abstract away the concrete representation and allow for implementation-independent (but somewhat OS dependent) syntax and semantics for dealing with pathnames.

Using such tools, instead of having to implement your desired functionality once per implementation per OS, you only have to implement it once per OS. Instead of having to use detached abstractions that correspond to nothing and are unable to express either what you want or what the system wants, you can build the simplest path from what you actually want to what the various systems you need to use actually want. Along the way, you will most certainly build simplifying constructions that better factor the system considering the actual constraints; but these will not be detached abstractions, but abstractions anchored in reality.

Providing the building blocks on top of which users could implement themselves the missing bits that necessarily exist in any language: such would be following the principle of user-driven language extensibility. Unhappily, whereas the Common Lisp standardizers got it mostly right for syntax, they got it completely wrong for system access.

Standards are tools in a wide-ranging social phenomenon of coordination between many programmers. They are contracts separating the partaking programmers into two sets with well-defined responsibilities: implementers and users of a programming language. They are contract you can opt in if you think they will bring you value, and opt out without penalty if you think they won't. Because not everyone has the same expectations about value, people will prefer some contracts to others, resulting in a wide variety of standards. (Note how much better this variety of contracts is than the forced uniformity of statute, whereby the same binding terms are imposed to everyone, whatever their static and dynamic expectations. Uniformity isn't good in itself, but Statists don't quite understand.)

Because some people understand that expectations change, fast, as situations evolve, they constantly adapt the contracts they use. Java Specification Requests, Python Enhancement Proposals, Scheme Requests For Implementation, etc., are ways in which some programming language communities try to keep their language relevant with respect to the wide-ranging and moving challenges of programming. (That's why you shouldn't be afraid of service companies changing their terms of service as such; but you should be afraid of legislators and judges widening the meaning of implicit consent or enforcing privileges of intellectual "property" upon you). That's how some programming languages may be extremely rigid when you consider static snapshots of them, yet remain relevant because the languages communities are somewhat flexible in their dynamic adaptation. This, combined with open-source implementations that allow one to experiment with what one cares for until the community follows (or doesn't), can bring enough adaptability to unextensible languages, to make them more bearable to one than easily extensible languages that lack actual libraries and extensions to deal with one's real world concerns.

Once you understand programming language documents not as mere technical documents, but as social tools to coordinate people, you can critique them from a new perspective. Who promises to whom to take responsibility for what in exchange for what? What about the contract brings value to both parties, and what doesn't? These are questions you should be asking when working on programming language design. Contracts are useful when they promote an efficient division of labor, whereby the more competent in a specialized field easily take responsibility for work in said field at the benefit of others, at low cost. Contracts are harmful when they create unnecessary work, when they assign responsibilities to the wrong people, when they impede future improvement in division of labor. Design your contracts carefully; include what works, exclude what doesn't.

Comments

The stuff about statists and judges seems minimally relevant to this screed.

An offer you can't refuse

It's about the fact that there's worse than bad contracts that you can refuse to opt in: bad statutes that you can't refuse to opt in. "I'm going to make you an offer you can't refuse..."

I could possibly have articulated the idea better.

(Anonymous)

CL-FAD?

There is also http://weitz.de/cl-fad/ - yeah, not on the same level as IOLIB+CFFI - but better than *defaults*... that reminds me... *default-pathname-defaults* - gah!

Re: CL-FAD?

I looked at it. From not handling byte-oriented filenames or symlinks to confusingly calling "breadth-first" what is actually "inspect *before* you recurse", it is broken in more ways than I care to describe. It is not something you want to rely upon, much less standardize.

(Anonymous)

So true

You got it right. Common lisp is stagnated.

(Anonymous)

As one who has never quite understood the incessant whining about CL's
pathnames, I was glad to see examples of problems mentioned in your
post, but I must admit I do not quite agree.

While lack of I/O channel multplexing is a significant problem, I
cannot see that it has anything to so with pathnames.

That pathnames contains attributes such as host and device that make
little sense in some prevalent OS' may be slightly odd but hardly what
this is all about. The type attribute on the other hand makes perfect
sense and is used by most desktop environments that I know of, from
KDE and GNOME on Linux over OSX to Windows, the type of a file is a
rather important attribute.

That the wildcard concept only goes part of the way to the full
globbing power of (say) Bash is true but it is one of the thing that
can relatively easily by solved through portable libraries and one
might also find that there could multiple good ways of selecting sets
of files.

The symbolic link reference also had me puzzled. Why should the
pathname concept be concerned with such OS level attributes of a name?
At least in UNIX, a symbolic is a name like any other, it just has
some special attributes. A hard link is even harder to tell from a
"normal" name.

It is unclear to me if you want portability or not. If we give up on
portability, the standard is fine at is, just ignore the pathnames
part. If you really want portability, you will also have to
acknowledge that there will be some lowest common denominator flavour
to it, there are after all real differences between the underlying
systems.

-- Christian Lynbech

(Anonymous)

CL pathnames are seriously underspecified which means that there are totally legitimate corner cases where they behave differently on the existing CL implementations.
Also, they unnecessarily conflate two loosely related concepts: file names and pattern matching, creating such corner cases.

You're right that the content type is an important attribute of a file, but that has nothing to do with the file's name: GNOME and KDE both scan the files' headers to detect content type meaning that you can easily have an MP3 file named README.txt for instance. As usual, Window$ sucks in this particular respect. If you're looking for a good example, that is BeOS - which stored the file's MIME type as extended attribute in the FS.
eyes black and white

December 2014

S M T W T F S
 123456
78910111213
14151617181920
21222324252627
28293031   

Tags

Powered by LiveJournal.com