Log in

eyes black and white

Databases vs Programming Languages

Rahul recently pointed me to a nice 1998 review of datamodels in the database world by Stonebraker and Hellerstein: What comes around goes around. This paper is quite insightful, and I believe a good overview of the field, but it falls into the usual traps shared by most database practitioners. This got me started to think about what programming languages and databases have to learn from each other.

Unlike most people (and not just in the field of databases), Stonebraker and Hellerstein also have a uncommonly good understanding of human factors in communication of meaning and economic factors in adoption of software, and how they often override technical concerns. They show particular insight in their discussion of Schemas, XML, and the inadequacies of "semantic web" concepts when automating business transactions.

Like most database people, these authors understand the importance of Data; they understand that Data is both precious and fragile against loss, corruption and bit rot. Meanwhile, most computer scientists are still struggling with the concepts of Persistence, Robustness and Evolution: schools teach how to program with self-contained one-off toys that have short-lived data and that no one depends on; all the focus is on the (important) immediate algorithmic aspect, but little is said about the (equally important) long-range aspects of software engineering. Precious Data needs to persist across runs of individual programs, across power cycles of the machine; it needs to survive crashes of individual programs, crashes of the machine; it needs to resist corruption by buggy programs or by race conditions amongst programs accessing it; its shape itself will evolve to meet new requirements, and although this phenomenon is much slower than the evolution of programs modifying data, the old data needs to be preserved through these changes, and somehow the system must continue to run all along. Yet by and large, programming languages and operating systems, even when they offer concurrent programming capabilities, do not offer proper support for transactions on the persistent data; whatever support exists for transactions often comes in clumsy and brittle libraries, and evolution is almost never supported at all (even less so with statically typed language).

However, like most database people also, these authors are not trained in semantics, and they can't seem to fathom the notion and the possibility of abstraction as a general concept. Instead, they speak here of queries to operate on sets of records, there of logical vs physical data independence, there again of keeping things simple, of user-defined types and functions, of standards, etc., and they seem to think of language expressiveness as a cute feature but not all that important (or at least they are content registering that the market values it little). More generally, they do not understand the notion of a programming language, and think they can get away with throwing together features for their database interface and achieve a satisfactory design that will be used by application writers in a language independent way. Yet, the whole notion of language independence boasted by database designers (as well as designers of operating systems and other infrastructure) is but the pride by these self-ignorant mono-linguists to not call their own barking a language. To them language is a slur for what application programmers use. Little do they realize that the database interfaces they are offering are programming languages indeed and that their ignoring the hard-earned lessons of programming language design imposes a high cost upon themselves and their users.

Most importantly, most database people deliberately try to ignore the dynamics of the algorithms that manipulate data, but instead have mostly read-only views of data for which they design fancy query sub-languages; they fail to recognize the importance of concepts of ownership, of intensionality vs extensionality, etc. Because of the limitations they impose, application writers have to retrofit these concepts in ways the consistency of which is not taken into account by the otherwise sacrosanct integrity management of the database. The whole discussion about datamodels is thus poisoned by an attitude based on the wholly absurd premise that a datamodel is a modular aspect of an application that can be factored away from the rest of the system.

This attitude is not so problematic in the case of database gurus like the authors of said article, who are able to adapt the internals of the databases they develop to extend their datamodels to fit the needs of application writers. But lesser database specialists, who do not develop and extend databases, reduce all data to some poor datamodels, where data relationships are something static and cast in stone; they consider computations that will happen on the data as something unfathomable, irrelevant, unworthy of interest, modularized away. As a result, they insist on alleged simplifications, normalizations and representations, that only simplify, normalize or represent but the small part of the system that they oversee, at the expense of hugely increased complexity in the rest of the system, and communication problems between developers.

The worst kind of datamodellers is those data bureaucrats who code neither application nor database infrastructure, but who imagine themselves the masters and keepers of some datamodel that has a value independent of the rest of the system. They spend their time slowing down development with bureaucratic processes and time wasted using their pitiful tools and pseudo-languages, contributing nothing but complications and gratuitous dependencies for those who manipulate the data, have the domain expertise, and actually understand what the data is all about.

To be fair, blindness and bureaucracy are not the exclusive attribute of Data guys. There is plenty of such horrors amongst Code guys. Blind Coders will lightly consider data persistence as well as all I/O to be architecturally unimportant ancillary tasks that can be factored away from code. Code Bureaucrats will insist that everyone should use their one blessed Language and Implementation, strictly respect their Object Model and Programming Methodology, and follow canned templates or graphical tools to export and document interfaces or models that they have to bless. The worst amongst them will declare that they own the API and create additional hurdles and gratuitous compatibility backwardness to the already difficult task of developing software, without contributing anything to the bottom line of building a working system. None of these people will understand the big picture, the social issues of development, the burden their decisions impose upon others, the cost of their folly to the group, and least of all the possibility to automate away all the rigid and stupid rules that constitute the cherished meat of their own petty bureaucratic job.

In the end, the fields of programming languages and databases contain complementary lessons. Software engineers should learn from both. And more importantly, they should expand their views to the dynamics of the whole system rather than a small static aspect of it. That, and avoid bureaucrats.


Yup. :)
The database people are forced to carve the database in stone. As soon as application grows big anough and there is enough amount of data to be useful, hierarchy takes over and developers are shunned away. Suddenly database is not just a mean to support the application but an entity of itself, which will be used for safety, mining and most important of all REPORTS. Then coders must have all their changes go through multilevel councils for approval to even simplest of the changes.

BTW The link for What comes around goes around is outdated , new one seems to be http://pages.cs.wisc.edu/~cs764-1/datamodel.pdf

eyes black and white

August 2015



Powered by LiveJournal.com