Thursday, 22 October 2009

Context is sticky

Code reuse is one of the holy grails of the software engineering movement. Across the world, developers are frantically reinventing the wheel. The web groans under the weight of piles of functionally equivalent PHP applications for rendering the contents of database tables.

If a larger portion of this torrent of code could be reused then an enormous amount of effort could be saved. Perhaps this effort could be diverted into improving the software quality and we could finally make a dent in the software crisis.

But though everyone has been talking about code reuse for decades, there has been very little progress.

The code that has enjoyed a significant degree of reuse has been specifically designed for that purpose. Frameworks, libraries and plugin architectures are widespread. Even the mighty operating system exists to share functionality between applications. But serendipitous reuse of code that was originally designed to solve a singular problem is rare.

I think that the reason that code reuse is hard is the same reason that the semantic web has failed to materialise. This makes sense, because code is just a particular kind of semantic content.

As Clay Shirky has argued, the the semantic web is a problematic ambition because it requires a universal worldview. The semantic web project envisages that information interoperability will be achieved by employing universal data formats. But data formats are contingent on worldview, which can never be universal. Shirky takes genetics as an example:
It would be relatively easy, for example, to encode a description of genes in XML, but it would be impossible to get a universal standard for such a description, because biologists are still arguing about what a gene actually is. There are several competing standards for describing genetic information, and the semantic divergence is an artifact of a real conversation among biologists. You can't get a standard til you have an agreement, and you can't force an agreement to exist where none actually does.
Even something as apparently clear-cut as genetic science resists universal semantic presentation because the data is contaminated by its original context.

The opinions, prejudices, needs and worldview of a programmer are imprinted on their code to a far greater degree. That class you wrote the other day to process form values assumes that every field has exactly one value. The HTML the form was displayed in uses classes unique to your site's CSS. And the coding standards the class conforms to differ from standard PHP conventions because your organisation wants to achieve consistency with its .NET projects.

You might be able to shoehorn this code into the next project you complete for the same organisation, but there is little chance of your form-processing class ever being used by someone else entirely. The cleaner and more decoupled your code is, the more use it might be to someone else, but you cannot entirely erase the imprint of its original context because context is what gives your code meaning.

The way you can best foster reuse is to engineer a situation where the worldview embedded in your code is adopted by the reuser. Take Firefox as an example. The core functionality of the browser is leveraged by thousands of plugin developers. But the API these extensions work with was laid down by the developers of Firefox and has meaning only in the context of the Firefox browser.

A cross-browser extension API would be very convenient, but the task of creating a plugin model that would apply as well to Chrome as to Firefox would be gargantuan. Witness how difficult it is to even get HTML and CSS to render the same in more than one browser. A cross-browser API would take the compatibility issues from the DOM and spread them to every aspect of the browsing experience.

Commonly-used frameworks also owe their success to prescribing a worldview. The only painless way to work with a framework is to follow the Rails way, the Django way or the Drupal way. To reuse someone else's code you must make concessions to their way of doing things.

There are a couple of current developments in software engineering that will help with the code reuse problem. Test driven development helps to make the assumptions embedded in code explicit by describing them using unit tests. The referential transparency fostered by the functional programming paradigm controls context by quarantining side-effects.

But code reuse will always be intrinsically hard because context is sticky.

Sunday, 11 October 2009

Vision is a feature

A few weeks ago Mark Whiting and I had a brief Twitter conversation about his suggestion that as design quality increases the designer disappears. He went on to suggest that the formalism we when recognising a designer's work is as much an imperfection of the design as a feature.

I was not so sure. There are definitely instances where the designer's mark seems to contribute to the design. Programming languages are a good example. Ruby would not be what it is without the strength of Matz's personal vision.

On the other hand, I do get annoyed when a designer's vanity tempts them to graffiti their signature onto a design that would have been better left alone. I'm thinking here of 'clever' designs like teapots with two spouts.

The difference between these two scenarios is, in my opinion, is whether or not the design space is convergent. I mean the term in the same sense as convergent evolution. In a convergent design space, the differences between designs will gradually disappear over time as individual designers are gradually more successful at approximating the best solution to the problem at hand.

In such a domain, it follows that any deviation from the one true design is noise. The designer's personal touch therefore detracts from their attempt to produce good design. A double-spouted teapot might help the designer express their individuality, but the result is just slightly less convenient tea.

However, it's rare to find a design space where a Platonic 'best' design exists. When have the various stakeholders in the construction of a new building ever agreed what is best? And to revisit my earlier example, which language is 'best' is one of the most common topics of programming flame wars.

Designers usually have to balance competing interests. How much should the finished product cost? What kind of user/customer should it be optimised for? What about older users/customers, or ones with disabilities? And not least, when is the deadline for the completed design? How designers balance these interests will inevitably affect the design. There is rarely any objective way to balance these subjective interests, so there is rarely an objective best design.

In such open design spaces, the designer's vision serves an important purpose - coherence. There are so many elements in a complicated design that it can be hard to take them in all at once. A strong authorial vision helps users/customers by giving them a guide to predict and/or remember the designer's choices.

Many Ruby admirers speak of the Principle of Least Surprise. Ruby is comparatively easy to learn and understand because its design choices aim to produce the least astonishment in the programmer. But since every programmer comes from a different background, they will each have different expectations and standards of astonishment.

So more precisely, Ruby was designed according to the Principle of Matz's Least Surprise. Once the programmer gets a handle on Matz's programming aesthetic, they can make educated guesses about parts of the language that they have not yet encountered.

So in conclusion, the formalism we when recognising a designer's work is a feature because it makes understanding complicated design simpler.