For he who attempts to view a multitude of objects with the one and the same glance, sees none of them distinctly; and similarly the man who is wont to attend to many things at the same time by a single act of thought is confused in mind.
- René Descartes, 'Rules for the direction of the mind' in Key Philosophical Writings
"Let us concentrate on explaining to human beings what we want a computer to do"
Monday, 15 February 2010
Descartes on Unit Testing
Saturday, 6 February 2010
Deconstructing The Savoy
The software industry has long looked to the construction industry for inspiration.
We appropriate its vocabulary - programmers "build" software designed by "architects". We draw on its ideas - the seminal Gang of Four book adapted architect Christopher Alexander's concept of design patterns for use in software construction. And the discipline of software engineering was founded on a desire to employ civil engineering practices to help us build complex software systems.
For me, the most striking similarity between the two industries is the frequency of budget blowouts and schedule overruns. The great thing about this for software developers is that it gives us a tangible way of describing our otherwise inexplicable travails and catastrophes to ordinary people.
Yesterday in The London Evening Standard I read an article about the renovations of The Savoy, the famous London hotel. It read almost word-for-word like a story about an overly ambitious IT migration project.
But much more frightening are the dull, blank blocks of obsolete code that loom like monoliths erected by a vanished civilisation. No one remembers what they were originally intended to do, and you can never be quite sure that the earth won't mysteriously stop turning if they are ever removed.
But I am not entirely pessemistic about the software development process. There is one important attribute that software has that buildings don't - malleability.
We are able to follow agile methodologies and incrementally improve our programs. We don't have to follow The Savoy's example and attempt to implement an enormous modification in one go. We can embrace change and use the information we gather along the way to improve the end product. And if we refactor as we develop, we can reduce the amount of technical debt we bequeath to our successors.
We appropriate its vocabulary - programmers "build" software designed by "architects". We draw on its ideas - the seminal Gang of Four book adapted architect Christopher Alexander's concept of design patterns for use in software construction. And the discipline of software engineering was founded on a desire to employ civil engineering practices to help us build complex software systems.
For me, the most striking similarity between the two industries is the frequency of budget blowouts and schedule overruns. The great thing about this for software developers is that it gives us a tangible way of describing our otherwise inexplicable travails and catastrophes to ordinary people.
Yesterday in The London Evening Standard I read an article about the renovations of The Savoy, the famous London hotel. It read almost word-for-word like a story about an overly ambitious IT migration project.
When The Savoy closed on 15 December 2007 for a planned 16-month, £100 million makeover, it was hoped the hotel would quickly resume its status, buffed and restored to its former glory.In the planning stages, optimism rules. The project is so large and complex that we can't possibly plan accurately, so in the absence of evidence one way or the other we assume the best.
The original £100 million estimate for the work has been ripped up and although operators Fairmont Hotels and Resorts will not disclose the actual figure, the 15 months of work suggests it could be close to double that sum.Once work begins, the fragility of the initial estimates is exposed. Often, the 'estimate' of how much a project will cost is as much based on the depth of the client's pockets as the actual effort required to get the job done (which of course no one knows in advance anyway).
Part of the problem was The Savoy's unplanned, organic growth.We have to live with the sins of those who came before us. In my experience, the quality of a system's legacy code base has more impact on a project than the inherent difficulty of the project in question.
Although we had done two years of planning and tried to assess the level of issues behind the walls, it's only when you close the doors and open it up that you realise the amount of work is much more serious and extensive than first envisaged.Unsurprisingly, once you are up to your elbows in a system's viscera, you have a much better idea about what you're in for. Exploratory surgery is the only way to be certain of how long changes will take.
"What was an open courtyard suddenly became a room, with a mix of internal and external walls."A system accrues idiosyncrasies because it is inevitably patched, hacked and enhanced. Modules that were designed with one use-case in mind are re-purposed as business needs change. And scar tissue accumulates.
Digging up the roadway in Savoy Place, off the Strand entrance - still the only place in the UK where one must legally drive on the right - he found a huge gulley running around the perimeter, instead of a solid foundation. "We don't even know what it's for."Users will adapt to visible peculiarities. They may even grow attached to them, even if the rationale for them has become obsolete (cars drive on the right in Savoy Place so that hansom cab drivers could open the door for their customers without leaving their seat).
But much more frightening are the dull, blank blocks of obsolete code that loom like monoliths erected by a vanished civilisation. No one remembers what they were originally intended to do, and you can never be quite sure that the earth won't mysteriously stop turning if they are ever removed.
The huge expense and loss of revenue mean The Savoy has to "hit the ground running" when it reopens if the money is ever to be recouped.Counter intuitively, the level of optimism rises as the schedule slips. It's very tempting to think that although we fell behind in phase 1, we can make up the time in phase 2. However, it's much more likely that if one part of a project runs into trouble, the rest will too.
As a manager at one said: "Hotel travellers are very promiscuous, they will, as it were, sleep around. While you are off the scene many will have happily moved on and it could take years to get them back."Software users are even more promiscuous, especially on the web. They will cheerfully see your competitors behind your back, even in the good times. They will not tolerate a prolonged outage and they will complain loudly if your service is unavailable even for an hour.
But I am not entirely pessemistic about the software development process. There is one important attribute that software has that buildings don't - malleability.
We are able to follow agile methodologies and incrementally improve our programs. We don't have to follow The Savoy's example and attempt to implement an enormous modification in one go. We can embrace change and use the information we gather along the way to improve the end product. And if we refactor as we develop, we can reduce the amount of technical debt we bequeath to our successors.
Thursday, 14 January 2010
Clean code
At the beginning of Clean Code, Uncle Bob Martin enlists various well-respected programmers to explain what code cleanliness means to them. These luminaries include Bjarne Stroustrup (inventor of C++) and Ward Cunningham (inventor of the wiki).
Here is my definition:
When I work with unclean code my vision is clouded by weak naming, murky structure, inadequate commenting, convoluted dependencies and duplicated logic. Unclean code makes me afraid, because I cannot predict or understand the consequences of my changes.
Here is my definition:
Clean code imposes minimal impedance between the reader and the intent of the author. It contains little accidental complexity and its meaning can be easily understood, verified and manipulated.When I work with clean code I have a sensation of reaching through the code to directly engage with the system's concepts.
When I work with unclean code my vision is clouded by weak naming, murky structure, inadequate commenting, convoluted dependencies and duplicated logic. Unclean code makes me afraid, because I cannot predict or understand the consequences of my changes.
Friday, 11 December 2009
Thinking from the G.U.T.
A mathematical physicist friend of mine once said that he looks forward to the day when physicists will produce a Grand Unified Theory (GUT) that will consolidate strong nuclear force, weak nuclear force and electromagnetism into a single interaction. This would be a stepping stone to the creation of a Theory of Everything that would also assimilate gravitation and thus unite all the strands of modern physics.
I was struck by his certainty that such a theory is possible. He seemed to be making a scientific prediction based on an aesthetic sensibility - unified theories are more beautiful therefore a unified theory is correct.
Unification is certainly an important part of the progress of a science. Occam's razor is a well-established principle for judging the utility of a theory. Entia non sunt multiplicanda praeter necessitatem - entities are not to be multiplied more than is necessary. In other words, the simplest theory that fits the data is the best.
When Sadi Carnot showed the equivalence of heat and mechanical work it was a victory for physics because physicists could now explain natural phenomena using less entities. At a superficial level, his discovery was useful because students now had to tax their brains with less concepts in order to understand both heat and motion.
But Carnot's unification had also produced a theory was more correct than earlier theories that thought of heat as a substance called "caloric". The mechanical theory of heat turned out to explain more phenomena than Carnot had originally considered. The laws of thermodynamics could not have been formulated without Carnot's insight.
Why should the simpler theory prove more correct?
Marcus Hutter believes that understanding is fundamentally an act of mental unification. The Hutter prize offers a reward for anyone able to produce a better compression of Wikipedia. A high compression ratio requires a deep understanding of the corpus - in this case a snapshot of human knowledge. If unification is in some sense equivalent to understanding then a unified theory is more likely to be correct because it is a better approximation of the phenomena in question.
(Interestingly, Hutter is also an advocate for a physical Theory of Everything)
However, it is a leap of blind optimism to assume that a G.U.T. is possible just because if it existed it would be useful, beautiful and likely to yield futher insight. Desirability does not imply feasibility.
In The Mythical Man Month Fred Brooks draws a distinction between essential and accidental complexity in software systems that is very pertinent to the possibility of a G.U.T.
Accidental complexity is caused by defficiencies in the solution. This kind of complexity can be eliminated by improving the approach to the problem. I would argue that the seperation of heat and mechanics was an example of accidental complexity caused by a lack of understanding of the nature of heat. The mechanical theory of heat was a successful simplification because it removed complexity that was never part of the phenomena itself.
Essential complexity, on the other hand, is inherent in the problem. It is impossible to build a solution that is less complex than the problem it is designed to solve.
The universe, like any other corpus, has an uncomputable Kolmogorov complexity that limits how simple a correct theory of physics can be. Though we cannot ever know the essential complexity of the universe, it does have one. There is an unknown and absolute limit to the unifying efforts of physics, so we cannot ever be sure that further unification will be possible.
Perhaps physics will encounter new phenomena that require new multiplication of entities to explain. Perhaps we are close to the limit and though we might incrementally simplify our theories we will never be able to reduce physics to less than four fundamental interactions.
We cannot hope to make out theories more unified than the phenomena they describe and still hope to make them correct. As Albert Einstein said (my italics):
I was struck by his certainty that such a theory is possible. He seemed to be making a scientific prediction based on an aesthetic sensibility - unified theories are more beautiful therefore a unified theory is correct.
Unification is certainly an important part of the progress of a science. Occam's razor is a well-established principle for judging the utility of a theory. Entia non sunt multiplicanda praeter necessitatem - entities are not to be multiplied more than is necessary. In other words, the simplest theory that fits the data is the best.
When Sadi Carnot showed the equivalence of heat and mechanical work it was a victory for physics because physicists could now explain natural phenomena using less entities. At a superficial level, his discovery was useful because students now had to tax their brains with less concepts in order to understand both heat and motion.
But Carnot's unification had also produced a theory was more correct than earlier theories that thought of heat as a substance called "caloric". The mechanical theory of heat turned out to explain more phenomena than Carnot had originally considered. The laws of thermodynamics could not have been formulated without Carnot's insight.
Why should the simpler theory prove more correct?
Marcus Hutter believes that understanding is fundamentally an act of mental unification. The Hutter prize offers a reward for anyone able to produce a better compression of Wikipedia. A high compression ratio requires a deep understanding of the corpus - in this case a snapshot of human knowledge. If unification is in some sense equivalent to understanding then a unified theory is more likely to be correct because it is a better approximation of the phenomena in question.
(Interestingly, Hutter is also an advocate for a physical Theory of Everything)
However, it is a leap of blind optimism to assume that a G.U.T. is possible just because if it existed it would be useful, beautiful and likely to yield futher insight. Desirability does not imply feasibility.
In The Mythical Man Month Fred Brooks draws a distinction between essential and accidental complexity in software systems that is very pertinent to the possibility of a G.U.T.
Accidental complexity is caused by defficiencies in the solution. This kind of complexity can be eliminated by improving the approach to the problem. I would argue that the seperation of heat and mechanics was an example of accidental complexity caused by a lack of understanding of the nature of heat. The mechanical theory of heat was a successful simplification because it removed complexity that was never part of the phenomena itself.
Essential complexity, on the other hand, is inherent in the problem. It is impossible to build a solution that is less complex than the problem it is designed to solve.
The universe, like any other corpus, has an uncomputable Kolmogorov complexity that limits how simple a correct theory of physics can be. Though we cannot ever know the essential complexity of the universe, it does have one. There is an unknown and absolute limit to the unifying efforts of physics, so we cannot ever be sure that further unification will be possible.
Perhaps physics will encounter new phenomena that require new multiplication of entities to explain. Perhaps we are close to the limit and though we might incrementally simplify our theories we will never be able to reduce physics to less than four fundamental interactions.
We cannot hope to make out theories more unified than the phenomena they describe and still hope to make them correct. As Albert Einstein said (my italics):
Make everything as simple as possible, but not simpler
Thursday, 22 October 2009
Context is sticky
Code reuse is one of the holy grails of the software engineering movement. Across the world, developers are frantically reinventing the wheel. The web groans under the weight of piles of functionally equivalent PHP applications for rendering the contents of database tables.
If a larger portion of this torrent of code could be reused then an enormous amount of effort could be saved. Perhaps this effort could be diverted into improving the software quality and we could finally make a dent in the software crisis.
But though everyone has been talking about code reuse for decades, there has been very little progress.
The code that has enjoyed a significant degree of reuse has been specifically designed for that purpose. Frameworks, libraries and plugin architectures are widespread. Even the mighty operating system exists to share functionality between applications. But serendipitous reuse of code that was originally designed to solve a singular problem is rare.
I think that the reason that code reuse is hard is the same reason that the semantic web has failed to materialise. This makes sense, because code is just a particular kind of semantic content.
As Clay Shirky has argued, the the semantic web is a problematic ambition because it requires a universal worldview. The semantic web project envisages that information interoperability will be achieved by employing universal data formats. But data formats are contingent on worldview, which can never be universal. Shirky takes genetics as an example:
The opinions, prejudices, needs and worldview of a programmer are imprinted on their code to a far greater degree. That class you wrote the other day to process form values assumes that every field has exactly one value. The HTML the form was displayed in uses classes unique to your site's CSS. And the coding standards the class conforms to differ from standard PHP conventions because your organisation wants to achieve consistency with its .NET projects.
You might be able to shoehorn this code into the next project you complete for the same organisation, but there is little chance of your form-processing class ever being used by someone else entirely. The cleaner and more decoupled your code is, the more use it might be to someone else, but you cannot entirely erase the imprint of its original context because context is what gives your code meaning.
The way you can best foster reuse is to engineer a situation where the worldview embedded in your code is adopted by the reuser. Take Firefox as an example. The core functionality of the browser is leveraged by thousands of plugin developers. But the API these extensions work with was laid down by the developers of Firefox and has meaning only in the context of the Firefox browser.
A cross-browser extension API would be very convenient, but the task of creating a plugin model that would apply as well to Chrome as to Firefox would be gargantuan. Witness how difficult it is to even get HTML and CSS to render the same in more than one browser. A cross-browser API would take the compatibility issues from the DOM and spread them to every aspect of the browsing experience.
Commonly-used frameworks also owe their success to prescribing a worldview. The only painless way to work with a framework is to follow the Rails way, the Django way or the Drupal way. To reuse someone else's code you must make concessions to their way of doing things.
There are a couple of current developments in software engineering that will help with the code reuse problem. Test driven development helps to make the assumptions embedded in code explicit by describing them using unit tests. The referential transparency fostered by the functional programming paradigm controls context by quarantining side-effects.
But code reuse will always be intrinsically hard because context is sticky.
If a larger portion of this torrent of code could be reused then an enormous amount of effort could be saved. Perhaps this effort could be diverted into improving the software quality and we could finally make a dent in the software crisis.
But though everyone has been talking about code reuse for decades, there has been very little progress.
The code that has enjoyed a significant degree of reuse has been specifically designed for that purpose. Frameworks, libraries and plugin architectures are widespread. Even the mighty operating system exists to share functionality between applications. But serendipitous reuse of code that was originally designed to solve a singular problem is rare.
I think that the reason that code reuse is hard is the same reason that the semantic web has failed to materialise. This makes sense, because code is just a particular kind of semantic content.
As Clay Shirky has argued, the the semantic web is a problematic ambition because it requires a universal worldview. The semantic web project envisages that information interoperability will be achieved by employing universal data formats. But data formats are contingent on worldview, which can never be universal. Shirky takes genetics as an example:
It would be relatively easy, for example, to encode a description of genes in XML, but it would be impossible to get a universal standard for such a description, because biologists are still arguing about what a gene actually is. There are several competing standards for describing genetic information, and the semantic divergence is an artifact of a real conversation among biologists. You can't get a standard til you have an agreement, and you can't force an agreement to exist where none actually does.Even something as apparently clear-cut as genetic science resists universal semantic presentation because the data is contaminated by its original context.
The opinions, prejudices, needs and worldview of a programmer are imprinted on their code to a far greater degree. That class you wrote the other day to process form values assumes that every field has exactly one value. The HTML the form was displayed in uses classes unique to your site's CSS. And the coding standards the class conforms to differ from standard PHP conventions because your organisation wants to achieve consistency with its .NET projects.
You might be able to shoehorn this code into the next project you complete for the same organisation, but there is little chance of your form-processing class ever being used by someone else entirely. The cleaner and more decoupled your code is, the more use it might be to someone else, but you cannot entirely erase the imprint of its original context because context is what gives your code meaning.
The way you can best foster reuse is to engineer a situation where the worldview embedded in your code is adopted by the reuser. Take Firefox as an example. The core functionality of the browser is leveraged by thousands of plugin developers. But the API these extensions work with was laid down by the developers of Firefox and has meaning only in the context of the Firefox browser.
A cross-browser extension API would be very convenient, but the task of creating a plugin model that would apply as well to Chrome as to Firefox would be gargantuan. Witness how difficult it is to even get HTML and CSS to render the same in more than one browser. A cross-browser API would take the compatibility issues from the DOM and spread them to every aspect of the browsing experience.
Commonly-used frameworks also owe their success to prescribing a worldview. The only painless way to work with a framework is to follow the Rails way, the Django way or the Drupal way. To reuse someone else's code you must make concessions to their way of doing things.
There are a couple of current developments in software engineering that will help with the code reuse problem. Test driven development helps to make the assumptions embedded in code explicit by describing them using unit tests. The referential transparency fostered by the functional programming paradigm controls context by quarantining side-effects.
But code reuse will always be intrinsically hard because context is sticky.
Sunday, 11 October 2009
Vision is a feature
A few weeks ago Mark Whiting and I had a brief Twitter conversation about his suggestion that
I was not so sure. There are definitely instances where the designer's mark seems to contribute to the design. Programming languages are a good example. Ruby would not be what it is without the strength of Matz's personal vision.
On the other hand, I do get annoyed when a designer's vanity tempts them to graffiti their signature onto a design that would have been better left alone. I'm thinking here of 'clever' designs like teapots with two spouts.
The difference between these two scenarios is, in my opinion, is whether or not the design space is convergent. I mean the term in the same sense as convergent evolution. In a convergent design space, the differences between designs will gradually disappear over time as individual designers are gradually more successful at approximating the best solution to the problem at hand.
In such a domain, it follows that any deviation from the one true design is noise. The designer's personal touch therefore detracts from their attempt to produce good design. A double-spouted teapot might help the designer express their individuality, but the result is just slightly less convenient tea.
However, it's rare to find a design space where a Platonic 'best' design exists. When have the various stakeholders in the construction of a new building ever agreed what is best? And to revisit my earlier example, which language is 'best' is one of the most common topics of programming flame wars.
Designers usually have to balance competing interests. How much should the finished product cost? What kind of user/customer should it be optimised for? What about older users/customers, or ones with disabilities? And not least, when is the deadline for the completed design? How designers balance these interests will inevitably affect the design. There is rarely any objective way to balance these subjective interests, so there is rarely an objective best design.
In such open design spaces, the designer's vision serves an important purpose - coherence. There are so many elements in a complicated design that it can be hard to take them in all at once. A strong authorial vision helps users/customers by giving them a guide to predict and/or remember the designer's choices.
Many Ruby admirers speak of the
So more precisely, Ruby was designed according to the Principle of Matz's Least Surprise. Once the programmer gets a handle on Matz's programming aesthetic, they can make educated guesses about parts of the language that they have not yet encountered.
So in conclusion,
as design quality increases the designer disappears. He went on to suggest
that the formalism we when recognising a designer's work is as much an imperfection of the design as a feature.
I was not so sure. There are definitely instances where the designer's mark seems to contribute to the design. Programming languages are a good example. Ruby would not be what it is without the strength of Matz's personal vision.
On the other hand, I do get annoyed when a designer's vanity tempts them to graffiti their signature onto a design that would have been better left alone. I'm thinking here of 'clever' designs like teapots with two spouts.
The difference between these two scenarios is, in my opinion, is whether or not the design space is convergent. I mean the term in the same sense as convergent evolution. In a convergent design space, the differences between designs will gradually disappear over time as individual designers are gradually more successful at approximating the best solution to the problem at hand.
In such a domain, it follows that any deviation from the one true design is noise. The designer's personal touch therefore detracts from their attempt to produce good design. A double-spouted teapot might help the designer express their individuality, but the result is just slightly less convenient tea.
However, it's rare to find a design space where a Platonic 'best' design exists. When have the various stakeholders in the construction of a new building ever agreed what is best? And to revisit my earlier example, which language is 'best' is one of the most common topics of programming flame wars.
Designers usually have to balance competing interests. How much should the finished product cost? What kind of user/customer should it be optimised for? What about older users/customers, or ones with disabilities? And not least, when is the deadline for the completed design? How designers balance these interests will inevitably affect the design. There is rarely any objective way to balance these subjective interests, so there is rarely an objective best design.
In such open design spaces, the designer's vision serves an important purpose - coherence. There are so many elements in a complicated design that it can be hard to take them in all at once. A strong authorial vision helps users/customers by giving them a guide to predict and/or remember the designer's choices.
Many Ruby admirers speak of the
Principle of Least Surprise. Ruby is comparatively easy to learn and understand because its design choices aim to produce the least astonishment in the programmer. But since every programmer comes from a different background, they will each have different expectations and standards of astonishment.
So more precisely, Ruby was designed according to the Principle of Matz's Least Surprise. Once the programmer gets a handle on Matz's programming aesthetic, they can make educated guesses about parts of the language that they have not yet encountered.
So in conclusion,
the formalism we when recognising a designer's workis a feature because it makes understanding complicated design simpler.
Friday, 4 September 2009
Using your browser to run an .msi as admin
Programmers should not use administrator accounts when developing. They should use accounts with the same privileges as the end users of the software. This minimises the chance of permission-related "it works on my machine" bugs occurring.
Trouble is, developers frequently need to install programs. The nice way to handle this is to use something like sudo (for *nix systems). A specific command can be executed with raised permissions, but for the rest of the time the user operates with normal privileges.
However, some operating systems (like Windows XP), do not fully support the sudo approach. There is a command known as "runas", but this does not work in all circumstances. In particular, it is not available for .msi installer files.
If you are running Windows XP on a non-administrator account, you need to install an .msi and you have the password of an administrator account, you do not have to take the trouble to logout and log back in. The following workaround lets you use your web browser as an .msi launcher and bypass the restriction:
Needless to say, use this trick sparingly. Running your browser as administrator all the time is almost as bad as developing under an admin account.
Trouble is, developers frequently need to install programs. The nice way to handle this is to use something like sudo (for *nix systems). A specific command can be executed with raised permissions, but for the rest of the time the user operates with normal privileges.
However, some operating systems (like Windows XP), do not fully support the sudo approach. There is a command known as "runas", but this does not work in all circumstances. In particular, it is not available for .msi installer files.
If you are running Windows XP on a non-administrator account, you need to install an .msi and you have the password of an administrator account, you do not have to take the trouble to logout and log back in. The following workaround lets you use your web browser as an .msi launcher and bypass the restriction:
- Use runas to launch your browser with admin privileges
- Open the .msi in your browser, either from the web or your local filesystem
- What you do next depends on what browser you use. In Firefox, you double click on the .msi in the download window which will launch it - as admin!
Needless to say, use this trick sparingly. Running your browser as administrator all the time is almost as bad as developing under an admin account.
Subscribe to:
Posts (Atom)