Sunday 20 March 2022

The unreasonable effectiveness of data-oriented programming

In 2009, I left my graduate job in Australia and moved to the UK with my British partner. I'd been working as a programmer for about two years. The economy was reeling from the 2008 crash, so I took whatever programming jobs that I was offered. And what I was offered was PHP, specifically Drupal.

Ever since I stumbled across programming during an electrical engineering degree at Melbourne University, I've loved programming languages. I love their differences. I love their quirks. Most of all I love the model of the world that is embedded in every programming language's design choices.

I loved C's memory manipulation. I loved Haskell's elegant types. I loved Prolog's single-minded obsession with modelling every computation using reversible predicates (it reminded me of Wittgenstein and Russell). I still remember the warm glow I felt leaving the lecture on software design where I learned about Java's interfaces could be used to implement polymorphism without inheritance.

But I didn't love PHP. I knew it as a language that grew rather than was designed. I thought it was ugly. I I was a snob. But I knew how to program in it. "What is there even to know?" I thought.

The killer apps of the PHP world were all open-source content management systems (CMSes). Drupal. Wordpress. Joomla. MediaWiki, which still powers Wikipedia today. You had to hand it to PHP. It was good at getting content in front of people.

Drupal 6 was my tool of choice. I'd used it at university and luckily it was quite a common requirement of the digital agency jobs that I found in the south-west of England. I learned it. I learned it by buying the book and reading it cover to cover and I learned it by FTPing files directly onto production servers and seeing what happened.

To learn Drupal meant to learn to use its extensions. That's where the power of the PHP CMSes lay. There were extensions for everything and the community culture encouraged sharing and open sourcing the extensions that you made to other peoples extensions.

To use and write Drupal extensions meant to understand the hook system. The hook system is a kind of reflection-based event system. Extensions implement a hook via specially-named functions. When the hook is activated, all functions with a matching name will be called.

But what would they be called with? What kind of arguments would be passed to the hook functions? PHP inherited convenient syntax for nested associative maps from Perl, so Drupal used those. You'd read the documentation to discover what keys to expect in the associative map and which ones you were expected to return. Sometimes you'd make a mistake, but you could usually figure out what had gone wrong by inspecting the input and output associative maps.

I remember being uneasy that our code was just functions that accepted and returned data. It felt like cheating somehow. "If we were doing this properly," I'd tell my coworkers with the scintillating arrogance of a programmer with two years professional experience, "we'd be grouping these functions into classes and encapsulating this data in objects." "Yeah?" they said patiently, "And how would that be better?" I had no reply (I know there are replies but I didn't have them at the time).

Later, I got into Clojure. I learned it so that I could use Jeff Rose and Sam Aaron's Overtone audio environment, that let me synthesise sound using programming (Sam now works on Sonic Pi). I started attending and then organising the London Clojure Dojo.

I learned a lot of other things about Clojure. I learned about the power of the REPL and then about using VimClojure to evaluate pieces of code inside the editor itself (I recommend Fireplace today for Vim users). It was always easier to experiment with pieces of code that accepted data that I could directly enter and returned data I could easily inspect.

I loved James Reeve's Ring, which gave me a clear programming model for handling HTTP requests. A HTTP request is an associative map, and if I apply a series of functions to it I can turn it into an HTTP response, which is also an associative map. So simple it almost felt like cheating.

I was lucky enough to meet Jeff, Sam and James at the dojo. Inspired by Ring, I wrote a library called Leipzig that applied this data-oriented style to Overtone programs. I carved out a niche for myself on the functional programming conference circuit speaking about music theory modeled using little data-in data-out functional programs.

Through all this, I didn't have a good way of explaining what this style of programming was. It definitely wasn't object-oriented. It didn't quite seem to be functional programming either, at least not functional programming as I'd practised it writing Haskell. Someone who thinks in types might describe it in negative terms, as functional programming without the guard rails and affordances of types. But I didn't experience this data-first approach as an absence of anything. It felt to me like a different style of programming altogether, one with its own strengths, weaknesses, tricks and traps.

In Data-Oriented Programming, Yehonathan Sharvit gives a comprehensive account of what it means to write programs data-first. I haven't read anything like it before. You can watch Rich Hickey's talks and get some of the ideas that Yehonathan covers. You can read books on Clojure and other functional programming languages. But nowhere else have I ever read a complete description of what it means to put data at the heart of your programs.

Yehonathan uses Javascript for his examples but the specific programming language is not the point. He writes about techniques including separating code and data, validation and state management that you can implement in any programming language.

Data-oriented programming starts with data modeling and treats functions as connectors that get you from one format to another. Unlike objects and higher-order functions, it offers a model that can be extended beyond individual programs to the system level. And it offers the flexibility to create designs more lucid and more opaque than any other style I've programmed with. It's not cheating and it does require discipline. Data-oriented programming is great when done right. But how can you do it right if no one tells you how to do it?

That's why I'm really glad that Yehonathan has taken the time to write this book, still in Manning's early access programme. Data-oriented programming is a thing, a different thing to other styles of programming. And now there's a reference detailing what it is.

Saturday 14 January 2017

Expressive types, not oppressive types

Uncle Bob wrote a recent post in which he warns programmers against the "dark path" some modern languages have taken - that is to "double down" on static typing. He cites Swift and Kotlin as examples, though his argument is meant to be interpreted more generally.
I share many points of view in common with Uncle Bob. I find the dynamically typed Clojure programming language beautiful and expressive - most of my personal projects are written in Clojure. I think that TDD (test-driven design) is a valuable and important discipline - I work for an agile consulting company where most of our projects include helping clients to get better at testing.

But I disagree strongly with the way Uncle Bob frames this discussion on static types.

Uncle Bob looks at advanced type systems and sees them as more oppressive rather than more expressive. Being able to describe whether or not a function can return null is an opportunity, not a constraint. Being able to use types to describe your code's intent is an opportunity, not a constraint. Being able to reason about the behaviour of a function based on its type signature is an opportunity, not a constraint.
The kicker is that this is almost exactly the fallacy about TDD that we have railed against for years. We call it "test-driven design" because we know that evolving code in response to examples is a great way to inform a design. Folks who have not learnt to listen to their unit tests see them as nagging constraints that prevents them from writing code in the way they'd like. A master of TDD uses tests as feedback for their design.

Anyone who sees unit test as mere "checks" that make changing code needlessly difficult isn't getting the most out of test-driven design. Anyone who sees static types as mere "checks" that make changing code needlessly difficult isn't getting the most out of type-driven design.
Based on his post Uncle Bob falls into the latter category. He sees types as ad hoc antidotes for specific mistakes rather than tools for thought - "Every time there’s a new kind of bug, we add a language feature to prevent that kind of bug."
If that's Uncle Bob's experience of Swift and Kotlin, he should try Elm. Or F#. Or Haskell. If his experience is anything like mine, he would find that more sophisticated types lead to less ad hockery, not more.

In a follow-up post, Uncle Bob is explicit about what he wants in a programming language - "There is a balance point after which every step down The Dark Path increases the cost over the benefit. I think Java and C# have done a reasonable job at hovering near the balance point."
I couldn't disagree more. Java and C# have two of the most onerous and least beneficial type systems. Their complexity and absence of type inference force excessive bookkeeping on the programmer. They lack of basic features like sum types, which denies the programmer an important expressive idiom.
Java and C# represent the nadir of the type system trade-off, not the zenith. Type systems are tools. Better tools help us write better code. We should welcome each and every advance in the tools we use to do our job, because frankly we could do a lot better than what we have now.

To argue that employing more expressive types is a "dark path" that leads developers away from personal responsibility isn't accurate or helpful.

Saturday 5 November 2016

Computational Musicology, ????, Profit

This year I had the pleasure of attending FARM at ICFP. As well as demoing Klangmeister, I gave a paper on what computational musicology means for the study of music. The abstract is as follows:

In this paper I examine the relationship that complexity theory and
disjunctive sequences have to music, music-generating programs
and literary works. I then apply these ideas by devising a program
to generate an infinite ‘Copyright Infringement Song’ that contains
all other songs within it. I adopt literary modes of analysis and
presentation, which I motivate by arguing that music is a cultural
and artistic phenomenon rather than a natural one.

The full paper is available online via the ACM.

Most of the FARM papers focused more on general analysis of the structure of music than the interpretation of the meaning of specific pieces. I find the general analytic approach fascinating, but as I argue in my paper, I think computational musicology can be more than that.

I'm grateful to the FARM organisers for accepting a work that is a little loose with the genre conventions of a computer science paper. ICFP is a great conference, but its usual standard of worthwhile research is inherited from mathematics and the sciences. With notable exceptions like James Noble's work on postmodern programming, I don't see many examples of academics employing computational thinking for humanities research.

The paper is based on a talk I gave at Strange Loop last year called Kolmogorov Music.

Monday 19 September 2016

Music as code talks

I've been giving talks about music theory and code for a few years now, so I thought I'd collect them all together in one place. They are based on Overtone and my Leipzig music composition library.
  1. Functional Composition, about music theory from sine waves through to canons, given at Lambda Jam 2013 (code).
  2. Kolmogorov Music, about music and complexity theory, given at Strange Loop 2015 (code).
  3. Dueling Keyboards, about temperament and tuning systems, given at Clojure eXchange 2015 (code).
  4. Klangmeister, about my online live coding environment, given at FlatMap 2016 (code).
  5. African Polyphony and Polyrhythm, about music from the Central African Republic, given at Strange Loop 2016 (code). Slides are online.
  6. It Ain't Necessarily So, about the psychology of musical perception, given at Curry On 2018 (code). Slides and demo are online.
  7. Birdsong-as-code, about the music theory of birdsong, given at Strange Loop 2023 (code).
 If you're interested in my personal music, check out Whelmed (code) or this performance with keytar accompaniment.

Sunday 3 July 2016

Falsehoods programmers believe about music

In the spirit of Patrick McKenzie's great post on falsehoods programmers believe about names, I am trying to write an equivalent one for music. Any false assumption that might be made in codifying music is a candidate for inclusion. Suggestions are very welcome.
  1. Music can be written down.
  2. Okay, maybe not with European notation, but there'll be a specialist notation for that kind of music.
  3. Music is finite in duration.
  4. Music has a composer.
  5. Music is about harmony.
  6. Music uses scales.
  7. Music uses equal temperament.
  8. Music uses tones and semitones.
  9. Music and dance are separate activities.
  10. Playing and listening to music are separate activities.
  11. Musicians can play their part separately from the overall composition.
  12. Music is performed by professional musicians.

Sunday 26 June 2016


Douglas Hofstadter's Godel, Escher, Bach is one of my favourite books. Commonly referred to as GEB, this book is a mesmerising meditation on consciousness, mathematics and creativity. The central idea is that of a "strange loop", in which the same message is interpreted on multiple semantic levels.

GEB was the first place I came across the idea of a musical canon. A canon is a beautifully austere form of composition that was popular in the Baroque period. A canon consists of a dux part which sets out the base melody accompanied by a comes part which is some kind of transformation of the dux.

Here is the structure of a canon described using the programming language Clojure. f stands for the transformation selected by the composer.

(defn canon [f notes]
  (->> notes
       (with (f notes))))

For example, the comes might be formed by delaying the dux by a bar and raising every note by a third. In my talk Functional Composition I show how computer code can be used to explain music theory, focussing on JS Bach's Canone alla Quarta from the Goldberg Variations. Canone alla Quarta is an unusually complex and beautiful canon where the transformation is composed of a delay of three beats (a simple canon), a reflection (a mirror canon) and a pitch transposition down a fourth (an interval canon).

Here is the transformation from Canone alla Quarta written in Clojure. comp is a Clojure function for composing multiple transformations together.

(defn canone-alla-quarta [notes]
  (->> notes
         (comp (interval -3) mirror (simple 3)))))

I was working on a talk for last year's Strange Loop programming conference (itself a reference to Hofstadter's work) and I decided that I wanted to create my own canon as a tribute to GEB as a finale. Rather than use an ordinary musical transformation for my comes, I wanted to pick something that spoke to the idea of composing music with computer code. I also wanted to incorporate GEB's theme of interpreting messages on multiple levels.

I took the letters G, E and B, and used the ASCII codes that represent these letters as though they were MIDI pitch codes. This gave me my dux. I then took the same three letters and interpreted them as the musical notes G, E and B. This gave me my comes. I had obtained a canon based not on musical concepts like delay or transposition, but on encoding schemes used in computer programming.

(defn canone-alla-geb [notes]
  (->> notes
         #(where :pitch ascii->midi %))))

I elaborated the harmonies provided by this canon into a complete track, composed via computer code. The dux and the comes are joined by various other parts, some using polyrhythms to generate apparent complexity from underlying simplicity.

Eventually, the dux and the comes are accompanied by a third canonic voice, in which the names of Godel, Escher and Bach are read out by a text-to-speech program. So the theme of three notes G, E and B becomes a canon of three voices musical, technical and allusive to the three great creative spirits Godel, Escher and Bach.

Listen to the recording.

Read the code.

Watch the talk.

Sunday 17 May 2015

Lanham on explicit data dependencies

Who's kicking who?

- Richard Lanham, Revising Prose