Sunday, 20 March 2022

The unreasonable effectiveness of data-oriented programming

In 2009, I left my graduate job in Australia and moved to the UK with my British partner. I'd been working as a programmer for about two years. The economy was reeling from the 2008 crash, so I took whatever programming jobs that I was offered. And what I was offered was PHP, specifically Drupal.

Ever since I stumbled across programming during an electrical engineering degree at Melbourne University, I've loved programming languages. I love their differences. I love their quirks. Most of all I love the model of the world that is embedded in every programming language's design choices.

I loved C's memory manipulation. I loved Haskell's elegant types. I loved Prolog's single-minded obsession with modelling every computation using reversible predicates (it reminded me of Wittgenstein and Russell). I still remember the warm glow I felt leaving the lecture on software design where I learned about Java's interfaces could be used to implement polymorphism without inheritance.

But I didn't love PHP. I knew it as a language that grew rather than was designed. I thought it was ugly. I I was a snob. But I knew how to program in it. "What is there even to know?" I thought.

The killer apps of the PHP world were all open-source content management systems (CMSes). Drupal. Wordpress. Joomla. MediaWiki, which still powers Wikipedia today. You had to hand it to PHP. It was good at getting content in front of people.

Drupal 6 was my tool of choice. I'd used it at university and luckily it was quite a common requirement of the digital agency jobs that I found in the south-west of England. I learned it. I learned it by buying the book and reading it cover to cover and I learned it by FTPing files directly onto production servers and seeing what happened.

To learn Drupal meant to learn to use its extensions. That's where the power of the PHP CMSes lay. There were extensions for everything and the community culture encouraged sharing and open sourcing the extensions that you made to other peoples extensions.

To use and write Drupal extensions meant to understand the hook system. The hook system is a kind of reflection-based event system. Extensions implement a hook via specially-named functions. When the hook is activated, all functions with a matching name will be called.

But what would they be called with? What kind of arguments would be passed to the hook functions? PHP inherited convenient syntax for nested associative maps from Perl, so Drupal used those. You'd read the documentation to discover what keys to expect in the associative map and which ones you were expected to return. Sometimes you'd make a mistake, but you could usually figure out what had gone wrong by inspecting the input and output associative maps.

I remember being uneasy that our code was just functions that accepted and returned data. It felt like cheating somehow. "If we were doing this properly," I'd tell my coworkers with the scintillating arrogance of a programmer with two years professional experience, "we'd be grouping these functions into classes and encapsulating this data in objects." "Yeah?" they said patiently, "And how would that be better?" I had no reply (I know there are replies but I didn't have them at the time).

Later, I got into Clojure. I learned it so that I could use Jeff Rose and Sam Aaron's Overtone audio environment, that let me synthesise sound using programming (Sam now works on Sonic Pi). I started attending and then organising the London Clojure Dojo.

I learned a lot of other things about Clojure. I learned about the power of the REPL and then about using VimClojure to evaluate pieces of code inside the editor itself (I recommend Fireplace today for Vim users). It was always easier to experiment with pieces of code that accepted data that I could directly enter and returned data I could easily inspect.

I loved James Reeve's Ring, which gave me a clear programming model for handling HTTP requests. A HTTP request is an associative map, and if I apply a series of functions to it I can turn it into an HTTP response, which is also an associative map. So simple it almost felt like cheating.

I was lucky enough to meet Jeff, Sam and James at the dojo. Inspired by Ring, I wrote a library called Leipzig that applied this data-oriented style to Overtone programs. I carved out a niche for myself on the functional programming conference circuit speaking about music theory modeled using little data-in data-out functional programs.

Through all this, I didn't have a good way of explaining what this style of programming was. It definitely wasn't object-oriented. It didn't quite seem to be functional programming either, at least not functional programming as I'd practised it writing Haskell. Someone who thinks in types might describe it in negative terms, as functional programming without the guard rails and affordances of types. But I didn't experience this data-first approach as an absence of anything. It felt to me like a different style of programming altogether, one with its own strengths, weaknesses, tricks and traps.

In Data-Oriented Programming, Yehonathan Sharvit gives a comprehensive account of what it means to write programs data-first. I haven't read anything like it before. You can watch Rich Hickey's talks and get some of the ideas that Yehonathan covers. You can read books on Clojure and other functional programming languages. But nowhere else have I ever read a complete description of what it means to put data at the heart of your programs.

Yehonathan uses Javascript for his examples but the specific programming language is not the point. He writes about techniques including separating code and data, validation and state management that you can implement in any programming language.

Data-oriented programming starts with data modeling and treats functions as connectors that get you from one format to another. Unlike objects and higher-order functions, it offers a model that can be extended beyond individual programs to the system level. And it offers the flexibility to create designs more lucid and more opaque than any other style I've programmed with. It's not cheating and it does require discipline. Data-oriented programming is great when done right. But how can you do it right if no one tells you how to do it?

That's why I'm really glad that Yehonathan has taken the time to write this book, still in Manning's early access programme. Data-oriented programming is a thing, a different thing to other styles of programming. And now there's a reference detailing what it is.