Sunday, March 29, 2015

Simplicity

Lately, I've been thinking a lot about the importance of simplicity in software. I can remember a time in my career when I considered a single system that does everything to be ideal; I dreamed of building monolithic applications that met every possible need my users may have, and I searched for all-encompassing frameworks that eliminated the need for any other dependencies and abstracted away as many challenges as possible. Why import several small libraries into my application when I could adopt a framework that has everything included?

As time has passed and I've written more software, though, I have come to realize what so many programmers figured out before me: software works best when it focus on doing one thing well. And the corollary: software that tries to do too many things will not excel at all of them.

I've seen this philosophy discussed more often recently, particularly in the backlash to monolithic frameworks like JSF and AngularJS; I haven't yet decided if simplicity is gaining popularity, or if I'm just noticing simplicity more now that I better appreciate it. But even if simplicity is gaining traction in software development, it's not a new concept. Dijkstra himself wrote in 1975:

Simplicity is prerequisite for reliability. Edsger Dijkstra, How do we tell truths that might hurt?

Or consider Unix, around in various forms since the 1960s. Utilities in Unix-like systems (like Linux and Mac OS X) have a tendency to do one thing only, but do it very well. There are utilities like:

  • find - searches a directory structure for files matching an expression
  • xargs - executes an arbitrary command using the output of a previous command as arguments
  • egrep - searches for text matching a given regular expression
  • identify - retrieves image metadata (part of ImageMagick)
  • cut - extracts segments of text
  • tar - creates file archives

Individually, these are fairly simple programs. They are often useful individually, but they don't do terribly complex things.

But say you want to search through your file system and gather all of your winter holiday season pictures from any year into a single archive? Well, maybe you can find some big monolothic program that will do it for you; you might have to pay for it, it surely does way more than you need, and yet still probably won't do exactly what you want. Or instead, you can compose several simple programs to accomplish your goal:

find . -iname '*.jp*g' -print0 \ | xargs -0 -L 1 -I @ identify -format '%[EXIF:DateTime] %d/%f\n' @ \ | egrep '^[[:digit:]]{4}:12' \ | cut -d' ' -f3- \ | tar -cf december.tar -T -

This uses find to list all files ending in jpg/jpeg, then xargs to pass each file to identify, which lists the date each image was taken and its name. Then egrep filters the list down to images taken in December of any year, and cut trims the output to just list the December file names. Finally, tar takes that list of files and puts them in an archive. Again, none of these individual tasks was particularly complex; when these simple programs were composed together, though, we were able to do something rather complex.

As a programmer, it's important to understand the difference between writing complex software and writing software that does complex things. Complex software is difficult to write, difficult to test, and more likely to break. Instead, we should write many pieces of simple, thoroughly tested software, and compose these simple pieces of software to do complex things.

If you're writing utility libraries, make them as tightly focused as possible, instead of writing some do-it-all super utility belt -- leave those to Batman. If you're writing applications, make sure you're writing tight, well-tested modular code; module systems like Browserify can really help on the client side. Keep it simple.

Update: added -print0 to find and -0 to xargs to properly handle spaces in file names. Thanks Graingert!