Sunday, March 29, 2015

Simplicity

Lately, I've been thinking a lot about the importance of simplicity in software. I can remember a time in my career when I considered a single system that does everything to be ideal; I dreamed of building monolithic applications that met every possible need my users may have, and I searched for all-encompassing frameworks that eliminated the need for any other dependencies and abstracted away as many challenges as possible. Why import several small libraries into my application when I could adopt a framework that has everything included?

As time has passed and I've written more software, though, I have come to realize what so many programmers figured out before me: software works best when it focus on doing one thing well. And the corollary: software that tries to do too many things will not excel at all of them.

I've seen this philosophy discussed more often recently, particularly in the backlash to monolithic frameworks like JSF and AngularJS; I haven't yet decided if simplicity is gaining popularity, or if I'm just noticing simplicity more now that I better appreciate it. But even if simplicity is gaining traction in software development, it's not a new concept. Dijkstra himself wrote in 1975:

Simplicity is prerequisite for reliability. Edsger Dijkstra, How do we tell truths that might hurt?

Or consider Unix, around in various forms since the 1960s. Utilities in Unix-like systems (like Linux and Mac OS X) have a tendency to do one thing only, but do it very well. There are utilities like:

  • find - searches a directory structure for files matching an expression
  • xargs - exectues an arbitrary command using the output of a previous command as arguments
  • egrep - searches for text matching a given regular expression
  • identify - retrieves image metadata (part of ImageMagick)
  • cut - extracts segments of text
  • tar - creates file archives

Individually, these are fairly simple programs. They are often useful individually, but they don't do terribly complex things.

But say you want to search through your file system and gather all of your winter holiday season pictures from any year into a single archive? Well, maybe you can find some big monolothic program that will do it for you; you might have to pay for it, it surely does way more than you need, and yet still probably won't do exactly what you want. Or instead, you can compose several simple programs to accomplish your goal:

find . -iname '*.jp*g' \ | xargs -L 1 -I @ identify -format '%[EXIF:DateTime] %d/%f\n' @ \ | egrep '^[[:digit:]]{4}:12' \ | cut -d' ' -f3- \ | tar -cf december.tar -T -

This uses find to list all files ending in jpg/jpeg, then xargs to pass each file to identify, which lists the date each image was taken and its name. Then egrep filters the list down to images taken in December of any year, and cut trims the output to just list the December file names. Finally, tar takes that list of files and puts them in an archive. Again, none of these individual tasks was particularly complex; when these simple programs were composed together, though, we were able to do something rather complex.

As a programmer, it's important to understand the difference between writing complex software and writing software that does complex things. Complex software is difficult to write, difficult to test, and more likely to break. Instead, we should write many pieces of simple, thoroughly tested software, and compose these simple pieces of software to do complex things.

If you're writing utility libraries, make them as tightly focused as possible, instead of writing some do-it-all super utility belt -- leave those to Batman. If you're writing applications, make sure you're writing tight, well-tested modular code; module systems like Browserify can really help on the client side. Keep it simple.

Tuesday, September 30, 2014

Response from Rep. Scott Perry (R-PA) on Net Neutrality

I contacted Scott Perry (R-PA), my U.S. Congressman, to encourage him to support net neutrality. Like Pat Toomey (R-PA), my U.S. Senator, he opposes it:

Thank you for contacting me regarding the Federal Communications Commission's (FCC) newly proposed net neutrality rules. I appreciate learning your views on this issue.

I'm committed to doing everything possible to ensure a transparent, accountable, and openly available Internet. In the last year, I voted for Representative Justin Amash's amendment to end NSA mass surveillance (H.Amdt. 413) and supported the Digital Accountability and Transparency (DATA) Act (H.R. 2061), which requires information on all federal spending to be posted to a comprehensive and searchable website.

As you know, earlier this year the U.S. Court of Appeals for the District of Columbia struck down portions of the FCC's 2010 Open Internet Order, ruling that the agency's rules overstepped its regulatory authority under current law. As a result, FCC Chairman Tom Wheeler proposed a new set of net neutrality rules on May 15, 2014. While these rules are not finalized, the FCC is encouraging interested individuals and groups to comment on the proposal, either through the FCC website or by sending an e-mail to openinternet@fcc.gov.

Know that I support a hands-off approach to the Internet. Since the Telecommunications Act of 1996 was passed, new technologies and advancements in telecommunications have rapidly developed due to the limited government regulation of Internet traffic and services. According to the FCC National Broadband Plan, this unrestricted free market has provided broadband to over 95% of Americans without government intervention or interference. Please know that I will continue to support policies that preserve the Internet's open structure to promote innovation, spur economic growth, and protect the free exchange of ideas.

Once again, thank you for contacting me. I appreciate your concerns and welcome your continued feedback. Please visit my website at perry.house.gov to submit further questions/comments or to sign up for my e-newsletter, Facebook page, and/or Twitter updates.

Scott Perry
Member of Congress

It was rather difficult find Rep. Perry's position on net neutrality. I couldn't find any news articles or press releases, and I never received a response to my emails, tweets, or Facebook messages. I finally called his Washington D.C. office; I spoke with a very polite and helpful staffer, and they sent me the email above.

Whatever your position is on net neutrality, if you are a U.S. citizen, I encourage you to contact your senators and representatives and let them know where you stand. We may not all have the financial resources to compete with large telecoms, but we do still have the votes.

Sunday, September 28, 2014

New programming techniques and the productivity curve

Though I love learning new programming techniques and technologies, I often struggle to make them a part of my normal development processes. For example, it took years before I finally started using regular expressions on a normal basis. The reason? The productivity curve:

You may have seen a chart like this before; the productivity curve is used to show one of the challenges of software delivery. The general idea is that when you first use a new product (or technology), your productivity always plummets in the short term. No matter how much better the new stuff is, you will be less productive using it at first, because you were more familiar with the old stuff. However, as you become more proficient with the new technology, your productivity gradually increases, and eventually your productivity will be greater than it was with the old technology (assuming the new technology is actually better).

So the key is to push through that initial dip, and eventually you get to the increased productivity that comes with mastering the new technology. Easy, right? Well, if your company just replaced your HR system, you have little choice in the matter; you’ll get through that productivity dip because you have no other choice.

However, if you’ve just decided you want to pick up a new programming technique, it’s a little more difficult. Before I got the hang of regular expressions, it was simply far faster for me to whip together something ugly using things like StringTokenizers and substring. Nobody cared if I was using regular expressions, but they did care if I took 10 times as long to complete a task.

On the rare occasions where I needed to make a change to existing regular expressions, I’d end up slogging through references and online tutorials to try and make sense out of it. Each time, after finally figuring it out, I’d tell myself that I was going to remember how these worked next time. But since I hadn’t made regular expressions a part of my standard toolchain, I’d forget everything I learned, and I would be lost again next time I needed to use them.

I eventually got the hang of regular expressions through forced practice and forced application.

Forced practice was straightforward enough: I dedicated time to reading references, running through tutorials, and solving regex challenges. This practice was an important step in building the skills, but I had tried this to a certain degree before. I’d spend a few hours practicing regular expressions, and feel like I was getting the hang of it. But time would pass, and the next time I needed to fix a regex, I’d have forgotten most of it again.

The key for me was forced application of regular expressions. Like the poor HR staff who learn a new HR system because they have no other choice, I forced myself to use regular expressions any time they could be useful, even though it would take me longer than a quick hack with stuff I was already familiar with. Good unit tests helped a lot; though I was not confident in my ability to write bug-free regular expressions, I was confident in my ability to write thorough tests.

As the productivity curve predicted, I was definitely less efficient at first. But as I forced myself to use regular expressions instead of hacking something together more quickly, I gradually improved. And before long, I was past the productivity dip, having finally gotten the hang of regular expressions. Now they are an essential part of my development toolchain, and I am a better programmer.

I know this isn’t really earth shattering advice; my five year old son has figured out that you get better at something the more you do it. But when you find some new technology or technique you want to make part of your development process, I think it’s important to go into it recognizing that not only will it take time to master it, but you will almost certainly be less productive in the short-term than you were before. And I think that if you are prepared for the productivity dip and are willing to accept it, you can push through the difficult times and master it. In the long run, you will be a better developer for it.

Monday, September 15, 2014

Response from Sen. Pat Toomey (R-PA) on Net Neutrality

I emailed Pat Toomey (R-PA), my U.S. Senator, to encourage him to support net neutrality. As you can see from his response, he is opposed to it:

Thank you for contacting me about Federal Communications Commissions' (FCC) net neutrality regulation. I appreciate knowing your thoughts on this issue.

As you may know, on December 21, 2010, the FCC adopted an Open Internet Order, better known as "net neutrality," that imposed new federal regulations on the types of services Internet providers could sell. Verizon Communications sued the FCC arguing that the regulations were too stringent and went beyond the agency's authority.

On January 14, 2014, in the case Verizon Communications Inc. v. FCC, the D.C. Circuit Court of Appeals struck down the FCC's net neutrality regulation. The Court stated that the FCC did not have the statuary authority to compel a broadband provider to follow the Open Internet Order.

I understand the concerns expressed by those who support net neutrality regulations; however, I also believe that such federal mandates would unduly inhibit this industry's innovation, investment in new technology, and job creation. Moreover, the Internet and online content have thrived in the United States without net neutrality regulations, which throws into question the need for more government intervention. Although there is currently no legislation before the Senate addressing net neutrality, please be assured that I will keep your thoughts on net neutrality in mind, should the Senate begin consideration of open internet legislation.

Thank you again for your correspondence. Please do not hesitate to contact me in the future if I can be of assistance.

Sincerely,

Pat Toomey
U.S. Senator, Pennsylvania

Whatever your position is on net neutrality, if you are a U.S. citizen, I encourage you to contact your senators and representatives and let them know where you stand. We may not all have the financial resources to compete with large telecoms, but we do still have the votes.

Wednesday, August 27, 2014

Yes, you CAN unit test client side code (video)

When developers think about unit testing web applications, we often focus on server-side code. We may think that testing client-side code (JavaScript, HTML, CSS) either can’t be done or is limited to isolated scenarios. But with the right tools and programming techniques, we can achieve the same rigorous test quality in our client code that we expect from our server code.

In this Web Conference at Penn State presentation, I demonstrated how to use tools like Mocha and PhantomJS to build rigorous client tests. I also discussed programming techniques to make client code more easily testable.

All sample code is available on GitHub.

Tuesday, April 15, 2014

Solve your problem by almost asking a question on StackOverflow

If you were to look at my StackOverflow profile, you would see that I’ve only asked a handful of questions in the years that I’ve been a member. However, this is not because I don’t run into programming problems or don’t think to ask questions. In the process of formulating a good StackOverflow question, I usually find the answer myself.

First of all, it’s embarrassing to spend time typing up a question and have someone respond with a Google search providing dozens of hits that provide the answer. So before I even start typing, you’d better believe I Google the hell out of the problem. And many times, I find what I'm looking for.

Assuming Google fails me, now I start explaining my problem. Since my audience is a group of complete strangers that have no knowledge of my application, I can’t simply say “I’m getting a SplinesNotReticulatedException from my FlangerFactory.” Even if I happen to be working on an open source project, I can’t expect the StackOverflow community to take the time to understand my full code base in depth. Thus, I am forced to write a small example demonstrating the problem. This is a fantastic way to narrow down the problem, because if I think the problem is with Dojo, but I can’t write a small example outside of my code base to reproduce the problem, then I’m most likely not dealing with a Dojo bug. In fact, there’s a good chance I’m looking in the wrong area of my code base. And if I can reproduce the problem, then maybe I can file a bug report with Dojo.

Now I have a simple code example that helpful members of the community can work from. So that they don’t waste their time running into the same walls I did, I explain exactly what I’ve tried thus far that hasn’t worked. This saves a lot of back-and-forth “nope, I tried that and it doesn’t work” comments. And in the process of documenting what I’ve already tried, I often come up with new ideas. And sometimes, one of those ideas actually solves the problem.

Of course, while I used StackOverflow as an example, this process really works with any community and medium, including emailing my coworkers. I respect the time of those who help me, so I make every effort to ask for help in the most efficient manner possible. In every forum I use, I always follow this process:

  1. Thoroughly search Google (and any other available non-human resources).
  2. Write a concise example demonstrating the problem.
  3. Explain any failed attempts to solve the problem.

More often than not, I solve the problem myself by going through this process. And if I don’t discover the solution myself, then I have constructed a proper question that gives me the best chance of getting the help I need from my community.

Update: as several Redditers have pointed out, if you've documented a problem and found a solution, you should share it! StackOverflow actually encourages posting and immediately answering your own question, and I imagine most communities feel the same way. My focus in this post was on discussing how to properly research and document a problem before asking for help, and highlighting how often you could end up solving the problem yourself, but I left out the importance of sharing that information with others.

I wouldn't necessarily recommend posting your solution if it's "oops, I found a missing comma in one of my modules", and I don't think we need any more StackOverflow questions that could easily be solved via a Google search (that probably points back to a dozen StackOverflow questions). But if your solution could actually help others, then go ahead and publish it in whatever forum you like, or put it up on your personal blog, or email it to your coworkers...

Thursday, October 31, 2013

Using Node.js streams to massage data into the format you want

Google provides some pretty cool flu data in CSV format, and I wanted to display that in a chart at Dash. However, the raw data isn't quite right for my needs:
  1. It has a bunch of intro/header text (copyright stuff, description of the data, etc), and Dash needs just the raw data.
  2. It shows dozens of states/regions/cities, and I just want to show overall U.S. data and my home state.
Fortunately, Dash can read data from any publicly accessible endpoint, so I decided to throw together a quick Node.js app to massage the data into what I needed. The most straightforward solution was probably to load the whole file, read through it line by line, build up an array of data, then write it out. And since the data feed is currently just under 400KB, maybe that would have been alright. But a better pattern (and more fun, IMO) is to take advantage of Node Streams. As long as we use streams throughout the entire process, we can make sure that only a small buffer is kept in memory at any given time.

If you just want to see the full app, it's on GitHub. Otherwise, read on to see my thought process.

Filter out the intro/header text

First, we'll write a stream that filters out the copyright/overview stuff and passes on the rest:

var stream = require('stream')
  , util = require('util')

function CleanIntro(options) {
  stream.Transform.call(this, options)
}

util.inherits(CleanIntro, stream.Transform)

CleanIntro.prototype._transform = function (chunk, enc, cb) {
  if (this.readingData) {
    this.push(chunk, enc)
  } else {
    // Ignore all text until we find a line beginning with 'Date,''
    var start = chunk.toString().search(/^Date,/m)
    if (start !== -1) {
      this.readingData = true
      this.push(chunk.slice(start), enc)
    }
  }
  cb()
}

A Transform stream simply takes data that was piped in from another stream, does whatever it wants to it, then pushes whatever it wants back out. In our case, we're just ignoring anything before the actual data begins, then pushing the rest of the data back out. Easy.

Parse the CSV data

Now that we have a filter to get just the raw CSV data, we can start parsing it. There are lots of CSV parsing libraries out there; I like csv-stream because, well, it's a stream. So our basic process is to make the HTTP request, pipe it to our header-cleaning filter, then pipe it to csv-stream and start working with the data:
var request = require('request')
  , csv = require('csv-stream')
  , util = require('util')
  , _ = require('lodash')
  , moment = require('moment')
  , OutStream = require('./out-stream')
  , CleanIntroFilter = require('./clean-intro-filter')

// Returns a Stream that emits CSV records from Google Flu Trends.
// options:
//   - regions: an array of regions for which data should be generated.
//     See http://www.google.org/flutrends/us/data.txt for possible values
module.exports = function (options) {
  options = _.extend({
    regions: ['United States']
  }, options)

  var earliest = moment().subtract('years', 1)

  request('http://www.google.org/flutrends/us/data.txt')
    .pipe(new CleanIntroFilter())
    .pipe(csv.createStream({}))
    .on('error',function(err){
        // Oops, got an error
    })
    .on('data',function(data) {
      var date = moment(data.Date)

      // Only return data from the past year
      if (date.isAfter(earliest) || date.isSame(earliest)) {
        // Let's build the output String...
        console.log(data.Date + ',' + _.map(options.regions, function (region) {
          return data[region]
        }).join())
      }
    })
    .on('end', function () {
      // Okay we're done, now what?
    })
}

Alright, now we're getting close. We've built the CSV output, but now what do we do with it? Push it all into an array and return that? NO! Remember, we'll lose the slim memory benefits of streams if we don't keep using them the whole way through.

Write out to another Stream

Instead, let's just make our own writeable stream:
var stream = require('stream')

var OutStream = function() {
  stream.Transform.call(this,{objectMode: false})
}

OutStream.prototype = Object.create(
  stream.Transform.prototype, {constructor: {value: OutStream}} )

OutStream.prototype._transform = function(chunk, encoding, callback) {
  this.push(chunk, encoding)
  callback && callback()
}

OutStream.prototype.write = function () {
  this._transform.apply(this, arguments)
}

OutStream.prototype.end = function () {
  this._transform.apply(this, arguments)
  this.emit('end')
}

And now our parsing function can return that stream and write to it:
module.exports = function (options) {
  options = _.extend({
    regions: ['United States']
  }, options)

  var out = new OutStream()
  out.write('Date,' + options.regions.join())

  var earliest = moment().subtract('years', 1)

  request('http://www.google.org/flutrends/us/data.txt')
    .pipe(new CleanIntroFilter())
    .pipe(csv.createStream({}))
    .on('error',function(err){
        out.emit('error', err)
    })
    .on('data',function(data) {
      var date = moment(data.Date)

      // Only return data from the past year
      if (date.isAfter(earliest) || date.isSame(earliest)) {
        out.write(data.Date + ',' + _.map(options.regions, function (region) {
          return data[region]
        }).join())
      }
    })
    .on('end', function () {
      out.end()
    })

  return out
}

Serve it up

Finally, we'll use Express to expose our data as a web endpoint:
var express = require('express')
  , data = require('./lib/data')
  , _ = require('lodash')

var app = express()

app.get('/', function(req, res){
  var options = {}

  if (req.query.region) {
    options.regions = _.isArray(req.query.region) ? req.query.region : [req.query.region]
  }

  res.setHeader('Content-Type', 'text/csv')

  data(options)
    .on('data', function (data) {
      res.write(data)
      res.write('\n')
    })
    .on('end', function (data) {
      res.end()
    })
    .on('error', function (err) {
      console.log('error: ', error)
    })
})

var port = process.env.PORT || 5000
app.listen(port)
console.log('Listening on port ' + port)

Once again, note that as soon as we get data from our stream, we manipulate and write it out to another stream (the HTTP response, in this case). This keeps us from holding a lot of data in memory unnecessarily.

Now if we fire up the server, we can use curl to see it in action:

$ curl 'http://localhost:5000'
Date,United States
2012-11-04,2492
2012-11-11,2913
2012-11-18,3040
2012-11-25,3641
2012-12-02,4427
[and so on]

$ curl 'http://localhost:5000?region=United%20States&region=Pennsylvania'
Date,United States,Pennsylvania
2012-11-04,2492,2579
2012-11-11,2913,2889
2012-11-18,3040,2785
2012-11-25,3641,3248
2012-12-02,4427,3679
[and so on]

As long the server is running someplace that is accessible to the public, we can head on over to Dash and plug it into a Custom Chart widget, giving us something like this:
Hey, looks like December and January are big months for the flu in the U.S. Who knew?

Want to try this yourself? The full source for this app is on GitHub, along with step-by-step instructions for running the project and creating a widget in Dash. Enjoy!