Jerry on Java: 2015

Friday, April 3, 2015

5 simple rules for securely storing passwords

Far too frequently, systems are hacked and their user databases are compromised. And there are far too many cases where the database contains plain text passwords, poorly hashed passwords, or two-way encrypted passwords, despite the wealth of resources available on how to properly store user credentials. And it's not just legacy databases; just this week, I saw a reddit thread with at least one developer advocating custom hashing functions and "security through obscurity".

Though cryptography is a complex subject, you don't need to be an expert to be a good steward of your users' credentials. Just follow these simple rules when selecting a scheme:¹

Use a proven one-way hashing algorithm that has been publicly reviewed by security experts, such as bcrypt, PBKDF2, or scrypt.
Don't attempt to write your own hashing algorithm. You can't rely on "security by obscurity"; you must assume the attacker has your source code, and can attack your inferior algorithm.²
I mean it, don't write your own hashing algorithm! Just use one of the ones from Rule #1. They are free, available on virtually any platform, and have been proven secure by the best minds in security.
Seriously, why are you even still reading this? Rule #1 is all you need!
Alright, if you are really going to ignore Rule #1, please at least notify your users that you are not safely storing their passwords, and in the event of a database breach, their passwords will become public knowledge.

I don't like to deal in absolutes, particularly in software development, but I truly feel strongly about this. There is no situation in which you should be developing your own password hashing algorithm!³ Here's an example of using PBKDF2 in Java without importing any external libraries; if Node.js is your thing, there's a module for bcrypt; there's a Ruby gem for scrypt; hell, even PHP has a built-in password hashing function that uses bcrypt. Really, virtually any language has a freely available library for any of those three algorithms. Just Google "your-language-of-choice bcrypt OR scrypt OR pbkdf2" and you'll find plenty of options.

Any hash function that has not been through rigorous public peer review is almost certainly not properly designed. If Niels Provos himself — one of the creators of bcrypt — was on my team and wanted to use a new hash function he created in private, I would pass on it and use bcrypt. Although I doubt Mr. Provos would be foolish enough to suggest using such an algorithm; he may have created the next great password hashing algorithm, but he would likely realize that it could not be relied upon until it had been evaluated by his fellow security experts.

So when you need to store user credentials, just please, please hash them with bcrypt, scrypt, or PBKDF2. It's easy. Don't try to be clever.

¹ Even better: don't store user credentials if you don't have to. Consider using an OAuth provider, or your organization's internal identity system. It saves some effort, and saves your users from remembering yet another set of credentials.^↩

² This is essentially Kerckhoff's principle: "The system must not require secrecy and can be stolen by the enemy without causing trouble"^↩

³ Okay, I suppose if you are entering something like the Password Hashing Competition or doing academic research, that's fine. But you still shouldn't be using it in a real application until it has been through thorough public review.^↩

Sunday, March 29, 2015

Simplicity

Lately, I've been thinking a lot about the importance of simplicity in software. I can remember a time in my career when I considered a single system that does everything to be ideal; I dreamed of building monolithic applications that met every possible need my users may have, and I searched for all-encompassing frameworks that eliminated the need for any other dependencies and abstracted away as many challenges as possible. Why import several small libraries into my application when I could adopt a framework that has everything included?

As time has passed and I've written more software, though, I have come to realize what so many programmers figured out before me: software works best when it focus on doing one thing well. And the corollary: software that tries to do too many things will not excel at all of them.

I've seen this philosophy discussed more often recently, particularly in the backlash to monolithic frameworks like JSF and AngularJS; I haven't yet decided if simplicity is gaining popularity, or if I'm just noticing simplicity more now that I better appreciate it. But even if simplicity is gaining traction in software development, it's not a new concept. Dijkstra himself wrote in 1975:

Simplicity is prerequisite for reliability. Edsger Dijkstra, How do we tell truths that might hurt?

Or consider Unix, around in various forms since the 1960s. Utilities in Unix-like systems (like Linux and Mac OS X) have a tendency to do one thing only, but do it very well. There are utilities like:

find - searches a directory structure for files matching an expression
xargs - executes an arbitrary command using the output of a previous command as arguments
egrep - searches for text matching a given regular expression
identify - retrieves image metadata (part of ImageMagick)
cut - extracts segments of text
tar - creates file archives

Individually, these are fairly simple programs. They are often useful individually, but they don't do terribly complex things.

But say you want to search through your file system and gather all of your winter holiday season pictures from any year into a single archive? Well, maybe you can find some big monolothic program that will do it for you; you might have to pay for it, it surely does way more than you need, and yet still probably won't do exactly what you want. Or instead, you can compose several simple programs to accomplish your goal:

find . -iname '*.jp*g' -print0 \
 | xargs -0 -L 1 -I @ identify -format '%[EXIF:DateTime] %d/%f\n' @ \
 | egrep '^[[:digit:]]{4}:12' \
 | cut -d' ' -f3- \
 | tar -cf december.tar -T -

This uses find to list all files ending in jpg/jpeg, then xargs to pass each file to identify, which lists the date each image was taken and its name. Then egrep filters the list down to images taken in December of any year, and cut trims the output to just list the December file names. Finally, tar takes that list of files and puts them in an archive. Again, none of these individual tasks was particularly complex; when these simple programs were composed together, though, we were able to do something rather complex.

As a programmer, it's important to understand the difference between writing complex software and writing software that does complex things. Complex software is difficult to write, difficult to test, and more likely to break. Instead, we should write many pieces of simple, thoroughly tested software, and compose these simple pieces of software to do complex things.

If you're writing utility libraries, make them as tightly focused as possible, instead of writing some do-it-all super utility belt -- leave those to Batman. If you're writing applications, make sure you're writing tight, well-tested modular code; module systems like Browserify can really help on the client side. Keep it simple.

Update: added -print0 to find and -0 to xargs to properly handle spaces in file names. Thanks Graingert!