A cryptographic hash is a fingerprint of a document or other stream of data. A cryptographic hash uniquely identifies that stream of characters within a certain realm of possibility. Typical uses are cryptographic hashes of files to ensure a successful download without corruption, and the storage of cryptographic hashes of a password in place of the password itself. The algorithms discussed here will be MD5, SHA and bcrypt.
Do not confuse cryptographic hashes with Ruby hashes, which are used to store key/value pairs. Ruby hashes are named for the non-cryptographic hashes they do to organize the objects stored in them in "buckets," and have nothing to do with cryptographic hashes.
By far the most common place you'll see hashes used is in storing and authenticating passwords. Storing cleartext passwords is bad, any intruder that steals your database will have access to every one's accounts. This problem is compounded by the fact that many people use the same password on multiple accounts. So, instead of storing the actual password, authentication databases should store only the cryptographic hash of the password. When a user presents a password for authentication, a new hash is generated from the submitted password and the two hashes are compared.
In the past, MD5 has been the standard, but it should be avoided today. There are theoretical and practical weakneses in the algorithm, and brute force and rainbow table attacks can break MD5 passwords quickly. It is, however, available to use.
More recently SHA-1 and SHA-2 are the standards. It's a stronger and more modern algorithm and produces a longer hash. And, typically against the brute force attackers (who simply try every possibly combination of letters, numbers and serials until they find the password), takes slightly longer to calculate. However, even this is considered weak by today's standards. SHA-3 is currently in the words, but not yet available.
The strongest and safest way to store passwords using Ruby would probably be using the ruby-bcrypt gem. It uses the bcrypt algorithm, which in turn uses the Blowfish encryption algorithm to generate hashes. Not only is it cryptographically sound (MD5 and SHA both have known weaknesses), but it also integrates a method that thwarts brute force attacks (a constant problem for all hashing algorithms). One of the only other algorithms out there to feature this is the yet to be finished SHA-3. So for the most modern and secure feature set, bcrypt is recommended but it's far from an industry-wide standard.
The Types of Attacks You'll Face
Imagine you run a web application with hundreds of users. Users supply a username, email address and password. Someone has broken into your web application and stolen the database with all of this information. But you were prepared and passwords are stored using cryptographic hashes. What can the attacker do to obtain the cleartext passwords?
- Dictionary Attacks - Historically, these have been the most common. Simply go through a list of dictionary words, compute hashes for each one (using the same algorithm as your password hashes, of course) and compare them. Attackers will build dictionaries of English words (or other languages, depending on the locality), names, places, etc combined with numbers, spelling variations and capitalization. Running a dictionary attack is often effective and takes just seconds.
- Rainbow Tables - Once affordable harddrives reached sizes of 100GB or so, rainbow tables became a popular option for attackers. Their operation is simple, compute the password hash for every single combination of letters, numbers and symbols and store it all on the hard drive. When you get a hash, look up that hash and see what plaintext you used to store it. Attackers are able to crack entire password databases in just a few minutes with a rainbow table. However, salts (discussed below) largely defeat rainbow tables.
- Brute Force - Often the weapon of last resort for an attacker, readily available and cheap multi-core and multi-CPU systems as well as very fast graphics cards able to compute many different hashes in parallel have made brute force attacks feasible. They don't need to pre-compute millions of hashes for a rainbow table, their computers are fast enough that they can simply try every password one could possibly type to find a match. This is currently the largest threat to password security you will face, and it only gets worse as CPUs and particularly graphics cards (which are massively parallel computers, immensely faster at tasks such as this) get faster.
There are two primary countermeasures you can use to prevent your password hashes from being cracked: the salt and rounds. Both can be used with any algorithm, and implemented with just a few lines of code.
A salt is an extra string of randomly generated characters added to the password before it is hashed. It is stored in cleartext along with the password hash. When the hash is first created, the password and the salt are hashed. So if Alice registers on your site and submits the password pass123 (an extremely weak password, by the way) your site would first generate a salt. For example 9jauq7295, any string or random characters will do (as long as each user has a different salt). It would then generate a hash of the string pass123:9jauq7295 (the colon is only there by convention) and store it in the database along with the salt it generated. Next time Alice tries to log in she submits her password, the salt is retrieved from the database, a hash is generated and compared to the hash stored in the database. If they are the same, Alice is authenticated and logs in successfully.
Salts exist to make rainbow table attacks impossible. While it may be possible to store the hashes of every combination of passwords up to 8 characters, no attacker can store the hashes of every password up to 32 characters. It would probably take more storage than is available on the planet. Using long and randomly generated salts for each user shuts rainbow tables down. It has the added benefit of masking users that use the same password. So if Bob also signs up for the site and uses the password pass123, since he'll have a different randomly generated salt his password hash will be different than Alice's, even though their password is the same.
Another countermeasure that can be used is the round. This just means to calculate the hash more than once. Calculate the first hash using the salt as you would normally do, but then calculate the hash of that hash, and then the hash of that hash and so on. While MD5, SHA-1 and the SHA-2 algorithms don't support this natively, you can implement it by calling the hashing method multiple times. The bcrypt algorithm supports multiple rounds natively, as will SHA-3 when it's released. The objective of using multiple hashing rounds is to slow brute forcing down enough that it becomes impractical.
And of course enforcing a strong password policy with your users is usually a good idea. Don't allow them to create passwords like "pass123" or "qwerty." Have a minimum length, require them to use lowercase, uppercase, numbers and possibly symbols. Though password policies are often seen as a nuisance and may not be a good idea for casual web applications. How strong of a policy to enforce is up to you.