Code Download: Hash Winforms project (10 kb)
The web attacks against sites like Gawker, the Sony Playstation Network, various Sony sites, and even the FBI-affiliated Infragard made some big headlines. Just this month, Anonymous has released passwords of BART officers online.
My frustration with these attacks was how quickly the attackers were able to post credentials online of the users. This should have taken much longer and quite possibly been impossible to accomplish, even with stolen data.
Your users, whether they should be or not, might be using the same user name and password on several key sites like Facebook, Twitter, their bank (checking or credit card), or a government site. Maybe your state’s DMV is creating user accounts now and someone could request a new ID card using the name and password reused on a simple blog.
Defense In Depth
This post and the follow-ups do not claim to be the end-all last word on web site security. It is merely the beginning. I am going to take a small piece of this larger world and explore it with you. Do not think this is the end of the story. You should be defending your servers, your databases, and your code using all means appropriate for your needs.
Today’s Example and Assumptions
So for purposes of today’s story, we are going to assume the database with all of your user’s login credentials (username and password) has been copied and stolen. I’m going to use this as an assumption because, just in the last year, we’re seeing passwords recovered again and again so the raw data is being stolen.
The thief is now staring at your database in, let’s say, SQL Server Management Studio. Do you want the thief to see this?
Even if the data were encrypted, what if your encryption key were stolen in the attack? And even if it wasn’t, how long before it can be cracked, giving the thief access to all of the plain passwords in one swoop? My suspicion is that, in all of the data posted recently where passwords were posted on the Internet quickly, the passwords were sitting there in plaintext. I could be wrong about this, as a clever attacker facing a weak encryption (or even weak usage of strong encryption) might recover the key in a day or a week, making the whole database of credentials plain to read and post for all to see.
Don’t Store The User’s Password
At the very least, we can make that credential database so cumbersome as to be nearly worthless. We don’t need to store the user’s actual password and we don’t even need to store that password with reversible encryption. When you think about it, you don’t actually care what a user’s password is. All you care is that you can prove the user is likely who they say they are.
We can do this by storing hashes of the user’s password. This is the result of running a password through a one-way cryptographic hash function and storing what comes out the other side. Technically, the result of using a one-way hash function is called a “digest”, but many people call them “hashes” or “hash values”.
Let me add a disclaimer before I go on: You must do more than just hash the password and store the result. But I want to explain the technique first so you understand why it works. You cannot stop reading here and go implement code so beware there is more to think about besides just hashing the user’s password.
Let’s look at some code to see what I mean. If you haven’t downloaded the Winforms Hash project linked to at the top of the article, do so now. There is not much code at all here, but it is the key demonstration of how you can calculate a hash value from a password, or any plain text using built in .NET classes and methods.
When you run the sample, you should see the dialog below. You can enter a password and just click Generate Hash to see a result of running it through either the SHA1, SHA256, or SHA512 functions. Leave the Salt field blank for now.
The research done to develop cryptographic hash functions is meant to create hash values that have lots of entropy and few meaningful collisions. “Lots of entropy” means it is highly unlikely (or even impossible) for a hash value to be “easy to guess” like “AAAAAAAAAAAAAAAA”. “Few Meaningful Collisions” means that it should be extremely hard or impossibly time consuming to figure out some alternate plaintext that produces the same hash value.
But notice, first and foremost, I now have a target hash value I can use to verify your password. I no longer care about your password!
Still Not Good Enough
So I could change my credential database to look like this:
This is a positive step, because we no longer have the actual password stored. Without any other tools, a thief would have to start a brute force attack. There might be a piece of software that starts guessing all password combinations until one of them produces a hash in this table. Finally, your users might help you out if they use complex passwords, which increases the time it takes to search for a match. It could take days or months to recover the first password, and if you’ve detected the attack, you have time to alert your users before any of their passwords leak out to the Internet… and the nightly news.
However, a clever thief knows there are only so many solid hash functions in the world, and they might decide to precompute a database table with lots of possible hash values computed from plaintext. Then, when your database is taken, they might just scan against that table.
We call such a table a Rainbow Table.
Now, creating your own rainbow table, with all possible password combinations for an algorithm like SHA1, could take months or years. If you wanted to cover all of the possible “keyspace” for 8 or 12 or 16 character passwords, it could take more computing power than a thief has access to. That should make you feel a little better… that is, until you consider the Internet.
This is just one of many places that you could download complete rainbow tables that cover a lot of possible passwords your user might choose. What should scare you is that with a rainbow table, the amount of time to recover a password has been reduced from a long running brute force attack to a single clustered index table scan.
So it isn’t enough to just hash the password and store that. We need to make these rainbow tables as useless as possible.
A Dose of Salt
What you can do is add a constant, random value to the password before hashing it. Then, you save that hash value and the salt. When the user logs back in, you retrieve the salt, add it to the password, and verify you still get the same result.
This is a mitigation against Rainbow Tables because the rainbow table was precomputed with hashes created without any additional data (the salt). The attacker would be forced back into a brute forcing technique, even if they were looking at the database below.
In this example, the salt is stored alongside the password hash, and there’s no question that an attacker can start the brute force process of testing every possible password and adding the salt. However, that might takes days or months. And when that process is finished, the attacker will have one credential. The process doesn’t scale because each user has a different salt.
Just make sure you use what we call a “cryptographically strong random numbers”. That means don’t use the System.Random namespace for this. .NET provides us the RNGCryptoServiceProvider in the System.Security.Cryptography namespace to help.
Test for yourself!
You can use the Winforms project posted at the top of this entry to test out the idea of hashing a password with some strong random salt. It prevents attackers who have stolen your database from posting your users credentials immediately and from using rainbow tables to figure out the passwords quickly.
If you are implementing your own password storage and verification, take a look at the code in the Winforms Hash project and learn more about using salted hashes as an alternative to storing passwords that you can decrypt. You don’t actually need to know your users’ passwords. You just need to know they are correct when they login!