Fabulous Adventures In Coding
Eric Lippert is a principal developer on the C# compiler team. Learn more about Eric.
Last time we were considering what happens if an attacker gets access to your server's password file. If the passwords themselves are stored in the file, then the attacker's work is done. If they're hashed and then stored, and the hash algorithm is strong, then there's not much to do other than to hash every string and look through the password file for that hash. If there's a match, then you've discovered the user's password.
You don't have to look through the vast space of strings in alphabetical order of course. An attacker will start with a dictionary of likely password strings. We want to find some way to make that attacker work harder. Setting a policy which disallows common dictionary words as passwords would be a good idea. Another technique is to spice up the hashes a bit with some salt.
For every user name we generate a random unique string of some fixed length. That string is called the “salt”. We now store the username, the salt and the hash of the string formed by concatenating the user’s password to the salt. If user Alpha's password is "bigsecret" and the salt is "Q3vd" then we'll hash "Q3vdbigsecret".
Since every user has their own unique random salt, two users who happen to have the same password get different salted hashes. And the dictionary attack is foiled; the attacker cannot compute the hashes of every word in a dictionary once and then check every hash in the table for matches anymore. Rather, the attacker is going to have to re-hash the entire dictionary anew for every salt. A determined attacker who has compromised the server will have to mount an entire new dictionary attack against every user’s salted hash, rather than being able to quickly scan the list for known hashes.
Salting essentially makes it less feasible to attack every user at once when the password file is compromised; the attacker must start a whole new attack for each user. Still, given enough time and weak passwords, an attacker can recover passwords.
In this system the client sends the username and password to the server, the server appends the password to the salt, hashes the result, and compares the result to the salted hash in the table.
This answers the original question posed by the JOS poster; the salt can be public because it is just a random string. Ideally, both the salt and the salted hash would be kept private so that an attacker would not be able to mount a dictionary attack against that salt. But there is no way to deduce any information whatsoever just from the salt.
And of course, it's better to not get into this situation in the first place -- don't allow your password list to be stolen! But it's a good idea for a security system to not rely on other security systems for its own security. We call this idea "defense in depth". You want to make the attacker have to do many impossible things to compromise your security, so that if just one of those impossible things turns out to be possible after all, you're not sunk.
But what about the fact that the password goes over the wire in the clear, where anyone can eavesdrop? That's now the weak point of this system. Can we do something about that? Tune in next time and we'll see what we can come up with.
Some real balls being talked about MD5
Salting only strengthens against rainbow table attacks - not brute force attacks. With a brute force attack a binary salt of say 32 bits increases the search space by 32 bits... true... but thats not going to stop a hacker.
Now heres where I open a whole can of worms.
Conventional wisdom says that rainbow tables are the threat. Thats balls. A REAL hacker will brute force because its FASTER than a rainbow table lookup. Yes! I said it's faster than a rainbow table ... And yes, right now you're probably all calling me an idiot. Thats fine - I'm used to that.
The difference here is that you're thinking in a Von Neumann architecture. Offload the task to an FPGA and the situation flips upside down. Each potential hashed password (with or without SALT) requires only one run through the functional 64 blocks of the MD5 algo. Due to the way the algo is constructed we get most of the processing for free (ROL's are free, and the ABCD rearranging is free - since in an FPGA we do this with ONLY the signal path, no gate delay)
Not only this, by pipelining, we can have 64 passwords in various stages of processing at the SAME TIME in a single MD5 pipeline. And we can fit perhaps two or four pipelines per FPGA device. The result is that the device is so much faster at cracking unsalted passwords than a rainbow table lookup... AND it fits in a matchbox.
Now, since we are doing the entire block in one 64th of a stage, the SALT adds no additional overhead to round processing speed. We can search a few trillion hashes per second with or without SALT... so, instead, we're talking about a longer search and not a slower search... and since FPGA's are cheap and the whole process is scaleable we can build a device based on a 4x4 array of FPGA's and farm out the task across each. Indeed, there is no limit to how fast we can search the space.
Then you can stack those 4x4 blades and add to them as funds permit. For not a great deal of capital you can crack 32bit salted hashes without any problem - in a matter of days. And the interface ? Well, we cheated ... we seed the entire thing through a four wire JTAG to set up the ranges for each device on the boundary cells before starting her up... takes about 60 seconds to start up a 64x FPGA device ....
After starting it up you get a latency of around 1500 ns before the pipelines fill up and the first hashes are spat out... then one every 20ns for each pipeline. (Thats MD5 128 pipelines handling over 8000 MD5's simultaneously and comparing 128 hashes every 20 nanoseconds). Seriously, salted hashes are no problem at all.
For the helpless, 128 hashes per 20ns is around 6.4 bn per second. Thats a cool 23,040,000,000,000 per hour. And once you factor a useable alpha/num/symbol set into it... well, it stretches much further than you'd think.
The reason for this scary performance is that the MD5 algorithm fits a logic based design really well with no iterative steps. Also, there are no multiplication or division (Which are costly in logic) and the adds, well, towards the end of the block most of them are redundant and reduce to single hardwired constant adds. Essentially theres a whole lot of XORing and fixed ROTATES ... and of course, a fixed rotate is a wiring problem and not a logic problem, so those carry no gate delay. we're left then with XORs and ADDs.
With the ADDS using a carry lookahead the stage timing is real fast, and of course XOR's are very cheap in terms of added gate delay.
The people who say MD5 is reasonably safe if salted tend to be looking at the world through Von Neumanns glasses... and lets face it, Von Neumann engines, even the most modern multi-cored processors, are slow and a very inefficient use of transistors. Yes, they may play half-life well... they may produce beautifully rendered landscapes in Bryce... but when it comes to raw crunching you can pay less money and crank out a a few million hashes in the time it takes a PC to stagger through round one (of sixty-four) of the first hash.
Now, the next argument I'm bound to hit is 'But who has time and skills to use FPGAs' ... easy. Hackers. Not the little script kiddies hanging out waiting for published exploits and coveting other peoples PoC code. The REAL hackers. The ones that ain't scared off by apparent complexity but are attracted by it.
And as for governments and large corporations. Well, they can move that same FPGA design over to a much faster ASIC process... crank out a hundred thousand of the buggers, and crack SALTED MD5 in realtime faster and cheaper than a cluster of fully popped Crays can even dream of.
But the great thing about my FPGA cluster is that it can crack RC4 one day, WEP the next and MD5 the day after... and then I can close the case and take it with me. Try that with a Cray!
So sorry, but all your hash are belong to us!
Anon (Posted via TOR, So, IP me!)
A recent question I got about the .NET CLR's hashing algorithm for strings is apropos of our discussion