
By now, you've probably heard that LinkedIn's passwords have
been allegedly compromised. I first heard about this from a
Norwegian website earlier today. Here is what we know now:
- LinkedIn has not confirmed the leak and currently doesn't
understand how the hack could have happened, but there is a 271 MB
file of alleged SHA-1 hashes floating around with LinkedIn's name
on it.
- The hash digests are unsalted SHA-1 hashes.
Technical mumbo jumbo is to follow, if you're just worried about
your password you should simply change your password. Don't use any
website that offers to check if your password has been leaked by
typing your password into their website. These may be completely
legit, but they're probably trying to ride on your fear and steal
your password. You should also change your password on any other
site that you used that password on.
If you'd like to see the list of hashes for research purposes
you can download it
here (warning: 171 MB).
This is particularly interesting to for a number of reasons.
Right now I'm in Sofia, Bulgaria teaching a room full of developers
secure coding best practices and just yesterday we had talked about
proper handling of passwords and other sensitive data. We walked
through the spectrum of poor password handling practices:

Obviously, worst is on the left to best on the right.
To understand the risk let's walk through a quick rundown of
what hashing is and how LinkedIn is storing your passwords.
Hashing
is a one-way mathematic function that can take any input and map it
to a smaller output (digest). The nice thing about hashing is that
if you use the same input you are guaranteed to get the same output
or digest. If the input changes, even in the slightest bit, your
digest will change drastically. Another nice thing about hashing is
that the digest doesn't give you any information about the input,
so it is not possible to reverse a hash.
We can use hashing algorithms to make it easier to safer to
store passwords. Instead of storing the plaintext password in the
database, which everybody agrees is bad, right? We can
store the digest of the password in the database. Then when you
type in your password I'll calculate the hash of what you typed in
and compare it with what I have on file for you. If they match then
you're in, if they don't I'll kick you out!
There are multiple ways, or algorithms, to "hash" text, some are
better than others, only "cryptographic" hash functions should be
used for anything security related, so I'll only talk about those
here. Again, we have a spectrum of worst to best (hint, if you're
thinking about using a hashing algorithm that isn't on this list...
don't):
MD5 <-> SHA-1 <-> SHA-256224 <->
SHA-512/384
The first two are broken and should not be used. Note, LinkedIn
used one of the broken ones.
Hopefully as you're reading this you're thinking "Joe! You said
earlier that if give only the output of a hash you can't get back
to the original text, so if LinkedIn was only storing the hash
digest of my password I should be safe, right!?"
Ah true! However, that's assuming I attack the hashes by trying
to break the hash algorithm directly. I've got a few quick
computers, and a bit of time. I also know a bit about how people
choose passwords, and how bad they are at it. So instead of trying
to break the hashing algorithm, I'm going to simply get a list of
every password I can get my hands on and calculate the output for
each one. I'll keep track of the password on the left and the
output I generate on the right. Next I'll simply lookup each digest
from the list in the massive table I just generated and get the
passwords from there.
This type of attack is called a lookup table, a more efficient
version that is slightly more difficult to explain is the Rainbow Table
attack and it theoretically will work for any password, it can take
a lot of work for really long and/or really random passwords
because there are so many combinations. This is why I mentioned
I'll just calculate the digests for the top passwords I already
know about. This way I don't have to do an exhaustive search.
OK, so now maybe you're getting nervous, because your password
of LvBieber isn't looking so great right now.
Since you used upper and lower case letters there are 52
combinations in each of the 8 locations above. That means there are
528 or 53,459,728,531,456 combinations. "Wow," you think
"that's a lot of combinations!" Slow down there, tiger. On my,
mostly, regular computer I can calculate about 680 million
possibilities per second. I can crack your password (and all the
other 52 trillion passwords in about 22 hours (Before lunch, if I
get a few friends to help).
What could LinkedIn have done to protect you from your own poor
password choice? Well, they could have required a Password Policy,
but everybody seems to hate those. They could have also added Salt. No, not
that salt, this Salt.
In software we call a chunk of random data that we add to
passwords "salt." Since your password is so easily guessable it's
likely it already exists in somebody's Rainbow table so the lookup
would be really quick and easy. We want to make them work for it.
So for each user I generate, say, 10 extra random characters to add
to each password. This means I generate some random characters
"7%bKeVm!fN" and add that to your password turning it into
LvBieber7%bKeVm!fN If I do this for every user the
hacker has to generate a rainbow table for each user independently.
Since I have to store the salt in plaintext along side your
password hash, since I'm the only one that knows it and I have to
use it to generate your digest to validate your password. Well
that's better than plaintext or just plain hashing, right? I bet we
can do better though, right? Of course we can.
There are a few key
derivation algorithms, PBKDF2, scrypt, and bcrypt, that add an
element of "work" to calculating a digest. The idea is that it's
not such a big deal for a website to spend half a second
calculating your password digest each time you login, but if an
attacker can only calculate two passwords per second that makes any
rainbow table attack infeasible. PBKDF2 does this by hashing the
output of the previous digest thousands of times, bcrypt and scrypt
use some fancy cryptography to do this a bit more elegantly, but
the result is about the same. Both bcrypt and PBKDF2 have been well
implemented in many languages.
One thing to note. Don't ever try to
implement your own cryptographic solutions. Never, ever, ever,
ever, should you attempt this. Cryptographers are smarter than you
and me and everybody I know put together. They spend all day
thinking about these things, and even then it takes a team years to
create new secure algorithms that is then peer reviewed and proven
to be secure. It does not happen with a clever insight after four
cups of coffee and an espresso.
Edit: A few comments and edits that my friends
at Reddit have pointed out. I suppose this turned into a bit more
of a technical article than I was originally intending, so I may
have skipped over a couple of things that I would have otherwise
touched on.
- I should have mentioned explicitly that everything but PBKDF2,
bcrypt, and scrypt should be considered bad practice. Encoding is
no better than plaintext, at all. Encoding does not use keys and
does not provide security.
- When I say encrypted, I mean that you are trying to encrypt the
password such that it can be decrypted later, which is a bad
practice. We want a one-way function such that no date can ever be
recovered. Keys can be compromised, and there is no benefit to
using a reversible algorithm for passwords, it just increases your
risk.
- The reason why I consider encrypting a little bit
better than hashing is the massive popularity of rainbow tables and
hash databases available on the
internet. Poor passwords don't stand a chance against this if the
salt isn't used.
- I didn't (and don't) want to get into the specifics of how
PBKDF2, bcrypt and scrypt work, but they all increase the amount of
"work" a computer has to do to create the same hash digest. This
slows you you when verifying a user's password, but also slows the
attacker down from generating their tables which is good! Work can
be introduced purely by CPU usage, or in the case of scrypt memory
usage as well.
- All hashing algorithms will fail. Full Stop. Currently, if you
are designing a new system we recommend using SHA256 or greater. MD5 is
considered completely broken, SHA1
is considered well on its way out. If you have a system that
uses either MD5 or SHA1 consider making changes as soon as
possible.
Security Innovation, the company I work for, is hiring! If you
aren't completely and utterly excited about the job you currently
have come work for us. We have some of the best perks in the
industry and you'll be having as much fun as you've ever had at
work.
To Apply send an e-mail to jobs@securityinnovation.com
or try your hand at our challenge website at http://challenge.si.vc if you get
stuck on the challenge just send an e-mail to the e-mail address
above and we'll give you a hint. The challenge is supposed to be
fun, so have fun with it!
Discuss on Reddit