My name is Joe Basirico, I help people build secure software. Learn more »

What LinkedIn Should Have Done with Your Passwords

6/6/2012 - Posted by joe

LinkedIn Logo

By now, you've probably heard that LinkedIn's passwords have been allegedly compromised. I first heard about this from a Norwegian website earlier today. Here is what we know now:

  • LinkedIn has not confirmed the leak and currently doesn't understand how the hack could have happened, but there is a 271 MB file of alleged SHA-1 hashes floating around with LinkedIn's name on it.
  • The hash digests are unsalted SHA-1 hashes.

Technical mumbo jumbo is to follow, if you're just worried about your password you should simply change your password. Don't use any website that offers to check if your password has been leaked by typing your password into their website. These may be completely legit, but they're probably trying to ride on your fear and steal your password. You should also change your password on any other site that you used that password on.

If you'd like to see the list of hashes for research purposes you can download it here (warning: 171 MB).

This is particularly interesting to for a number of reasons. Right now I'm in Sofia, Bulgaria teaching a room full of developers secure coding best practices and just yesterday we had talked about proper handling of passwords and other sensitive data. We walked through the spectrum of poor password handling practices:

Plaintext <-> Encoded <-> Hashed <-> Encrypted <-> Hashed with Salt <-> bcrypt/PBKDF2

Obviously, worst is on the left to best on the right.

To understand the risk let's walk through a quick rundown of what hashing is and how LinkedIn is storing your passwords.

Hashing is a one-way mathematic function that can take any input and map it to a smaller output (digest). The nice thing about hashing is that if you use the same input you are guaranteed to get the same output or digest. If the input changes, even in the slightest bit, your digest will change drastically. Another nice thing about hashing is that the digest doesn't give you any information about the input, so it is not possible to reverse a hash.

We can use hashing algorithms to make it easier to safer to store passwords. Instead of storing the plaintext password in the database, which everybody agrees is bad, right? We can store the digest of the password in the database. Then when you type in your password I'll calculate the hash of what you typed in and compare it with what I have on file for you. If they match then you're in, if they don't I'll kick you out!

There are multiple ways, or algorithms, to "hash" text, some are better than others, only "cryptographic" hash functions should be used for anything security related, so I'll only talk about those here. Again, we have a spectrum of worst to best (hint, if you're thinking about using a hashing algorithm that isn't on this list... don't):

MD5 <-> SHA-1 <-> SHA-256224 <-> SHA-512/384

The first two are broken and should not be used. Note, LinkedIn used one of the broken ones.

Hopefully as you're reading this you're thinking "Joe! You said earlier that if give only the output of a hash you can't get back to the original text, so if LinkedIn was only storing the hash digest of my password I should be safe, right!?"

Ah true! However, that's assuming I attack the hashes by trying to break the hash algorithm directly. I've got a few quick computers, and a bit of time. I also know a bit about how people choose passwords, and how bad they are at it. So instead of trying to break the hashing algorithm, I'm going to simply get a list of every password I can get my hands on and calculate the output for each one. I'll keep track of the password on the left and the output I generate on the right. Next I'll simply lookup each digest from the list in the massive table I just generated and get the passwords from there.

This type of attack is called a lookup table, a more efficient version that is slightly more difficult to explain is the Rainbow Table attack and it theoretically will work for any password, it can take a lot of work for really long and/or really random passwords because there are so many combinations. This is why I mentioned I'll just calculate the digests for the top passwords I already know about. This way I don't have to do an exhaustive search.

OK, so now maybe you're getting nervous, because your password of LvBieber isn't looking so great right now. Since you used upper and lower case letters there are 52 combinations in each of the 8 locations above. That means there are 528 or 53,459,728,531,456 combinations. "Wow," you think "that's a lot of combinations!" Slow down there, tiger. On my, mostly, regular computer I can calculate about 680 million possibilities per second. I can crack your password (and all the other 52 trillion passwords in about 22 hours (Before lunch, if I get a few friends to help).

What could LinkedIn have done to protect you from your own poor password choice? Well, they could have required a Password Policy, but everybody seems to hate those. They could have also added Salt. No, not that salt, this Salt.

In software we call a chunk of random data that we add to passwords "salt." Since your password is so easily guessable it's likely it already exists in somebody's Rainbow table so the lookup would be really quick and easy. We want to make them work for it. So for each user I generate, say, 10 extra random characters to add to each password. This means I generate some random characters "7%bKeVm!fN" and add that to your password turning it into LvBieber7%bKeVm!fN If I do this for every user the hacker has to generate a rainbow table for each user independently. Since I have to store the salt in plaintext along side your password hash, since I'm the only one that knows it and I have to use it to generate your digest to validate your password. Well that's better than plaintext or just plain hashing, right? I bet we can do better though, right? Of course we can.

There are a few key derivation algorithms, PBKDF2, scrypt, and bcrypt, that add an element of "work" to calculating a digest. The idea is that it's not such a big deal for a website to spend half a second calculating your password digest each time you login, but if an attacker can only calculate two passwords per second that makes any rainbow table attack infeasible. PBKDF2 does this by hashing the output of the previous digest thousands of times, bcrypt and scrypt use some fancy cryptography to do this a bit more elegantly, but the result is about the same. Both bcrypt and PBKDF2 have been well implemented in many languages.

One thing to note. Don't ever try to implement your own cryptographic solutions. Never, ever, ever, ever, should you attempt this. Cryptographers are smarter than you and me and everybody I know put together. They spend all day thinking about these things, and even then it takes a team years to create new secure algorithms that is then peer reviewed and proven to be secure. It does not happen with a clever insight after four cups of coffee and an espresso.

Edit: A few comments and edits that my friends at Reddit have pointed out. I suppose this turned into a bit more of a technical article than I was originally intending, so I may have skipped over a couple of things that I would have otherwise touched on.

  • I should have mentioned explicitly that everything but PBKDF2, bcrypt, and scrypt should be considered bad practice. Encoding is no better than plaintext, at all. Encoding does not use keys and does not provide security.
  • When I say encrypted, I mean that you are trying to encrypt the password such that it can be decrypted later, which is a bad practice. We want a one-way function such that no date can ever be recovered. Keys can be compromised, and there is no benefit to using a reversible algorithm for passwords, it just increases your risk.
    • The reason why I consider encrypting a little bit better than hashing is the massive popularity of rainbow tables and hash databases available on the internet. Poor passwords don't stand a chance against this if the salt isn't used.
  • I didn't (and don't) want to get into the specifics of how PBKDF2, bcrypt and scrypt work, but they all increase the amount of "work" a computer has to do to create the same hash digest. This slows you you when verifying a user's password, but also slows the attacker down from generating their tables which is good! Work can be introduced purely by CPU usage, or in the case of scrypt memory usage as well.
  • All hashing algorithms will fail. Full Stop. Currently, if you are designing a new system we recommend using SHA256 or greater. MD5 is considered completely broken, SHA1 is considered well on its way out. If you have a system that uses either MD5 or SHA1 consider making changes as soon as possible.

Security Innovation, the company I work for, is hiring! If you aren't completely and utterly excited about the job you currently have come work for us. We have some of the best perks in the industry and you'll be having as much fun as you've ever had at work.

To Apply send an e-mail to jobs@securityinnovation.com or try your hand at our challenge website at http://challenge.si.vc if you get stuck on the challenge just send an e-mail to the e-mail address above and we'll give you a hint. The challenge is supposed to be fun, so have fun with it!

Discuss on Reddit