Modern online services (websites and platforms) do not allow multiple unrestricted login attempts as that has been identified as a vulnerability: only a small number of wrong passwords (typically 3 to 5) can be tried before the account is locked down for some time, making direct brute force attacks unfeasible. This, however, does not protect a platform against being compromised. When hackers break into a system and successfully extract the password database, entire blocks of credentials (username, password, and possibly other sensitive information) are exfiltrated: that is a data breach.
Hashing and rainbows
Data breaches are frequent, and often have catastrophic proportions. To protect users' credentials from a breach it is standard practice not to store passwords in cleartext, and even better not to store passwords at all. Instead, when a username/password pair is created or updated in a system, the sensitive data is passed through a hash - a one-way function where data is mapped to a fixed-length value. Under the assumption that the chosen hash function is robust against collision (that is, its output is unique for each unique input) the platform database will store just the hash, which does not contain any information about the original password except matching it when the password is passed through the hash function.
Once an attacker has a copy of the password database, it can attempt to crack it by attacking it with a dictionary of known passwords, and a series of functions that make every possible attempt at password permutation from random strings, which is commonly known as brute-force attack; however, as passwords are hashed, both attacks require the real-time computation of the hash. Since a good password hash function is slow (which is a very desirable property as the cost of a slow function is negligible during normal authentication operations) this would take too long.
To circumvent this problem, the attacker may then rely on rainbow tables: these are very large, pre-computed tables for caching the output of cryptographic hash functions. Rainbow tables are fast because they are removing the need for the attacker to execute any hash calculation; the trade-off is that these tables have a huge size (up to tens of Gigabytes), but that has become less of a problem as storage space has been commoditized and larger media availability is now mainstream.
With the usage of rainbow tables, an attacker can quickly perform a reverse lookup from the hashed values and obtain the corresponding password. This is possible because the nature of the transfer function guarantees that, for the same function, a password will always result in the same hash.
Salt and pepper
Rainbow table attacks can be thwarted by the use of a salt: a fixed-length cryptographically-strong random value that is used as an additional input to the hash functions, concatenated to the beginning or the end of a password. As each password has its salt, this causes the function to create unique hashes for every input, even if the input was not unique; in other words, thanks to the salt, each password hash will be different even if the passwords that generated it were the same.
This technique prevents a rainbow table attack as the hash dictionary would have to be recomputed for each password, making precomputation infeasible as long as each password has a unique salt.
Salts are stored in cleartext along with hashes and username in a server database. At login time the system will look up the username, append the salt to the provided password, calculate the hash and authenticate the user if the result is matching the stored hash.
According to the OWASP guidelines, the robustness of this solution depends on two factors:
salts must be unique per each stored credential
Using the same salt for all passwords would add no security at all, as two identical passwords with the same salt will still result in the same hash, effectively negating the purpose of salting. Similarly, user-based salts would be insecure, because if the user was recycling the same password after performing a password update, the resulting hash would be the same.
salts must be cryptographically-strong 32-byte or 64-byte random data
This is necessary as a longer salt is effectively increasing the computational complexity of attacking passwords, while at the same time increasing the space required to store rainbow tables.
It is interesting to observe that this schema does not rely on hiding, encrypting, or obfuscating the salt, which is stored in cleartext. That is because the purpose of salting is to prevent an attacker from cracking the passwords in general and make attempts such as rainbow tables ineffective. Salts will be exfiltrated as well in case of data breach, and knowledge of salts does not weaken the robustness of hashes.
For this reason, an additional layer of protection is usually implemented, called pepper.
The pepper is similar to the salt but it has two key differences:
- the pepper is shared between all stored passwords, rather than being unique like a salt
- the pepper is not stored in the database, unlike the salts.
While salts have no requirement for secrecy and are commonly stored alongside hashes, the pepper will be stored separately to keep it secure in case of a database breach.
According to the OWASP guidelines, the pepper should be randomly generated and at least 32 characters long in size; due to its sensitive nature, it should be stored in a configuration file with restricted permissions managed by the Secure Storage APIs provided by the operating system, or even better stored in a Hardware Security Module (HSM).
Two methods are commonly adopted to implementing the pepper. In the simplest case, it would be used similarly to the salt by concatenating it to the password before hashing. A more secure option is to hash the passwords as usual and then encrypt the hashes with a symmetrical encryption key before storing them in the database, with the key acting as the pepper; this second method allows rotation of the pepper if it was compromised.
As noted by Wikipedia, "by including pepper in the hash, one can have the advantages of both methods: uncrackable passwords so long as the pepper remains unknown to the attacker, and even if the pepper is breached, an attacker still has to brute force the hashes".
Password exposure is caused by bad practice
While the methods described above provide a reasonable level of security for storing passwords and other data, unfortunately not all of them are always implemented correctly (and oftentimes some of them are not implemented at all). A glance at the list of most notable website breaches available on the website HaveIBeenPwned highlights how the largest amount of damage has been suffered by sites that did not follow the guidelines and best practices for data security, despite them being known and consolidated for more than two decades: it is quite safe to say that, if passwords and sensitive user data is exposed, that depends only on negligent implementation and lack of care by the companies that we trust our data with.