Top Security Interview Question: Best Practices for Storing Passwords in a Database

Explained in 3 mins.

Position: junior sec engineer. fresh grad.

Let’s start the interview.

Me: “You mentioned that you have a strong grasp of data security. Could you provide some examples of specific areas within data security?"

A: "Absolutely. For instance, when we develop a system, there will be a user module involved. When we save passwords in the user table within the database, we must encrypt them prior to storage."

Me: “Are you certain it's encryption and not hashing?”.

A: “Yes.”

Me: "So, where are the keys kept?"

A: “What key?”

Me: “The encryption key you used for the password— is it an asymmetric or symmetric encryption method? Additionally, does each user have a unique key, or is the key shared?”

A: "Hmm. We aren't utilizing all of these. It must be hashing, then."

Me: “Sure thing. Can you help me understand why it's necessary to hash it before storing it instead of just keeping the plain text?”

A: “Yes, because we aim to ensure safety. When we validate the password, it cannot be sent as plaintext from the UI to the server for validation.”

Me: "When registering a user, is it necessary to send both the password and confirmation password in plaintext?"

A: “Yes, but that’s the only situation where we send plaintext.”

Me: "Alright, I understand. So, you believe that storing hash values in the database helps reduce the risk of sending sensitive data as plaintext during API calls?"

A: “Exactly.”

Me: “So in your opinion, would it be logical to store the hash value in the event that the database is breached by someone?”

A: “Oh yes. In that case, hackers could see the original password. Using a hashing value can stop this from happening.”

Me: "Yes, but it's not sufficient. Someone could still use a rainbow table to find the original value by exploiting hash collisions. Are you familiar with the concept of salt?"

A: "Yes, I've heard about that. Using a random string for the value can enhance the security of password storage."

Me: "Great. Are you familiar with how it functions?"

A: “Not really”.

Me: "So, imagine someone managed to access the database through an SQL injection or another method. They might use a Rainbow table to try and crack it. Let's say a hash collision occurs, and they retrieve the plain text, but what they actually get is the original password mixed with a random string. This way, we can effectively prevent the original password from being stolen. Essentially, we've stopped an unauthorized person from getting hold of your 'one for all' password used for logging into your email, social media, and even bank account."

A: “I see”.

Me: “What hashing algorithm is your current system utilizing?”

A: "As we all know, md5 isn't secure, so we've decided to use alternatives like sha3."

Me: "Can you explain to me why MD5 isn't considered secure anymore? Also, why would you choose SHA3 for hashing passwords? From what I understand, SHA3-based KDFs like HMAC-SHA3 aren't specifically meant for password hashing but are used for other things like signing JWT tokens."

A: “md5 is proven as not safe. there are already some news. but what is KDF?”

Me: "KDF stands for key derivation function. Essentially, it means the hashing algorithm you utilize will take a password and produce a hash value. It's true that there are reports saying md5 isn't secure. But do you understand the reasons behind that?"

A: “Hmm not really.”

Me: “Basically, it's because the hash collision resistance isn't strong enough, making it vulnerable to rainbow table attacks.”

A: “What is rainbow table?”

Sure! Here is the rephrased text: Me: “A rainbow table is essentially a pre-made collection of hash value chains that originate from plaintext passwords. These chains are created by repeatedly using a hash function on an initial plaintext password and then converting the resulting hash value back to a plaintext password using reduction functions. Commonly, rainbow tables are employed in password cracking attacks where an attacker tries to match hashed passwords from a compromised database with entries in the rainbow table to identify the corresponding plaintext passwords.”

A: “I see”.

Me: “So, can you explain why you're using sha3 for password generation?”

A: “It’s safer than MD5 I think.”

Me: “Yes, that’s true. However, there are better hashing options for passwords. For instance, Argon2, which won an award in 2015, is a superior choice compared to SHA3. Argon2 requires more iterations and higher RAM, which can lead to increased time complexity and resource consumption for generating a single hash. This makes it significantly harder for attackers to brute-force. On the other hand, SHA3 is typically used for other applications, such as digital signatures, where balancing security and performance is crucial.”

…

We should go through each part of this interview step by step.

It's common knowledge that storing passwords as plaintext in this day and age is a big no-no.

However, it's not just about minimizing the risk of transmitting the password in plaintext during certain API calls.
The primary worry arises when databases are accessed by unauthorized individuals.
Following this, the database undergoes a process known as "Credential-Stuffing." This involves repeatedly using the same login details on various accounts that share the same username, which could include accounts for email, bank cards, social media, school, and even government services.

Why not use encryption but hashing?

The crucial distinction is reversibility. Encryption can be undone, allowing the original plaintext password to be retrieved with the correct key. This introduces a security vulnerability, even when encryption keys are well-protected.
Furthermore, handling encryption keys, like constructing a key chain, can lead to considerable expenses and intricate challenges.

Salt introduces an element of randomness to every hashed password, thereby increasing the computational difficulty for attackers attempting to break passwords with precomputed resources like rainbow tables.
Without the use of salt, attackers could easily guess passwords by matching hash values to an extensive collection of precomputed hashes.
Furthermore, a salt should be distinct for each password, not merely per user. This ensures that if two users happen to choose the same password, their hashed values will still differ because of the individual salts. This practice helps thwart attackers from easily spotting duplicate passwords.

Why opt for MD5 when SHA3 offers better security? Beyond just password protection, hashing finds its application in various scenarios.

Data integrity (e.g., SHA3): Hashing algorithms such as SHA3 are frequently employed to create unique fingerprints for files, helping to preserve their integrity. These algorithms must balance speed and resistance to hash collisions. Although SHA3 provides strong protection against collisions, it also delivers satisfactory performance for tasks related to file fingerprinting.
Checksum (e.g., CRC32): When speed is crucial, especially for ensuring data integrity during network transfers, algorithms like CRC32 are highly effective. CRC32 is commonly employed to rapidly detect any modifications to a file during transmission, focusing on efficiency rather than collision resistance.
Key generation (e.g., Argon2, bcrypt, PBKDF2): When it comes to securely storing passwords or creating cryptographic keys, specialized key derivation functions (KDFs) such as Argon2, bcrypt, and PBKDF2 are highly recommended. These algorithms are designed to slow down the hashing process deliberately, incorporating additional computational complexity to combat brute-force attacks on hash collisions.

Although SHA3 works well for applications such as HTTPS connections and generating JWT token signatures, it doesn't have the requisite slowness needed for the enhanced security involved in password hashing.

KDFs are essential for password hashing because they deliberately slow down the hashing process. This intentional delay is accomplished by performing many iterations of hash value computations. Although the technical specifics of these iterations can be intricate, they generally involve repetitive tasks such as shifting, inverting, and XOR-ing hash values, repeated over thousands of rounds.

The main objectives of implementing KDFs in passwords are:

By adding computational overhead, KDFs slow down the process and make it much more difficult for attackers to brute-force passwords. The extra time and resources needed to compute hash values serve as a strong deterrent against quick dictionary or rainbow table attacks.
Resource consumption: KDFs also demand a considerable amount of CPU and memory resources while hashing. This need for extensive resources adds another layer of difficulty for attackers, who must dedicate significant computational power and time to break hashed passwords.

A quick explanation.

Dictionary table. Before the advent of the rainbow table, attackers would precompute and list all possible hash values in what is called a “dictionary table.” Once they gain access to the target database, they perform hash collisions to uncover the corresponding original password values. This process illustrates why incorporating salt is crucial.
However, a dictionary table might rapidly expand and accumulate a massive number of rows, rendering it impractical to use.
The rainbow table can be thought of as a more efficient form of a dictionary table.
In a rainbow table, there are two main functions. The first one, represented as H, is used to derive a hash value from a plaintext input. The second function, denoted as R, is responsible for retrieving a plaintext value from a given hash value.
Holds several rows, with each one acting as a "hashing chain". Each row begins with plain text and then repeatedly undergoes H and R.
Performing a hash collision on a specific database table involves comparing hash values until a match is found. Once the hash values align, proceed with the R method to obtain the plaintext.

Why shouldn't MD5 (or SHA1) be used for hashing password values? If MD5 can still be applied, what are the other scenarios where it's appropriate? It’s crucial to understand that MD5 and SHA1 are no longer suitable for hashing passwords. These algorithms are vulnerable to attacks, making them insecure for protecting sensitive information like passwords. However, MD5 can still be used in less security-critical applications, such as checksums and data integrity verifications.

The simple explanation: Weak hash collision resistance.
In 2004, research illustrated that MD5 has weak hash collision resistance.
The same occurred with sha1 in 2005 and was revealed by Google in 2017.
What if we consider using md5 for tasks like fingerprints or checksums? Unfortunately, that's not advisable. It's been shown that two distinct files can produce the same md5 hash. For checksums, crc32 is a better choice due to its significantly faster performance.
MD5 should never be used because there are no valid use cases for it.