At Ubiq Security we focus on data security and making it easier for developers to incorporate encryption into their applications. As part of our work, we spend time on Slack, Stack Overflow, Reddit, etc. and we see several common mistakes that can cause security vulnerabilities that are easy to resolve. While we don’t think any developer wants to make an insecure product, it is easy to understand how developers not experienced in data security might not realize the impact of grabbing some sample code from the Internet and incorporating it into their application. I often say that writing programs incorporating encryption or data security is not like other software development. Just because an application runs, doesn’t mean you are done or that your application is secure.
Common Mistake #1 - Inadvertently reducing the range of a hashed value
I have lost count of how many times I have seen someone use sha256 thinking they are creating a 256-bit value stored in 32 bytes when they are actually creating a 128-bit value stored in 32 bytes.
Look at code snippet below.
KEY = sha256(<some-value>).hex.substr(0,32)
Since the snippet begins with sha256, the developer’s intent appears to be to create a 256-bit key stored in the 32 bytes. However, this code snippet is literally taking a 256-bit value and reducing it to only 128 bits.
You may have already identified the problem, but let’s walk through it.
- sha256 produces a 32 byte value with each byte having 256 unique values, meaning 25632 possible (which is the same as 2256).
- The next step in the code is encoding the results of the hash into hexadecimal which returns a string of 64 characters, with each character having 16 unique values: a hex character [0-9a-f]. This means the string contains 1664 possible values (which is the same as 25632) so we haven’t lost any precision yet.
- The issue is the substr(0,32) which takes only the first 32 hex characters in the string, meaning that KEY only has 1632 possible values (which is the same as 2128).
Potential Impact: Simply put, the impact is probably catastrophic. Instead of deriving a 256-bit value, the result is only 128 bits. The following article explains the difference between 128 and 256 bit encryption keys but for simple reference, the difference between 128- and 256-bit values is roughly a factor of the number 3 followed by 38 zeros.
Solution: Besides more thorough and good code reviews? Don’t think of keys or hashes as a string of characters. Think of them for what they are, an array of bytes. If you need to debug, print, display, or transmit the value, only convert to the necessary form at that time.
Using python as an example:
key = hashlib.sha256(b"<somevalue>").hexdigest()[0:32] # debug code print(key)
key = hashlib.sha256(b"<somevalue>").digest() # debug code print(key.hex())
Common Mistake #2 - Using a Hash rather than HMAC
This one is a little more subtle, especially to those who are new to using cryptographic algorithms. Look at the code snippet below.
signature = sha256(Key + Message)
The forum topics and related code indicate an intent to create a signature as a way to verify message authentication and integrity. Unfortunately, most of the common hashing algorithms such as SHA256 are vulnerable to a length extension attack which, simply stated, means:
Hash(Key + Message) can be used to derive Hash(Key + Message + extra) even if the secret Key value is not known.
Potential Impact: The impact of this attack means that the receiver cannot detect if the message has been altered. An attacker can intercept a message and signature, modify the message, derive a new signature, and forward the modified message and signature to the receiver; and the modification would not be detectable.
Not all hash functions are subject to a length extension attack, but unfortunately, the ones that are include SHA256 and many of the other common hash functions.
Solution: Use the right function – HMAC
HMAC – This stands for Hash-based Message Authentication Code which is used to prove message authenticity and that the message hasn’t been altered between a sender and receiver. Fortunately, the HMAC function is explicitly designed to produce a signature and is not susceptible to a length extension attack, regardless of the hashing algorithm.
A HMAC typically has three inputs, a Key, a Message, and a hashing algorithm. The key and message are combined and processed with the hashing algorithm to produce the message signature. If the Key value is secret, it is pretty much impossible to alter the message or signature without detection.
Using python as an example:
signature = hashlib.sha256(Key + Message)
signature = hmac.new(Key, Message, hashlib.sha256)
Common Mistake #3 - Inadvertently allowing hash data collisions
Let’s face it, one of the great features of a good secure hashing algorithm is being collision resistant. This means that there is an extremely unlikely chance that two different inputs will produce the same hashed output. However, hashing algorithms are deterministic, meaning that for the same input, they will always produce the same output.
Consider this code snippet:
hash(<customer-name> + <customer-id>)
On the surface, this looks fine, the customer supplies their name, and maybe we force all customers in the DB to be unique. The customer-id is a database sequencer and is guaranteed to be unique, so the developer may be making an incorrect assumption that there is a high likelihood that the result of the hash should be unique. Unfortunately, because the customer provides input to one of these values, they control one of the hash inputs and therefore can have some control over the output and potentially induce hash collisions. Remember that all of the following statements produce identical results even though the customer-name and customer-id are unique values.
hashlib.sha256(b"acme-tech" + b"1234").digest() hashlib.sha256(b"acme-tech1" + b"234").digest() hashlib.sha256(b"acme-tech123" + b"4").digest()
Potential Impact: We should all remember that we live in a world where an attacker’s goal is not always to get your data. Sometimes they just want to disrupt your business or prevent you from accessing your own data. If a data collision causes your system to break and they can induce this condition, then they have succeeded. Don’t make it easier for them.
Solution: The developers need to be very careful when hashing values where the user has any direct control over even one part of the hash input. Simply replacing the customer name with a UUID stored in the DB would solve this issue. The UUID may be known by the customer, but since it isn’t under their control the risk of attacker influenced hash collision would be avoided.
value = hashlib.sha256(customer-name + customer-id)
value = hashlib.sha256(customer-uuid + customer-id)
Although this article focuses on some common mistakes and their relatively simple solutions, it easily could have listed many more that are much less obvious or more difficult to resolve. You may start to get an appreciation of how even a slight mistake or oversight can introduce data security vulnerabilities into your application. The Ubiq Platform is designed to avoid the complexity and mistakes that are related to data encryption. Ubiq constantly monitors the latest research related to cryptographic algorithms, including encryption and hashing, and any newly discovered vulnerabilities, and it incorporates the newest information into the platform, ensuring your data remains as secure as possible.
To see the Ubiq Platform in action, check out this demo.