How Random is Random?
Image credit: Isabella and Louisa Fischer on Unsplash
Sometimes I like to refer to my computer as a big on-off switch. After all, at its most basic level, that’s exactly what it is; that’s exactly what it does. A computer is really nothing more than several (tens of billions to be more accurate) transistors that exist, for practical purposes, in one of two states: on or off. These transistors exist in one of these two states according to well defined rules, based on their inputs, because if they didn’t, your computer wouldn’t behave according to any sort of logical rules and would therefore be useless as anything other than a large, expensive paperweight. If a computer always produces the same output for a given input, though, how is it that it can produce random numbers?
The short answer is that it can’t. As always, the short answer doesn’t tell the whole story, but before we get into the long answer, it would be useful to understand what kinds of things random numbers are used for in computing. Random numbers can be useful for everything from playing your MP3 collection in “shuffle” mode to choosing which packets to drop in a congested network to generating passwords and keys for cryptographic use. Looking back over that list, it’s easy to see the increasing importance of the quality of the random numbers chosen for each potential use. For example, if your MP3 player selected tracks 2, 4, 6, and 8 followed by 7, 5, 3, and 1, that wouldn’t be very random, but it would suffice for the purposes of playing your music in a new order to spice things up. On the other hand, if the keys to secure your data were chosen in a similar manner, it wouldn’t be long before an enterprising individual discovered this deficiency and walked off with your data.
In computing, we can divide random numbers into 3 classes: pseudo-random numbers, cryptographically secure pseudo-random numbers, and (truly) random numbers. Truly random numbers typically come from outside hardware devices that can convert atmospheric noise into numeric values or from measuring the timing around keyboard presses, mouse movements, or other device inputs. For the purposes of this article, however, we are interested in the remaining two classes of (pseudo) random numbers. Pseudo-random number generators (PRNGs) create random numbers in a way that is not random but that, at least at first glance, appears to be. For example, the POSIX specification page for the rand() function which is available in nearly every programming language provides this example (in C):
static unsigned long next = 1;
int rand(void)
{
next = next * 1103515245 + 12345;
return((unsigned) (next/65536) % 32768) ;
}
The above algorithm is what is known as a linear congruential generator. It can generate numbers that will pass statistical tests for randomness, but it has a several of drawbacks:
- It will always generate the same sequence of numbers in the same order.
- The sequence of numbers will eventually repeat.
- The internal state of the algorithm and therefore all future outputs can be determined from observation of (enough of) the previous outputs.
- If the internal state of the algorithm is known, then all previous outputs can be determined.
A generator of this type would be good for shuffling cards for your computer game of solitaire, but it would be catastrophically bad for shuffling cards in a poker game where money is on the line.
This is where a cryptographically secure pseudo random number generator (CSPRNG) comes in. CSPRNG’s are designed to overcome the shortcomings associated with a PRNG, namely:
- Given some number of outputs from the generator, the next output cannot be determined[i], and
- Previous outputs cannot be determined if the internal state of the generator becomes known
Many cryptographic operations like ciphers and hash functions derive their security from the fact that their outputs appear random (in the cryptographically secure sense), regardless of the input. As such they are used as the basis of several CSPRNG schemes. There are numerous schemes based on these cryptographic primitives; however, NIST has standardized some of them in its Special Publication 800-90A. The original version of this document contained four schemes: one based on hashes, one based on hash-based message authentication codes (HMAC), one based on block ciphers, and one based on elliptic curves. The last scheme in this list was eventually withdrawn due to backdoors that had been placed into it by the NSA. The block cipher variant has since been found to deliver output that is distinguishable from random data under certain conditions; however, its underlying design along with that of the hash and HMAC-based variants remain proven methods of generating cryptographically secure random data when used correctly.
All of this is to say that obtaining random data for use in data security is far more involved than simply calling whatever random number generation function is provided by your favorite programming language. It’s even more complicated when your code needs to run on different platforms with different libraries and interfaces. The Ubiq platform’s API’s can simplify and improve your application’s data security by not requiring your application to supply the random data required for cryptographic operations. Ubiq is constantly monitoring the latest research related to random number generation and use and incorporating it into the platform so that your data remains as secure as possible.
To see the Ubiq Platform in action, check out this demo.
[1] In practice, this requirement is relaxed to “cannot be determined in polynomial time” which is a fancy way of saying a “reasonable” amount of time for a computer.