The Random Facts of Coding (or so it seems)

Fernando Vicaria

What is a Random Number?

Just so we can set the tone of what I have planned for this blog I decided to start with a very quick definition and then let you think while I prepare the next one...

 

A random number can not be predicted in advance. Thus, we can define only what a random number is not, not what it is. Random numbers can be produced by physical processes, such as throwing a dice, flipping a coin or counting intervals between radioactive decay events. By itself, software can't generate truly random numbers; instead, it creates what are called pseudo random numbers, starting from a single random seed.

 

To generate a random sequence, we start with a certain seed, and then we iteratively apply some mathematical transformation to it (see Listing 1 below), progressively extracting a random sequence. A sequence is considered random when no prediction can be made about it and no simple description can be found. But from the simple fact that we generate this sequence from a definite transformation, then a description of it will always exist.

 

 

long Seed = a;

 

long  _Random()

{  

    Seed = Seed * b + c;

    return Seed >> d;

}

 

Listing 1: A simple implementation for the LCRNG algorithm.

 

The values of a, b, c and d are fixed constants.

The formula is simple and deterministic and will always result in the same sequence of numbers as long as you don’t modify the values of the constants.

 

The second line in the body of the function is optional and will only shift the result by a specified number of bits to narrow the result to a fixed size. For instance you could shift the result by 56 bits to force the values returned to always be between 0 and 255 (or use anyother type with different sizes for that matter).

 

This type of algorithm is known as Linear Congruential Random Number Generator or LCRNG for short. It consists of three or more fixed constants, usually large prime numbers, which have certain mathematical properties.

 

Bad Seeds

 

In a general way, a bad seed is one that can be guessed after a relatively small number of tries or in a short interval of time. If a hacker (or an Adversary, to use a cryptography term) can predict the value of the constants used in your random algorithm, for instance the values of a, b, c and d from Listing 1, the seed can be computed and the whole scheme is compromised.

 

A typical example of this is some earlier implementations of SSL key generation in web browsers. They would encrypt the message in a large sequence of random numbers.

Some of those seeds generators would use the time of the day as a means of obtaining the values for the constants used in their algorithm.

The basic algorithm can usually be reversed-engineered using a disassembler and the time of the day can be known to within a certain precision often better than a second, which means that there are only one million possible choices for the microsecond part (and often a lot less because the clocks on most computer systems do not have a true microsecond resolution). See Table 1 for the resolution of some Windows API functions.

 

Function

Unit

Resolution

Now, Time, Timer

seconds

1 second

GetTickCount

milliseconds

10 ms

TimeGetTime

milliseconds

10 ms

QueryPerformanceCounter

milliseconds

< 1 ms

Table 1: Different time accuracy offered by Windows.

 

The values shown in Table 1 do not take into account the overheads of calling those functions. For example, it takes 5 to 10us (micro seconds) to call QueryPerformanceCounter, because it has to do several port I/O instructions to read the computer’s clock counter.

 

Note: If you decide to use QueryPerformanceCounter remember that it doesn't run at the same rate on all machines,

 

Another good example of a flawed seed generator algorithm is one that uses the process id (or PID) of an application. You should never consider PID’s a secret. These PID’s are usually present in message packets from the system.

 

You can see that generating a good random number is very hard, and as Donald Knuth once said, “A random number generator should not be chosen at random”. A good random process is not just a complicated or an obscure procedure. Another famous quote says “compromise of the system should not inconvenience the correspondents”. That means the system should remain secure even if all the details about its inner workings, down to the actual source code, are known to all.

 

Entropy = Randomness

 

In general the importance of generating a good sequence of random numbers is a bit more relaxed when we are working with mathematical or scientific applications given that we only need to fool our simulations or modeling programs into thinking that true random numbers have been used. This approach is not possible (or at least not wise) when writing security applications (to fool a strong-willed hacker or the NSA you need to do a lot better than simply shuffling your numbers).

 

Pseudo random numbers as the name itself implies are not really random. They just look random. As we saw earlier they are pseudo (or knowable) because they come from a mathematical function.

 

To the increase the randomness of our selected numbers we should collect as much entropy as possible before using a system. Anything that is determined by external factors can be and should be used as input, such as the time between keystrokes, the timing of disk interrupts, number of network packages arrived, and the like. On a multi-user system, the number of page faults, the number of disk read/writes, the time to wait until eligible to get the next time slice, and other such information that depends on the overall activity in a manner that is hard to predict or precisely control.

 

Typically a cryptographically sound “message digest”, such as MD5 or SHA (the Secure Hash Algorithm) is computed over a entropy pool and used as your next random number or seed.

 

It’s a good idea to have a tenth to a fifth as much entropy as the length of the key. For example if you are generating a 1024-bit key, it’s a good idea to have between 100 and 200 bits of entropy. More of course is always better but as you would expect it’s also slower. Sources of entropy usually carry a significant overhead.

 

Note: By one bit of entropy I mean the equivalent of a true random event with two, and only two, possible outcomes (such as a coin-flip for example).

 

There are many ways to get true random numbers. Some methods include making hardware devices that generate noise, observing cosmic ray flux, and observing light emissions from trapped mercury atoms. They're great in theory and in some sorts of practice, and there are some very high-quality random generator chips out there but most of the time we have to work with existing equipment.

 

Here is a list of ways to get entropic numbers:

 

  • User input: A few examples are time between keystrokes, mouse movements etc. Users are among the most entropic (meaning unpredictable) things there are. The obvious drawbacks are that it takes time to get entropic responses from the user, and if you have no user (in the case of a server), this doesn't help. Nonetheless, if you do have one, you should use user inputs, as these are the hardest for the adversary to acquire or spoof.
  • Hardware: If you have no user, you have to use your hardware for a stream of entropy. This is annoying, but not impossible. Some methods you might use are: clock, hard disk, video etc. Most of these devices will give you a source of nearly true random data if you know how/where to get them.
  • Network: There are all sorts of things on a network that are unpredictable. The downside of these methods is that they are also accessible to other people as they are to you.

Microsoft Cryptography API

 

Starting from PIII processors Intel has added support for a Security Driver that provides software applications the ability to access the Firmware Hub's hardware Random Number Generator. This RNG is based on sampling the thermal noise in resistors. It uses SHA-1 as mixing function for the outputs. It also runs some FIPS 140-1 autotests.

 

A number of Microsoft Operating Systems are currently supported by Intel and you can take advantage of this feature using the Microsoft’s Cryptography API.

 

To start using this RNG in your programs you will have to first download the driver from Intel and then read the directions presented in the CryptoAPI documentation in the Platform SDK. 

 

Some people consider this the ultimate solution. Others are afraid because all the entropy comes from one source (albeit a good one).

 

Note: In the SDK you can find some good example on how to use the CryptoAPI.

 

I will stop now and leave the testing of randomness for the next posting when we will look into makes a sequence of numbers a good or a bad apporximation of a "random" sequence.

 

Further Reading

 

• Using and Creating Cryptographic - Quality Random Numbers:

www.merrymeet.com/jon/usingrandom.html
• random.org:

http://www.random.org/essay.html.
• Generating a Truly Random Number by Leif Svalgaard:

http://cobolreport.com/columnists/leif/
• Use Query Performance Counterto Time Code:

http://support.microsoft.com/default.aspx?scid=http://support.microsoft.com:80/support/kb/articles/Q172/3/38.asp&NoWebContent=1
• The Cryptography API, or How to  Keep a Secret  by Robert Coleridge:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dncapi/html/msdn_cryptapi.asp
• Intel Security Driver:

http://www.intel.com/design/software/drivers/platform/security.htm

Published Wednesday, October 06, 2004 4:41 PM by fvicaria

Comments

 

damien morton said:

A good paper of random numbers can be found here:

"Random Generators and Normal Numbers"
David H. Bailey and Richard E. Crandall
http://crd.lbl.gov/~dhbailey/dhbpapers/bcnormal-em.pdf

"We call a real number b-normal if, qualitatively speaking, its base-b digits are “truly random.” For example, in the decimal expansion of a number that is 10-normal, the digit 7 must appear 1/10 of the time, the string 783 must appear 1/1000 of the time, and so on.

It is remarkable that in spite of the elegance of the classical notion of normality, and the sobering fact that almost all real numbers
are absolutely normal (meaning b-normal for every b = 2, 3, . . . )"
October 6, 2004 5:50 PM
 

asciiarmor.com Blog » Blog Archive » java.util.UUID mini-FAQ said:

December 7, 2006 9:34 PM
 

The Random Facts of Coding or so it seems What is a Random Number | Paid Surveys said:

May 29, 2009 6:58 PM
 

The Random Facts of Coding or so it seems What is a Random Number | Portable Greenhouse said:

June 1, 2009 7:20 AM
 

The Random Facts of Coding or so it seems What is a Random Number | home lighting said:

June 13, 2009 9:53 PM
 

The Random Facts of Coding or so it seems What is a Random Number | garden decor said:

June 19, 2009 3:07 AM
Anonymous comments are disabled

© 2010 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker