Choosing a Pseudo-Random Number Generator

A random number generator is an important part of many software for a surprising number of applications including cryptography, numerical simulation and video games. I wish to talk a little bit about my choices for the pseudo-random number generator (noted as RN, RNG, or PRNG), because it illustrates some of the issues that I have with software and documentation. In this post, a random number usually refers to a uniformly distributed discrete random number. Think of a die: each face has a equal probability of coming up. In my case, I desire a RNG that spits out numbers in the range 0-99, interpreted as percents of event occurrence.

The pseudo-random number generator lies at the heart of the combat system of the RPG I want to develop, determining who lives and who dies. Computers are by nature deterministic machines: they tend to do exactly as they are told by humans. Producing randomness, no matter how ones defines it is not a trivial problem. One way to get random numbers from computers is by clever algorithm design: “pseudo random numbers” that are “virtually indistinguishable” from “truly random numbers” can be produced this way. Such algorithms have advantages of being orders of magnitude faster than other methods.

Most programming software’s standard libraries come pre-packaged with a number of RNG. For example C/C++ has the rand() function. It is known to be a BAD RNG, see the “rand() Considered Harmful” talk by Stephan T. Lavavej, easily found on Youtube. Already, I start to be annoyed. If it is a bad RNG, why leave it implemented like this? Why not change it?

Anyhow, in the talk, the speaker recommends using the standard library: create a std::mt19937 object, inject it in the std::uniform_int_distribution object with chosen boundaries and BAM: better random numbers than rand().

Now, the mt in the std::mt19937 refers to the Mersenne Twister, a very popular RNG. It was originally developped by two Japanese researchers and published in 1998 in the scientific article: “Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator with many advantageous properties.”. Check it out, and Wikipedia for a more approachable summary.

Now this is all well and good, but a final issue still persists. The Mersenne Twister produces random integers in either the interval [0, 2^32-1] (32 bits) or [0, 2^64-1] (64 bits). And this is where I found there was a critical flaw: how does one scale the outputted RN to any range? For the purposes of my game, I do not need a random number between 0 and 2^32-1, I need a random number between 0 and 99! The previously rand() Considered Harfmul videos does present some naive attempts at scaling that are biased and should be avoided, pointing to the fact that this problem is very deep and complex. But how does one do it properly, then? It can’t be so easy! Mathematics isn’t easy, and neither are statistics or probability.

Scouring the internet, I found rumors that the standard does not specify the implementation of the “scaling” algorithm! Great! Exactly what I wish to know more about, unspecified! Then, in the GCC documentation and implementation of random, this is not properly explained. Not only is the libstdc++ documentation’s formatting horrible, but it seems to miss critical information. The source code itself is opaque and the algorithm not commented enough to help understand anything.

After more research, I did find a scientific article named “Fast Random Integer Generation in an Interval” which was exactly what I was looking for: algorithm explanation with detailed mathemagics. Not only could I implement these and test the myself, I could finally get some understanding of what (probably) is going on in the standard c++ library. All these annoyances and frustrations finally convinced me to forget the standard library and try implementing functions myself.

Even though it might be useful, I hate using software that is so badly documented or so hard to understand. Software should not be a black box, especially when dealing with complex subjects like statistics. This time, the standard library does not deserve me using it. Another third-party library and some homemade code proved more performant, efficient, and better suited my needs.

P.S. I thought about implementing the Mersenne Twister myself. I still might do it, later. But for now, I think using the TinyMT library, implemented by the RNG’s original authors to be satisfactory. Then, I used the article I used previously to test scaling algorithms myself. After some testing, I discovered the TinyMT implementation with homemade scaling algorithm was up to 5 times faster than the std library, and probably uses way less memory. It is sometimes useful to do things yourself to better suit your needs. I appreciate the C++ std library, but it lacks clear explanation about the “scaling” algorithms of the std::uniform_int_distribution family.

Leave a comment