Random test data generation

Article
05/30/2007

I am not a big fan of static test data, so this month's issue of Software Testing and Performance magazine published an article I wrote outlining one approach for generating random string data (although the basic concepts can be used for generating other types of random data).

Unfortunately, it appears that some of the numbers got a little screwed up and the printer did not superscript the exponents correctly so the numbers in the thrid paragraph are probably looking pretty strange. So, to clarify, the paragraph should read:

Using only the characters 'A' - 'Z' the total number of possible character combinations using for a filename with an 8-letter filename and a 3-letter extension is 26⁸ + 26³, or 208,827,099,728. If we were assigned to test long filenames on a Windows platform using only ASCII characters (see Table 1), the number of possibilities increases because there are 86 possible characters we can use in a vaild filename or extension and a maximum filename length is 251 characters with a 3 character extension is 86²⁵¹ + 86³. Trust me, that is one big number.

(NOTE: There have been several assertions regarding the above formula for determining the number of tests, here is the explaination. Essentially, the Windows platform file system treats the base filename and the file extension as 2 separate components and there is no interaction or dependencies between these two components. (For example, we cannot save a filename as CON.txt, but we can save a filename as myFile.CON.) Since there is no dependencies between the base filename component and the extension component they are treated as 2 independent parameters which would mathematically result in 26⁸ + 26³, or 208,828,082,152 tests if we elected to test all possible combinations of the base filename component with a nominal valid extension, then test all possible extension component combinations with a nominal valid base filename. One could argue we could combine the 17576 unique 3-character extension combinations with various combinations of the 8-character base filename component to reduce the overall number of tests by 17576; however I choose not to use that approach and instead test each parameter independently. If we mistakenly assumed dependency or inter-relationship between the base filename and extension components of a filename on the Windows platform testing all combinations (or 26⁸ * 26³ (or simply 26¹¹) on a Windows OS would result in approximately 3,670,135,659,905,624 redundant tests (if we could do exhaustive testing). This is where in-depth knowledge of the ‘system’ really pays off.)

Of course, the filename length and extension length is variable. Also, 251 characters assumes a base filename component length from the root directory (it does not take into account the MAXPATH constant). So, the total number of combinations using only ASCII characters is much greater because the base filename component length with a 'default' 3-letter extension from the root directory is actually 86²⁵¹ + 86²⁵⁰ + 86²⁴⁹ + 86²⁴⁸ + 86²⁴⁷ ... + 86¹. Then, of course vary the length of extensions, and the total number of combinations increases even further. But, all this is only to provide some scope the magnitude of the testing problem.

Also, the equivalence class table (Table 2) is simplified and does not inlcude reserved device names. For example, Windows will/should prevent a user from saving a filename of LPT1, or COM6, or CON, etc. (The behavior for saving filenames with strings composed of reserved device names is different on Windows Xp and Windows Vista...Vista frinally got it right!).

Unfortunately, I did not get a chance to read the edited copy before print, but I think the basic idea comes through and I hope you find value of using intelligent random test data in your testing and would be interested in hearing your feedback.

Share via

Random test data generation

Additional resources