Share via


Babel - A 'new' random Unicode string generator test tool

For some time I have wanted to add surrogate pair character support to a tool I developed called GString, and this week I managed to find some time to do that work and more! As I developed the methods for surrogate pair support I rewrote (refactored in developer parlance) some of the previous methods to reduce complexity. And wouldn't you know it...the simple act of refactoring exposed some otherwise hard to find defects (and one pretty obvious one). I discovered these defects because I had to approach the problem space from a different perspective, and that perspective (working primarily with int types instead of char types) exposed the problems.

So, I decided to retire the GString code base, and I ported what I could into a new tool named Babel (and this is my shameless plug for that tool.) I know it is not 'customer friendly' when someone goes and renames a tool, especially when it comes with a library for test automation because now the 'customer' has to change their references in order to use the functionality in the new DLL. However, the name Babel seems more fitting in the purpose of this tool to generate random characters across the Unicode spectrum of language scripts; and besides Java also has a class called GString and I didn't want to cause any confusion. :-)

The obvious bug fixed in Babel is a problem that occured when generating character in the ASCII only range. For some bizarre reason I neglected to exclude Japanese half-width katakana characters (and for an even more bizarre reason I failed to find it; which is a really good reason why unit testing only goes so far and we really need a second set of eyes for sufficient testing). One not so obvious defects included exclusion of a range of code points between U+1A20 and U+1AFF instead of U+1B80 and U+1CFF. This was a classic boundary bug! But unless we did a formal code review it is unlikely this one would have never been found.) The other not so obvious defect that has been fixed involved the the programs inability to exclude some valid Unicode code points that have not been assigned a character if the user selected to exclude unassigned code points (again a similar problem to that described above.)

The good news is these are now fixed, and the new Babel tool also includes support for Unicode surrogate pair characters in the range of U+10000 through U+10FFFF as an option. Also, I included a feature to save the output to a text file rather than having to copy and paste. The installation package include a desktop tool, a DLL for test automation, and the user's guide and can be found at Testing Mentor.

If you encounter any problem using the tool, or if you have any feedback please let me know. Enjoy!