Hello. My name is Pengpeng Wang, a new VC++ IDE QA member. I joined Microsoft and the VC++ IDE team just a little over four months ago. These four months have been quite an experience for me. It still seems like yesterday that I was struggling with my first .NET “hello world” assignment. My mentor, Alvin (yes the Alvin Chardon guy who was talking about Excel), was “evil” enough to let me write my “hello world” to speak in four different languages and to be properly localized (all the language strings are specified in the resources and shipped with the application). Then things were happening pretty fast: I began to learn different UI testing technologies, to fix test bugs, to work on the testing infrastructure, to analyze and fix the test-result-auto-analysis tool …
Today, I am going to talk about the recent project I am involved in: the fuzz testing of the IDE, as part of the division-wide security testing for Orcas. First, a little bit background on security testing. Simply put, security testing is to detect any security issues of the product, such as buffer overrun, integer overflow, memory spikes, and CPU spikes. It is very critical to our customers. Well, I believe this following example can explain why.
In 2003, the Blaster worm infected over one million computers. It was due to security holes in the Microsoft RPC service http://support.microsoft.com/kb/826955. This is simply two lines of codes in the RPCSS like this:
while (*pwszTemp != L'\\')
*pwszServerName++ = *pwszTemp++;
The hacker that controls the input, the data that pwszTemp points to, can overrun the buffer and put, for instance a pointer to the malicious program he has. When another application executes and accidentally tries to access this malicious-program pointer, bomb! This helps to explain why Microsoft and our division have a big push on security testing.
Fuzz testing is one effective type of security testing. It is done by fuzzing the inputs, (For IDE, examples are VC++ project files (.vcproj) and resource files (.rc).), feeding the fuzzed input to the program being tested, and watching for any security issues. By fuzzing the inputs, I mean, take randomly one or several portions with variable lengths of valid input, and replace it with something else. This something else could be 0x00000000, 0xFFFFFFFF, !@#$, or maybe my “hello world” in the four different languages J … The idea is that when the tested program consumes/parses the fuzzed input, it may have a chance to hit another buffer overrun. As you can imagine, this fuzzing method is not very smart, (And indeed, we call it dumb fuzzing) since it randomly chooses what to fuzz and how and chances of detecting security issues are low. The other smart way is to understand the structure of the input, and fuzz it accordingly. For example, .vcproj file is an xml file. It makes sense to fuzz tags and contents as separate portions. And chances (I am saying from the black box testing point of view) are this will hit different code paths and provide better testing coverage. Well, nothing prevents you from doing a 20% dumb fuzzing and 80% smart fuzzing, if you feel more comfortable that way.
I am doing the fuzz testing for the .vcproj, .bsc, .rc and some other files. For example, for the .rc file, I wrote a component-based automation test that asks the IDE to open the fuzzed .rc file, parse it, and walk though all the resources items in it. I used a fuzzing tool that fuzzes the input from a valid one and feeds it to the automation test. This tool also detects security issues in the background when the automation runs. This is repeated for 100k times. Some of my tests are still running ... Guess I will have to go check their status.
Visual C++ IDE Team
Could you say something about why fuzz testing parsers for source code is a good investment?
For example, other options might be to use parsers that are already extensively fuzz-tested (Microsoft certainly has plenty of XML parsers, in the case of vcproj files for example) or to ensure that the process parsing the files is never elevated, making it a less useful attack vector (there's no reason for the project system to run as administrator).
Frankly, Visual Studio seems like an unlikely avenue of attack compared with the programming systems in productivity apps or the script engines that ship with the OS for example, and if the source code is untrusted there would appear to be more subtle security issues. Unlike RPC, DevEnv is not an internet-facing system service, so the analogy doesn't seem to hold.
While we all agree that more security is a ++good thing, I'd argue from my position of ignorance that this testing doesn't (or shouldn't) increase the effective level of security, only robustness, and might not be doing even that in the most expedient way (sln files still aren't valid XML, so do I assume they have their own parser too? Why?).
I'd welcome your insight into when to, er, refine the existing code, and when to look at the bigger picture.