A lot of people have asked Greg and me about automated testing. This is an important subject for me since I feel like it's the most important part of my job. I'm a smart guy and I know a lot about software development, so it's clearly not the best use of my time to click on the same button looking for the same dialog box every single day. Part of smart testing is delegating those kinds of tasks away so I can spend my time on harder problems. And computers are a great place to delegate repetitive work.

That's really what automated testing is about. I try to get computers to do my job for me. One of the ways I describe my goal as a tester is to put myself out of a job - meaning that I automate my entire job. This is, of course, unachievable so I don't worry about losing my job. But it's a good vision. My short term goal is always to automate the parts of my job I find most annoying with the selfish idea of not having to do annoying stuff any more.

With people new to automated testing, that's always how I frame it. Start small, pick an easy menial task that you have to do all the time. Then figure out how to have a computer do it for you. This has a great effect on your work since after you get rid of the first one that will free up more of your time to automate more and more annoying, repetitive tasks. Now with all this time you can go an focus on testing more interesting parts of your software.

That last paragraph makes it sound like writing automated tests is easy, when in fact it's typically quite hard. There are some fundamentally hard problems in this space, and those are what I'm going to talk about most here. Lots of test tools try to help out with these problems in different ways, and I highly recommend finding a tool that works for you. Unfortunately all the tools I use here at Microsoft are custom built internal tools (it's one of the advantages of a huge development team, we have people who's full time jobs are developing test tools for us.) So I can't really recommend specific tools. But I will talk about are the two major parts of an automated test, the major problems involved, and some of the general ways to solve those problems. Hopefully it will be valuable as a way to better understand automated testing and as a way to help choose your own test tools. As a side note, implementing automated tests for a text based or API based system is really pretty easy, I'm going to focus on a full UI application - which is where the interesting issues are.

Automated test can be broken into two big pieces: Driving your program and Validating the results.

Driving Your Program

This concept is pretty basic, if you want to test that when you push the spell check button a spell check session starts you have to have some way of getting your test to push the spell check button. But execution can be much trickier.

As far as I know there are really just three ways to do this. You can override the system and programmatically move the mouse to a set of screen coordinates, then send a click event. You can directly call the internal API that the button click event handler calls (or an external API if your software provides it, we use these a lot in Office testing.) Or you can build a crazy system that hooks into the application and does it through some kind of voodoo magic.

I'll admit that the systems I use are the last one. I don't understand exactly how they work, hence the vague description. They're the best way to do it from a testing standpoint, but hard to build. The other two options have serious drawbacks.

Calling into the API is good because it's easy. Calling an API function from your test code is a piece of cake, just add in a function call. But then you aren't actually testing the UI of your application. Sure, you can call the API for functionality testing, then every now and then click the button manually to be sure the right dialog opens. Rationally this really should work great, but a lot of testing exists outside the rational space. There might be lots of bugs that happen when the user goes through the button instead of directly calling the API (don't ask me, more voodoo magic, but I have lots of real life experience that backs me up here.) And here's the critical part - the vast majority of your users will use your software through the UI, not the API. So those bugs you miss by just going through the API will be high exposure bugs. These won't happen all the time, but they're the kind of things you really don't want to miss, especially if you were counting on your automation to be testing that part of the program.

I don't want to discount working through the API. I do it a lot since it's an easy way to exercises the functionality of the program. Just don't get dependent on it. Remember that if your automation is working this way you're getting no testing coverage on your UI. And you'll have to do that by hand.

Simulating the mouse is good because it's working the UI the whole time, but it has its own set of problems. The real issue here is reliability. You have to know the coordinates that you're trying to click before hand. This is doable, but lots of things can make those coordinated change at runtime. Is the window maximized? What's the screen resolution? Is the start menu on the bottom or the left side of the screen? Did the last guy rearrange the toolbars? These are all things that will change the absolute location of your UI. And I'm not even getting into trying to predict where a dialog window will pop up (its typically where it was when it last closed, and who knows what the last guy was doing.)

The good news is there are tricks around a lot of these issues. The first key is to always run at the same screen resolution on all your automated test systems (note: there are bugs you could be missing here, but we won't worry about that now - those are beyond the scope of your automation anyway.) I also like to have my first automated test action be maximizing the program (hint: sending the key sequence alt-<space>-x to the program will do this.) This takes care of most of the big issues, but small things can still come up.

The really sophisticated way to handle this is to use relative positioning. If your developers are nice they can build in some test hooks for you so you can ask the application where it is (or get an HWND and ask the OS yourself.) This even works for child windows, you can ask a toolbar where it is, or the dialog problem I mentioned is easily solved this way. If you know that the 'file -> new' button is always at (25, 100) inside the main toolbar it doesn't matter if the application is maximized or if the last user moved all the toolbars around. Just ask the main toolbar where it is and tack on (25, 100) - then click there.

So this has an advantage over just exercising the APIs since you're using the UI too, but it has a disadvantage too - it's a lot of work.

No one method will be right for all the cases you're trying to automate. Think hard about the goals of each automation case and choose the right way to drive your application to meet those goals (to back this up, I've written many specs for test case automation.)

Results Verification

So you've figured out the right way to drive your program, and you have this great test case, but after you've told your program to do stuff you need to have a way to know if it did the right thing. This is the verification step in your automation, and every automated script needs this.

You have three options. You can fake it, do it yourself, or use some kind of visual comparison tool.

Faking verification is great. I don't mean that you just make up the answers, I'm talking about making some assumptions about the functionality of your program and the specific functionality this automated test is checking (once again, having a well defined scope is critical.) For example when I was writing automation for the spelling engine in Visio I wrote a test that typed some misspelled text into a shape: “teh “. This should get autocorrected to “the “. But it's hard to programmatically check if “the “ was correctly rendered to the screen. Instead, I went and asked the shape for the text inside it and just did a string compare with my expected result.

There are lots of bugs this scenario can (and did) miss. The big one is when the spelling engine auto-corrects the text it has to tell someone it changed so that new text gets redrawn. Every now and then we'd have a recurring bug where this didn't happen and my test doesn't catch that. But I didn't define my test scope to catch that, I even explicity called out in the test description that it wouldn't catch that issue. This way I still get automated coverage on some stuff, but don't get complacent thinking that just because I have automation I don't have to manually check that area. The key to “faking it” is to limit the functionality you're testing and make explicit assumptions that other things will work right. Then you know that you aren't testing those other things and they still need manual coverage.

There are other ways to fake verification too. One of my favorites is a test case that says “if the program doesn't crash while executing this case, it's a pass.“ Don't laugh, this can be really valuable. I use it mostly in stress case scenarios. I do something really attrociously bad to the program like add a couple hundred thousand shapes to a Visio diagram, or set the font size to 18000 pt. I don't really have an expectation for what working right means at that point and it would be hard to check, I just want to make sure things don't blow up. Once again the key to this kind of fake it verification is having very narrow parameters on what the test is really testing, and understanding what isn't being tested.

The second method is to just do the verification yourself. This is basically the easy way out (cheating even) of the automation world. But it's sometimes the most cost effective way to go. An example of this is a big suite of tests that drive the program, and when they get to a state where they want to see if the right thing happened, instead of doing a verification it just grabs a bitmap of the screen and saves it off somewhere. Then you come along and click through all the bitmaps quickly checking if they all look right. This has real advantages over manual testing, especially if you're working with test cases that have complicated set ups (lots of things to manipulate and settings to change.) You basically save all the time you would have spent clicking buttons and instead just make sure the end results are correct. Most importantly, you don't have to implement a way to programatically compare two pictures in a meaningful way (more on this later.)

You pay for that cost saving in other places though. You don't get automated logs which can track which tests pass and fail over time. You don't get your automation to a point where you don't have to think about it and just get notified when something goes wrong. And you still have to spend a lot of time flipping through images (which, as you may guess, is exceptionally boring.)

Overall I only recommend this if you're working in a space where doing programatic visual comparison is really hard (like some advanced rendering scenarios, or times when your program's output is non-deterministic) or if you're not building a lot of automation and the cost of finding/buying/learning a visual comparison tool is greater then the cost of you sitting there and looking at bitmaps for the full product cycle and beyond.

Using a visual comparison tool is cleary the best, but also the hardest. Here your automated test gets to a place where it wants to check the state, looks at the screen, and compares that image to a master it has stored away some where. This process suffers from a lot of the problems that the mouse pointer option has. Lots of things can chage the way your program gets rendered on screen, and visual comparison tools are notoriously picky about any small changes at all. Did that toolbar get moved by the last user, just a couple of pixels? Too bad, the comparison fails.

Fortunately advanced comparison tools (or smart test scripts) can dodge a lot of these issues. The methods for this are similar to the smart methods of driving an application. For example, you could only get a picture of the toolbar you care about and compare those two pictures - then it doesn't matter where it moved to as long as it rendered correctly. Another way to use this is to only compare the canvas of your application (where the content goes: the drawing page in Visio, the writing page in Word, etc.) That way you don't care about the toolbars or the start menu, or other kinds of UI (except they can change the size of your canvas, be careful about that.)

This is clearly a complicated issue. And it's the most problematic part of automated testing for standard user applications. Fortunately, test tool suites are working hard on these problems trying to come up reasonable solutions for them. I know the internal Microsoft tools I use are getting much better at this, unfortunately I don't know what outside industry tools are doing in this space.

Other Thoughts

Of course, just figuring out how to drive your program and check the results doesn't get you an automated test suite. There is still a lot of work in designing those tests. For me, this is the hardest part of automation. Deciding what the tests should do is much more difficult then figuring out how to do it.

I guess the easiest way to say this is: writing automation is a valuable skill, but writing really good, reusable, and effective automation will make you a super star. I'm still working on that second part.

But I promise as I get better at it I'll write about it here. Until then, I hope this helps, and if you have any questions I'd love to hear them.


Chris Dickens
Microsoft Office Test