With Beta 2 most developers inside Microsoft has started playing with Vista. Among a lot of uber-cool features one of the interesting feature is UIAutomation. This is the next-gen assistive/UI-Automation technology that succeeds Active Accessibility (MSAA).

I tried it out a bit and I liked its power and reach. However, the programming model is very unconventional and un-intuitive. I really hope that the folks have done enough user-study to understand if its understandable by average developers.

Lets take a look.


I don't have the official statement but my guess would be that MSAA failed to keep up with the expectations. MSAA served its purpose well. However it was designed for assistive purposes to help out visually impaired people or people with other disabilities. Soon everyone started using it to drive/automate UI and ran into all sorts of limitations. The fact that it was not documented well added to the confusion and let to lot of incorrectly implemented piece of UI.

UIA was designed grounds up to serve both automation and assistive needs.

What is UIA

Its a layer over all MS UI technologies and provides a single interface to automate all kinds of user interface. You can potentially write generic code to automate an application and run the same bits to drive an application written in Avalon (WPF), WinForm, ASP.NET (inside IE), Win32 etc. WPF controls implement UIA natively and all others implement MSAA and UIA reads into all of them and provides a seamless experience.

UIA has a plugin provider model where you can register client-side providers which acts as bridges and allows UIA to read into any assistive technology out there. So potentially you can create a bridge that'd allows UIA to read into Java Accessible applications. Currently it works with WPF, Win32, MSAA, Office controls. Since a lot of frameworks like Flash supports MSAA, it'd should work with UIA seamlessly.


Multiple features make UIA very interesting. Specially the fact that it represents the whole desktop as a singly rooted tree. Its not relevant which technology is used to implement an UI, everything is expressed in terms of an AutomationElement in the desktop tree. So Visual Studio which is a native Win32 IDE comes under the desktop node and all the managed pieces inside VS appear as its children. Using crossbow if you host XAML (Avalon) inside a WinForm control it'll appear seamlessly in that tree. All the background work of converting technology specific details are done by UIA.

UIA also provides way to search the tree to locate nodes based on search criteria. It has it own query mechanism.

UIA supports notification of user actions using Events similar to the MSAA WinEvents.


The primary step of automating UI is to locate specific controls that match a search criteria and then drive it. We'll first define three overloads of a function named GetElement that locates a control based on criteria's passed to it

//Search for a elment just by name starting from root
private static AutomationElement GetElement(AutomationElement root, string name, bool recursive)
PropertyCondition condName = new PropertyCondition(AutomationElement.NameProperty, name);
return root.FindFirst(recursive ? TreeScope.Descendants : TreeScope.Children, condName);
//Search for the first occurance of a given type of control
private static AutomationElement GetElement(AutomationElement root, ControlType controlType)
PropertyCondition condType = new PropertyCondition(AutomationElement.ControlTypeProperty, controlType);
return root.FindFirst(TreeScope.Descendants, condType);

So the first one gets me any control with a given name (as in label, text) and the second gives me a control of a given type like ControlType.Button

Even though I don't use it here, its trivial to combine multiple conditions. So in case I wanted to use a condition that combines both of the above I would have done

AndCondition andCond = new AndCondition(condName, condType);

Another interesting feature in UIA is that each of the AutomationElements supports patterns. The supported patterns vary from control to control and can be used to drive them. So a button has a invoke pattern which is equivalent to clicking it and edit boxes have a Value pattern that can be used to set values in them. So the following clicks on a button and sets the value of a edit box

AutomationElement btn = GetElement(...);

InvokePattern invPattern = btn.GetCurrentPattern(InvokePattern.Pattern) as InvokePattern;


AutomationElement edit = GetElement(...);

ValuePattern valPattern = edit.GetCurrentPattern(ValuePattern.Pattern) as ValuePattern;


So with all of the above I can locate any button in say a calculator window and drive it. We can use the following code to drive calculator to do the calculation 7 * 7 - 7 = 42. At the end we'll also verify that the result in the calculator indeed matched the expected the most important answer 42. The code expects to have calculator already running.

static void Main(string[] args)
// Locate calculator window
AutomationElement calculator = GetElement(AutomationElement.RootElement, "Calculator", false);
// locate the button 7
AutomationElement btn7 = GetElement(calculator, "7", true);
InvokePattern btn7InvPat = btn7.GetCurrentPattern(InvokePattern.Pattern) as InvokePattern;
// locate the button *
AutomationElement btnMult = GetElement(calculator, "*", true);
InvokePattern btnMultInvPat = btnMult.GetCurrentPattern(InvokePattern.Pattern) as InvokePattern;
// hit on 7 again
// locate and invoke -
AutomationElement btnMinus = GetElement(calculator, "-", true);
InvokePattern btnMinusInvPat = btnMinus.GetCurrentPattern(InvokePattern.Pattern) as InvokePattern;
// hit on 7 again
// locate and invoke =
AutomationElement btnEq = GetElement(calculator, "=", true);
InvokePattern btnEqInvPat = btnEq.GetCurrentPattern(InvokePattern.Pattern) as InvokePattern;
// get the result edit box and verify the value is indeed 42
AutomationElement editResult = GetElement(calculator, ControlType.Edit);
string result = editResult.GetCurrentPropertyValue(ValuePattern.ValueProperty, false).ToString();
Debug.Assert(result != "42.", result);

Code Quality

The UIA code looks very ugly and is not strongly typed. With generics in place there is no excuse for writing a framework that requires casting almost every other line. I think I'll use another post to discuss why I do not like the code.