Welcome to MSDN Blogs Sign in | Join | Help

Let’s Talk About Touch (Part2)

In part 1 I introduced several of the key pieces in Windows Mobile 6.5 gesture story, including the basics of what a touch gesture is and how the system identifies and delivers gesture messages to your app. I finished up by introducing you to the Physics Engine, so let’s continue from there.

Natural Physical Interaction

I talked about the new Physics Engine component briefly in part 1 but I skipped over a couple of important bits, so let me cover them off here.

Panning on a list of data is relatively simple to implement because the expected behaviour is fairly obvious, especially when the input device is a finger or thumb – it should work just like sliding a piece of paper around on a slippery desktop surface. However modelling the scroll gesture at the end of a pan sequence is a little harder to get right because the expectations are not so clear cut: what speed should the data move at, does the speed decay and if so at what rate?

The touch team did quite a bit of research into what the ‘right’ i.e. most natural response should be and captured the corresponding deceleration algorithm in the Physics Engine. We then built apps around that and presented users with the results so we could fine tune the algorithm parameters.

It was evident from this research that the human eye is very sensitive to the movement and deceleration used in the scroll animation, and it’s important to ensure the response to a scroll gesture is always predictable and the feedback is instantaneous – this helped us focus our performance tuning efforts when working on our controls in the OS.

The Physics Engine is used to map the animation of several other aspects such as snapping to an item in a list (more later) and rubber band where the velocity of scroll gesture would take the display position outside the visible content region.

The key point is when you are implementing UI that moves content in response to a scroll gesture, use the Physics Engine to drive the animation to ensure the user experience is consistent and predictable for the user.

What do you call the space that’s not client and not ‘non-client’?

Extending the concept of touchable content gives rise to some scenarios new to Windows Mobile. Imagine an application showing a list of items with a nice scroll bar to indicate the relative position of the visible content. Through gesture support the user can freely navigate up and down the list by direct manipulation (i.e. without touching the scrollbar), but what happens at the limit of the content? Extending the idea of direct manipulating means we really want the top and bottom of the content to be visually flexible so the user can pan the list beyond its limit and see a clear indication of the top of the document –showing a physical document border for example - and then on release see a smooth animation back to a edge of the document.

We’ve always had the concept of client space where data is drawn, and non client space like the menu, title bar, border etc. But this is something new! The space uncovered by going beyond the list limit doesn’t have a name – it’s not really client space because it’s beyond the client limit (and beyond the scroll bar range!) but it’s definitely not non-client because it’s drawn in the client area. In 6.5 we didn’t formally define this space however when updating the controls and applications to become touch aware we had to solve the problem of what to draw in this space - which was not as straightforward as it sounds when considering that most of our controls can be owner draw / support background themes and were originally designed to work well with a scrollbar.

If you are planning to support direct manipulation via gestures and make use of the Physics Engine then make sure your controls can support the concept of this new window space. There is an option to disable rubber banding in the Physics Engine and this will stop your window from uncovering this new space – but be aware doing this is likely to degrade the user experience.

Item Snapping

While we’re on the topic of the Physics Engine, another feature it exposes is item snapping. Somewhere in Window’s long ago past scrollbars were introduced to support a smaller display area than the actual data needed. And they have served us well for many years. Scrollbars allow the data space and the view space dimensions to be provided while allowing some of the physical display details to remain hidden, such as the real number of pixels needed to draw a piece of the screen.

There are lots of list style controls out there that use the scrollbar is this way because its very convenient: specifying the scroll range as the number of items, the current scroll position representing the item number and the page size described in number of items that can be visually displayed. The built in listbox and listview controls work in this way.

Unfortunately this approach causes some problems when introducing direct manipulation, specifically the pan gesture. The user wants to see pixel by pixel movement in response to the touch input but there aren’t actually any valid scrollbar positions available to represent each pixel position because the scroll range is in items and each item is more than one pixel. That may not be such a problem while the gesture is in progress – round the scroll position to the nearest item, after all it’s just an indicator at this point – but when the session ends the user is unlikely to have aligned the item exactly on the pixel boundary required to match the scrollbar.

The Physics Engine supports the concept of item snapping to help solve this problem. The item height / width can be specified when creating the Physics Engine, and then all animations are guaranteed to end exactly on an item boundary. In the case of a scroll the end point of the scroll is either stretched or shrunk to the nearest boundary and the animation adjusted to make this look smooth and predicted. The Physics Engine can also be used with a zero initial velocity to provide an item snap animation for when the user ends a pan session without a flick.

Here is a snippet from the Physics Engine sample in the DTK showing how to use item snapping:

{

    PHYSICSENGINEINIT initState = {sizeof(initState)};

...

 

    initState.dwEngineType              = 0;

    initState.dwFlags                   = 0;

    initState.lInitialVelocity          = -nTransitionSpeed;

    initState.dwInitialAngle            = nTransitionAngle;

    initState.bXAxisMovementMode        = PHYSICSENGINE_MOVEMENT_MODE_DECELERATE;

    initState.bYAxisMovementMode        = PHYSICSENGINE_MOVEMENT_MODE_DECELERATE;

    initState.bXAxisBoundaryMode        = PHYSICSENGINE_BOUNDARY_MODE_RUBBERBAND;

    initState.bYAxisBoundaryMode        = PHYSICSENGINE_BOUNDARY_MODE_RUBBERBAND;

    GetClientRect(hwnd, &rctClient);

    initState.rcBoundary.left           = 0;

    initState.rcBoundary.right          = rctClient.right + g_nMaxXExtent;

    initState.rcBoundary.top            = 0;

    initState.rcBoundary.bottom         = rctClient.bottom + g_nMaxYExtent;

    initState.sizeView.cx               = rctClient.right;

    initState.sizeView.cy               = rctClient.bottom;

    initState.ptInitialPosition.x       = g_nXPos;

    initState.ptInitialPosition.y       = g_nYPos;

    initState.sizeItem.cx               = 100;      

    initState.sizeItem.cy               = 100;      

 

    // create the physics engine and store it

    if (SUCCEEDED(TKCreatePhysicsEngine(&initState, &g_hPhysicsEngine)))

...

 

In this code the item height is set to 100 indicating that individual scrollbar values represent 100 pixels of screen real estate. If you are writing new code I would recommend you design your code to support by pixel scrolling from the outset and keep the scroll range in pixels – it just makes things a bit easier for you the developer and slightly more natural for the user. But this feature is really useful if you are updating ‘legacy’ code to support touch.

WAGI (Window Auto Gesture Interface)- Make it Simple

I was writing a touch presentation for some of our Asian partners and put a ‘small’ sample together (native code) showing the basics for implementing direct manipulation with gestures. Walking through the code in front of the partners it dawned on me, as I was searching the pages and pages of source, that maybe we need to work on simplifying some of the common scenarios. The Physics Engine interface was revamped and we’ve introduced the new WindowAutoGesture API’s that provide a very simple way of implementing the most common direct manipulation scenarios.

WindowAutoGesture Interface (WAGI) provides configurable gesture handling for individual windows taking away the complexity of dealing directly with the gesture messages and the physics engine. WAGI is implemented as part of the window manager and handles the pan and scroll gestures on the windows behalf, creating and driving the Physics Engine as appropriate. WAGI then instructs the application where to draw content through custom animation messages.

The window remains responsible for drawing its content and updating its scrollbar in response to animation commands from WAGI. The WAGI API was originally designed to go the extra step and take control of content drawing and scrollbars as well but there wasn’t time in the schedule to implement this for 6.5.

You enable and configure the WAGI behaviour for a specific window by calling the new TKSetWindowAutoGesture(). Gestures are delivered to the WAGI infrastructure through DefWindowProc(), so the application must ensure all unprocessed messages - the scroll +pan gestures specifically - are appropriately passed to DefWindowProc(). WAGI then processes the gesture and delivers the appropriate animation messages back to the window.

The most significant restriction with WAGI is that the window must have a scroll style (WS_VSCROLL or WS_HSCROLL) set and a scroll range greater than the visible area i.e. range > page size, which all means there must be visible scrollbars on your window. WAGI supports gestures in one or both axis and dynamically detects the scroll range, adjusting its behaviour appropriately.

WAGI can operate in two modes: one manipulates the application by directly simulating scrollbar messages such as WM_VSCROLL and WM_HSCROLL, and the other works through private animation messages to tell the window the pixel positions to draw content to. In both modes WAGI also provides notification messages to tell the window when touch interaction starts and ends – this is relevant for the focus issue discussed later.

Scrollbar Manipulation Mode

Scrollbar manipulation mode makes adding touch support to existing code that already supports scrollbar navigation very easy – just set up the WAGI information via a single call to TKSetWindowAutoGesture() and the existing scroll logic will take care of everything else. However there are a couple of reasons why you might want to consider upgrading to the animation message mode of WAGI:

·         If the scrollbar range for the existing window is something other than pixel positions then the movement of the UI can appear quite granular and jumpy because the scrollbar can only be manipulated to represent whole item positions and can’t animate through the interim pixels positions if they exist.

·         There is no way when using scrollbars to let make the content draw beyond its scroll limit, i.e. when rubber banding at the end of the list when the user has flicked fast on the screen, or when the user drags to the top of the list and beyond to see the limit of the list. Scrollbars have min & max values that bound the content and for lots of reasons it’s not possible to set the current position outside those positions.

Scrollbar manipulation mode is selected by setting nOwnerAnimateMessage to zero.

Animation Message Mode

This mode is selected by setting nOwnerAnimateMessage to a valid window message value equal or above WM_USER. This becomes the message id that carries the animation information back to the window. When an animation message is received the window must call TKGetAnimateMessageInfo() to get the x, y coordinates of the top left of the display area. TKGetAnimateMessageInfo also identifies the type of animation, and although there is currently only one, make sure you check this for AMI_ANIMATION_SCROLL because this may be extended in future releases. Then you need to force a redraw of the window using the new x,y content position not forgetting to update the scroll bar pos to the nearest item.

This mode supports much more flexibility because the scroll range is not a limiting factor, so per pixel positioning is possible. Also in this mode the x,y pixel position might be outside the windows data area, uncovering the new client area space I talked about before, so you need to update the paint routine to make sure it copes with this.

There are a number of other options available through WAGI, like disabling scroll or pan gesture support, limiting the extent that a user can go beyond the window content (including 0% if you don’t want to solve that problem), and the magical Lock Axes option. Lock Axes isn’t some medieval battle cry, unfortunately it’s something much more mundane. This option only makes sense when both the horizontal and vertical axes are scrollable, and it tells WAGI to ignore the scroll gestures unless they are roughly along one of the two axes, i.e. you will only get left/ right or up/down scroll movement with this option turned on. You can see behaviour very similar to this in IE6 on WM 6.5 when scrolling around the page.

See the                WAGSample project in the DTK for more details.

Item Focus

Direct manipulation through gestures brings in a whole bunch of issues around focus. Windows traditionally encourages the support of having a region or a point of focus so that a user can interact with the application without being forced to use a pointing device like a mouse or a stylus, or now a finger. Focus is traditionally set with the mouse-down or touch-down event but with direct manipulation the application cannot tell which gesture the user is intending to make until sometime after that initial event so it may be the user is starting a pan or scroll gesture instead of a tap, and moving the focus at the start of every pan gesture is more than likely not what the user wants. Also what should happen if the user hits a hardware key, like a delete key or context menu key while the control is animating in response to a gesture message like a scroll? Should it act on the item with focus at that moment, or stop the animation and move the item into view first, or something else?

To help solve this problem on many of the inbuilt controls we have changed the point at which focus is chosen from when the initial input occurs to now happen at end of the gesture sequence. Visually there are some subtle differences from 6.1 but the overall experience should feel right for the user.

So here are a couple of recommendations for your application design:

·         The ideal solution is to design your touch based applications to work without focus. However it’s important to ensure the application navigation is accessible through the direction pad (DPad) keys and if you are planning to use marketplace then the application must also work on pre 6.5 images.

·         If you have a custom (inbuilt list controls: Listbox and Listview already do this) list style control that has a concept of focus, move focus selection to mouse-up events (or even better use the select gesture - see the last point in the list). That way you can detect if a gesture happened and ignore the focus change. The exception here is the hold gesture which probably needs to move focus to the item under the gesture before displaying the context menu.

·         During a gesture session like a pan or scroll animation, any hardware key should interrupt the animation but not action the focused item. If you use WAGI the animation interruption is done for you, but the application code must still block the associated action.

·         This is fairly specific but may be relevant for you: if the focused item is off the screen and the user presses a dpad key, move the focus to an item onscreen instead of scrolling the list to show the current / next / prev focus item – have a look at how the listview does it in something like outlook email client, we spent quite a bit of time getting this right.

·         If you can, use only gesture messages or only mouse messages but not both. Focus issues get easier to solve if you can stick with just one or the other.

Which 6.5 Controls are Gesture Aware?

The primary OS controls we updated are these:

·         Listview

·         Listbox (includes combo)

·         Webview

·         Treeview

·         Tab (scroll left / right to change page)

We also updated a number of applications that have their own custom controls such as the Getting Started app and IE.

Why do you need to know? Well, possibly more for interest than anything else, but you do need to be more aware of this if you subclass any of the above controls. Listview, treeview, webview all use WAGI to drive the animations which means they require all gesture messages to get to the DefWindowProc() and also they use WM_USER+x for the WAGI animation and status messages that must get through to the control. I know this is “programming 101” but we’ve seen this cause problems a few times already. Remember, if you subclass a control you should not be overloading messages in the WM_USER range;  this range is private for the original window proc to use.

Getting the Right Frame Rate

Direct manipulation by the user demands a fast response – users are surprisingly adept at spotting UI lag. However a fast response does not translate directly to a demand for a blistering frame rate, in fact the user experience is much better with a slower, more consistent frame rate than a much faster but less consistent rate. In our testing the key factors that came out top are:

1>     The initial response to a touch input must be fast, in the order of 50ms but the faster the better.

2>     The frame rate must be as consistent as possible.

3>     With a consistent frame rate above ~15 fps our own research showed the untrained eye cannot easily distinguish between small differences in frame rate, i.e. 20fps vs 25fps (note: this was not exhaustive research and there are likely more scientifically sound results available, but this was sufficient for our needs)

We found that many of the legacy controls were not designed with rapid frame updates in mind, but with some relatively minor adjustments and performance optimizations we were able to hit our target frame rates with ease. Here are some of the things I applied while optimizing listview:

·         As with all optimization it’s vital to have the right measurements in place so you can ensure you are spending time optimizing the right bits.

·         It’s possible to receive touch input / gesture messages at a much faster rate than is required to update the screen. To keep the frame rate consistent it’s important to separate the gesture input from the frame update. For WAGI we use GESTURE_ANIMATION_FRAME_DELAY_MS which is found in the gesturephysicsengine.h from the DTK. Currently this is set to 34ms which when used in a timer gives rise to a frame rate of close to 25 FPS, and WAGI will deliver animation updates no more often than this. For non WAGI gesture handling I would recommend you use the same frame delay counter and aggregate pan messages between the frame events. The frame counter actually works against us when measuring the initial gesture response time because the first update is made a minimum of 40ms after the first gesture message. We made a number of changes to help improve the situation, and now WAGI will deliver the first animation request as soon as a PAN delta has been detected - the GID_BEGIN message was also modified to include the initial position of the touch point thus enabling the first GID_PAN to trigger a frame update. The key point here is: do as little as possible in the first frame update in order to keep the lag to a minimum.

·         Don’t do anything unnecessary in the update loop. This one sounds like plain common sense but there are some subtleties worth pointing out. It’s ok to use a special case drawing loop when processing gesture animation especially if you have high detail / high cost pieces of drawing code. For example if you have a list of contacts that also shows online presence information it doesn’t make sense to wait for an update of the presence information for each frame – wait until the gesture interaction has finished and then update the screen with the slower information. I got quite an improvement here by reducing the frequency of scrollbar drawing during the gesture animation – take a look at the scrollbar in the outlook client when you flick up / down on a long list of emails, you will likely not have noticed before but it actually updates on a much lower frequency than the frame rate.

·         Judicious use of off-screen screen cache can also boost performance significantly. Take care here because an off-screen bitmap takes memory from GWES heap which is a scarce, shared resource pool so don’t go and create a 4mb cache just for your screen. However if you are likely to redraw the same area of screen from the original data for several frames of an animation then it may be worth using a small cache. The key thing here is to carefully measure the benefits to the application performance and be pragmatic about the results.

Managed Code

You may have noticed that there is currently no update to support gestures in managed code for CF 2.0 or 3.5 at this point. As a managed code developer there are still some options open to you. CF controls are implemented in terms of the OS controls so if you are using any of the updated common controls in your managed app then you will have gesture support built in. However if you want to gesture enable a bespoke / custom control then you are required to interop and interact with the gesture architecture that way.

To help you get this up and running there are a couple of projects I’m aware of that will help. Firstly Maarten Struys has recently posted managed code versions of the DTK samples showing pretty much everything you need to get started. Take a look at his blog post here. In addition we’ve been working on a set of simple managed code extension classes that can be used to access gesture messages, the physics engine and WAGI from managed code. It’s not quite ready yet, but I will post more details when it’s baked.

What I’ve not covered:

There are a couple of things I’ve not covered here, like gesture other than touch, gesture enabling forms, touch filter, what’s in the DTK etc. I might post more on these in the near future.

I’m due to deliver a session on WM 6.5 gestures at the upcoming EMEA TechEd in Berlin in November and I’m hoping to get Maarten to join me and share some real field experience of using gestures.

Anyway that’s about it on touch from me. The UK team has moved onto other things now but hopefully its stuff I can blog about more readily.

Marcus

Posted by marcpe | 0 Comments

Windows Mobile 6.5 Touch Gesture docs go live

The doc team has just published our touch gesture docs for Windows Mobile 6.5

Check them out here: http://msdn.microsoft.com/en-us/library/ee220920.aspx

I'm still working on part 2 of my touch blog post; hopefully I'll get it done next week.

Marcus

Posted by marcpe | 1 Comments

Let’s Talk About Touch (Part1)

Windows Mobile (and CE) has supported touch screen interfaces since the beginning but the release of Windows Mobile 6.5 brings something new to the platform: gesture support. Gestures are intended to be a more natural way of interacting with the device through the touch screen, making more of an emotional connection between the user and the applications under the finger or stylus. Technically gestures are a collection of input points generated by touching the screen in patterns that the system recognizes. However the touch solution is much more than just the gestures, it’s also about the animation and interaction that take place as a result of the gesture input, for example smoothly scrolling a list of items or ensuring a UI item remains fixed to the ‘finger’ to give the illusion that the user is directly manipulating the screen content as if it were tangible.

What’s the difference between gestures and simple mouse input?

At first glance there appears to be a lot of commonality between raw mouse messages and gesture messages such as select = mouse click, pan = mouse move and double select = double click. However the gesture recognition code is designed to handle a quite difference set of input limitations to the mouse input. Primarily the mouse input for mobile devices is expected to originate with a pointing device like a stylus or a physical mouse; however gesture messages are expected to originate from a variety of sources such as using a finger or thumb, or by shaking the device and even broader input like smile recognition from a camera. Most users will initially experience gestures through the touch screen input and the majority of the work in 6.5 has been around getting finger input and response right.

Using a stylus or a mouse results in surprisingly accurate touch data which in turn makes small screen controls a viable user experience. In this situation tolerances for click, double click and tap’n’hold can be very small.

However when using a finger instead of a stylus several things have to change – for example the tolerances for click , double click, and tap’n’hold need to grow significantly to handle the huge variety of finger shapes and sizes found out in the wilds of human kind. Additionally when moving your finger across the screen the shape pressed against the screen changes due to the angle the finger is at. This often leads to unexpected input points at the end of a pan input that can cause misinterpretation of the movement

A Word about Screens

Touch enabled Windows Mobile devices traditionally sport a plastic tipped stylus and have a touch screen based on resistive technology.

In brief resistive screen technology is based on two layers of transparent conducting material (Indium Tin Oxide or ITO) separated by an air gap held apart with tiny insulating plastic beads. Pressing the screen deforms the two sheets and makes contact between them and from the change in resistance the screen firmware can identify where the stylus has been placed. There are lots of variations on this technology.

Resistive screens have several killer properties: they are cheap, very accurate for a stylus, and they can continue to work in quite hostile environments i.e. dirty screens.

However they do suffer in other areas: they require an amount of force to deform the screen and make contact between the conducting layers; because of the multi plastic layers placed on top of the display and the air gap, some brightness is always lost; cheap & readily available traditional resistive screens really only support a single touch point - more advanced digital resistive sensors have been demonstrated which do support multiple touch points, but this is a future development; it’s quite tough to get more information beyond just the point location i.e. size of the touch area; and durability can be an issue due to the use of moving parts – i.e. deformation of the screen.

Another touch technology that has rapidly gained in popularity is capacitive (as found in the iPhone and Android G1). This technology works by continually measuring the capacitive property of different areas of the screen. When conducting material such as a finger is placed on the screen, its capacitive properties change and the screen driver can determine where the finger is based upon the changes.

Capacitive technology has several advantages: zero pressure is required to make an input because nothing needs to be deformed and this leads to a much more natural interface experience; although additional material is laid onto the screen, there is no air gap so optical clarity is much improved reducing the need for backlighting making power draw lower; multiple touch points can be supported; things like touch size and pressure can be extrapolated from the capacitive data.

However they do suffer in other areas: in general the cost is currently higher than the equivalent resistive screen; supporting a stylus is hard because it must be made of conducting material and must make sufficient contact to change the capacitive property of the screen; in several areas the accuracy tends to be lower than resistive e.g. around the edges of the screen, combined with the lack of a stylus and lower sample rates makes things like handwriting input very hard.

There are other input technologies developing all the time, but at the moment these two represent nearly all the market for mobile devices.

Windows Mobile 6.5 has primarily been designed for resistive screens because some input areas still rely on small controls and require a high level of input accuracy that can’t be easily achieved with a finger and require a stylus; however some device manufacturers are considering options to ship capacitive screens.

Looking forward the mobile team is considering how to address these issues and support many more screen types including capacitive.

What Gestures are supported?

In Windows Mobile 6.5 we have implemented five primary gestures:

Select

User taps on the screen for less time than a specific threshold, and movement is less than a threshold distance.

Double select

A second select is detected within a timeout period of the first one

Hold

User taps on the screen for more time than a specific threshold.

Pan

Once the distance moved exceeds a threshold all touch movement is represented as a pan gesture.

Scroll

At the end of a touch session, if the preceding points are roughly linear and exceed a minimum speed.

 

Gestures are delivered using a new message WM_GESTURE which is accompanied by the gesture ID and a handle that can be used to get the rest of the gesture data, like angle and velocity of a scroll, or the location of a pan gesture through the GetGestureInfo () API. Windows 7 for the desktop uses this same message and at the moment offers a slightly different set of gestures available on mobile, so be careful when searching MSDN docs to get the right ones (at this time the MSDN mobile docs haven’t yet been published).

 

How do gestures work then?

There are a couple of things you need to know when working with gestures directly:

·         Gestures and mouse messages are not intended to be interchangeable. Although in WM 6.5 you will probably get away with using mouse messages instead of select or double select gestures, but moving forward that’s highly likely to change as new hardware is designed to take advantage of the touch infrastructure – touch is designed to allow separate areas for touch and mouse input so imagine a device with a mouse pad area as well as a touchable screen where the touch screen only generates gestures.

Ideally you should write your code to work either with mouse messages, or with gestures but not both at the same time.

·         Gestures are always delivered to the window under the initial input – i.e. touch down location. You’ve probably never thought about this for mouse messages but it makes total sense that all mouse messages are delivered to the window directly under the mouse at the point a mouse event happens (unless delivery is forced to a specific window using SetCapture()).

For Gestures it’s a bit different. If the user wants to send a scroll gesture to a specific area of the screen the touch input may start in the ‘target’ window area but because it takes time and distance to describe a scroll gesture the end of the gesture might happen in a completely different window somewhere else on the screen. So the gesture engine code remembers where the initial point was and ensures the scroll gets delivered there as well. Same for a hold – the input may ‘wander’ under the finger, but the hold is sent to the window under the initial input point.

If for some reason the window under the initial input gets destroyed the whole of that input ‘session’ will be lost. It will only start again after the finger has been lifted and placed down again.

·         We’ve also added some special routing for the WM_GESTURE message to help maximise the size of the touchable area. If we get a WM_GESTURE message in DefWindowProc()  it means the target window didn’t process it either because it doesn’t support touch at all, or because the specific gesture means nothing to the control. DefWindowProc() will send the WM_GESTURE message to the parent window in case there is a larger control that will support the gesture. Consider the example of a form with labels on it – a pan gesture means nothing to the individual control, however the form itself can reasonably respond to the pan and move the whole form around in response.

So here’s something to watch out for: Don’t send gesture messages from parent to child window. We’ve put loop protection to stop a stack overflow but it’s still very inefficient to hit this and I’m sure you could find some way of overcoming the protection if you try!

Can I extend the list of gestures?

No, not at this point although it’s something we might consider in the future.

How do I use them?

There are a couple of examples that shipped in the Windows Mobile 6.5 Developer Tool Kit showing how to use the WM_GESTURE message. The basics are here:

LRESULT CALLBACK WndProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam)

{

    switch (message)

    {

        case WM_GESTURE:

        {

        GESTUREINFO gi = {sizeof(gi)};

        // Go get the gesture - will return FALSE if the gesture engine is not present in the system.

        if (TKGetGestureInfo(reinterpret_cast<HGESTUREINFO>(lParam), &gi))

        {

            switch (wParam)

            {

                case GID_PAN:

                {

                    ...

                    fHandled = TRUE;

                }

                break;

 

                case GID_SCROLL:

                {

                    ...

                    fHandled = TRUE;

                }

                break;

 

            }

        }

        if (!fHandled)

        {

            return DefWindowProc(hWnd, message, wParam, lParam);

        }

        break;

    }

}

What’s physics got to do with anything?

So far I’ve only covered half the story... or maybe it’s less than half because the user can only be aware of the response to gestures and without the right response there can be no real connection with the device.

The key point is that the device presents consistent responses across all applications so the user becomes confident in their interaction. So the expected responses are these:

Select and double select

Drill or action on the selected item

Hold

Bring up a context menu

Pan

Content under the finger moves in direct proportion to the movement of the finger i.e. direct manipulation.

Scroll

Content under the finger continues to move in the direction of the last pan and at the same velocity, decaying to a halt over time.

 

From this we can see there is really only one gesture that require any sort of physics driven response and that’s the scroll gesture. What we need is a way to implement a consistent movement in response to the gesture. To make this possible we’ve implemented a number of routines in the physics engine that allow the caller to identify the shape of the data area and the client area, then input the speed and angle of the scroll gesture (both available from GetGestureInfo()) and query over time the location of the client area until it comes to rest.

There are a number of animations ‘modes’ available to the Physics Engine beyond just deceleration and these are internally combined to move the location of content to the extent of the data area and then to change from one mode to another i.e. decelerate to rubber band, so that the final resting place of the animation is always with valid data showing.

By implementing this behaviour in a central Physics Engine module, each touchable UI component can expose consistent and natural feedback to the user. This is a key to raising user confidence in the device and achieving an emotional connection with the experience.

Take a look at the PhysicsEngineSample in the Windows Mobile 6.5 Developer Tool Kit for more information.

That’s enough for part 1

This is already a bit long, so let me break off now and I will post another update soon covering WindowAutoGesture, Managed Code and some more stuff about the resource kit. Oh and I will share things we learned while optimizing 6.5 touch related animations.
Posted by marcpe | 5 Comments

Samsung i640v

Information about the new Samsung i640v device have started to appear on various blogs and news pages:Samsung SGH-i640V

Engadget, Unwired, MSMobileNews and even a short video on YouTube.

It is a compact, elegant device with a slide out qwerty keypad and a thumb wheel incorporating D-Pad functions, Bluetooth 2.0 +ERD, 2.4 inch QVGA (landscape) screen and HSDPA capable radio which supports up to 3.6 Mbps data rate! Unfortunately the claims of GPS and WiFi are untrue - nice idea tho!

This device will be available through Vodafone and like the Palm 500v, is also running the Vodafone Terminal Platform Program (TPP) version of Windows Mobile 6.

Posted by marcpe | 9 Comments

Default GPRS setting

Way back in the misty past - October 03, 2005 to be precise - I posted about configuring networks and GPRS entries on Windows Mobile.

I left an open question about how to configure the default GPRS entry the device should use for a specific network destination. For example if I configure 3 different GPRS entries that all connect the device to The Internet meta-network, which one will the device choose? Well there is a default that gets shown in the UI but at the time I couldn't figure out the config settings needed to programmatically set or change this.

Recently I had a question from Sam Mannix asking if I ever solved this and found a way of setting these defaults. Well yes I did, and here is how:

<wap-provisioningdoc>

    <characteristic type="CM_Planner">

        <characteristic type="PreferredConnections">

            <parm name="{436EF144-B4FB-4863-A041-8F905A62C572}" value="Internet GPRS" />

            <parm name="{7022E968-5A97-4051-BC1C-C578E2FBA5D9}" value="WAP GPRS" />

        </characteristic>

    </characteristic>

</wap-provisioningdoc>

The CM_Planner Cofiguration Service Provider (CSP) is responsible for maintaining the default connection entries for each meta-network. In the above XML I'm setting the default GPRS entry for the Internet meta-network {436EF144-B4FB-4863-A041-8F905A62C572} and for the WAP meta-network {7022E968-5A97-4051-BC1C-C578E2FBA5D9}.

The default value string value="xxx", identifies the display name of the GPRS entry. So for example if I had:

<wap-provisioningdoc>

    <characteristic type="CM_GPRSEntries">
        <characteristic type="Internet GPRS">
            <parm name="DestId" value="{436EF144-B4FB-4863-A041-8F905A62C572}" />
            <characteristic type="DevSpecificCellular">
                <parm name="GPRSInfoValid" value="1" />
                <parm name="GPRSInfoAccessPointName" value="MyInternetAPN" />
            </characteristic>
        </characteristic>

        <characteristic type="Alternate GPRS">
            <parm name="DestId" value="{436EF144-B4FB-4863-A041-8F905A62C572}" />
            <characteristic type="DevSpecificCellular">
                <parm name="GPRSInfoValid" value="1" />
                <parm name="GPRSInfoAccessPointName" value="MyOtherAPN" />
            </characteristic>
        </characteristic>
    </characteristic>

</wap-provisioningdoc>

The "Internet GPRS" entry would be the default and Connection Manager will choose that entry over the "Alternate GPRS" entry.

Note: this is true for Windows Mobile 6 Standard. I haven't confirmed this is true for Professional as well, but its very likely.

Marcus

Posted by marcpe | 3 Comments

Palm Treo 500v Announced

Following the launch party last night (no, I didn't get an invite :( ), Palm now have online detailstreo500vwhitephncallactvhi of their new 500v device, available through Vodafone.

The device will be available to buy from 1st October - price details not available yet - and will ship in UK, Germany, Italy, Ireland, Netherlands, Portugal, Spain and South Africa.

Specs and pics available here on the Palm web site .

This is the first announced device running the Vodafone Terminal Platform Program (TPP) version of Windows Mobile 6 - i.e. my team's product. Wohoo!

Posted by marcpe | 2 Comments

Connection Manager

Connection Manager

We’ve been working on a bunch of very tricky connection issues recently that came to light when creating a network configuration setup for a Mobile Operator (MO). The trickiness comes as a result of the complex network topology and the requirements and limitations from the MO:

1>     The MO network supports only one PDP context (explanation later)

2>     The device is required to connect to different APN destinations depending on the traffic type – for example certain streaming content must go through a proxy server, but MMS traffic is only available from a dedicated APN.

PDP Contexts

First of all let me explain a PDP context. PDP (Packet Data Protocol) context refers to the instance of shared session state between a handset radio and base station software. It contains important information such as APN and IP address. You can get a fuller explanation of PDP context on Wikipedia here.

For a GSM device to send and receive data it must first establish a PDP context. The context is established by the device making a request to the base station, passing the name of the desired APN. The base station will typically forward the PDP request along with the APN and handset IMSI (SIM number) into the MO’s billing network so that access can be verified – e.g. SIM is allowed (i.e. has a valid contract, and enough credit) to make a data connection to that APN. On success the base station will reply to the handset supplying extra information such as the handset IP address, at which point the context is ready to go.

Multiple PDP Contexts

You might wonder why a handset needs more than one PDP context. In some situation the device must use more than one APN in order to send and receive data to the right network endpoints – for example if the MO applies billing and controls data access through specific APN’s, or has secure information behind a dedicated, protected APN. Once a PDP context has been established the parameters of the context cannot be changed – this means the APN endpoint cannot be dynamically updated, and instead the context must be dropped and a new context created.

Earlier I mentioned a request to setup a PDP context typically routes through to the MO’s server infrastructure for validation. This process takes time and depending on the MO’s infrastructure can take up to 20 seconds to create a context (although the average time inside the MO’s network is more like 5 seconds). For performance alone it’s desirable to allow more than one PDP context especially when the MO requires multiple APN’s for things like billing.

Single PDP Contexts

For some of the first generation 3G networks, the base station software doesn’t support multiple PDP contexts, and when the handset needs to change APN in this scenario the existing PDP context must be dropped before a new one can be requested.

At this time the vast majority of 2.5G networks support multiple PDP contexts.

The available number of contexts that Windows Mobile can use is defined by the base station capabilities combined with the radio hardware capabilities. Only when both handset and base station support multiple simultaneous PDP contexts is there a possibility to use this feature.

Many new handsets support  ~3  contexts.

Connection Manager

Ok so in a single PDP context scenario there might be a small delay for the user – not great, but acceptable in the unusual situation where the APN needs to be switched. But what’s all this got to do with connection manager, I hear you ask.

For many (less capable? J) smartphone handsets not running Windows Mobile, when the setup requires multiple APN’s to be configured, the common solution is for each application to ask the user to select the network APN to use for that application. Additionally once the application is running it will typically expect to hold that connection exclusively until the application is closed by the user, at which time the connection is dropped and another application can be started and connect to a different APN.

Windows Mobile implements Connection Manager that takes the complexity of choosing a connection away from the user… after all, who should really know which connection is the right one, Mobile Operator or user? Hey if the MO / application doesn’t know which connection to use, the user hasn’t got a hope! The Windows Mobile approach allows more flexibility for applications and enables the concept of ‘background’ or ‘always-connected’ applications without writing a large amount of logic for each application.

For details of Connection Manager (CM) check out the Windows Mobile SDK documentation or online here.

Modeling the MO’s Network

Connection manager requires the MO’s network topology to be modeled using a number of settings.  The basic components of this model are as follows:

1>     There are a number of ‘Meta’ Networks defined as GUID’s in the registry. These identify connection destinations such as ‘the internet’ or ‘work’ or ‘Secure Wap’. There is a list of meta network guids published in the public documentation and used to identify common destinations such as ‘Internet’, however the list is fully extensible. (Use CM_Networks CSP for provisioning)

2>     There are a number of GPRS entries in the registry that define the information required to make a data connection through a specific APN. These settings contain information like user name, password and APN name. Additionally these settings also identify a destination meta network – this is the network destination that will be available by connecting to the APN. The simplest case would be something like a GPRS entry that connects to the internet network. (Use CM_GPRSEntries CSP for provisioning)

3>     There can also be a number of WiFi and dial-up entries to define other connection’s that could be made to reach a meta-network.

4>     Additionally there can be a number of VPN and Proxy entries. These are slightly different types of connection because a VPN or a proxy enable a connection from one network to be changed into a connection to another network. For example using a VPN might allow an internet connection to become a work network connection. So these entries have both a source meta-network and a destination meta-network.

One more configuration settings needs to me mentioned, although not strictly to model the network topology, and that the Mappings table (Use CM_Mappings CSP for provisioning).  This table allows applications to defer the choice of meta-network required for a particular resource or URL to the MO or OEM that configures the device. Without the Mappings table an application developer must either hard wire the destination GUID into code, or provide a configuration option for the user to see and selecting the meta-network – back to the old issue of relying on the user!

The Mappings table contains an ordered list of URL patters matched with a meta network GUID and can be interrogated by using the Connection Manager API: ConnMgrMapURL. The idea is that an application will pass the required URL to ConnMgrMapURL call, and that API in turn will interrogate the table, starting at the first entry to match the URL pattern, moving through the table until a match is found or the table ends. When a match has been located the associated meta network guid is returned. The calling code can then use this meta network in a call to ConnMgrEstablishConnection when trying to connect.

This type of lookup is a requirement for a general purpose application like IE Mobile or Windows Media Player, where any number of URL’s could be supplied. However it’s also a great way for other applications to ensure they support the widest range of network topologies.

Here is an example of what the CM_Mappings table might look like:

<wap-provisioningdoc>

       <characteristic type="CM_Mappings">

              <characteristic type="501">

                     <parm name="Pattern" value="*://*/*.3gp"/>

                     <parm name="Network" value="{D3B2D798-9E69-4B65-A75B-6DDFBECEAAAA}"/>

              </characteristic>

              <characteristic type="610">

                     <parm name="Pattern" value="*://*.operator.*"/>

                     <parm name="Network" value="{7022E968-5A97-4051-BC1C-C578E2FBA5D9}"/>

              </characteristic>

              <characteristic type="536870912">

                     <parm name="Pattern" value="wsp://*/*"/>

                     <parm name="Network" value="{7022E968-5A97-4051-BC1C-C578E2FBA5D9}"/>

              </characteristic>

              <characteristic type="553648128">

                     <parm name="Pattern" value="wsps://*/*"/>

                     <parm name="Network" value="{F28D1F74-72BE-4394-A4A7-4E296219390C}"/>

              </characteristic>

              <characteristic type="570425344">

                     <parm name="Pattern" value="*://*.*/*"/>

                     <parm name="Network" value="{436EF144-B4FB-4863-A041-8F905A62C572}"/>

              </characteristic>

              <characteristic type="587202560">

                     <parm name="Pattern" value="*://*/*"/>

                     <parm name="Network" value="{A1182988-0D73-439E-87AD-2A5B369F808B}"/>

              </characteristic>

       </characteristic>

</wap-provisioningdoc>

 

The numeric type value defines the order that the mappings will be examined, starting at 0 and going up, so entry “501” will be examined before “610” and so on.  The pattern parameter is not a full regular expression but allows quite a range of flexibility. ‘*’ is used as the wild character so for example “*://*.operator.*/*” will map to any protocol, any address that contains the text pattern ‘.operator.’ followed by a ‘/’ and any training page name. For example the following URL strings would match this destination network:

·         “http://www.operator.com/”

·         “rtsp://rtsp.operator.media.com/20070707_ABHDD3227DD/today_news1.sd”

·         “https://my.long.name.operator.da/index.aspx”

 

If you want to examine the mappings for your device, run the following XML via RapiConfig.exe:

<wap-provisioningdoc>

    <characteristic-query type="CM_Mappings" />

</wap-provisioningdoc>

How Connection Manager makes connections

Ok, once we’ve got a model for the network, Connection Manager can now do its magic and take away much of the pain of choosing connections.

When my code needs to make a connection I call ConnMgrEstablishConnection and pass it a populated CONNMGR_CONNECTIONINFO structure. The important bits are as follows:

DWORD dwFlags;

Specify stuff like Proxy Aware. If at all possible, write proxy aware code (HTTP proxy is usually enough). It’s not hard! Just make use of the InternetOpen or InternetOpenURL WinInet API’s and its all pretty much done for you!

 

DWORD dwPriority;

The priority is very important if you want your app to play well with other applications on the device.

 

BOOL bExclusive;

Stops connection manager from sharing the physical connection with other applications even if they request the same destination meta network. Use with care!

 

BOOL bDisabled;

Useful to determine if CM can even find a way of connecting to your meta network. But setting this flag means this request will never result in a connection happening.

 

GUID guidDestNet;

Your meta network destination returned from ConnMgrMapURL().

 

HWND hWnd; UINT uMsg; LPARAM lParam;

Where to send status update messages – e.g. when the state of a connection changes. What message to send and what lParam value to add in

When connection manager is called it checks the parameters and uses the connection Request (CR) data to track the life of the connection, releasing the data when ConnMgrReleaseConnection is called for the CR. The following is a summary if the main steps that Connection Manager takes when processing a CR:

1>  Verify that the destination meta network GUID can be reached and work out the required connection path.  

This is where connection manager looks at the network settings  and  meta-network destination, GPRS / WiFi entries and proxy settings to find the best set of connections that could be used to satisfy the CR. It’s worth noting, if your app does not tell Connection Manager that it supports proxy’s then no proxy server entries will be used when calculating the required path… so make sure you add proxy support to your code and to the CM request!!!  Many of the .NET CF managed classes (HttpWebRequest for example) already support proxy connections – I’ve got a little sample that I might post later to show this working.

Connection Manager can choose to use more than one connection entry in order to satisfy your request, for example in order to get to ‘Work’ network there might be a GPRS connection plus a VPN connection. Connection Manager can also choose a different path depending on the current connection state of the device, for example if the device is not connected to any network CM might choose a direct GPRS connection. However if the device is currently connected to a Work network then it might choose just to reply with proxy information if available. There is quite a bit of complexity in this process, so check the docs if you want more detail.

If connection Manager cannot find an appropriate path to the destination the application will be notified and no further steps taken.

2> State transition journey begins

Every connection request that is accepted by the ConnMgrEstablishConnection will travel through a finite number of transient states before ending up eventually at one of the 5 or so non transient, closed states (I consider the ‘connected’ state to also be transient).  Once a connection is in a non-transient state that CR will never again change status, so if the calling application needs another connection it needs to close the existing CR and re-request a new CR.

These state transitions are reported back to the calling application via windows messages using the parameters supplied to the connection request. The first transition takes place right after the connection has been requested.

Adam Dyba is the expert on this stuff and although his blog posts are rare, there is a very useful description here of state transitions including a diagram, which is worth many thousands of words!

3> Consider all active CR’s and calculate the next steps

Once a path has been identified for the connection, the next action is to trigger a resource allocation process. This involves CM first sorting all CR’s by request time (newest first) within connection priority order. Resources are then allocated to the connections from the top of the list down. Therefore the most recent, highest priority CR gets connected first, and the lowest oldest CR is last in the list.

Let me pull out a couple of things from this:

·         Going back to PDP contexts, if only one PDP context is available then only the top CR in the list will get connected. If a new request is lower priority it will receive WAITINGFORRESOURCE message to say that the network connection is busy. There are some additional complexities here, for example if a lower priority CR is requesting the same meta-network as the CR at the top of the list – take a look at the docs for a more complete explanation of all the different situations that are supported.

 

·         If more than one PDP context is available then Connection Manager can try to connect a number of CR’s, starting at the top of the list. Again, if there are more CR’s than PDP context the remaining CR’s will receive WAITINGFORRESOURCE notification messages.

 

·         New Connection Requests will always take priority over existing Connection Requests if they are at the same priority.

Think about this for a sec and you will realize that’s exactly what users want, but there’s a trap here for an unwary dev to fall into. Let’s say my application needs to be ‘always connected’ so that updates can flow back and forth to the server as required. Also my application requires connection to a private APN that offers no ‘internet’ access.

My application connects and starts to transfer data back and forth, but the user gets bored  and fires up a web browser to see how well her stock portfolio is doing. To access general internet content a different, unprotected APN connection is configured so the browser issues a CR for the new destination meta-network.

If the connected base station and radio hardware support multiple PDP contexts then the second CR will also be connected and both apps can quite happily run side-by-side.

However if there is only one PDP context, then the newer browser CR will trump my application’s request and Connection Manager will force closed the existing CR to the work APN, then connect the newer browser connection. Aha! But I wrote my application to be resilient to connection failures, knowing that the user is likely to be traveling a lot. As soon as the application detects that the connection has been lost(receives a state transition from CM saying it’s been disconnected), it re-requests the connection.  The new request is received by CM and because it’s at the same priority but newer than the browser CR, it wins and CM tears down the browser CR to connect my applications session. The browser is notified of the disconnected CR and reports an error to the user. ‘Darn!’ says the user and tries again, but the browser will never connect because of the aggressive retry logic built into my application.

So what’s the app developer doing wrong?

First of all it’s important to use the correct priority for the connection: USERINTERACTIVE is exactly that! If you want to use this priority, make sure the user really has interactively caused the connection to be established – like a browse request to a browser application, or a sign-in request. If the application changes from user interactive to the background – i.e. the user launches a new app or a dialog is displayed over your application – then change the priority of your connection to USERBACKGROUND using ConnMgrSetConnectionPriority. Doing this should make connection race less likely.

The second thing is to consider implementing some form of connection back-off. What I mean is that if the connection is broken, don’t immediately re-issue the CR, instead wait for a few seconds before trying  again. If it fails the new request increase (or double) the delay, and keep doing that up to a maximum delay value. When the connection is next connected successfully, clear the back-off delay value. This will also make race conditions much less likely to occur.

The resource allocation process can be triggered by a number of other API’s calls beyond just ConnMgrEstablishConnection. Changing a CR’s priority, releasing a CR and a scheduled connection event occurring will all cause the resource allocation process to take place.

It’s also worth noting that when the device makes a voice call, it’s started by requesting a connection to Connection Manager. There is a reserved, ‘highest’ priority flag available for voice calls: CONNMGR_PRIORITY_VOICE, that overrides all other priorities. Only after CM has connected the voice CR can the call take place. This is a slightly different case, because CM will actually suspend connected CR’s for the duration of the voice call and then reconnect them when the voice priority CR has been released. In normal circumstances CR’s transition permanently away from the connected state to support higher priority CR’s.

4> Disconnect demoted CR’s, and connect new ones

If there are CR’s currently connected that require to be disconnected then CM first breaks these connections. New APN requests are then attempted.

Status notifications continue to be sent to the calling application to inform of the transitions the CR is going through.

For CR’s that are about to be connected, there state will change from WAITINGFORRESOURCE to WAITINGFORNETWORK as the PDP context is established, and then WAITINGCONNECTION just before CONNECTED status is achieved. At this point the application can make network requests.

Multiple PDP’s and IP Address issues

If a device supports multiple PDP’s and more than one is connected then the device will have multiple IP addresses available, one from each connected PDP context. In this situation, how does an application bind to the right IP address in order to send traffic on the right network?

Connection Manager comes to the rescue here. When an application’s CR becomes connected, CM associates the IP address from the connected PDP context with all subsequent automatic socket bindings from the process. This ensures all your IP traffic goes down the right pipe.

Note that this is done at the *process* boundary. Because IP addresses are associated with the process Connection Manager supports only one CR per process. That’s not to say it will stop you from making more than one CR from within the process but if you do, make very sure they will all resolve to the same PDP context and subsequent IP address.

If your application ignores this restriction and issues two CR’s that connect via different APN’s (and hence required different PDP contexts and subsequently different IP addresses) then the behavior of CM is undetermined . However with the version of WM 6.0 that we are using CM will bind all new sockets to the last successful IP address i.e. it routes all traffic to the APN identified in the last CR, which is probably not what you intended!

Let me use an example to clarify this a bit:

 Consider an app that has two key features, the first feature issues a web service requests to “www.myservice1.com” and CM connects via ‘APN1’ in order to connect to the meta-network that was requested. Once the PDP context is established my process is bound to the IP address from that PDP context, say its ‘1.1.1.1’. Requests and responses flow between my feature and the server just fine.

At a later point the user fires up the second feature of my app that displays a web portal page at “http://privatenetwork1/mysite” and requests a connection via ‘APN2’. Because the network and radio hardware support 2 PDP context, the second connection is also allowed to connect and is given the IP address ‘2.2.2.2’. So now I’m connected to both APN1 and APN2 and the web portal is displayed just fine. However the next time a request from the first feature is created for the web service at “www.myservice1.com”, the underlying socket will be bound to the last successful connection IP address for this process i.e. ‘2.2.2.2’, and the TCP/IP traffic will be sent via ‘APN2’ not ‘APN1’ as was intended. So the web service request is likely to fail.

Handling Transient CR States

I’ve mentioned various states that CR’s can get into. One of the problems we came across time and time again is that code to process Connection Manager state change notifications just doesn’t deal with transient states very well. Transient states like WAITINGFORRESOURCE or WAITINGFORNETWORK are really just hints to the application so that it can update progress or status to the user. Developers commonly write a block of code that processes the CONNECTED state and maybe one or two more, but the default switch statement is set to display a failure message and destroy the CR. The way a developer will typically identify the states that need to be handled is by running the code and tracing the state change messages, then add the code to support that path and bomb out for anything else. But when the code is subsequently run on a different network configuration or on a less isolated device setup with more applications all vying for connection resources, the list of state transitions could change significantly.

Note: Watch out when looking at the SDK samples, they show these same issues. I will see if I can persuade Adam or someone else from the CM team to blog a template state notification handling sample.

Connection Manager issues

I said at the beginning that we’ve been working on some tricky connection issues, as a result of our work we found and fixed ~10 bugs in various different apps. But only one came down to a Connection Manager code issue! Considering the complexity of the CM code and the thrashing we were giving it I have to say that I consider this feature to be totally rock solid – every time we thought it was broken, investigation proved it was an issue in some other app. Great job Adam and team!

Marcus

Posted by marcpe | 6 Comments

MEDC 2007 slides

Well, its all over for another year. So much prep and rushing about, then 2 days later its all done - today seems such an anticlimax.

I want to say thanks to David Goon, John Wyre and the rest of the UK team that so competently put MEDC together year after year, and give us the opportunity (excuse?) to be totally immersed in everything mobile for 48 hours. Also thanks to the highly competent US team that supported each of the global MEDC events, ending with Berlin yesterday - it was also good to match email names with real faces. And that goes for the delegates as well, great to have so many enthusiastic conversations and share experiences - as well as putting faces to email addresses.

So I will try and find time to blog more details from my two sessions, but I've already been asked for the slides so here they are:

APP304  - Bluetooth Communication in Windows Embedded CE and Windows Mobile

APP407 - Inside Windows Mobile Device Provisioning

 Marcus

 

Posted by marcpe | 2 Comments

MEDC 2007 Europe - Bluetooth session APP304

June 26th and 27th will be MEDC Europe in Berlin, and one of the sessions I will be doing is all about programming bluetooth.

For those of you registered to attend the event, one of the demo's I will be using in this session is simple bluetooth client server comms. So to make it a bit more interesting I wanted to share the server code and have as many people download and install onto their device before coming along to the session.

Here is the link to the CAB file (tested on WM6 Standard, but should in theory be fine on Standard or Pro. versions of WM6 and WM5):

BthTestServer.cab

I will share the source after the session.

Marcus

Posted by marcpe | 4 Comments

Vodafone and Microsoft

Check this out:

Vodafone signs new terminal platform agreement with Microsoft:

http://www.vodafone.com/article_with_thumbnail/0,3038,OPCO%253D40000%2526CATEGORY_ID%253D210%2526MT_ID%253Dpr%2526LANGUAGE_ID%253D0%2526CONTENT_ID%253D291534,00.html

 

So that means Microsoft is now one of the three Vodafone phone platforms along with Symbian and Linux.

Thats about all I can say right now, but more on this later...

 

Marcus

Posted by marcpe | 1 Comments

Windows Mobile 5.0 Role security

I've just submitted a new post to my security column (http://www.microsoft.com/uk/msdn/security/default.mspx), but it takes a couple of weeks to hit the web through that route. So I might as well post it here too.

 

Here is the unedited version:

 

Introduction

In my last entry on this column I described the code signing security architecture of Windows Mobile and explained that this forms the first line of defense against running malicious code on the device. However, relying on just the digital signature of executable code is not enough to form a complete or particularly granular level of security for a device. In addition to code signing Windows Mobile also enforces role based security to protect certain assets on the device. In this post we will explore the different facets of the role model implemented on Windows Mobile 5.0 Smartphone and Pocket PC devices.

 

Certificate Security Recap

First we need to recap the code signing process: When any executable module (e.g. .EXE, .DLL. .CPL etc) is presented to the Windows Mobile OS by the program loader, the code signature is extracted and tested against a number of certificate stores and the device policy to determine which ‘mode’ or code group the module will execute in. For Smartphone there are two code groups: Normal – execute with access restrictions to certain API’s and other assets; and Trusted – execute with full access to all assets on the device. Windows Mobile 5.0 Pocket PC has one code group: Trusted.

 

Code groups are maintained at process granularity which means that DLL’s loaded into a process space must execute in the same code group as the exe. Therefore there are some restrictions that the loader enforces when loading DLL’s, for example an exe running in the Trusted code group cannot load a DLL that resolves to the Normal code group, otherwise the DLL would automatically receive an elevation of privileges. Conversely a Trusted code group DLL loaded by an exe running in the Normal code group will force the DLL into the Normal code group to match the containing process.

 

Enough of the recap lets think about roles on the device.

 

Do we need Roles?

Why do we need anything beyond the code signing security? Consider the situation where a system process is used to perform operations on behalf of some non executable content. One very clear example of this is when installing a CAB file on the device. A CAB file is not executable content so the program loader will not be invoked when this file is loaded; however a CAB file can contain installation instructions that require access to assets that are considered restricted and only available in the Trusted code group. For example a CAB file can include instructions to change registry keys such as the HKLM\Security or file operation instructions that affect the \Windows directory.

 

CAB files are processed by WCELOAD.exe application that is signed and runs in the Trusted code group so all assets are available to the process on both Pocket PC and Smartphone. But the CAB file could arrive on the device from any source; through email, from a web download or from removable media. In this situation code signing security does not provide protection against script based instructions that are interpreted by an application running in the Trusted code group, we need something a little more granular.

 

On a desktop system the solution to these type of issues is often implement in terms of user impersonation against object security or Access Control Lists (ACL’s), however Windows Mobile doesn’t understand the concept of object level security and we only ever have one user. So the solution implemented by Windows Mobile is role based access.

 

Role based security requires the OS to protect system assets (files, reg keys, code operations etc) by asserting that the process or operation requesting access has a role flag matching or exceeding the requirements of that asset. So we need a set of role flags, a way of associating role flags with all assets that need protection, and a way of associating role flags with a specific operation.

 

Role Flags

Let’s start with role flags. Role flags are implemented as a bit mask so there are theoretically up to 32 possible roles recognized by Windows Mobile – that’s 32 bits of a single 4 byte value. However today there are eleven publicly documented:

SECROLE_OEM

SECROLE_OPERATOR

SECROLE_MANAGER

SECROLE_USER_AUTH

SECROLE_ENTERPRISE (added with AKU2)

SECROLE_USER_UNAUTH

SECROLE_OPERATOR_TPS

SECROLE_KNOWN_PPG

SECROLE_TRUSTED_PPG

SECROLE_PPG_AUTH

SECROLE_PPG_TRUSTED

 

Full details of these flags are available in the SDK documentation or at MSDN online.

 

For most developers and users the important roles are SECROLE_USER_UNAUTH, SECROLE_USER_AUTH and SECROLE_MANAGER as these can be considered somewhat equivalent to the desktop Guest, standard user and Administrator accounts respectively.

 

Applying Roles to Assets

These flags need to be applied to both the assets and the applications that require access in order for the OS to manage access correctly. For assets such as registry keys, files and directories the roles can be stored in database form using the ‘path’ to the reg key or file as a lookup for the role. This database is called the metabase and is initially configured by the OEM when building the device image. The contents are not limited to just the security role, but can also include other information such as registry key data type, min / max values, default values and local specific strings. The role enforcement is applied within the OS code that accesses these resources, for example when calling a registry function or using a file IO function, the roles will be checked.

 

The OEM’s configuration, and any changes requested by the operator, is applied during cold boot of the device, however it is possible to query and update many of these values through the Metabase Configuration Service Provider. This service provider is available via Client Provisioning XML and can be accessed via the RapiConfig SDK utility, DMProcessConfigXML API call or via WAP push provisioning.

 

If you are adding new assets to the device, role base protection can be extended through this service provider to include your new assets by using simple XML.

 

For more details about provisioning and updating the metabase see the SDK documentation or MSDN online or take a look at my blog site.

 

Applying Roles

Now we have the role flags defined and we have the metabase that describes which roles are required for which assets, the last step is to assign a role to an operation or a ‘message’ that causes an operation to take place. This information obviously needs to be provided in a way that can be trusted, for example it’s not acceptable for a message to just declare ‘hey, I’m running in the manager role’!

 

There are two ways that a role is assigned to an operation or to a message: either through the certificate used to sign the message; or by examining the source of a message and deriving the role from the source.

 

First let’s consider how the role is derived from the source of the message.

 

When a message originates from a running process the security role is derived from the source of the operation by determining which code group the process is running in. For example consider a call to DMProcessConfigXML, used to apply XML configuration to the device, from an application that is running in the Normal code group. In this situation the XML received by DMProcessConfigXML is applied with the role mask of SECROLE_USER_UNAUTH, and the metabase is consulted as each change is applied to determine if appropriate permission exist before the values are read or written. Conversely if the process that calls DMProcessConfigXML is running in the Trusted code group then the XML is applied with the role mask SECROLE_USER_AUTH.

 

In other situations the message can be received through some form of push protocol such as SMS, and in this situation the message can also be signed with a certificate to validate the origin of the message and to verify that the content of the message is unmodified since it was sent. Now we have a certificate available that can directly validate the origin of the message by looking in the cert stores of the device to find an equivalent public key that allows us to validate the message. In addition to the certificate information and public key, each entry in the certificate store can also have a corresponding role mask. For signed messages, when the signing certificate for the message is identified, the role mask is also retrieved and used to process the message.

Verifying Application Installation

This type of message signing is common when an operator is using Over the Air (OTA) provisioning, but it is also used for application installation. Going back to the example of application installation I used earlier, for this purpose there is another, special certificate store called the SPC (Software Publisher Certificates). Every certificate that is installed on the device has an associated role mask, but for most stores this is ignored. When the WCELOAD process opens a CAB file for installation, it firsts looks to see if the CAB is signed. The CAB signature is validated against the SPC store and the role mask of the matching certificate is retrieved. This role mask is then used for all file and registry operations resulting from the installation of that CAB file and the application of any settings required by the CAB.

 

Getting at the Manager Role

If you take a look at the default metabase your will see that many assets require the manager role to modify and in some cases even to read their values. For example by default all of the certificate stores require manager role to manipulate their content.

 

This leads application developers to a problem: imaging your application needs to make SSL / HTTPS connections to a server as part of its normal operation. During installation of the application via cab file you deploy the main application files signed with a certificate rooted in the unprivileged store (execute with minimum permissions! Good practice on any system) and a small configuration application signed with a certificate in the privileged store, used during installation and configuration to update privileged assets. Part of this configuration application will need to install a new certificate into the root certificate store for silent setup of an SSL session by the main application. But this configuration app, even though it’s running in the Trusted code group will only have the SECROLE_USER_AUTH role and will fail to update the root store because the metabase requires SECROLE _MANAGER! And there is no way you can change the role mask for your process as its determined system wide for all Trusted execution apps.

 

So how can you solve this issue? There are two ways. Another aspect of security on Windows Mobile is that of Policies. There are a number of registry keys on the device that define the security policy that is applied on the device. I will cover the details of these settings in my next post but to give you an example, these policies configure information such as whether a 1 or 2 tier security policy is enforced on Smartphone, what permissions are enabled through RAPI (remote API), how unsigned applications and CAB’s should be dealt with.

 

One of these policies is called the Grant Manager policy. This is a role mask used to elevate other roles to the manager role thus avoiding the need to change large sections of the metabase to enable another role to access them. This policy defines a role mask in the registry that indicates which other roles should be considered and applied as if they were Manager – it’s a bit like the Administrators group on a desktop Windows operating system, adding someone to that group elevates them to administrator privileges without changing the individual object permissions.

 

Using this technique for certificate installation is a bit of overkill and potentially unsafe as all applications running in the Trusted code group would then be elevated to manager role on the device. Additionally you still need to find a way of changing the Grant Manager policy, which itself is a Manager operation in the first place.

A better solution would be to talk to your operator and have them sign a CPF (special CAB file containing just XML provisioning) with an operator certificate that has the manager role flag assigned already. That way the security model remains intact and the operator remains in control of the privileged content of the device.

 

And Finally …

Pretty much everything to do with security on Windows Mobile is configurable. The metabase, security policies, roles assigned to certificates etc.

 

What I have described here is generalized to the default settings that Microsoft recommends to its OEM and operator partners. However the final security model found on your retail device is defined by your OEM and Operator, and can vary significantly from the default. It’s always worth checking the actual device settings and join your operator’s developer program to ensure you get the latest and most accurate information for your device.

 

There is still more to Windows Mobile security. I will cover device Policies in the next article to complete this part of the OS security.

 

Marcus

Posted by marcpe | 6 Comments

MSDN Nuggets site updated

I noticed this morning that the UK developer group has finished their long anticipated overhaul of the MSDN Nuggets page: http://www.microsoft.com/uk/msdn/events/nuggets.aspx

Nuggets are short (e.g. 10 to 15 mins) videos focusing on a specific feature or technology. No marketing, no fluff, just raw information. And now the site is categorized, searchable and much easier to use! There are a bunch of new nuggets waiting to be published and should hit the site over the coming weeks.

Marcus

Posted by marcpe | 0 Comments

Secure Clock

I've recently been working on an interesting problem for detecting clock changes in Windows Mobile 5.0.

The problem I’ve been trying to solve is this: lets say you have a business operation that takes place at date/time x and the data is only valid for y minutes past that point: so that beyond x+y the content or data generated by the business process cannot be accessed any more. Sounds pretty straight forward except that y is likely to be a number of days and the business process (or even the device) will be restarted in that time.

The initial solution might be to take the file time stamp x, add y minutes and ensure its greater than 'now' date/ time. That will work so long as the system date / time remains unchanged, but in Windows Mobile there is no way of knowing if the system date changes... well that’s not entirely true: there is a system notification that occurs when the date changes, but if you register for that event and receive notification, by the time you get the notification the clock has already changed and there is no way of knowing by how much it changed!

I don’t really want to stop people from changing the time on the device, but I need to know how much the time has changed since the content was retrieved in order to accurately calculate when the content expires. So the validity test should really be 'now' < x + y + delta where delta is the cumulative time changes to the clock. So what I am really looking at is some way of implementing a "linear clock" rather than a "secure clock", and a linear clock is something I can use to determine elapsed time.

So how can it be done in Windows Mobile 5.0? The design I’m working to is pretty straight forward and relies on a couple of premises:

1>    The clock cannot be altered outside the OS (e.g. through the bootloader)

2>    Protection of the clock is only required from UnTrusted applications – e.g. trusted applications are trusted not to mess with the reg keys, and notification events.

The principals of the design are based on the GetTickCount API that returns the number of ticks (milliseconds in WM 5.0) that the OS has been running.

1>    Set an event to fire when the system starts

2>    Set an event to fire when the clock is changed

3>    Set an event to fire every 2^32 milliseconds (minus a small buffer for error)

1>    When the system start event fires take the time and tick count and store them in a secure registry location.

2>    When the periodic timer fires take the time and tick count and overwrite the secure reg location

3>    When the time is changed:

a.       Read the stored time and tick count in the registry

b.      Add the tick count to the stored time and take the delta with ‘now’

c.       Read the existing delta from the registry and add the two together

d.      Store the delta back to the registry

An app can then calculate the linear time by adding the delta value held in the registry to the current clock value. Alternatively the application could register for State Notify broker of the delta registry key and then it will receive an event when the clock changes, and again can calculate the linear time. 

From a security perspective the SecureClock app needs to be signed and running in the Trusted code group otherwise it cant update the registry.

These are the important bits of my test implementation:

Registration step:

void Register()

{

      // register for device wake up

      if (0==CeRunAppAtEvent(buff, NOTIFICATION_EVENT_WAKEUP))

      {

        DEBUGMSG(TRUE, (TEXT("Failed to set Wakeup event in SecureClock register\r\n")));

            return;

      }

 

      // Now register for time change

      if (0==CeRunAppAtEvent(buff,NOTIFICATION_EVENT_TIME_CHANGE))

      {

        DEBUGMSG(TRUE, (TEXT("Failed to set TimeChange event in SecureClock register\r\n")));

            return;

      }

}

Unregister step – if the whole things needs to be pulled out:

void Unregister()

{

      // Clear all entries for this app.

      if (0==CeRunAppAtEvent(buff, NOTIFICATION_EVENT_NONE))

      {

        DEBUGMSG(TRUE, (TEXT("Failed to clear Wakeup event in SecureClock Unregister\r\n")));

      }

}

 

Device startup and periodic timer:

 

void ProcessRegReset()

{

      // Clear registry info and write time and system ticks

      SYSTEMTIME sysTime;

      DWORD ticks = 0;

      HKEY hKey=0;

      DWORD dispo;

 

      if (!GetTimeAndTicks(&sysTime, &ticks))

      {

            goto error;

      }

 

      if (ERROR_SUCCESS != RegCreateKeyEx(HKEY_LOCAL_MACHINE, TEXT("SYSTEM\\SecureClock"), 0, TEXT("SecureClock DEMO"),REG_OPTION_NON_VOLATILE, 0,0,&hKey, &dispo))

      {

        DEBUGMSG(TRUE, (TEXT("Error creating the reg key\r\n")));

            goto error;

      }

 

      // Set run option next execution

      // Setup next run time.

      FILETIME TempNowFileTime;

 

      // Convert NOW to filetime

      if (0==SystemTimeToFileTime(&sysTime,&TempNowFileTime))

      {

        DEBUGMSG(TRUE, (TEXT("Failed to convert system time to file time\r\n")));

            goto error;

      }

 

      // build trigger time

      __int64 base = 0xFFFF0000;    //set it to 2^32 miliseconds - 0xFFFF (allow a 66 second error - timer granulrity is OEM defined but defaults to 10 seconds)

      base = base * 10000; // turn the mili's to 100 nano's

 

      // Add Base time to Now

      // Stuff the 2 FILETIMEs into their own __int64s.

      __int64 t1 = TempNowFileTime.dwHighDateTime;

      t1 <<= 32;                   

      t1 |= TempNowFileTime.dwLowDateTime;

 

      // Write the tick count and time (as filetime

      if (ERROR_SUCCESS!= RegSetValueEx(hKey, TEXT("TickSync"), 0,REG_DWORD, (BYTE*)&ticks, sizeof(ticks)))

      {

        DEBUGMSG(TRUE, (TEXT("Failed to set TickSync\r\n")));

            goto error;

      }

 

      if (ERROR_SUCCESS!= RegSetValueEx(hKey, TEXT("TimeSync"), 0,REG_BINARY, (BYTE*)&t1, sizeof(t1)))

      {

        DEBUGMSG(TRUE, (TEXT("Failed to set TimeSync\r\n")));

            goto error;

      }

 

      if (ERROR_SUCCESS!= RegFlushKey(hKey))

      {

        DEBUGMSG(TRUE, (TEXT("Failed to flush registry\r\n")));

            goto error;

      }

 

      if (ERROR_SUCCESS!= RegCloseKey(hKey))

      {

        DEBUGMSG(TRUE, (TEXT("Failed to close reg key\r\n")));

            goto error;

      }

      hKey = 0;

 

      // Add the 64bit ints.

      t1  = t1 + base;

 

      // Set it back to the file time

      TempNowFileTime.dwHighDateTime = (long)(t1>>32);

      TempNowFileTime.dwLowDateTime = (long)(t1);

 

      // Convert back to system time

      if (!FileTimeToSystemTime(&TempNowFileTime,&sysTime))

      {

        DEBUGMSG(TRUE, (TEXT("Failed to convert to Systime\r\n")));

            goto error;

      }

           

      if (!CeRunAppAtTime(buff,&sysTime))

      {

        DEBUGMSG(TRUE, (TEXT("Failed to call CeRunAppAtTime\r\n")));

            goto error;

      }

 

      return;

error:

      if (hKey)

            RegCloseKey(hKey);

 

      return;

}

 

And last but not least the time change function:

void ProcessTimeChanged()

{

      // Time has changed. Calculate delta and store in the registry

      // take new time, now ticks

      SYSTEMTIME NewTime;

      __int64 OldTime;

      DWORD NowTicks = 0;

      DWORD StoredTicks = 0;

      HKEY hKey=0;

      DWORD dispo;

 

      if (!GetTimeAndTicks(&NewTime, &NowTicks))

      {

            goto error;

      }

 

      // calc diff ticks from the registry and now ticks

      if (ERROR_SUCCESS != RegCreateKeyEx(HKEY_LOCAL_MACHINE, TEXT("SYSTEM\\SecureClock"), 0, TEXT("SecureClock DEMO"),REG_OPTION_NON_VOLATILE, 0,0,&hKey, &dispo))

      {

        DEBUGMSG(TRUE, (TEXT("Error creating the reg key\r\n")));

            goto error;

      }

 

      DWORD type;

      DWORD size = sizeof(DWORD);

      if (ERROR_SUCCESS!=RegQueryValueEx(hKey, TEXT("TickSync"), 0, &type, (BYTE*)&StoredTicks, &size))

      {

        DEBUGMSG(TRUE, (TEXT("Failed to read TickSync\r\n")));

            goto error;

      }

 

      size = sizeof(OldTime);

      if (ERROR_SUCCESS!=RegQueryValueEx(hKey, TEXT("TimeSync"), 0,&type, (BYTE*)&OldTime, &size))

      {

        DEBUGMSG(TRUE, (TEXT("Failed to read TimeSync\r\n")));

            goto error;

      }

 

      DWORD TickDifference = 0;

      if (StoredTicks >NowTicks)

            TickDifference = NowTicks + (0xFFFFFFFF - StoredTicks);

      else

            TickDifference = NowTicks-StoredTicks;

 

      // add diff ticks to stored time

      __int64 base = TickDifference;

      base = base * 10000; // turn the mili's to 100 nanos

 

      // Add Base time to Now

      OldTime  = OldTime + base;

 

      // calc difference from now time

      FILETIME TempNewFileTime;

      // Convert NOW to filetime

      if (!SystemTimeToFileTime(&NewTime,&TempNewFileTime))

      {

        DEBUGMSG(TRUE, (TEXT("Failed to convert SysTime to FileTime\r\n")));

            goto error;

      }

 

      __int64 t2 = TempNewFileTime.dwHighDateTime;

      t2 <<= 32;                   

      t2 |= TempNewFileTime.dwLowDateTime;

 

      __int64 timediff = OldTime-t2;      // Calc the difference in nano seconds (SIGNED!)

 

      // store the difference

      __int64 regDelta = 0;

      size = sizeof(regDelta);

 

      if (ERROR_SUCCESS!=RegQueryValueEx(hKey, TEXT("TimeDelta"), 0, &type, (BYTE*)&regDelta, &size))

      {

        DEBUGMSG(TRUE, (TEXT("No TimeDelta found - continuing \r\n")));

            //NO ERROR - just reset the delta to 0;

            regDelta = 0;

      }

 

      regDelta+=timediff;

      type=REG_BINARY;

      if (ERROR_SUCCESS!=RegSetValueEx(hKey, TEXT("TimeDelta"), 0, type, (BYTE*)&regDelta, sizeof(regDelta)))

      {

        DEBUGMSG(TRUE, (TEXT("Failed to set TimeDelta \r\n")));

            goto error;

      }

 

      if (ERROR_SUCCESS!=RegCloseKey(hKey))

      {

        DEBUGMSG(TRUE, (TEXT("Failed to close reg key\r\n")));

            goto error;

      }

      hKey = 0;

 

      // Re-synch the reg keys

      ProcessRegReset();

 

      return;

error:

      if (hKey)

            RegCloseKey(hKey);

 

      return;

}

 

If you want to see the whole thing and try it out, grab the ZIP (with a test harness) here

Usual code disclaimer applies.

Marcus

Posted by marcpe | 3 Comments

MEDC Session demos

The European MEDC is over! All the build-up, all the prep and its all over so fast (uh oh, starting to show my age :) ). It was great to meet up with so many people from the industry as well as old friends and to feel a real buzz about mobile devices.

If you attended then you will be receiving a nice shiny post conference DVD with all the slides in the next few weeks, but I promised that I would post my demo material here for those that asked.

I presented 3 sessions this year which all seemed to go reasonably well - apart from the emulator dropping out in the provisioning session (thanks for the help Keni). I've zipped up the demo's and the slides from each session. Feel free to grab them and dig through, but remember its all sample code and provided 'as is' with no support.

Advanced Windows Mobile

NOTE: for the camera streaming to work you need to modify the IP address hard coded in the CameraCapture project to point at your own desktop machine. Also the web site for "WimoBot" that Brian Cross demo'ed is here: http://www.wimobot.com/

Local Authentication Sub System

Provisioning

Marcus

Posted by marcpe | 1 Comments

Timed Camera Capture - Update

Regards the Timed Camera Capture sample I posted last month, lots of you came across a bug in my code that slipped through my rigorous testing. I did a little code tidy up before I posted the ZIP and inadvertently removed an important line from the graph setup steps. Of course I tested it thoroughly before posting but its never a good thing to test your own code!

Anyway I have updated the ZIP file and posted over the top of the old ZIP. Here is the link: http://marcusperryman.members.winisp.net/BlogStuff/TimedCamera.zip

The error was in CGraphManager::CreateCaptureGraphInternal() where the filter graph is initially built. To make sure you have the right ZIP its 46.7kb (reported by explorer download window).

We apologies for the inconvenience.

Marcus

 

Posted by marcpe | 10 Comments
More Posts Next page »
 
Page view tracker