Connection Manager

We’ve been working on a bunch of very tricky connection issues recently that came to light when creating a network configuration setup for a Mobile Operator (MO). The trickiness comes as a result of the complex network topology and the requirements and limitations from the MO:

1>     The MO network supports only one PDP context (explanation later)

2>     The device is required to connect to different APN destinations depending on the traffic type – for example certain streaming content must go through a proxy server, but MMS traffic is only available from a dedicated APN.

PDP Contexts

First of all let me explain a PDP context. PDP (Packet Data Protocol) context refers to the instance of shared session state between a handset radio and base station software. It contains important information such as APN and IP address. You can get a fuller explanation of PDP context on Wikipedia here.

For a GSM device to send and receive data it must first establish a PDP context. The context is established by the device making a request to the base station, passing the name of the desired APN. The base station will typically forward the PDP request along with the APN and handset IMSI (SIM number) into the MO’s billing network so that access can be verified – e.g. SIM is allowed (i.e. has a valid contract, and enough credit) to make a data connection to that APN. On success the base station will reply to the handset supplying extra information such as the handset IP address, at which point the context is ready to go.

Multiple PDP Contexts

You might wonder why a handset needs more than one PDP context. In some situation the device must use more than one APN in order to send and receive data to the right network endpoints – for example if the MO applies billing and controls data access through specific APN’s, or has secure information behind a dedicated, protected APN. Once a PDP context has been established the parameters of the context cannot be changed – this means the APN endpoint cannot be dynamically updated, and instead the context must be dropped and a new context created.

Earlier I mentioned a request to setup a PDP context typically routes through to the MO’s server infrastructure for validation. This process takes time and depending on the MO’s infrastructure can take up to 20 seconds to create a context (although the average time inside the MO’s network is more like 5 seconds). For performance alone it’s desirable to allow more than one PDP context especially when the MO requires multiple APN’s for things like billing.

Single PDP Contexts

For some of the first generation 3G networks, the base station software doesn’t support multiple PDP contexts, and when the handset needs to change APN in this scenario the existing PDP context must be dropped before a new one can be requested.

At this time the vast majority of 2.5G networks support multiple PDP contexts.

The available number of contexts that Windows Mobile can use is defined by the base station capabilities combined with the radio hardware capabilities. Only when both handset and base station support multiple simultaneous PDP contexts is there a possibility to use this feature.

Many new handsets support  ~3  contexts.

Connection Manager

Ok so in a single PDP context scenario there might be a small delay for the user – not great, but acceptable in the unusual situation where the APN needs to be switched. But what’s all this got to do with connection manager, I hear you ask.

For many (less capable? J) smartphone handsets not running Windows Mobile, when the setup requires multiple APN’s to be configured, the common solution is for each application to ask the user to select the network APN to use for that application. Additionally once the application is running it will typically expect to hold that connection exclusively until the application is closed by the user, at which time the connection is dropped and another application can be started and connect to a different APN.

Windows Mobile implements Connection Manager that takes the complexity of choosing a connection away from the user… after all, who should really know which connection is the right one, Mobile Operator or user? Hey if the MO / application doesn’t know which connection to use, the user hasn’t got a hope! The Windows Mobile approach allows more flexibility for applications and enables the concept of ‘background’ or ‘always-connected’ applications without writing a large amount of logic for each application.

For details of Connection Manager (CM) check out the Windows Mobile SDK documentation or online here.

Modeling the MO’s Network

Connection manager requires the MO’s network topology to be modeled using a number of settings.  The basic components of this model are as follows:

1>     There are a number of ‘Meta’ Networks defined as GUID’s in the registry. These identify connection destinations such as ‘the internet’ or ‘work’ or ‘Secure Wap’. There is a list of meta network guids published in the public documentation and used to identify common destinations such as ‘Internet’, however the list is fully extensible. (Use CM_Networks CSP for provisioning)

2>     There are a number of GPRS entries in the registry that define the information required to make a data connection through a specific APN. These settings contain information like user name, password and APN name. Additionally these settings also identify a destination meta network – this is the network destination that will be available by connecting to the APN. The simplest case would be something like a GPRS entry that connects to the internet network. (Use CM_GPRSEntries CSP for provisioning)

3>     There can also be a number of WiFi and dial-up entries to define other connection’s that could be made to reach a meta-network.

4>     Additionally there can be a number of VPN and Proxy entries. These are slightly different types of connection because a VPN or a proxy enable a connection from one network to be changed into a connection to another network. For example using a VPN might allow an internet connection to become a work network connection. So these entries have both a source meta-network and a destination meta-network.

One more configuration settings needs to me mentioned, although not strictly to model the network topology, and that the Mappings table (Use CM_Mappings CSP for provisioning).  This table allows applications to defer the choice of meta-network required for a particular resource or URL to the MO or OEM that configures the device. Without the Mappings table an application developer must either hard wire the destination GUID into code, or provide a configuration option for the user to see and selecting the meta-network – back to the old issue of relying on the user!

The Mappings table contains an ordered list of URL patters matched with a meta network GUID and can be interrogated by using the Connection Manager API: ConnMgrMapURL. The idea is that an application will pass the required URL to ConnMgrMapURL call, and that API in turn will interrogate the table, starting at the first entry to match the URL pattern, moving through the table until a match is found or the table ends. When a match has been located the associated meta network guid is returned. The calling code can then use this meta network in a call to ConnMgrEstablishConnection when trying to connect.

This type of lookup is a requirement for a general purpose application like IE Mobile or Windows Media Player, where any number of URL’s could be supplied. However it’s also a great way for other applications to ensure they support the widest range of network topologies.

Here is an example of what the CM_Mappings table might look like:

<wap-provisioningdoc>

       <characteristic type="CM_Mappings">

              <characteristic type="501">

                     <parm name="Pattern" value="*://*/*.3gp"/>

                     <parm name="Network" value="{D3B2D798-9E69-4B65-A75B-6DDFBECEAAAA}"/>

              </characteristic>

              <characteristic type="610">

                     <parm name="Pattern" value="*://*.operator.*"/>

                     <parm name="Network" value="{7022E968-5A97-4051-BC1C-C578E2FBA5D9}"/>

              </characteristic>

              <characteristic type="536870912">

                     <parm name="Pattern" value="wsp://*/*"/>

                     <parm name="Network" value="{7022E968-5A97-4051-BC1C-C578E2FBA5D9}"/>

              </characteristic>

              <characteristic type="553648128">

                     <parm name="Pattern" value="wsps://*/*"/>

                     <parm name="Network" value="{F28D1F74-72BE-4394-A4A7-4E296219390C}"/>

              </characteristic>

              <characteristic type="570425344">

                     <parm name="Pattern" value="*://*.*/*"/>

                     <parm name="Network" value="{436EF144-B4FB-4863-A041-8F905A62C572}"/>

              </characteristic>

              <characteristic type="587202560">

                     <parm name="Pattern" value="*://*/*"/>

                     <parm name="Network" value="{A1182988-0D73-439E-87AD-2A5B369F808B}"/>

              </characteristic>

       </characteristic>

</wap-provisioningdoc>

 

The numeric type value defines the order that the mappings will be examined, starting at 0 and going up, so entry “501” will be examined before “610” and so on.  The pattern parameter is not a full regular expression but allows quite a range of flexibility. ‘*’ is used as the wild character so for example “*://*.operator.*/*” will map to any protocol, any address that contains the text pattern ‘.operator.’ followed by a ‘/’ and any training page name. For example the following URL strings would match this destination network:

·         “http://www.operator.com/”

·         “rtsp://rtsp.operator.media.com/20070707_ABHDD3227DD/today_news1.sd”

·         “https://my.long.name.operator.da/index.aspx”

 

If you want to examine the mappings for your device, run the following XML via RapiConfig.exe:

<wap-provisioningdoc>

    <characteristic-query type="CM_Mappings" />

</wap-provisioningdoc>

How Connection Manager makes connections

Ok, once we’ve got a model for the network, Connection Manager can now do its magic and take away much of the pain of choosing connections.

When my code needs to make a connection I call ConnMgrEstablishConnection and pass it a populated CONNMGR_CONNECTIONINFO structure. The important bits are as follows:

DWORD dwFlags;

Specify stuff like Proxy Aware. If at all possible, write proxy aware code (HTTP proxy is usually enough). It’s not hard! Just make use of the InternetOpen or InternetOpenURL WinInet API’s and its all pretty much done for you!

 

DWORD dwPriority;

The priority is very important if you want your app to play well with other applications on the device.

 

BOOL bExclusive;

Stops connection manager from sharing the physical connection with other applications even if they request the same destination meta network. Use with care!

 

BOOL bDisabled;

Useful to determine if CM can even find a way of connecting to your meta network. But setting this flag means this request will never result in a connection happening.

 

GUID guidDestNet;

Your meta network destination returned from ConnMgrMapURL().

 

HWND hWnd; UINT uMsg; LPARAM lParam;

Where to send status update messages – e.g. when the state of a connection changes. What message to send and what lParam value to add in

When connection manager is called it checks the parameters and uses the connection Request (CR) data to track the life of the connection, releasing the data when ConnMgrReleaseConnection is called for the CR. The following is a summary if the main steps that Connection Manager takes when processing a CR:

1>  Verify that the destination meta network GUID can be reached and work out the required connection path.  

This is where connection manager looks at the network settings  and  meta-network destination, GPRS / WiFi entries and proxy settings to find the best set of connections that could be used to satisfy the CR. It’s worth noting, if your app does not tell Connection Manager that it supports proxy’s then no proxy server entries will be used when calculating the required path… so make sure you add proxy support to your code and to the CM request!!!  Many of the .NET CF managed classes (HttpWebRequest for example) already support proxy connections – I’ve got a little sample that I might post later to show this working.

Connection Manager can choose to use more than one connection entry in order to satisfy your request, for example in order to get to ‘Work’ network there might be a GPRS connection plus a VPN connection. Connection Manager can also choose a different path depending on the current connection state of the device, for example if the device is not connected to any network CM might choose a direct GPRS connection. However if the device is currently connected to a Work network then it might choose just to reply with proxy information if available. There is quite a bit of complexity in this process, so check the docs if you want more detail.

If connection Manager cannot find an appropriate path to the destination the application will be notified and no further steps taken.

2> State transition journey begins

Every connection request that is accepted by the ConnMgrEstablishConnection will travel through a finite number of transient states before ending up eventually at one of the 5 or so non transient, closed states (I consider the ‘connected’ state to also be transient).  Once a connection is in a non-transient state that CR will never again change status, so if the calling application needs another connection it needs to close the existing CR and re-request a new CR.

These state transitions are reported back to the calling application via windows messages using the parameters supplied to the connection request. The first transition takes place right after the connection has been requested.

Adam Dyba is the expert on this stuff and although his blog posts are rare, there is a very useful description here of state transitions including a diagram, which is worth many thousands of words!

3> Consider all active CR’s and calculate the next steps

Once a path has been identified for the connection, the next action is to trigger a resource allocation process. This involves CM first sorting all CR’s by request time (newest first) within connection priority order. Resources are then allocated to the connections from the top of the list down. Therefore the most recent, highest priority CR gets connected first, and the lowest oldest CR is last in the list.

Let me pull out a couple of things from this:

·         Going back to PDP contexts, if only one PDP context is available then only the top CR in the list will get connected. If a new request is lower priority it will receive WAITINGFORRESOURCE message to say that the network connection is busy. There are some additional complexities here, for example if a lower priority CR is requesting the same meta-network as the CR at the top of the list – take a look at the docs for a more complete explanation of all the different situations that are supported.

 

·         If more than one PDP context is available then Connection Manager can try to connect a number of CR’s, starting at the top of the list. Again, if there are more CR’s than PDP context the remaining CR’s will receive WAITINGFORRESOURCE notification messages.

 

·         New Connection Requests will always take priority over existing Connection Requests if they are at the same priority.

Think about this for a sec and you will realize that’s exactly what users want, but there’s a trap here for an unwary dev to fall into. Let’s say my application needs to be ‘always connected’ so that updates can flow back and forth to the server as required. Also my application requires connection to a private APN that offers no ‘internet’ access.

My application connects and starts to transfer data back and forth, but the user gets bored  and fires up a web browser to see how well her stock portfolio is doing. To access general internet content a different, unprotected APN connection is configured so the browser issues a CR for the new destination meta-network.

If the connected base station and radio hardware support multiple PDP contexts then the second CR will also be connected and both apps can quite happily run side-by-side.

However if there is only one PDP context, then the newer browser CR will trump my application’s request and Connection Manager will force closed the existing CR to the work APN, then connect the newer browser connection. Aha! But I wrote my application to be resilient to connection failures, knowing that the user is likely to be traveling a lot. As soon as the application detects that the connection has been lost(receives a state transition from CM saying it’s been disconnected), it re-requests the connection.  The new request is received by CM and because it’s at the same priority but newer than the browser CR, it wins and CM tears down the browser CR to connect my applications session. The browser is notified of the disconnected CR and reports an error to the user. ‘Darn!’ says the user and tries again, but the browser will never connect because of the aggressive retry logic built into my application.

So what’s the app developer doing wrong?

First of all it’s important to use the correct priority for the connection: USERINTERACTIVE is exactly that! If you want to use this priority, make sure the user really has interactively caused the connection to be established – like a browse request to a browser application, or a sign-in request. If the application changes from user interactive to the background – i.e. the user launches a new app or a dialog is displayed over your application – then change the priority of your connection to USERBACKGROUND using ConnMgrSetConnectionPriority. Doing this should make connection race less likely.

The second thing is to consider implementing some form of connection back-off. What I mean is that if the connection is broken, don’t immediately re-issue the CR, instead wait for a few seconds before trying  again. If it fails the new request increase (or double) the delay, and keep doing that up to a maximum delay value. When the connection is next connected successfully, clear the back-off delay value. This will also make race conditions much less likely to occur.

The resource allocation process can be triggered by a number of other API’s calls beyond just ConnMgrEstablishConnection. Changing a CR’s priority, releasing a CR and a scheduled connection event occurring will all cause the resource allocation process to take place.

It’s also worth noting that when the device makes a voice call, it’s started by requesting a connection to Connection Manager. There is a reserved, ‘highest’ priority flag available for voice calls: CONNMGR_PRIORITY_VOICE, that overrides all other priorities. Only after CM has connected the voice CR can the call take place. This is a slightly different case, because CM will actually suspend connected CR’s for the duration of the voice call and then reconnect them when the voice priority CR has been released. In normal circumstances CR’s transition permanently away from the connected state to support higher priority CR’s.

4> Disconnect demoted CR’s, and connect new ones

If there are CR’s currently connected that require to be disconnected then CM first breaks these connections. New APN requests are then attempted.

Status notifications continue to be sent to the calling application to inform of the transitions the CR is going through.

For CR’s that are about to be connected, there state will change from WAITINGFORRESOURCE to WAITINGFORNETWORK as the PDP context is established, and then WAITINGCONNECTION just before CONNECTED status is achieved. At this point the application can make network requests.

Multiple PDP’s and IP Address issues

If a device supports multiple PDP’s and more than one is connected then the device will have multiple IP addresses available, one from each connected PDP context. In this situation, how does an application bind to the right IP address in order to send traffic on the right network?

Connection Manager comes to the rescue here. When an application’s CR becomes connected, CM associates the IP address from the connected PDP context with all subsequent automatic socket bindings from the process. This ensures all your IP traffic goes down the right pipe.

Note that this is done at the *process* boundary. Because IP addresses are associated with the process Connection Manager supports only one CR per process. That’s not to say it will stop you from making more than one CR from within the process but if you do, make very sure they will all resolve to the same PDP context and subsequent IP address.

If your application ignores this restriction and issues two CR’s that connect via different APN’s (and hence required different PDP contexts and subsequently different IP addresses) then the behavior of CM is undetermined . However with the version of WM 6.0 that we are using CM will bind all new sockets to the last successful IP address i.e. it routes all traffic to the APN identified in the last CR, which is probably not what you intended!

Let me use an example to clarify this a bit:

 Consider an app that has two key features, the first feature issues a web service requests to “www.myservice1.com” and CM connects via ‘APN1’ in order to connect to the meta-network that was requested. Once the PDP context is established my process is bound to the IP address from that PDP context, say its ‘1.1.1.1’. Requests and responses flow between my feature and the server just fine.

At a later point the user fires up the second feature of my app that displays a web portal page at “http://privatenetwork1/mysite” and requests a connection via ‘APN2’. Because the network and radio hardware support 2 PDP context, the second connection is also allowed to connect and is given the IP address ‘2.2.2.2’. So now I’m connected to both APN1 and APN2 and the web portal is displayed just fine. However the next time a request from the first feature is created for the web service at “www.myservice1.com”, the underlying socket will be bound to the last successful connection IP address for this process i.e. ‘2.2.2.2’, and the TCP/IP traffic will be sent via ‘APN2’ not ‘APN1’ as was intended. So the web service request is likely to fail.

Handling Transient CR States

I’ve mentioned various states that CR’s can get into. One of the problems we came across time and time again is that code to process Connection Manager state change notifications just doesn’t deal with transient states very well. Transient states like WAITINGFORRESOURCE or WAITINGFORNETWORK are really just hints to the application so that it can update progress or status to the user. Developers commonly write a block of code that processes the CONNECTED state and maybe one or two more, but the default switch statement is set to display a failure message and destroy the CR. The way a developer will typically identify the states that need to be handled is by running the code and tracing the state change messages, then add the code to support that path and bomb out for anything else. But when the code is subsequently run on a different network configuration or on a less isolated device setup with more applications all vying for connection resources, the list of state transitions could change significantly.

Note: Watch out when looking at the SDK samples, they show these same issues. I will see if I can persuade Adam or someone else from the CM team to blog a template state notification handling sample.

Connection Manager issues

I said at the beginning that we’ve been working on some tricky connection issues, as a result of our work we found and fixed ~10 bugs in various different apps. But only one came down to a Connection Manager code issue! Considering the complexity of the CM code and the thrashing we were giving it I have to say that I consider this feature to be totally rock solid – every time we thought it was broken, investigation proved it was an issue in some other app. Great job Adam and team!

Marcus