Welcome to MSDN Blogs Sign in | Join | Help

Big Iron

The ability to try out bug fixes is a valuable part of the Hopper process and re-building images quickly is one of the easiest ways to improve your Hopper numbers. We recently went about trying to find the combination of build server parts that makes short work of our builds and wanted to share with the OEM community.

 

Step 1: Start with a fast quad-core computer with dual hard-drives. Doesn’t have to be the fastest processor group out there, but use something capable. Don’t worry about the size of the hard-drives.

 

Step 2: Buy two (2) 1-Terabyte SATA drives. The more expensive ones are rated for RAID, we went “on the cheap” and opted for the non-rated drives. You will also need to buy bay adapters and SATA cables for final connections. Hint: get the 90-degree angle connectors on the SATA cables.

 

Step 3: Connect the new drives per your manufacturer’s documentation. This step is intentionally vague since each build machine will have different configurations – please do you best. Enable SATA Raid in the Bios, reboot. Configure RAID-2 Array in BIOS, reboot. We found it best to stripe the two smaller drives as your C: drive and then stripe the 1TB drives as D:.

 

Step 4: Reload the OS making sure the OS is running from C. Install your Windows Mobile AK to the D: drive and build from there. Verify that all indexing programs are turned OFF. We have also seen some gains by pointing the TEMP directory to a RAMDRIVE.

 

Happy Building!

Posted by shende | 1 Comments

Troubleshooting guide - part 2

Below is the second half of a well executed document from guestRx: Bulent Elmaci. Bulent has worked with Windows Mobile debugging for a long time and backs up his writing with a lot of experience. It is the first of a series of articles he has written to help our OEM's become better debugging developers. Enjoy!

TOOLS

Using correct tools to collect relevant and sufficient data, and to analyze them not only makes the life a little easier for the engineers, it also helps making sure correct information can be gathered and correct conclusions can be reached, that would otherwise be missed if things were done only manually. It also shortens the investigation time and thus leaves more time to better assessing and implementing resolution alternatives.

Below, some of the important tools are discussed briefly that are available to Windows Mobile partners, with the goal of presenting where they are applicable/usable, how they can be located and available references for detailed usage.

CELog

CELog is an event-tracking tool that can log both a set of predefined kernel events (related to processes, threads, synchronization, memory, etc.) as well as application defined ones. It collects the data in a user-configurable location that can later be viewed using various tools. The data captured can be divided into “zones” to be able to filter out events that are relevant to the problem. More details can be found in the following pages:

-          MSDN documentation

-          Windows CE Base Team Blog

o   http://blogs.msdn.com/ce_base/archive/2005/11/30/a-tour-of-windows-ce-performance-tools.aspx

o   http://blogs.msdn.com/ce_base/archive/2006/01/23/516586.aspx

Perfman

Perfman is an on-device tool that helps collecting CELog data on a Windows Mobile powered device without needing any connection to the device. It provides a simple user interface to make it easier to start and control the logging on the device. More details can be found in the following pages:

-          OEM documentation topic “Tuning the Platform for Optimum Performance”

-          Windows CE Base Team Blog

o   http://blogs.msdn.com/ce_base/archive/2005/11/30/a-tour-of-windows-ce-performance-tools.aspx

Application Logs

Even though this is obvious, it can easily be missed (if it is not being overused). For issues that originate from applications or components for which you have access to the sources, application level logging/tracing can be very powerful to help narrow down the problem area.

Platform Builder Debugging

If a consistent repro of the issue is available, you have a good understanding of the possible components that the issue might be related to, and you have access to a KITL enabled device, live debugging using Platform Builder would be the best way to go. This is true even if you don’t have the correct symbols. Following pages provide good tricks that can aid in debugging using Platform Builder:

-          Resolving symbols manually on Windows CE

-          Windows CE Virtual Memory Layout for Debugging

-          Platform Builder Debug Symbols

-          Virtual Memory and Thread Stacks

Netlog

Netlog is an on-device network packet sniffer (a.k.a. network monitor) that can be used to log network activity on the device. The output of Netlog can be viewed with popular desktop viewer tools like Microsoft Network Monitor, or Wireshark/Ethereal. More details can be found in the following pages:

-          MSDN Documentation

o   http://msdn.microsoft.com/en-us/library/ms883125.aspx

o   http://msdn.microsoft.com/en-us/library/aa926774.aspx

o   http://msdn.microsoft.com/en-us/library/aa922639.aspx

o   http://msdn.microsoft.com/en-us/library/ms886701.aspx

-          Windows CE Networking Team Blog

o   http://blogs.msdn.com/cenet/archive/2004/10/11/getting-network-captures-on-windows-ce.aspx

RIL Proxy Logs

RIL Proxy is a component that is part of the Radio Interface Layer (RIL) architecture. Besides its main function, it also serves the purpose of being able to log radio events that originate either from the radio or from the operating system. RIL Proxy logs can be very useful especially for troubleshooting radio or connectivity related issues. More details can be found in the following pages:

-          MSDN Documentation

-          Windows CE Networking Team Blog

o   http://blogs.msdn.com/cenet/archive/2005/09/27/474650.aspx

-          Hopper Blog

o   http://blogs.msdn.com/hopperx/archive/2007/07/26/using-the-radio-interface-layer.aspx

RapiConfig

RapiConfig is a desktop configuration tool, part of the Windows Mobile 6 SDK, which allows applying a provisioning XML on a device/emulator connected using ActiveSync. It can be used both to make quick configuration changes (e.g. change Connection Manager settings, make registry changes, etc.), and to capture current configuration on the device (e.g. read Connection Manager settings, read a registry value, etc.).

Usage of RapiConfig is controlled by security settings on Windows Mobile which means that any “RapiConfig request” could be rejected if RAPI API usage is disabled (look for policy 4097 in the “OEM Documentation”). In such a case security configuration of Windows Mobile device must be changed before using RapiConfig tool.

More details can be found in the following pages:

-          MSDN Documentation

o   http://msdn.microsoft.com/en-us/library/aa454232.aspx

o   http://msdn.microsoft.com/en-us/library/ms889520.aspx

o   http://msdn.microsoft.com/en-us/library/aa924367.aspx

o   http://msdn.microsoft.com/en-us/library/ms889589.aspx

o   http://msdn.microsoft.com/en-us/library/30dtsstx.aspx

o   http://msdn.microsoft.com/en-us/library/bb384149.aspx

-          MSDN Blogs

o   http://blogs.msdn.com/marcpe/archive/2005/01/18/355158.aspx

-          OEM Documentation

o   “Configuration Service Providers”

MakeCab

This tool, which runs on Windows XP/Vista and which is also part of Windows Mobile SDK, can generate a CPF file (which is special CAB file with only settings but no files) from a provisioning XML. The CPF file can be put on the device and run to make the actions (read or change some configuration on the device) defined by the provisioning XML it contains.

Installation of a CAB/CPF file is controlled by security settings on Windows Mobile which means that CAB/CPF file installation could be rejected (look for policy 4101 in the “OEM Documentation”). In this case the file(s) have to be signed with a trusted certificate that is already installed on the Window Mobile device (provided by OEM or Mobile Operator).

More details can be found in the following pages:

-          OEM Documentation

o   “How to Create a .cpf File”

o   “Cab Provisioning Format (CPF) File”

o   “Packaging the XML File for Delivery”

-          MSDN Documentation

o   http://msdn.microsoft.com/en-us/library/ms889557.aspx

o   http://msdn.microsoft.com/en-us/library/aa455993.aspx

Remote Tools

Platform Builder comes with various remote tools that target specific troubleshooting/debugging scenarios. OEM documentation includes comprehensive documentation about each of these tools. Some of these topics in the documentation are listed below:

-          “Remote Tool Connectivity”

-          “Remote Tools for Debugging”

-          “Tools for Performance Tuning”

-          “Remote Tools for Information Management”

CEDebugX

CEDebugX is an extension to Platform Builder that helps collecting system information from a device, to better understand the state of the device at a given time. More details can be found in the following pages:

-          MSDN Documentation

o   http://msdn.microsoft.com/en-us/library/bb509784.aspx

-          MSDN Channel 9

o   http://channel9.msdn.com/posts/mikehall/Using-CEDebugX-with-Windows-Embedded-CE-60-SP1/

Watson/Kernel Dumps

This mechanism (a.k.a. “Windows Error Reporting” or “Watson”) takes a snapshot of the system memory and system state upon a system exception (e.g. memory access violation, division by 0, etc.). It is by default installed in all Windows Mobile devices and can be turned on or off using “Error Reporting” settings UI or alternatively via using some registry keys.

When “Windows Error Reporting” is enabled, you should take care that as soon as the Windows Mobile device is connected to a PC via ActiveSync, it will by default try to remove the dump file from the device and to upload it to “Windows Error Reporting” server. This behavior can be turned off using registry keys. If you want to get the dump file, you should either disable the upload mechanism or copy the file before connecting the device to ActiveSync. More details can be found in the following pages:

-          OEM Documentation

o   “Error Reporting Overview”

o   “Using CELog and Watson to Debug a Driver Exception in a Retail Device Without KITL”

o   “Generating A Dump From A Handled Exception”

o   “Exception Mode Error Reporting”

-          MSDN Documentation

o   http://msdn.microsoft.com/en-us/library/bb905581.aspx

o   http://msdn.microsoft.com/en-us/library/aa935972.aspx

o   http://msdn.microsoft.com/en-us/library/aa935583.aspx

o   http://msdn.microsoft.com/en-us/library/aa935778.aspx

-          MSDN Blogs

o   http://blogs.msdn.com/hopperx/archive/2005/10/07/getting-help-from-the-doctor-dr-watson-that-is.aspx#478380

o   http://blogs.msdn.com/hopperx/archive/2005/10/12/not-all-watson-dumps-are-created-equal-watson-part-ii.aspx

Post Mortem Debugging

In addition to Just in Time Debugging capabilities, Platform Builder also provides post-mortem debugging capabilities via an extension called “Post Mortem Debugger”. It is used to debug Watson dump files using Platform Builder, as if what is being debugged is a live device under debugger. More details can be found in the following pages:

 

-          OEM Documentation

o    “Post-Mortem Debugging”

o   “Types of Crash Dump Files”

o    “Using CELog and Watson to Debug a Driver Exception in a Retail Device Without KITL”

o   “Generating A Dump From A Watched Process”

o   “Capturing a Dump File on a Standalone Device”

o   “Capturing a Dump File While Debugging”

 

 

 

Posted by shende | 2 Comments

Troubleshooting guide - part 1

Below is a well executed document from guestRx: Bulent Elmaci. Bulent has worked with Windows Mobile debugging for a long time and backs up his writing with a lot of experience. It is the first of a series of articles he has written to help our OEM's become better debugging developers, we will follow up with the second part soon. Enjoy!

 

 Jumpstart Guide to Troubleshooting on Windows Mobile – Part I

Every Windows Mobile device goes through a full software project cycle, before it can be commercialized, and made available to mobile operators and is working in customers’ hands. In this cycle, development, customization and testing phases play a crucial role in achieving partner and end user satisfaction, and in making sure Windows Mobile phones are of high quality.

 

Like in any other project, during these phases, often various technical problems are discovered related to various aspects of the device and the experience it offers. In terms of their effects in overall device quality and project schedule (i.e. time to market), these issues or challenges can range from insignificant ones to device blockers. Needless to say, regardless of their size or effects, all of these issues need to be attacked by engineers and experts, to identify and implement effective, timely and high quality resolutions.

 

Troubleshooting (i.e. investigation of issues’ root cause for the purpose of removing it) is the first step in reaching an acceptable resolution for any issue we are challenged with during device/application development. It is more accurately a series of steps that make sure issues are correctly identified with all their aspects, so that effective, timely and high quality resolutions can be reached. To achieve that, using sound strategies, techniques, and tools is crucial. Without the correct approaches and tools used, troubleshooting could take more time than it actually requires or is available, or even worse, could lead to incorrect resolutions.

 

This article will present general guidelines and strategies that should be employed while troubleshooting Windows Mobile device issues. It is the first part of a series of articles I’d like to do. In the second part of this series, I will provide an overview of the tools that are available to make life a little easier for engineers.

STRATEGIES

Using sound strategies while approaching a problem and analyzing it directly effects how sound the resolution at the end will be, how long it will take to reach to it, and how painful the process will be. The strategies briefly discussed below are actually general in nature, and can be applied to any development problem, but still act as the basis for our purposes.

Get a Clear Picture of the Problem

The issue in question might be found by the same engineers who  will be troubleshooting it, or by some other team/partner. Regardless of the source, getting a clear picture of the problem at hand is the most important point.

Asking and finding the answer to the following questions would be helpful to better understand the problem:

-          Which version of the device is the issue reported/found for? The version applies to WM version, BSP version, radio version, etc.

-          What is the expected behavior?

-          What is the actual behavior?

-          Did the issue exist in previous versions?

-          Does the issue occur only on one device (a particular device), one set of devices (with same hardware, WM version, BSP version, configuration and customizations), or all devices with the same hardware and WM/BSP versions?

-          Is the reported issue an isolated one? This means finding out whether a similar problem (e.g. connectivity, etc.) exists on the device on another area or use case that might not have been reported initially, but might be related.

-          What is the use case and the expected user experience? Although this is closely related to the “expected behavior” mentioned above, in some cases it might be completel different. An example is the case where the expected behavior might be related to a part of the user interface, but the actual expectations underlying this might be completely different (which would cause us to look at the problem from a completely different angle).

Get a Consistent Repro, If Possible

Even if it is the same engineers who reported the issue and working on it, it is always a good exercise to write down the steps to reproduce the problem in detail (commonly referred to as “repro steps”). Having this is especially important, if the issue is being reported by another team, or partner.

Repro steps should at the minimum include the following data and characteristics:

-          The pre-conditions that existed before the repro is done. This can include the same information that we mentioned above for understanding the problem, or some other relevant data specific to the issue itself (e.g. is there radio connectivity, is  the SIM used, what is the meta network used, what are the applications that were running on the device, etc.)

-          What are the steps, in order, that were done until the issue occurred? Supporting these with screen captures from the device, and including the user interface (UI) elements’ names as they appear on the device, would be more than helpful.

-          What was the result? Supporting this with data related to the result, e.g. screen captures, error message texts/numbers, etc. would be very helpful.

-          Is the repro consistent? What s the failure rate?

Although not always possible, one thing that would ease the analysis of the issue is getting the same repro on a device you have access to. This can reveal a lot of new info, especially the ones that you might not have gotten when the issue was reported.

Identify Direct and Indirect Factors

Indirect and direct factors can include a lot of things. They are the data points that can give you important clues on the environment and pre-conditions on the device when the issue occurred.

Direct factors can simply include the configuration on the device relevant to the issue at hand (e.g. connection manager configuration for a network issue, the active theme for a UI issue, etc.), or they can be the pre-conditions for the repro that might not have been mentioned before (e.g.  the initialization API’s used before the failed API, the radio state for a connection issue, available memory for an issue involving processing a large amount of data, etc.).

Indirect factors are similar to the direct ones, which at first look don’t seem related to the issue.  Collecting this data can prove itself useful for some issues, especially if there is a chance a different view on that data could reveal some indirect relationship to the issue (e.g. another application running on the device holding a lock on the same file, another application keeping the radio busy causing your connectivity issue, etc.).

Do Your Homework

Attacking a problem effectively always require knowing the problem surface. Some issues require a deep level of knowledge about the problem domain, while for some issues, a general familiarity with the workings is sufficient. If you feel there are things that you can’t explain in the problem, or don’t know enough to collect relevant data or analyze them, take a look at the available documentation, and if possible, find an expert who can quickly demystify things for you.

Collect Sufficient and Relevant Data

To make sure the investigation has the chance to reveal useful results, all possible data that relate to the issue should be collected. Since the data should be relevant to the issue and should be sufficient (not too little or too much), a good understanding of the end-to-end scenario as well as the underlying modules/architecture/inner-workings is almost a must. An example is, for a connection issue, it is almost always useful to collect the Connection Manager configuration on the device as well as a capture of the network traffic. Another example is collecting Radio Interface Layer (RIL) logs for an issue that involve radio connectivity.

Collecting these logs will require knowing and having access to the right tools for the job. Some of these tools will be discussed in the second part of this series.

Analyze with Correct Tools

Once all data is collected, analysis phase also require using the right tools, which can convert your raw data to useful information and clues about your issue. There are a variety of tools available for Windows Mobile, which will be discussed later.

Use an Unlocked, KITL-Enabled Devices, If Possible

Having access to an unlocked device can save you a lot of time, when you need to collect data from the device. Having a KITL-enabled image on the device, on the other hand, is the ideal situation that would allow you to connect Platform Builder to the device and run the repro under a debugger.

Isolate the Problem

The goal of the data collection and analysis is obviously to find the root cause of the issue as well as a resolution. But since in most cases, it won’t be possible to reach this goal right away, intermediate goals are required, where the problem area is narrowed down gradually and isolated from all other irrelevant factors one step at a time. At each step, new data might need to be collected. As the problem is isolated more and more, the problem area will get more manageable and have a better chance to leading to the root cause.

Trace-back, Review and Evaluate Assumptions

While trying to understand the problem, to identify different factors that are in play and to gradually isolate the problem, the need for tracing back to a previous step and re-doing some steps is sometimes unavoidable. But to improve this, a good strategy is to review the assumptions made (consciously or not) and to evaluate them, so that unnecessary trace backs are avoided.

Don’t Eliminate Alternatives Too Early or Too Late; Don’t Jump to Conclusions

Especially for tough problems, where either it is never enough to collect sufficient data, or to have a deep-enough understanding of the factors in play, it might be tempting to cut some shortcuts and eliminate alternatives by jumping to conclusions. Although not completely avoidable especially for cases where trial-and-error (the ones that are done based on sound and informed assumptions/decisions) is the only way left to go, for most issues, considering all available/seen alternatives and having high degree of confidence for all the conclusions, will make sure a possible rework is eliminated. One principle that is always useful to aid in this is to separate facts or observations that are based on concrete data collected, from possibilities and conclusions that are not yet proven.

Focus on Both Short and Long Term Resolutions

For some issues, troubleshooting might end up identifying bigger underlying problems. For those cases, it is important to evaluate the cost of the resolutions offered with the project timeline and time-to-market requirements. If the proposed resolutions are costly, the investigation might need to re-focus on identifying cheaper alternatives for employing in the short term, as well as the longer term resolutions.

Posted by shende | 2 Comments

A Windows CE 6.0 Book that I keep paging back to...

I am not much of a programming book person, I am much more likely to select code and press "F1". However, I find myself reaching for a CE 6.0 book and have been finding it quite useful. "Windows Embedded CE 6.0 Fundamentals" [Pavlov, Belevsky][ISBN 0735626251

It is targeted as an “OS” book, so if you are curious about VM, threads and other nuts & bolts it makes for a real page-turner.

 

Posted by shende | 0 Comments

Writeable code sections got you down? Fear no more!

Virtual Memory changes included in Windows Mobile 6.1 can relocate read-only code sections out of Slot 0 and into a higher address range. This change was taken to relieve pressure from our coveted, read-write Slot 0. This change will be transparent to most developers since code sections are by default, read-only and do not assume code sections will be adjacent.

 

However, if you are intentionally including the “W” (write) attribute on a code section and your DLL is greater than 64kb, then you may be affected. Luckily, dumpbin.exe already knows to display a warning:

 

C:\WM612\release>dumpbin myDll.dll

 

60000020 flags

         Code

         Execute Read

 

DUMPBIN : warning LNK4078: multiple '.text' sections found with different attributes (E0000020)

 

SECTION HEADER #2

E0000020 flags

         Code

         Execute Read Write

 

There is nothing programmatically wrong with the above and most modules will work without modification. However, if you are seeing some strange results in recent versions of Windows Mobile, you may want to try and restrict your module to Slot 0 by adding the following registry key:

 

[HKEY_LOCAL_MACHINE\System\Loader\LoadModuleLow]

    "MyDll.dll"=dword:1  << change DLL name to match

 

Please note that using the above key will force your entire module into Slot 0 and prevent other modules from loading in that slot. The above registry setting should be used with caution and only for modules with a writeable code section.

Posted by shende | 1 Comments

Hopper: Start Menu Dead!

Interesting post over at "Reed & Steve" regarding Hopper and full screen apps and the Start Menu Dead message. Check it out here. Kudos to them for making this available.
Posted by shende | 3 Comments

Use CallWindowProc when using WNDPROC pointers directly

 

A recurring theme I see while debugging application compatibility issues has to do with the direct use of the window proc pointer. If the intended WNDPROC exists in a DLL that is located in Slot 0, the pointer “looks right” and is often mistakenly used directly. In Windows Mobile, direct access to the window procedure requires you go through CallWindowProc().

 

Instead of trying to use the pointer directly, use CallWindowProc() like this:

 

// Direct call to wndProc()

// Incorrect! tp.result = recvWndClass.lpfnWndProc(NULL, GCI_WNDPROC_TEST_MSG,NULL,NULL);

tp.result = CallWindowProc(recvWndClass.lpfnWndProc, NULL, GCI_WNDPROC_TEST_MSG,NULL,NULL); // Good Job!!

 

The reason for this indirection has to do with the way proc addresses are tracked by GWE and “mapped” to different processes. Sometimes you can get “lucky” and these addresses will work – but if the DLL you are trying to access is loaded outside of Slot 0, this mapping will break and your application will crash. Using CallWindowProc() abstracts this mapping for you so you don’t have to worry about it. Please see the WM documentation for Get/SetWindowLong() as well as other WM WNDPROC API’s for more information.

Posted by shende | 1 Comments

Are you passionate about Windows Mobile Devices?

Do you want to work on next generation Windows Mobile devices long before they become commercially available? Want to be the first one to work with and influence next year’s Windows Mobile devices? Then you’re in luck! The Windows Mobile team is looking for a strong developer who is passionate about the next wave of our devices.

The Windows Device Core Joint Development Program (JDP) team is looking for an experienced developer to assist OEMs and Silicon Vendors in putting the latest Windows Mobile on both new and existing hardware. We are looking for passionate, smart, and motivated candidates, with great development, problem solving, and debugging skills, ready to meet the highest standard of engineering excellence.

 

Find out more about it here.

Posted by shende | 1 Comments

MemoRx incorrectly displaying VM overlap in pre-release Windows Mobile versions

Many OEM’s have already noticed that Memory Doctor doesn’t correctly represent VM overlap in some pre-release versions of Windows Mobile. There have been some changes in WM VM architecture that contradict an assumption made by MemoRx which results in potential incorrect reporting. MemoRx will continue to work well in officially release versions of Windows Mobile. 

 

Instead of updating MemoRx and continuing with our 2-tool approach, we though it would be more useful to merge the useful features of MemoRx with devHealth to create the devHealthViewer. We are currently working on the documentation as well as some additional features to this new tool, but I can tell you it is very impressive and useful. You can download a version of this tool today from Jetstream under the WM6 Tools directory.

Posted by shende | 11 Comments

Understanding Output From “meminfo kernel”

I was debugging a weird hang at device boot and I used the command “meminfo kernel” in CeDebugX to get more info, but I realized right away that I didn’t know what this command was showing me.  So, I did a bit of investigation into what the output meant and thought I’d write down what I learned so I’d remember it later.  And, of course, there’s no better place to write things down than on the HopperRx blog, so here’s what I learned.

Intro to Kernel Heap

To understand the output of "meminfo kernel", I need to explain a bit about the kernel's internal heap.  As with all heaps, the kernel's heap allocates a page of memory and then divides that memory into blocks of various sizes as needed.  The kernel uses the following 8 different block sizes (in bytes): 16, 36, 64, 168, 228, 524, 576, 1024.  Each block size directly corresponds to the actual size of at least one kernel data structure.  To demonstrate this, I used a handy trick in PB's Watch Window to show the structure of the kernel's EVENT data type (see image below).  If you add up all the bytes in the structure (pProxHash has an array size of 32) you'll see that sizeof(EVENT) is 168 bytes, which fits perfectly into the bucket that is 168 bytes.  It turns out that all of the kernel's data types exactly matches one of the 8 buckets.LPEVENT structure, from the Watch Window

Unlike the heap used by LocalAlloc(), the kernel's private heap doesn't do compaction.  This means that when a chunk of memory is allocated as a certain block, it stays that size until the device reboots.  The block will be marked as free when the object it contains is deleted and could be re-used by another object later on, but the blocks will never be resized.  If the kernel needs to create a new object but no empty block exists, it will simply create a new block.  Consequently, the kernel heap will grow as necessary, but never shrink.  This has some important implications I'll touch on later.

Dissecting "Meminfo Kernel"

Okay, that's the basic stuff on the heap, now let's look at what meminfo shows.  Here's the output I got from meminfo on DeviceEmulator after a clean boot:
------------------------------------------------------------------

      Size     Used      Max   Extra   Entries(Max)   Name
==================================================================
 0:    576    98496    99648    1152    171(   173)   Thrd
 1:    228    54036    54948     912    237(   241)   Mod
 2:     36    91944    92412     468   2554(  2567)   API/CStk/ClnEvt/StbEvt/Prxy/HData/KMod
 3:    168   598920   604800    5880   3565(  3600)   Crit/Evt/Sem/Mut/ThrdDbg
 4:     64    33920    34048     128    530(   532)   FullRef/FSMap/ThrdTm
 5:     16    32864    33120     256   2054(  2070)   MemBlock
 6:    524        0     1572    1572      0(     3)   Name
 7:   1024   174080   175104    1024    170(   171)   HlprStk
 
Total Used  =   1084260 bytes
Total Extra =     11392 bytes
heapptr: 0x80076278
 
------------------------------------------------------------------
Here's my quick break down of what each column is showing:
First column:  The left most column is just a number for the table row.  The block sizes aren't listed in any special order so the numbering doesn't mean anything special, but the line numbers make the table easier to read and discuss.
Size:  The size, in bytes, of the block described on that particular row.
Used:  Number of bytes currently in use at this particular size (i.e. "Used" = "Size" * "Entries")
Max:  Number of bytes allocated at this size (i.e. "Max" = "Size" * "(Max)")
Extra:  Number of bytes that aren't being used (i.e. ""Extra" = "Size" * "((Max) - Entries)).
Entries:  Number of blocks of this size that are currently in use.
(Max):  Number of blocks of this size that have been created.
Name:  Friendly names of the data types that fit into this block
 
At the bottom of the "Total Used" shows the sum of all the "Used" columns.
The "Total Extra" shows the sum of all the of all the "Extra" columns.  
And, the "heapptr" is just what is sounds like, the pointer to the start of kernel's heap.

What the output actually means

In my table above, row 0 shows the blocks of size  576 bytes.  There are currently 173 of these blocks created, and 171 of those blocks are in use.  That means, 99,648 bytes are allocated as this block size, and  98,496 of those bytes are in use.   This particular block is only used to hold THREAD objects, so this information also tells us there are currently 171 threads in the system.

Contrary to the implications of their names, the Max and (Max) columns don't reflect any kind of upper limit.  These columns are actually reporting the number of blocks of that size that have been created.  Going back to row 0 as an example again, the Max column shows that 173 blocks have been created at size 576.  However, when I took this snapshot, only 171 threads were still around, so only 171 of those blocks were used.  If a new application was started and created 3 new threads, the kernel would re-use the 2 empty blocks for 2 of the new threads, and then create a new block for the other new thread.  After this happened, "Entries" and "(Max)" would both be match at 174, indicating there are 174 blocks for that size and all 174 of those blocks are in use.

How To use the output

This table is a great way to get an idea of where to start a new investigation.  It doesn't provide much help with root causing a specific issue though, so don't try and read too much into the information. 

For example, if the table shows rows 2 and 3 have a huge number of entries, then some process is probably leaking handles to a critical section, event, semaphore, or mutex.  When one of these objects is created, the object stays around until all handles are closed.  If the handle is leaked, then the handle (block size 36) and the object (block size 168) are both going to stay around and never get deleted.  On the other hand, if row 2 has a huge number of entries but all the other blocks have a normal number of entries, then it's possible someone is leaking a thread or module handle.  Since threads exit on their own, and there's only one instance of a DLL, neither of these buckets will grow even though there are a bunch of handles pointing at them.  If all of the columns look fine, then the issue probably isn't a handle leak or object. 

As you can see, this info can provide a helpful guide on where to start debugging, which is very valuable sometimes, but it doesn't tell you anything specific about the root cause.

Posted by JeCahill | 2 Comments
Attachment(s): WatchWindow_Snapshot.jpg

Passive KITL to the rescue

I'm sure many of you have been in a situation where your device hangs during field testing. Or sometimes you are trying to track down a problem which only repros at a certain location. The best thing you can have in these situations is, of course, a live KITL connection to the device. This can be challenging at times because you can't run around with a phone connected to your laptop with a short USB cable. Its just not practical.

This is where Passive KITL can be used. (Read the “Active and Passive KITL” article in the Windows Mobil 6.0 documentation for an introduction to Passive KITL.) Note that, when you don't have live KITL connection, you can't just press "go" when your device tried to break into KITL. This might happen, for example, when you mistakenly use a wrong Debug DLL with a lot of asserts you don't care about. So a word of advice, use only debug dlls for the components you are debugging at the moment. Better yet, try to stay with a retail image this way if it breaks it will be most likely in the place you are debugging.

Another thing to remember is battery life. You need to act quickly when your device breaks. If you let it sit too long your battery might run out and you will lose your repro.

Yet another thing you can do before working with Passive KITL is to use non-optimized version of the components you are trying to debug. Use the -Od switch in your SOURCES file to turn off Compiler optimization. This will turn off the optimization and preserve with assembly flow which in turn will make your assembly match your sources files. You can add -Od to the existing command line by doing this:

 

 

    CDEFINES=$(CDEFINES) -Od

 

 

Alternatively, you can set DISABLE_OPTIMIZER=1 in your build environment which will disable  optimization

 

No really, how do I enable Passive KITL?

We are going to use the GSample platform that can be downloaded from JetStream or you can just use FSample. Your particular platform will be very similar.

In a nutshell enabling Passive KITL is very simple. All you need to do is to OR OAL_KITL_FLAGS_ENABLED and OAL_KITL_FLAGS_PASSIVE from \PLATFORM\COMMON\SRC\INC\oal_kitl.h into OAL_KITL_ARGS->flags in the OALKitlStart() function.

There are several ways to do this. If you have a boot-loader menu you can do it there (We will not be describing it here, because not everyone has a boot-loader menu).

Another way is to use FIXUPVARs.Which can be configured so that all you need to do is to set a DOS environment variable and run MAKEIMG to make an image with Passive KITL enabled.

Take a look at the CONFIG.BIB file located in your \PLATFORM\<YOUR PLATFORM>\FILES directory or look at the sample (\PLATFORM\GSAMPLE\FILES\CONFIG.BIB).

You will see a section that looks something like this:

 

 

    ARGS                                  87AFF800    00000800    RESERVED
    GSM                                   87C00000    00400000    RESERVED
    dwOEMFailPowerPaging    00000000    00000001    FIXUPVAR
    dwOEMDrWatsonSize        00000000    0004B000    FIXUPVAR
    dwOEMPagingPoolSize      00000000    00200000    FIXUPVAR

You can add a new value here that is conditional on a makeimg flag:

IF IMGPASSIVEKITL 
    dwOEMEnablePassiveKITL  00000000    00000001    FIXUPVAR
ENDIF IMGPASSIVEKITL

The value dwOEMEnablePassiveKITL should live in \PLATFORM\GSAMPLE\SRC\KERNEL\OAL\init.c .  If you take a look at that file you will see something similar to this:  

 

    //---------------------------------------------------------
    // Global FixUp variables
    //
    // Note: This is workaround for makeimg limitation 
    //  no  fixup on variables
    // initialized to zero.
    //

    DWORD dwOEMFailPowerPaging = 1;
    DWORD dwOEMDrWatsonSize = 0x0004B000;
    DWORD dwOEMPagingPoolSize = 0x00200000;
    DWORD dwOEMPlatform = OAL_IOCTL_PLATFORM_WINCE;

    extern DWORD gdwFailPowerPaging;
    extern DWORD cbNKPagingPoolSize;
    extern LPCWSTR g_pPlatformManufacturer;
    extern LPCWSTR g_pPlatformName; 

 

You need to add the actual value that will be modified in the makeimg process.  In the example above, that would be:

DWORD dwOEMEnablePassiveKITL = 0xFFFFFFFF;
extern DWORD g_dwEnablePassiveKITL;

Why is this initialized to 0xFFFFFFFF and not 0x0?  Because you cannot have fixup variables initialized to 0x0 (See the comments in the init.c file above).

The extern in front of g_dwEnablePassiveKITL is important because this global variable is used in this file but is declared and initialized in kitl.c. 

Next, you need to copy the FIXUPVAR dwOEMEnablePassiveKITL to g_dwEnablePassiveKITL in OEMInit(). Search for OEMInit() in init.c and add this line somewhere in the function:

g_dwEnablePassiveKITL = dwOEMEnablePassiveKITL;

In order to make use to the new FIXUPVAR, you need to modify \PLATFORM\GSAMPLE\SRC\kernel\oal\kitl.c.  First, you must define and initialize the global at the top of kitl.c:

DWORD g_dwEnablePassiveKITL = 0xFFFFFFFF;

NOTE: this global must be defined in both KITL and non-KITL builds or else you will not be able to build properly.  In the case of non-KITL, it will be ignored.   If you separate your KITL functions for non-KITL builds into a separate file with empty stubs for the KITL functions, make sure you also declare this global there.  In our case this is done in \PLATFORM\GSAMPLE\SRC\kernel\oal\kitl.c.

Finnaly, in OALKitlStart(), you need to add the flag.  You probably call OALArgsQuery() to get the current value.  If you set this value in the bootloader, this is how it is passed in.  Once you have a pointer to OAL_KITL_ARGS (we will call it pArgs in this example) you need to OR in OAL_KITL_FLAGS_PASSIVE:

 

if (0xFFFFFFFF != g_dwEnablePassiveKITL)
{
      pArgs->flags |= OAL_KITL_FLAGS_PASSIVE;
}

 

That is it.  Now, after building, you can enable passive KITL by setting IMGPASSIVEKITL=1 in your build environment and running makeimg. 

 

Let's recap

1. Make sure you only use debug dlls that are relevant
2. Plug the device into a PC as quickly as possible when the device hits an exception and KITL is activated.  Otherwise, the battery will drain and the crash will be lost
3. Turn off Compiler Optimization to the components you are debugging if possible
4. Add a FIXUPVAR in config.bib
5. Copy value of FIXUPVAR to the global in init.c
6. OR OAL_KITL_FLAGS_PASSIVE to pArgs->flags in kitl.c if your global (g_dwEnablePassiveKITL) is set 

Posted by deanmel | 3 Comments
Filed under: , , ,

Running Platform Builder 6 on Vista

There are a lot of people that afraid to switch of Vista because they are afraid that their stuff will not work. Well truth be told, I've been running Vista+VS2005_SP1+PB6 since March of this year and haven't had any major problems. The only two problems I had was:

1. If the device doesn't have Vista compatible driver then you will not be able to connect/debug the device.

2. Flashing the devices on Vista can be painful because some devices still have no support for Vista flashing.

In order to have this properly working you need to uninstall PB5 from your machine before installing PB6. Personally, I would use a fresh Vista install.

After you install Vista and Visual Studio 2005 on you development computer, you need to install VS2005_SP1 which can be found here:

http://www.microsoft.com/downloads/details.aspx?familyid=bb4a75ab-e2d4-4c96-b39d-37baf6b5b1dc&displaylang=en

Ignore all the warnings that Vista will give you about VS2005 being incompatible with Vista.

Install Vista update for VS2005_SP1 from here:

http://www.microsoft.com/downloads/details.aspx?familyid=90E2942D-3AD1-4873-A2EE-4ACC0AACE5B6&displaylang=en

Now you are finally ready to install PB6 and PB6_SP1. Follow this link to install PB6:

http://www.microsoft.com/downloads/details.aspx?familyid=5733C26C-168B-474D-8A27-59B30B769402&displaylang=en

and

PB6_SP1

http://www.microsoft.com/downloads/details.aspx?FamilyId=BF0DC0E3-8575-4860-A8E3-290ADF242678&displaylang=en

 

Conclusion

I know this might seem like a lot of work just to install PB6 but trust me the benefits that you will get from using PB6 will outweigh this cumbersome  install. Just the ability to use mismatched PDB files does it for me. And also full integration with Visual Studio is a very nice addition.

Posted by deanmel | 1 Comments
Filed under: ,

Improving the Cat Parade (Part 3)

I was recently running hopper on a device that supported screen rotation and I realized that my test coverage was completely missing the rotation scenario.  The device would switch between portrait and landscape mode if the user took a specific action, but Hopper is automated and limited to standard key presses, so it would never take this action.

Once again, I extended FocusApp a little to improve my test coverage.  When FocusApp starts, I have it create a thread that calls ChangeDisplaySettingsEx() at random intervals to flip the orientation between different angles. 

The original sources for FocusApp are here: http://blogs.msdn.com/hopperx/archive/2005/11/30/the-cat-parade.aspx

To give you an idea of how I implemented the rotation, here's a bit of sample of the code:

...
    DEVMODE devMode;
...
Inside a while loop:
...
        //Wait for a random number of minutes before rotating the screen.

        //Reinitialize the structure.
        memset(&devMode, 0x00, sizeof(devMode));
        devMode.dmSize = sizeof(devMode);
        devMode.dmFields = DM_DISPLAYORIENTATION;

        //Figure out which orientation should be used this time.
        if (DMDO_0 == devMode.dmDisplayOrientation)
        {
            //Currently at 0 degrees, so flip to 90 degress.
            devMode.dmDisplayOrientation = DMDO_90;
        }

        else
        {
            //Currently at 90 degrees, so flip back to 0 degrees.
            devMode.dmDisplayOrientation = DMDO_0;
        }

        //Now apply the new orientation to the screen.
        ChangeDisplaySettingsEx(NULL, &devMode, NULL, CDS_RESET, NULL);
...

Not all devices use the orientations DMDO_0 or DMDO_90 and some device support more than two orientations, so, modify the code to alternate between all of the orientations your test devices supports.  Also, the interval between rotations is pretty much arbitrary, but it's important to make the interval random and variable to get good coverage.  In my testing, the rotation is done on a separate thread that randomly sleeps for 1 to 30 minutes each time through the loop.

Posted by JeCahill | 2 Comments
Filed under: , , ,

Where did Callstacks go from the Hopper logs?

If you upgraded from an older version of Hopper to a more recent one, you probably noticed that the callstacks are gone from the Hopper logs.

We made them optional for two reasons:

1. Printing out callstacks slowed down the run significantly

2. They did not provide a lot of value, because the callstacks were printed out at 5 minute intervals and not at the points of interest

If you feel that you can benefit from these callstacks you can turn them back on by passing /d to Hopper.exe. To pass a switch to hopper you can use the debugger's target window. If you don't have a debugger attached you can create a link file that passes the switch to Hopper.exe. MSDN has instructions on how to create link files here

Posted by deanmel | 3 Comments
Filed under: ,

Why my private binaries do not show up in the image?

Why my private binaries do not show up in the image? I've been asked this question too many times by now. Many partners when testing their private changes have to figure it out the hard way.

The reason why your updated binary doesn't show up in the image is because by default make image picks up package files from %_FLATRELEASEDIR%\prebuilt directory. All you need to do is to figure out which package your dll/exe goes to and delete that package. Here are the steps I do whenever I need to update a dll in an image.

First you need to find which package your dll goes into. There are several way to do that.

So here is what you need to do:

  1. Copy your binary, symbols, map into %_FLATRELEASEDIR%
  2. Find out what package your file is in
    1. Cd %_FLATRELEASEDIR%
    2. "findstr /i [your filename] *.bsm.xml" Ex: findstr /i nk.exe *.bsm.xml
  3. This should return a package's xml file, for example oemxipkernel.bsm.xml. This tells you that it's in the OS package.
  4. Find the prebuilt package and delete it
    1. cd prebuilt
    2. del oemxipkernel.cab.pkg
  5. Makeimg

Makeimg will detect the missing package and rebuild it.

Posted by deanmel | 4 Comments
Filed under: ,
More Posts Next page »
 
Page view tracker