Posted by: Sue Loh

The Windows CE Monte Carlo profiler works with support from the BSP.  All of our sample BSPs implement the profiler support, but a lot of OEMs seem hesitant to implement it.  Perhaps it looks like too much work or is too complicated.  Well I'm going to show how to do the easiest possible implementation.

In a nutshell, when you turn on the profiler, the kernel calls a routine in the OAL, OEMProfileTimerEnable().  This routine programs an interrupt to occur at the specified interval.  When the interrupt happens, the OAL reports it to the kernel by calling ProfilerHit().  When you disable the profiler, the kernel calls the OAL routine OEMProfileTimerDisable().  So to support profiling, an OAL requires:

  • OEMProfileTimerEnable
  • OEMProfileTimerDisable
  • Interrupt support to call ProfilerHit() at the right interval

Usually we run the profiler at a 200us interval.  But guess what?  Your BSP already has an interrupt at a 1ms interval.  Windows CE requires that already.  That's only going to give you one-fifth the number of profiler hits, but it's not so different as to be unusable.  So here is an easy profiler implementation.

// Keep track of whether the profiler is enabled
BOOL g_IsProfilerEnabled = FALSE;

void OEMProfileTimerEnable (DWORD dwUSec)  // dwUSec is ignored here
{
    g_IsProfilerEnabled = TRUE;
}

void OEMProfileTimerDisable (void)
{
    g_IsProfilerEnabled = FALSE;
}

UINT32 OEMInterruptHandler (UINT32 ra)
{
    // ... other code ...

    if ( <this is a timer IRQ> ) {
        if (g_IsProfilerEnabled) {
#ifdef ARM
            // This is the code you'd use on an ARM CPU
            ProfilerHit (ra);
#else
            // This is the code you'd use on non-ARM CPUs
            ProfilerHit (GetEPC ());
#endif
        }
        return SYSINTR_RESCHED;
    }
}

The implementations of OEMProfileTimerEnable and OEMProfileTimerDisable are very simple.  For the interrupt handler you have to find the cases where you return SYSINTR_RESCHEDULE, and call ProfilerHit there.

What's wrong with this implementation?  Well if you implement variable-tick scheduling there are probably some cases where you'll miss data points due to idle, so you might want to skip the variable tick on profiling builds or while the profiler is running.  You might want to check OEMIdle too, to make sure things happen the way you need when the profiler is enabled.  Otherwise the main problem is you only get 1/5 as much data as normal.  Make sure you run your test cases long enough that you can get a statistically significant number of samples.  I've seen people try to draw conclusion out of profiler runs that only gathered 500 or so samples.  That is far too few.  Also, the profiler can badly under- or over-represent OS activity that occurs at a multiple of 1ms, since it will be occurring right on or between every timer interrupt.

I still recommend that you try for the full-blown implementation.  Use a more frequent interrupt to get more information, and more accurate information.  But at least this description will give you a starting point for understanding how to implement the full version.