Posted by: Sue Loh

Windows CE APIs are implemented by a set of server processes.  Besides the kernel (nk.exe) we have other server processes: filesys.exe, gwes.exe, device.exe, services.exe.  When an application calls an API in one of these servers, the app thread actually jumps into the server process.  The API call is done on the application thread.  But how does all of that work?  Let's trace the code.  If you have Platform Builder you can look at some of this.  If you have our shared source you can look at all of it.

Most Windows CE APIs are exported by a single central DLL: coredll.dll.  All Windows CE applications link against coredll.  When an application calls an API, such as GetTickCount, it is calling the GetTickCount export from coredll.dll.  But the coredll export just is a small wrapper, also called a "thunk."  Here's an example showing what a coredll implementation looks like:

DWORD xxx_GetTickCount ()
{
    return GetTickCount ();
}

Reference: %_WINCEROOT%\private\winceos\coreos\core\thunks\*

You can find the coredll thunks in our shared source under %_WINCEROOT%\private\winceos\coreos\core\thunks.  The thunk function name has "xxx_" in it, but is exposed from coredll by a different name.  The rename happens inside coredll.def:
    GetTickCount=xxx_GetTickCount

Reference: You can look at %_WINCEROOT%\private\winceos\coreos\core\coredll.def, though I think you could also find a copy of the .def file somewhere in your public tree even without shared source.

But, if coredll is implementing GetTickCount, then what is the GetTickCount that it is calling?  You can find the answer in the public header files under %_WINCEROOT%\public\common\oak\inc.  The GetTickCount "function" that coredll calls actually resolves to a specially-cooked invalid address.

Reference: %_WINCEROOT%\public\common\oak\inc\*:
#define GetTickCount    COMPLICATED_MACRO(..., SH_WIN32, W32_GetTickCount, ...)
Where W32_GetTickCount is #defined to 13 in another OAK header, and SH_WIN32 is #defined to 0 in the SDK.

You can trace into the definition of IMPLICIT_CALL to find out how it works, but it quickly descends into macro nastiness.  The important thing here is that the macros are combining two numbers, SH_WIN32 which is the ID of the API set table that GetTickCount is part of, and W32_GetTickCount which is the index of GetTickCount inside that API set table.  The combination produces a 32-bit number, an invalid address into which the API "identity" is encoded.  When the coredll thunk xxx_GetTickCount "calls" the GetTickCount macro, it jumps to that invalid address.  If you look at the disassembly for the coredll thunks, you'll see the jumps to these addresses.

The jump produces an exception.  All exceptions go to the kernel first, and the kernel says, "A-ha!  I know this invalid address.  It's the encoding for an API, index 13 of API table 0."  The kernel marshals (maps) arguments, adjusts permissions, flushes cache and TLB if necessary, and finally sets things up so that the thread continues execution at the desired API inside the desired server process.

Reference:
%_WINCEROOT%\private\winceos\coreos\nk\kernel\x86\fault.c, Int20SyscallHandler.
%_WINCEROOT%\private\winceos\coreos\nk\kernel\objdisp.c, ObjectCall()

The thread, now running inside the server process, executes the real API call.  When the call finally returns it takes another exception, because during the API call setup the kernel sets the return address to another specially-coded invalid address.  During the return the kernel again adjusts arguments, permissions, and other state as necessary.

Reference: %_WINCEROOT%\private\winceos\coreos\nk\kernel\x86\fault.c, ServerCallReturn.

Finally execution returns to the coredll.dll thunk back inside the original process.

KMODE

As you can imagine, we pay a performance penalty to take these exceptions on the way into and out of every API call.  That is part of the reason that "KMODE" and "ALLKMODE" exist.  In Windows CE, "kernel mode" threads have permission to access memory addresses outside of their own process.  Normal threads could not execute code outside their process slot.  However kernel-mode threads have the ability to access any memory they like.  A kernel-mode thread can jump straight into another process and execute code.  Windows CE takes advantage of the expanded memory access to speed up the performance of kernel mode threads.  If you look around the coredll code, you'll find thunks like this (contrived) example:

DWORD xxx_GetTickCount ()
{
    // Kernel mode takes a direct jump
    if (IsInKMode) {
        return g_pKmodeEntries->m_pGetTickCount();
    }
    // Non kernel mode takes a trap
    return GetTickCount ();
}

g_pKmodeEntries is a table that the kernel passes to each instance of coredll that's running inside a trusted process.  So, only kernel-mode threads running inside trusted processes gain the performance benefit of these KMode short-circuits.

Reference:
%_WINCEROOT%\private\winceos\coreos\core\dll\coredll.cpp, CoreDllInit().
%_WINCEROOT%\private\winceos\coreos\nk\kernel\resource.c, SC_GetRomFileInfo().
%_WINCEROOT%\private\winceos\coreos\nk\kernel\KmodeEntries.cpp, g_KmodeEntries.

Each of the short-circuit functions in the kernel does the work that normally would happen inside the API call trap: it switches the process, maps arguments, and such.  This pseudocode might give you an example of what the kernel short-circuit wrappers look like:

DWORD NKGetTickCount ()
{
    // This API takes no arguments, otherwise there'd be calls
    // to map each argument here.

    // Switch to the process that exports the "Win32" API table,
    // and get a pointer to the table
    pApiTable = SwitchProcess (..., SH_WIN32); 

    // Call the GetTickCount entry in the table
    result = (*(DWORD (*) ()) (pApiTable[W32_GetTickCount])) ();

    // Return to the original process
    RestoreProcess ();

    return result;
}

For a real example, see:
%_WINCEROOT%\private\winceos\coreos\nk\kernel\kmisc.c, NKRegOpenKeyExW().

Most of the APIs in the system don't have kernel-mode short-circuits.  Only a few APIs were chosen for kernel-mode speed-ups, for performance reasons.  Windows CE was originally designed to NOT run in all-kernel-mode, for security reasons.  Non kernel-mode threads cannot read or write other processes' memory, so process data is more protected.  But for performance reasons, some of the Windows CE devices were built for all-kernel-mode.  The way these thunks are organized represents a balancing act between coding for all-kernel-mode devices and coding for those which make use of the improved security of user mode.