In Part 1, I discussed a bit of the history and function of SMIs.  How does this make them EEEEVIL, is the question?

Essentially, SMIs are the final word in what happens on a CPU, outside of removing power.  They cannot be interrupted, even by a Non-Maskable Interrupt (NMI).  Also, since they are not assertable from within software, it's impossible to use them or detect when they happen.  Essentially, the BIOS has control over everything that happens when it takes over.  Since it is it's own execution mode, the assumptions and mechanisms of the previous ones are ignored.  Specifically, this means any hardware breakpoints you may have set in your debugger will not fire based on anything that is happening in SMM. 

Now, when SMM was originally used only to implement power savings via Advanced Power Management (APM), this wasn't a huge problem.  When it became a problem was when BIOS makers and their OEM's started using this ability to implement other functionality via SMM trickery.  The most common application is implementing a USB keyboard handler for real-mode operation.  This also happens to be one of the most frustrating issue we see, as it can cause any variety of problems with the system's normal operation.

To understand why, think of the implications of an undetectable Hypervisor mode that has full access to the system.  Necessarily, to implement a keyboard handler like that, it needs to touch the hardware.  This means meddling with registers on devices, and even physical memory.  Now if you implement the perfect SMM handler for this kind of work, fine.  If you have a bug however, havoc can ensue.  You can be running along in a critical, essentially non-preempt-able code path, and from one instruction to the next have a section of memory or a hardware register changed out from under you.  This can result in all kinds of strange issue, from a crashed application, to a bluescreen, to a hang.

I'll cover some of the more common problems and symptoms in another article.