If you look at the disassembly of functions inside Windows DLLs, you'll find that they begin with the seemingly pointless instruction MOV EDI, EDI. This instruction copies a register to itself and updates no flags; it is completely meaningless. So why is it there?

It's a hot-patch point.

The MOV EDI, EDI instruction is a two-byte NOP, which is just enough space to patch in a jump instruction so that the function can be updated on the fly. The intention is that the MOV EDI, EDI instruction will be replaced with a two-byte JMP $-5 instruction to redirect control to five bytes of patch space that comes immediately before the start of the function. Five bytes is enough for a full jump instruction, which can send control to the replacement function installed somewhere else in the address space.

Although the five bytes of patch space before the start of the function consists of five one-byte NOP instructions, the function entry point uses a single two-byte NOP.

Why not use Detours to hot-patch the function, then you don't need any patch space at all.

The problem with Detouring a function during live execution is that you can never be sure that at the moment you are patching in the Detour, another thread isn't in the middle of executing an instruction that overlaps the first five bytes of the function. (And you have to alter the code generation so that no instruction starting at offsets 1 through 4 of the function is ever the target of a jump.) You could work around this by suspending all the threads while you're patching, but that still won't stop somebody from doing a CreateRemoteThread after you thought you had successfully suspended all the threads.

Why not just use two NOP instructions at the entry point?

Well, because a NOP instruction consumes one clock cycle and one pipe, so two of them would consume two clock cycles and two pipes. (The instructions will likely be paired, one in each pipe, so the combined execution will take one clock cycle.) On the other hand, the MOV EDI, EDI instruction consumes one clock cycle and one pipe. (In practice, the instruction will occupy one pipe, leaving the other available to execute another instruction in parallel. You might say that the instruction executes in half a cycle.) However you calculate it, the MOV EDI, EDI instruction executes in half the time of two NOP instructions.

On the other hand, the five NOPs inserted before the start of the function are never executed, so it doesn't matter what you use to pad them. It could've been five garbage bytes for all anybody cares.

But much more important than cycle-counting is that the use of a two-byte NOP avoids the Detours problem: If the code had used two single-byte NOP instructions, then there is the risk that you will install your patch just as a thread has finished executing the first single-byte NOP and is about to begin executing the second single-byte NOP, resulting in the thread treating the second half of your JMP $-5 as the start of a new instruction.

There's a lot of patching machinery going on that most people don't even realize. Maybe at some point, I'll get around to writing about how the operating system manages patches for software that isn't installed yet, so that when you do install the software, the patch is already there, thereby closing the vulnerability window between installing the software and downloading the patches.