My name is Marion Cole, and I am a Sr. EE in Microsoft Platforms Serviceability group. You may be wondering why Microsoft support would need to know ARM assembly. Doesn’t Windows only run on x86 and x64 machines? No. Windows has ran on a variety of processors in the past. Those include i860, Alpha, MIPS, Fairchild Clipper, PowerPC, Itanium, SPARC, 286, 386, IA-32, x86, x64, and the newest one is ARM. Most of these processors are antiquated now. The common ones now are IA-32, x86, x64. However Windows has started supporting ARM processors in order to jump into the portable devices arena. You will find them in the Microsoft Surface RT, Windows Phones, and other things in the future I am sure. So you may be saying that these devices are locked, and cannot be debugged. That is true from a live debug perspective, but you can get memory dumps and application dumps from them and those can be debugged.
There are limitations on ARM processors that Windows supports. There are 3 System on Chip (SOC) vendors that are supported. nVidia, Texas-Instruments, and Qualcomm. Windows only supports the ARMv7 (Cortex, Scorpion) architecture in ARMv7-A in (Application Profile) mode. This implements a traditional ARM architecture with multiple modes and supporting a Virtual Memory System Architecture (VMSA) based on an MMU. It supports the ARM and Thumb-2 instruction sets which allows for a mixture of 16 (Thumb) and 32 (ARM) bit opcodes. So it will look strange in the assembly. Luckily the debuggers know this and handle it for you. This also helps to shrink the size of the assembly code in memory. The processor also has to have the Optional ISA extensions of VFP (Hardware Floating Point) and NEON (128-bit SIMD Architecture).
In order to understand the assembly that you will see you need to understand the processor internals.
ARM is a Reduced Instruction Set Computer (RISC) much like some of the previous processors that Windows ran on. It is a 32 bit load/store style processor. It has a “Weakly-ordered” memory model: similar to Alpha and IA64, and it requires specific memory barriers to enforce ordering. In ARM devices these as ISB, DSB, and DMB instructions.
The processor has 16 available registers r0 – r15.
0: kd> r
r0=00000001 r1=00000000 r2=00000000 r3=00000000 r4=e1820044 r5=e17d0580
r6=00000001 r7=e17f89b9 r8=00000002 r9=00000000 r10=1afc38ec r11=e1263b78
r12=e127813c sp=e1263b20 lr=e16c12c3 pc=e178b6d0 psr=00000173 ----- Thumb
r0, r1, r2, r3, and r12 are volatile registers. Volatile registers are scratch registers presumed by the caller to be destroyed across a call. Nonvolatile registers are required to retain their values across a function call and must be saved by the callee if used.
On Windows four of these registers have a designated purpose. Those are:
In Windbg all but r11 will be labeled appropriately for you. So you may be asking why r11 is not labeled “fp” in the debugger. That is because r11 is only used as a frame pointer when you are calling a non-leaf subroutine. The way it works is this: when a call to a non-leaf subroutine is made, the called subroutine pushes the value of the previous frame pointer (in r11) to the stack (right after the lr) and then r11 is set to point to this location in the stack, so eventually we end up with a linked list of frame pointers in the stack that easily enables the construction of the call stack. The frame pointer is not pushed to the stack in leaf functions. Will discuss leaf functions later.
CPSR (Current Program Status Register)
Now we need to understand some about the CPSR register. Here is the bit breakdown:
So why do I need to know about the CPSR (Current Program Status Register)? You will need to know where some of these bits are due to how some of the assembly instruction affect these flags. Example of this is:
ADD will add two registers together, or add an immediate value to a register. However it will not affect the flags.
ADDS will do the same as ADD, but it does affect the flags.
MOV will allow you to move a value into a register, and a value between registers. This is not like the x86/x64. MOV will not let you read or write to memory. This does not affect the flags.
MOVS does the same thing as MOV, but it does affect the flags.
I hope you are seeing a trend here. There are instructions that will look the same. However if they end in “S” then you need to know that this will affect the flags. I am not going to list all of those assembly instructions here. Those are already listed in the ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition at http://infocenter.arm.com/help/topic/com.arm.doc.ddi0406b/index.html.
So now we have an idea of what can set the flags. Now we need to understand what the flags are used for. They are mainly used for branching instructions. Here is an example:
003a11d2 429a cmp r2,r3
003a11d4 d104 bne |MyApp!FirstFunc+0x28 (003a11e0)|
The first instruction in this code (cmp) compares the value stored in register r2 to the value stored in register r3. This comparison instruction sets or resets the Z flag in the CPSR register. The second instruction is a branch instruction (b) with the condition code ne which means that if the result of the previous comparison was that the values are not equal (the CPSR flag Z is zero) then branch to the address MyApp!FirstFunc+0x28 (003a11e0). Otherwise the execution continues.
There are a few compare instructions. “cmp” subtracts two register values, sets the flags, and discards the result. “cmn” adds two register values, sets the flags, and discards the results. “tst” does a bit wise AND of two register values, sets the flags, and discards the results. There is even an If Then (it) instruction. I am not going to discuss that one here as I have never seen it in any of the Windows code.
So is “bne” the only branch instruction? No. There is a lot of them. Here is a table of things that can be seen beside “b”, and what they check the CPSR register:
Condition Flags (in CPSR)
Positive or Zero (Plus)
C==1 and Z==0
Unsigned lower or same
C==0 or Z==1
Signed greater than or equal
Signed less than
Signed greater than
Z==0 and N==V
Signed less than or equal
Z==1 or N!=V
Floating Point Registers
As mentioned earlier the processor also has to have the ISA extensions of VFP (Hardware Floating Point) and NEON (128-bit SIMD Architecture). Here is what they are.
As you can see this is 16 – 64bit regiters (d0-d15) that is overlaid with 32 – 32bit registers (s0-s31). There are varieties of the ARM processor that has 32 – 64bit registers and 64 – 32bit registers. Windows 8 will support both 16 and 32 register variants. You have to be careful when using these, because if you access unaligned floats you may cause an exception.
As you can see here the SIMD (NEON) extension adds 16 – 128 bit registers (q0-q15) onto the floating point registers. So if you reference Q0 it is the same as referencing D0-D1 or S0-S1-S2-S3.
In part 2 we will discuss how Windows utilizes this processor.
I was recently investigating a crash in an application. As I researched the issue I found a very old defect in the code that was only recently being exposed by the compiler.
The crash occurred at the below instruction because the ebx register does not hold a valid pointer.
eax=d9050cf7 ebx=003078c0 ecx=6e2e0000 edx=00000000 esi=00000001 edi=0c334468
eip=65637fbe esp=010eb408 ebp=010eb878 iopl=0 nv up ei pl nz na po nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202
65637fbe 8b4b1c mov ecx,dword ptr [ebx+1Ch] ds:002b:003078dc=????????
0:001> dd 003078c0
003078c0 ???????? ???????? ???????? ????????
003078d0 ???????? ???????? ???????? ????????
003078e0 ???????? ???????? ???????? ????????
003078f0 ???????? ???????? ???????? ????????
00307900 ???????? ???????? ???????? ????????
00307910 ???????? ???????? ???????? ????????
00307920 ???????? ???????? ???????? ????????
00307930 ???????? ???????? ???????? ????????
Examining the assembly leading up to the crash, ebx came from [ebp-40c].
0:001> ub .
65637f9d 6a08 push 8
65637f9f ff156cf06465 call dword ptr [riched20!_imp__CreateBitmap (6564f06c)]
65637fa5 898784000000 mov dword ptr [edi+84h],eax
65637fab eb06 jmp riched20!CTxtSelection::CreateCaret+0x41e (65637fb3)
65637fad 8bb5e4fbffff mov esi,dword ptr [ebp-41Ch]
65637fb3 8b9df4fbffff mov ebx,dword ptr [ebp-40Ch]
65637fb9 ff775c push dword ptr [edi+5Ch]
65637fbc 6a01 push 1
0:001> dd @ebp-40c l1
Looking at the whole function, [ebp-40c] was populated at the beginning of the function as the contents of edi+1C. The contents of edi+1Ch were first moved into ecx and later the value of ecx was moved into [ebp-40Ch]. Further examination of the whole function showed the edi register is unchanged at the time of the crash, so I can use its current value to determine what [ebp-40c] should contain.
0:001> uf riched20!CTxtSelection::CreateCaret
65637b95 8bff mov edi,edi
65637b97 55 push ebp
65637b98 8bec mov ebp,esp
65637b9a 81ec5c040000 sub esp,45Ch
65637ba0 a100e06465 mov eax,dword ptr [riched20!__security_cookie (6564e000)]
65637ba5 33c5 xor eax,ebp
65637ba7 8945fc mov dword ptr [ebp-4],eax
65637baa 53 push ebx
65637bab 56 push esi
65637bac 57 push edi
65637bad 8bf9 mov edi,ecx
65637baf 8b4f1c mov ecx,dword ptr [edi+1Ch] <<< The value originates from [edi+1Ch]
65637bb2 0fbf4740 movsx eax,word ptr [edi+40h]
65637bb6 898df4fbffff mov dword ptr [ebp-40Ch],ecx <<< Store the value on the stack
65637fb3 8b9df4fbffff mov ebx,dword ptr [ebp-40Ch] <<< Read the value from the stack
65637fbe 8b4b1c mov ecx,dword ptr [ebx+1Ch] <<< Crash here because ebx is invalid
The expected value of [ebp-40C], and thus the expected value of the ebx register, is 091978c0 based on the value in [edi+1Ch] at the time of the crash. This would be a valid pointer and is not what is currently in [ebp-40C] or ebx. It is noteworthy that at the time of the crash, ebx is similar to what should be there, it differs only by the high word of the dword.
0:001> r ebx
0:001> dd @edi+1c l1
The expected value, 091978c0, is a valid pointer.
0:001> dd 091978c0
091978c0 091978c8 00000000 00000501 05000000
091978d0 00000015 076c1a27 2a372f35 0c2e3998
091978e0 000049aa 00000000 00000000 00000000
091978f0 00000000 00000000 00000000 00000000
09197900 00000000 00000000 00000000 00000000
09197910 00000000 00000000 00000000 00000000
09197920 1a3098a8 00000000 00000000 00000000
09197930 00000000 00000000 00000000 00000000
Somehow the value at ebp-40C was changed between instruction 65637bb6, where [ebp-40C] was set, and instruction 65637fb3 where [ebp-40C] was read. Fortunately I had a mechanism to reproduce this crash so I was able to set a breakpoint and trace through how this happened.
First I set a breakpoint on the instruction that populates [ebp-40C].
0:003> bp 65637bb6
Breakpoint 0 hit
eax=ffffffff ebx=0c334468 ecx=091978c0 edx=00000060 esi=091978c0 edi=0c334468
eip=65637bb6 esp=010eb410 ebp=010eb878 iopl=0 nv up ei pl nz na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000206
65637bb6 898df4fbffff mov dword ptr [ebp-40Ch],ecx ss:002b:010eb46c=00000000
Next I calculated ebp-40C and set a break on write access breakpoint.
Evaluate expression: 17740908 = 010eb46c
0:001> ba w4 010eb46c
Breakpoint 1 hit
eax=00000030 ebx=00000000 ecx=00000000 edx=00000020 esi=00000001 edi=0c334468
eip=65637f67 esp=010eb40c ebp=010eb878 iopl=0 nv up ei pl zr na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246
65637f67 66898475f4fbffff mov word ptr [ebp+esi*2-40Ch],ax ss:002b:010eb46e=0919
The write breakpoint hit at a location I was not expecting. The instruction where the breakpoint hit is not modifying the variable that was stored at [ebp-40C].
Although I cannot share the Windows source code on this blog, the code in question roughly resembles the below example. Note that a proficient assembly language reader could figure out the code flow, this example is not sharing any magic.
p1 = GetStruct1();
array[i-2] = 0x30;
p1->p = variable2; // Crash here because p1 is not a valid pointer
We are crashing because p1 is not a valid pointer. The high word of p1 is being overwritten as 0030 by the line “array[i-2] = 0x30;” because i is 1, leading to an underflow of the array. This underflow is corrupting the pointer in p1.
Clearly there is a defect in the above code. If it is legitimate for i to be 1 (and it is), then a check must be made to prevent an underflow of the array. However further research found that this code has been consistent for many years and many releases of the product. Why is this suddenly crashing now? As the bank robber in Dirty Harry said, “I gots to know."
In the above assembly we calculate that “array” starts at ebp-408 (assuming i is always 2 or greater, 2*2-40c is -408). In the earlier assembly we see that p1 is placed at ebp-40c. In this configuration an underflow of “array” will always corrupt p1.
Examining the assembly on a system that does not crash, I found that the local variables are stored differently in a different version of this binary. In the beginning of the function we see that p1 is stored in ebx. In this version of the binary ebx is never stored on the stack, so it cannot be corrupted by an underflow.
0:000> uf riched20!CTxtSelection::CreateCaret
74e75c53 8bff mov edi,edi
74e75c55 55 push ebp
74e75c56 8bec mov ebp,esp
74e75c58 81ec58040000 sub esp,458h
74e75c5e a19010e974 mov eax,dword ptr [riched20!__security_cookie (74e91090)]
74e75c63 53 push ebx
74e75c64 56 push esi
74e75c65 8bf1 mov esi,ecx
74e75c67 8b5e1c mov ebx,dword ptr [esi+1Ch]
The code that populates array[i-2] with 0x30 is later in the function. In this version, array is stored at ebp-404. If there is an underflow it will corrupt ebp-408.
74e76034 66c7847df8fbffff3000 mov word ptr [ebp+edi*2-408h],30h
The value stored at ebp-408 is used in several places in this function, however it is never used after instruction 74e76034 executes. This means any underflow in the array only corrupts memory that is not used after the corruption, and as a result the corruption never results in a crash. Although this defect has existed for a long time, the compiler has protected us until now.
74e75d3f 0b85f8fbffff or eax,dword ptr [ebp-408h]
74e75e51 ffb5f8fbffff push dword ptr [ebp-408h]
74e75e8a 8b8df8fbffff mov ecx,dword ptr [ebp-408h]
74e75f20 398df8fbffff cmp dword ptr [ebp-408h],ecx
74e75fec 8b85f8fbffff mov eax,dword ptr [ebp-408h]
The issue discussed in this article was addressed as part of KB2883200.