I know the answer (it's 42)

A blog on coding, .NET, .NET Compact Framework and life in general....

  • I know the answer (it's 42)

    ESP8266 Wifi With Arduino Uno and Nano


    If you are trying to add Wifi connectivity to an existing Arduino project or have serious aspirations for developing a Internet of Things (IoT) solution, Arduino + ESP8266 wifi module is one  of the top choices. Especially the Nano because it is super cheap (<$3) and is very small in size. Using some sort of web-server directly on ESP8266 (e.g. via Lua) doesn't cut it due to the lack of IO pins on ESP8266. You can get a full IoT node out at under $12 with a few sensors, Arduino Nano and a ESP9266 module (excluding the power supply).

    Inspite of a plethora of posts online it turned out to be very hard for me to get this to combination to work. I spent atleast 3-4 days until I actually got this right. The main problem I see is that a lot of the solutions online are actually down-right incorrect, not-recommended or for other similar boards (e.g. Arduino Mega). Also there are a few gotchas that were not commonly called out. Before I start let me get all of those out of the way

    1. Arduino Uno/Nano is very different from say Mega which can supply more current and have different number of UART. The steps to make a Uno and Nano work is different from them.
    2. Power Supply
      1. ESP8266 is powered by 3.3V and NOT 5V. So you cannot have a common power supply between Arduino and ESP8266
      2. ESP8266 draws way more current (200mA) then it can be supplied by the 3.3v pin on the Uno/Nano. Don’t even try them, I don't buy anyone who claims to have done this. Maybe they have some other high power variant of Arduino (Mega??) that can do this.
      3. So you either use a 3.3v 1A power supply to ESP8266 with common ground with the 5V powering Arduino, or you use a step down 5v to 3.3v (e.g. like here).
    3. Arduino <-> ESP8266
      1. All the ESP8266 I bought  came with the UART serial IO speed (BAUD) set to 115200. Now the problem is that Uno/Nano has only one HW serial, which is set to be used for communicating with the PC over USB with which you are debugging. You can use any other two IO pins to talk to the ESP8266 using SoftwareSerial, but it does not support that high a BAUD speed. If you try 115200 to communicate with Arduino <-> ESP8266 you will get garbage. A lot of articles online show a setup with Arduino Mega which does have two HW serial IO using which you can easily get 115200 and more. So you need to dial the ESP8266 settings to move the communication speed to a more manageable BAUD of 9600
      2. Arduino IO pins have 5V and ESP8266 accepts 3.3 v (max 3.6). I have seen people directly connect the pins but you are over driving the ESP8266. If it doesn’t burn out immediately (the cheaper ones does), it will burn out soon. I suggest you use a voltage divider using simple resistor to have Arduino transmission (TX) drive ESP8266 receive (RX)
      3. For some strange reason D2/D3 pins on Arduino Nano didn’t work for me for the communicating with ESP8266. I have no explanation for this and it happened on two separate Nano. The Arduino would just read a whole bunch of garbage character. So I had to move to the pins 8/9.
      4. In spite of whatever I did, garbage characters would still come in sometimes. So I wrote a small filter code to ignore them


    Things you need

    1. ESP8266
    2. Arduino Nano
    3. Power supply 5v and 3.3v
    4. Resistors 1K, 2.2K, 10K
    5. FTDI USB to serial TTL adapter. Link (optional, see below)

    Setting up ESP8266

    imageAs mentioned above I first set the ESP8266 BAUD rate to 9600. If yours is already 9600 then nothing to be done, if not you need to make the following connection

    PC (USB) <-> FTDI <-> ESP8266

    Then using specific AT commands from the PC set the 9600 BAUD rate on the ESP8266. I used the following circuit. Where the connections are as follows

    FTDI TX –> Via voltage divider (to move 5v to ~3.3v) to ESP8266 RX (blue wire)
    FTDI RX –> Directly to ESP8266 TX (green wire). A 3.3v on Nano I/0 pin will be considered as 1.
    FTDI GND to common ground (black)

    ESP8266 GND to common GND (black)
    ESP8266 VCC to 3.3v (red)
    ESP8266 CH_PD to 3.3v via a 10K  resistor (red)

    Power supply GND to common GND

    PC to FTDI USB.

    One that is set bring up Arduino IDE and do the following using the menu

    1. Tools –> Port –>COM{n}. For me it was COM6
    2. Then Tools –> Serial monitor

    In the serial monitor ensure you have the following set correctly. The BAUD should match the preset BAUD of your ESP8266. If you are not sure, use 115200 and type the command AT. If should return OK, if not try changing the BAUD, until you get that.


    Then change the BAUD rate by using the following command, and you should get OK back


    After that immediately change the BAUD rate in the serial monitor to be 9600 baud as well and issue a AT command. You should see OK. You are all set for the ESP8266.

    Setting up Arduino Nano + ESP8266

    This step should work for Uno as well. Essentially make the same circuit as above, but now instead of FTDI use an Arduino. I used pins 8 and 9 on Arduino for the RX and TX respectively.



    Debugging and Setup WIFI

    Even though I could easily run AT commands with the PC <->FTDI <-> ESP8266, I ran into various issues while doing the same programmatically in PC <->Arduino <-> ESP8266 setup. So I wrote the following very simple code to pass on commands I typed in the PC via the Arduino to the ESP8266 and reverse for outputs.

    The code is at GitHub as https://github.com/bonggeek/Samples/blob/master/Arduino/SerialRepeater.ino

    #include <SoftwareSerial.h>
    SoftwareSerial softSerial(8, 9); // RX, TX
    void setup() 
      uint32_t baud = 9600;
      Serial.print("SETUP!! @");
    void loop() 
        while(softSerial.available() > 0) 
          char a = softSerial.read();
          if(a == '\0')
          if(a != '\r' && a != '\n' && (a < 32))
        while(Serial.available() > 0)
          char a = Serial.read();

    With this code built and uploaded to Arduino I launched the Serial monitor on my PC. After that I could type commands in my Serial Monitor and have the Arduino pass that only ESP8266 and read back the response. I can still see some junk chars coming back (in RED). All commands are in Green and could easily enumerate all Wifi in range using AT+CWLAP and even connect to my Wifi.


  • I know the answer (it's 42)

    Publishing a ASP.NET 5 Web-Application to IIS Locally

    I ran into few issues and discovered some kinks in publishing the new ASP.NET 5 Web-Application to an Internet Information Services (IIS) on the local box and then accessing it from other devices on the same network.

    While there may be a number of different ways of doing this, the following worked for me.

    Visual Studio

    After you have create a new Project using File > New Project > ASP.NET Web Application


    Change the build to use x64 and not ANY CPU


    Now Right click on the project and choose publish. We will use File System publishing to push the output to a folder location and then get IIS to load it


    Publish target is inside default IIS web root folder. This might be different for your setup.


    Use 64 bit release in settings


    Finally publish it


    So with this step done your web application is now published to c:\inetpub\wwwroot\HomeServer


    Now launch the IIS Manager by hitting Win key and searching for IIS Manager

    Right click on default web-site and use Add Application.


    Create and point the application to the published app. Note that this is not the top level c:\inetpub\wwwroot\HomeServer, but rather the wwwroot folder inside it. This is required because the web.config is inside that folder. So we use c:\inetpub\wwwroot\HomeServer\wwwroot


    Hit, OK to create the web-app and then restart the web-site


    Now browse to the web-site, which in my case is http://localhost/HomeServer


    Accessing from local network

    To access the same website from other devices on the same network you need to enable access through the firewall. Search and select (Win key and type) “Allow an App Through Windows Firewall” then in the Control panel window that opens (Control Panel\System and Security\Windows Firewall\Allowed apps), click the “Change Settings” button and then check “World Wide Web Services”


    Find the local servers IP by running the ipconfig command in command shell. Then you can reach this from other devices on the same network.

    Screen shot of accessing the web-site from my cell phone connected to the same network over wifi.


  • I know the answer (it's 42)

    Dual Booting Ubuntu and Windows 10


    Even though I hit a ton of online resources outlining the details, I struggled a lot getting this to work. So in this blog I am outlining what worked for me.

    My setup was a office laptop (Lenovo X1 Carbon) with Trusted Platform Module (TPM), secure boot and Bit-locker enabled. My goal is to dual boot Ubuntu 15.10 with Windows 10.

    Setup the installer USB

    Get a USB thumb drive and format it as FAT32. NOTE: Do NOT use NTFS, otherwise the installation will completely fail with Kernel Panic


    I downloaded my Ubuntu 15.10 from http://www.ubuntu.com/desktop. Then I downloaded the Universal USB Installer from http://www.pendrivelinux.com/universal-usb-installer-easy-as-1-2-3/. I used the following setting to create the installer pen drive


    Setup your PC for Dual Boot

    If you do not have secure boot (UEFI or bitlocker) life is simple. But if you are reading this blog, then definitely you are not in that boat.

    First of all disable boot-locker before making any changes into startup settings. Just type Bitlocker in the search box and choose Manage Bitlocker. In that turn it off.


    Also ensure you have a empty partition to install Ubuntu onto. If you do not you need to partition your disk. There are various freeware and even Windows Disk Management can do that for you. I have the following D drive that I intended to install Ubuntu onto


    Disable Secure Boot

    From WinKey+R run the command

    shutdown.exe /r /o /f /t 00

    Windows will restart with the following screen, choose Troubleshoot


    In the next, advanced options screen choose UEFI Firmware settings


    The use the restart button


    This will bring up the BIOS window. It will differ for different PC/Laptop, but in any case you need to disable various secure boot options. In my Lenovo Thinkpad, I followed the following screen.

    Go to Security –> Secure Boot and Disable Secure Boot



    Then in startup enable legacy boot.


    Save the changes and exit (F10) and boot into windows

    Installing Ubuntu

    Now in Windows run the same command as before

    shutdown.exe /r /o /f /t 00

    This will bring up the following options, choose “Use a Device”


    In that use “Boot Menu”


    If you have done everything right the machine will restart and offer a boot menu with your USB thumb drive listed as an option.


    Selecting the bootable thumb drive should launch Ubuntu installer. Choose Install Ubuntu to start installing. Follow through the installer wizard.


  • I know the answer (it's 42)

    How to add a breakpoint in a managed generic method in windbg (sos)

    Milkyway over Mt. Rainier

    This is not really a blog post but a micro-post. Someone asked me and since I couldn’t find any post out there calling it out, thought I’d add

    If you want to add a breakpoint to a managed method inside windbg using sos extension, the obvious way is to use the extension command !bpmd. However, if the target method is generic or inside a generic type it is slightly tricky, you don’t use <T> but rather `<count of generic types>

    So I have a the following inside by foo.exe managed app

    namespace Abhinaba
        public class PriorityThreadPool<t> : IDisposable
            public bool RunTask(T param, Action<t> action)
                // cool stuff

    To set breakpoint in it I use the following (notice the red highlighted part)

    !bpmd ApplicationHost.exe Xap.ApplicationHost.PriorityThreadPool`1.RunTask

  • I know the answer (it's 42)

    List of Modules loaded


    While working on the .NET Loader and now in Bing where I am right now working on some features around module loading I frequently need to know and filter on the list of modules (dll/exe) loaded in a process or on the whole system. There are many ways to do that like use GUI tools like Process Explorer (https://technet.microsoft.com/en-us/sysinternals/bb896653.aspx) or even attach a debugger and get the list of loaded modules. But those to me seems either cumbersome (GUI) or intrusive (debugger). So I have written a small command line tool. It’s native and less than 100kb in size. You can get the source on GitHub at https://github.com/bonggeek/Samples/tree/master/ListModule or the binary at http://1drv.ms/1NAzkvy.

    The usage is simple. To see the modules loaded in all processes with the name note in it. You just use the following

    F:\GitHub\Samples\ListModule>listmodule note
    Searching for note in 150 processes
    \Device\HarddiskVolume2\Program Files\Microsoft Office 15\root\office15\ONENOTEM.EXE (8896)
            (0x00DB0000)    C:\Program Files\Microsoft Office 15\root\office15\ONENOTEM.EXE
            (0xCBEF0000)    C:\windows\SYSTEM32\ntdll.dll
            (0x776D0000)    C:\windows\SYSTEM32\wow64.dll
    \Device\HarddiskVolume2\Program Files\Microsoft Office 15\root\office15\onenote.exe (12192)
            (0x01340000)    C:\Program Files\Microsoft Office 15\root\office15\ONENOTE.EXE
            (0xCBEF0000)    C:\windows\SYSTEM32\ntdll.dll
    \Device\HarddiskVolume2\Windows\System32\notepad.exe (19680)
            (0xF64A0000)    C:\windows\system32\notepad.exe
            (0xCBEF0000)    C:\windows\SYSTEM32\ntdll.dll
            (0xCB7D0000)    C:\windows\system32\KERNEL32.DLL

    The code uses Win32 APIs to get the info. This is a quick tool I wrote, so if you find any bugs, send it my way.

  • I know the answer (it's 42)

    .NET RyuJIT Rocks


    As the CLR team announced a few days back, CTP for the new fast JIT from .NET (code named RyuJIT) is out. Go check out the announcement at http://blogs.msdn.com/b/clrcodegeneration/archive/2014/10/31/ryujit-ctp5-getting-closer-to-shipping-and-with-better-simd-support.aspx. In the post they say

    Recently the Bing team has tried using RyuJIT on top of 4.5.1 in some of their processing, and they see a 25% reduction in startup time in their scenario.  This is the most significant real-world throughput win we have witnessed on RyuJIT thus far.

    Being from that “Bing Team” let me just say that RyuJIT just blew our expectation. Some of our workloads run incredibly large managed loads (many 1000s of assemblies running million of methods). We saw dramatic drop in startup time as well as subsequent JIT time. And also may I add that we are actually using the CTP bits to serve live traffic. It’s been pretty stable for us.

    The graph shows the startup times averaged over all machines.


  • I know the answer (it's 42)

    Halloween Costume with Arduino


    This Halloween me and my daughter decided to add some dazzle to her fairy costume. Since we were anyway learning to code on Arduino we decided to dip our hands in wearables.

    The basic idea is to build a costume that glows when someone comes close. The project was intended to teach a 9 year old to code and is hence simple enough for her to grasp. We used the following


    1. Arduino UNO board
    2. TIP120 transistor
    3. Diode 1N4004
    4. 1K Resistor
    5. HC-SR04 Ultrasonic Range Finder


    It’s best to consider the circuit as two separate pieces. One to acquire the distance of someone approaching using the HC-SR04 ultrasound range finder. The second is to actually make the LED strip glow.

    The first part consists of connecting the 4 pins of the HC-SR04 as follows


    We cannot simply drive the LED strip using an output pin of Arduino because the strip drains way more current than that can be supplied by the Arduino chip. So we use a TIP120 or TIP121 chip as shown below


    There is a nice explanation of this whole setup at http://www.instructables.com/id/Use-Arduino-with-TIP120-transistor-to-control-moto/. The same principles hold, but instead of a fan we use a LED strip in our case.


    The entire code is available on GitHub at https://github.com/bonggeek/GlowDress/ (I cleaned up the code a tiny bit after my daughter wrote it). This is how it looks

    #include <ultrasonicranging.h>
    #define ECHO_PIN 2 // ECHO pin of HC-SR04
    #define TRIG_PIN 3 // Trigger pin of HC-SR04
    #define LED_OUT  5 // Drive LED (Base pin of TIP120
    const int space = 125; // Distance in cm in which to trigger LED
    void setup()
       Serial.begin (9600);
       pinMode(TRIG_PIN, OUTPUT); // trigger pin of US range finder
       pinMode(ECHO_PIN, INPUT);  // Echo pin of US range finder
       pinMode(LED_OUT, OUTPUT);  // base of TIP120 to drive LED
       analogWrite(LED_OUT, 0); 
    void GlowLed()
      // Slowly get from LED strip off to full bright (glow-in)
      for (int brightness = 0; brightness < 255; brightness++) 
        analogWrite(LED_OUT, brightness);
      // Slowly get from LED strip on to full off (glow-out)
      for (int brightness = 255; brightness >= 0; brightness--) 
        analogWrite(LED_OUT, brightness);
    void loop() 
      int distance = GetDistanceInCm(TRIG_PIN, ECHO_PIN);
      if (distance <= 0 || distance > space)
        analogWrite(LED_OUT, 0);
      if (distance <= space)

    Here to abstract away the intricacies of how distance is received from the ranger, I have used GetDistanceInCm. The source for this library is at  https://github.com/bonggeek/GlowDress/tree/master/UltraSonicRanging.

    Once we tested out the circuit we went ahead and soldered it on a board. My daughter did receive a battle scar (a small burn from iron) but we battled on.

    IMG_0565This is how it looks partially done 


    With my wife’s help we sewed it underneath her fairy dress. It was pretty well concealed other than the sensor sticking out a bit.


  • I know the answer (it's 42)

    .NET Just in Time Compilation and Warming up Your System


    One of the primary obstacle we face while scaling our system is Just In Time (JIT) compilation of the .NET stack. We run a .NET managed application server at a huge scale with many thousands of managed assemblies on each server (and many many thousands of those servers distributed globally).We deploy code daily and since those are managed code they are JITed at each deployment. Our system is very sensitive to latency and we do not want those servers getting new code to cost us execution latency. So we use various mechanisms to warm the servers before they start serving queries. Common techniques we use are

    1. NGEN
    2. Force JITing (a specialized multi-core JIT technique see bottom of post)
    3. Sending warmer queries that warm up the system

    In this effort I frequently handle questions regarding how the system JITs managed methods. Even after so many years of the CLR JIT existing there seems to be confusion around when JIT happens, what is the unit of compilation. So I thought I’d make a quick post on this topic.

    Consider the following simple code I have in One.cs

    using System;
    namespace Foo
        class MyClass
            public void a()
                int i = 0;
            public void b(int i)
        class Program
            static void Main()
                MyClass mc = new MyClass();

    Of interest is the function Foo.MyClass.a and Foo.MyClass.b. We will debug to find out exactly when the later is JITed.

    First I compile and then launch the debugger. I will use the windbg debugger and the sos extensions extensively in this post. Also read https://support2.microsoft.com/kb/319037?wa=wsignin1.0 to see how to setup symbol servers to debug into the CLR.

    csc /debug+ One.cs
    windbg one.exe

    After that in windbg I run the following command to setup

    .sympath+ d:\MySymbols          ;$$ Use the Microsoft symbol server (see link above)
    sxe ld:clr                      ;$$ break on CLR loaded
    g                               ;$$ continue the program until you break on CLR.dll being loaded
    .loadby sos clr                 ;$$ load the sos debugger extension

    !bpmd One.exe Foo.MyClass.a ;$$ Set a managed break point in a()
    g ;$$ continue until break point in a() is hit

    When this break point is hit, we have obviously already JITed MyClass.a() and executing it. The question we now have is that whether all the functions a() calls like MyClass.b() already JITed. If not when/how will that be JITed. Lets debug it!!

    **Color coding indicates how I take output of one command to give inputs to the next one.

    First lets find the this pointer for the MyClass instance. This can be obtained from the current managed call stack

    0:000> !clrstack -a
            this (0x0000000000d9eea0) = 0x0000000002922c58

    The details of the this object shows the MethodTable for it. The MethodTable has pointer to EEClass (cold data).

    0:000> !do 0x0000000002922c58
    Name:        Foo.MyClass
    MethodTable: 00007ffab2f640d8
    EEClass:     00007ffab3072340
    Size:        24(0x18) bytes
    File:        d:\Skydrive\Code\C#\_JITPresentation\One.exe

    Now we can see more details of the MethodTable, which will show the individual methods descriptors.

    0:000> !dumpmt -md 00007ffab2f640d8
    EEClass:         00007ffab3072340
    Module:          00007ffab2f62fc8
    Name:            Foo.MyClass
    mdToken:         0000000002000002
    File:            d:\Skydrive\Code\C#\_JITPresentation\One.exe
    BaseSize:        0x18
    ComponentSize:   0x0
    Slots in VTable: 7
    Number of IFaces in IFaceMap: 0
    MethodDesc Table
    Entry       MethodDesc    JIT Name
    00007ffb07c16300 00007ffb077c80e8 PreJIT System.Object.ToString()
    00007ffb07c5e760 00007ffb077c80f0 PreJIT System.Object.Equals(System.Object)
    00007ffb07c61ad0 00007ffb077c8118 PreJIT System.Object.GetHashCode()
    00007ffb07c5eb50 00007ffb077c8130 PreJIT System.Object.Finalize()
    00007ffab3080120 00007ffab2f640d0    JIT Foo.MyClass..ctor()
    00007ffab3080170 00007ffab2f640b0    JIT Foo.MyClass.a()
    00007ffab2f6c050 00007ffab2f640c0   NONE Foo.MyClass.b(Int32)

    The type has 7 methods. Also the out clearly indicates that Foo.MyClass.a() is JITed and Foo.MyClass.b() is NONE (or not JITed). We can get more details about these methods

    0:000> !dumpmd 00007ffab2f640b0
    Method Name:  Foo.MyClass.a()
    Class:        00007ffab3072340
    MethodTable:  00007ffab2f640d8
    mdToken:      0000000006000001
    Module:       00007ffab2f62fc8
    IsJitted:     yes
    CodeAddr:     00007ffab3080170 <----- JITed
    Transparency: Critical
    i. 0:000> !dumpmd 00007ffab2f640c0
    Method Name:  Foo.MyClass.b(Int32)
    Class:        00007ffab3072340
    MethodTable:  00007ffab2f640d8
    mdToken:      0000000006000002
    Module:       00007ffab2f62fc8
    IsJitted:     no
    CodeAddr:     ffffffffffffffff <----- Not yet JITed

    So at this point we know that a() is JITed but the method b() it calls is not. In that the question arises that if it is not what is the content of the native instructions for a() and what does that code call into for b(). The disassembly will clearly show that the entire method a() is JITed and that for outward managed calls there are calls to stubs

    0:000> u 00007ffab3080170 L24
    One!Foo.MyClass.a() [d:\Skydrive\Code\C#\_JITPresentation\One.cs @ 8]:
    00007ffa`b3080170 48894c2408      mov     qword ptr [rsp+8],rcx
    00007ffa`b3080175 4883ec38        sub     rsp,38h
    00007ffa`b3080179 c744242000000000 mov     dword ptr [rsp+20h],0
    00007ffa`b3080181 c644242400      mov     byte ptr [rsp+24h],0
    00007ffa`b3080186 48b83834f6b2fa7f0000 mov rax,7FFAB2F63438h
    00007ffa`b3080190 8b00            mov     eax,dword ptr [rax]
    00007ffa`b3080192 85c0            test    eax,eax
    00007ffa`b3080194 7405            je      One!Foo.MyClass.a()+0x2b (00007ffa`b308019b)
    00007ffa`b3080196 e82574b25f      call    clr!JIT_DbgIsJustMyCode (00007ffb`12ba75c0)
    00007ffa`b308019b 90              nop
    00007ffa`b308019c c744242000000000 mov     dword ptr [rsp+20h],0
    00007ffa`b30801a4 eb23            jmp     One!Foo.MyClass.a()+0x59 (00007ffa`b30801c9)
    00007ffa`b30801a6 90              nop
    00007ffa`b30801a7 8b4c2420        mov     ecx,dword ptr [rsp+20h]
    00007ffa`b30801ab ffc1            inc     ecx
    00007ffa`b30801ad 8b442420        mov     eax,dword ptr [rsp+20h]
    00007ffa`b30801b1 89442428        mov     dword ptr [rsp+28h],eax
    00007ffa`b30801b5 894c2420        mov     dword ptr [rsp+20h],ecx
    00007ffa`b30801b9 8b542428        mov     edx,dword ptr [rsp+28h]
    00007ffa`b30801bd 488b4c2440      mov     rcx,qword ptr [rsp+40h]
    00007ffa`b30801c2 e889beeeff      call    Foo.MyClass.b(Int32) (00007ffa`b2f6c050)
    00007ffa`b30801c7 90 nop
    00007ffa`b30801c8 90 nop
    00007ffa`b30801c9 c644242401 mov byte ptr [rsp+24h],1 00007ffa`b30801ce ebd6 jmp One!Foo.MyClass.a()+0x36 (00007ffa`b30801a6) 00007ffa`b30801d0 90 nop 00007ffa`b30801d1 4883c438 add rsp,38h 00007ffa`b30801d5 c3 ret 00007ffa`b30801d6 0000 add byte ptr [rax],al 00007ffa`b30801d8 1909 sbb dword ptr [rcx],ecx 00007ffa`b30801da 0100 add dword ptr [rax],eax 00007ffa`b30801dc 096200 or dword ptr [rdx],esp

    So we see that for b a call is made to the memory location 00007ffa`b2f6c050. We can see what is there now by disassembling that address.

    0:000> u 00007ffa`b2f6c050
    00007ffa`b2f6c050 e87b5e755f      call    clr!PrecodeFixupThunk (00007ffb`126c1ed0)
    00007ffa`b2f6c055 5e              pop     rsi
    00007ffa`b2f6c056 0201            add     al,byte ptr [rcx]

    So basically instead of real native JITed code existing for b() there is actually a stub or thunk in it’s place. So we clearly establish that when a function is called it’s entire code is JITed and other method it calls is not yet JITed (however, there are caveats like inline methods etc). Now we can now go and set a breakpoint inside JIT to break when it tries it JIT the b() method. This is what we do

    0:000> bp clr!UnsafeJitFunction ;$$ entry point for JITing a method
    0:000> g                        ;$$ continue executing until we hit the UnsafeJITFunction
    0:000> k                        ;$$ dump the stack for JITing
    clr!ThePreStub+0x5a [f:\dd\ndp\clr\src\vm\amd64\ThePreStubAMD64.asm @ 92]
    One!Foo.MyClass.a()+0x57 [d:\Skydrive\Code\C#\_JITPresentation\One.cs @ 12]

    As we can see that the JITing actually happened in the same call thread that is executing a() and exactly when b was called. ThePreStub finally calls the JITer. The JITer will actually JIT the method b() and backtrack the stack and patch up the call, so that it will actually now be a call straight to the JITed copy of b(). We hit g couple of times and now see what happens for the MethodDescriptor for b()

    0:000> !dumpmd 00007ffab2f640c0
    Method Name:  Foo.MyClass.b(Int32)
    Class:        00007ffab3072340
    MethodTable:  00007ffab2f640d8
    mdToken:      0000000006000002
    Module:       00007ffab2f62fc8
    IsJitted:     yes
    CodeAddr:     00007ffab30801f0 <-- Now it is JITed
    Transparency: Critical

    As we see b() is now JITed and we can see it’s disassembly as well. However, more interesting, lets go back and see what the disassembly of a() now contains

    0:000> u 00007ffab3080170 L24
    One!Foo.MyClass.a() [d:\Skydrive\Code\C#\_JITPresentation\One.cs @ 8]:
    00007ffa`b3080170 48894c2408      mov     qword ptr [rsp+8],rcx
    00007ffa`b3080175 4883ec38        sub     rsp,38h
    00007ffa`b3080179 c744242000000000 mov     dword ptr [rsp+20h],0
    00007ffa`b3080181 c644242400      mov     byte ptr [rsp+24h],0
    00007ffa`b3080186 48b83834f6b2fa7f0000 mov rax,7FFAB2F63438h
    00007ffa`b3080190 8b00            mov     eax,dword ptr [rax]
    00007ffa`b3080192 85c0            test    eax,eax
    00007ffa`b3080194 7405            je      One!Foo.MyClass.a()+0x2b (00007ffa`b308019b)
    00007ffa`b3080196 e82574b25f      call    clr!JIT_DbgIsJustMyCode (00007ffb`12ba75c0)
    00007ffa`b308019b 90              nop
    00007ffa`b308019c c744242000000000 mov     dword ptr [rsp+20h],0
    00007ffa`b30801a4 eb23            jmp     One!Foo.MyClass.a()+0x59 (00007ffa`b30801c9)
    00007ffa`b30801a6 90              nop
    00007ffa`b30801a7 8b4c2420        mov     ecx,dword ptr [rsp+20h]
    00007ffa`b30801ab ffc1            inc     ecx
    00007ffa`b30801ad 8b442420        mov     eax,dword ptr [rsp+20h]
    00007ffa`b30801b1 89442428        mov     dword ptr [rsp+28h],eax
    00007ffa`b30801b5 894c2420        mov     dword ptr [rsp+20h],ecx
    00007ffa`b30801b9 8b542428        mov     edx,dword ptr [rsp+28h]
    00007ffa`b30801bd 488b4c2440      mov     rcx,qword ptr [rsp+40h]
    00007ffa`b30801c2 e889beeeff      call    Foo.MyClass.b(Int32) (00007ffa`b2f6c050)
    00007ffa`b30801c7 90              nop
    00007ffa`b30801c8 90              nop
    00007ffa`b30801c9 c644242401      mov     byte ptr [rsp+24h],1
    00007ffa`b30801ce ebd6            jmp     One!Foo.MyClass.a()+0x36 (00007ffa`b30801a6)
    00007ffa`b30801d0 90              nop
    00007ffa`b30801d1 4883c438        add     rsp,38h
    00007ffa`b30801d5 c3              ret
    00007ffa`b30801d6 0000            add     byte ptr [rax],al
    00007ffa`b30801d8 1909            sbb     dword ptr [rcx],ecx
    00007ffa`b30801da 0100            add     dword ptr [rax],eax
    00007ffa`b30801dc 096200          or      dword ptr [rdx],esp
    00007ffa`b30801df 005600          add     byte ptr [rsi],dl

    Now if we re-disassemble the target of this call at 00007ffa`b2f6c050

    0:000> u 00007ffa`b2f6c050
    00007ffa`b2f6c050 e99b411100      jmp     One!Foo.MyClass.b(Int32) (00007ffa`b30801f0)
    00007ffa`b2f6c055 5f              pop     rdi
    00007ffa`b2f6c056 0201            add     al,byte ptr [rcx]

    As you can see the address has been patched up and now b() is JITed and a() calls into b() without going through any stubs.

    Obviously in this example I took a bunch of assumptions, but hopefully you now have the understanding to go debug your own scenarios and see what is at play. Some of the takeaways if you have JIT issues at startup


    JIT happens at method granularity


    Use modular code. Especially if you have error handling and other code which is rarely or almost never used, instead of having them in the main function, move them out. This will ensure that they are never JITed or at best not JITed at startup

    void Foo()
        catch(Exception ex)

    is better than

    void Foo()
        catch(Exception ex)
             // MoreCode;
             // EvenMoreCode;

    For a given function running it once, JITs the whole function. However, do note that if it has difference code branches and each calls other functions you will need to execute all branches. In the case below Foo has to be called with both true and false to ensure downstream methods are JITed

    void Foo(bool flag)

    JITing happens in the same thread as the calls. The JIT engine does takes lock to ensure there is no races while JITing the same method from multiple threads.

    Consider using NGEN, MultiCoreJIT or ForceJIT all methods you care about or even build your own mechanism based on the following code

    Here’s some PseudoCode to force JIT which accomplishes that using RuntimeHelpers.PrepareMethod API (note this code does no error handling whatsoever). You can craft code around this to ForceJIT only assemblies and/or types in them that is causing JIT bottlenecks. Also this can be parallelized across cores. The .NET Multicore JIT is based on similar principle but automatically does it by generating a profile of what executes at your application startup and then JITing it for you in the next go.

    using System;
    using System.Reflection;
    namespace ConsoleApplication5
        class Program
            static private void ForceJit(Assembly assembly)
                var types = assembly.GetTypes();
                foreach (Type type in types)
                    var ctors = type.GetConstructors(BindingFlags.NonPublic
                                                | BindingFlags.Public
                                                | BindingFlags.Instance
                                                | BindingFlags.Static);
                    foreach (var ctor in ctors)
                        JitMethod(assembly, ctor);
                    var methods = type.GetMethods(BindingFlags.DeclaredOnly
                                           | BindingFlags.NonPublic
                                           | BindingFlags.Public
                                           | BindingFlags.Instance
                                           | BindingFlags.Static);
                    foreach (var method in methods)
                        JitMethod(assembly, method);
            static private void JitMethod(Assembly assembly, MethodBase method)
                if (method.IsAbstract || method.ContainsGenericParameters)
            static void Main(string[] args)
  • I know the answer (it's 42)

    Use of SuppressIldasmAttribute

    Meteors and sky Wish Poosh Campground, Cle Elum Lake, WA

    We use ildasm in our build deployment pipeline. Recently one internal partner pinged me saying that it was failing with a weird message that ildasm is failing to disassemble one particular assembly. I instantly assumed it to be a race condition (locks taken on the file, some sort of anti-virus holding read locks, etc). However, he reported back it is a persistent problem. I asked for the assembly and tried to run

    ildasm foo.dll

    I was greeted with


    Dumbfounded I dug around and found this weird attribute on this assembly

    [assembly: SuppressIldasmAttribute] 

    MSDN points out http://msdn.microsoft.com/en-us/library/system.runtime.compilerservices.suppressildasmattribute(v=vs.110).aspx that this attribute is to make ildasm not disassembly a given assembly. For the life of me I cannot fathom why someone invented this attribute. This is one of those things which is so surreal…. Obviously you can simply point reflector or any of the gazillion other disassemblers to this assembly and they be happy to oblige. False sense of security is worse than lack of security, I’d recommend not to use this attribute.

  • I know the answer (it's 42)

    Fastest way to switch mouse to left handed

    Milky way


    I think I was born left handed, unfortunately I was brought up to be right handed. This was not uncommon in India 30 years back. Left hand usage was looked down upon.

    However, the good thing is I am still ambidextrous (equal handed) in some things like using the mouse. For ergonomic reason I keep switching between left and right hand usage to ensure I do not wear out both my wrists with 12 hour daily computer usage.

    The main problem I face when I switch and even otherwise when I am in left-hand mode is that most PC’s are configured for right hand use. In a course of the day I use many machines (upwards of 10) as I remote into data-center machines and even 3-4 local machines. The fastest way I have found to switch between left and right hand mouse is just run the following command either from the console or from WindowsKey+R

    rundll32.exe user32.dll,SwapMouseButton

    Basically this command calls the SwapMouseButton win32 function in user32.dll.

    If you know of anything more brief either using command shell or powershell do let me know.

  • I know the answer (it's 42)

    .NET: NGEN, explicit loads and load-context promotion

    Sunset over the Pacific

    If you want to know the conclusion and want to skip the details jump to the end for the climax :). If you care to see this feature in, please vote for this at http://visualstudio.uservoice.com/forums/121579-visual-studio/suggestions/5194915-ngen-should-support-explicit-path-based-assembly-l


    In my previous post on how NGEN loads Native images I mentioned that NGEN images are supported only in the default load context. Essentially there are 3 load contexts (excluding Reflection-only context) and based on how you load an assembly it lands in one of those 3 contexts. You can read more about the load contexts at http://blogs.msdn.com/b/suzcook/archive/2003/05/29/57143.aspx. However for our purposes
    1. Default context: This is the context where assembly loaded through implicit assembly references or Assembly.Load(…) call lands
    2. LoadFrom context is where assemblies loaded with Assembly.LoadFrom call is placed
    3. Null-context or neither context is where assemblies loaded with Assembly.LoadFile, reflection-emit (among other APIs) are placed.
    Even though a lot of people view the contexts only in the light of how they impact searching of assembly dependencies, they have other critical impact. E.g. native images of an assembly (generated via NGEN) is only loaded if that assembly is loaded in the default context.

    #1 and #3 are pretty simple to understand. If you use Assemby.Load or if your assembly has other implicit assembly dependency then for those assemblies NativeBinder will search for their native images. If you try to load an assembly through Assembly.LoadFile(“c:\foo\some.dll”) then it will be loaded in null-context and will definitely not get native image support. Things get weird for #2 (LoadFrom).


    Lets see an simple example where I have an executable loadfrom.exe which has the following call
    Assembly assem = Assembly.LoadFrom(@"c:\temp\some.dll");
    some.dll has been NGEN’d as

    c:\temp>ngen install some.dll
    Microsoft (R) CLR Native Image Generator - Version 4.0.30319.17929
    Copyright (c) Microsoft Corporation. All rights reserved.
    1> Compiling assembly c:\temp\some.dll (CLR v4.0.30319) ...

    Now we run the loadfrom.exe as follows
    Got assembly
    In the the fusion log I can see among others the following messages

    WRN: Native image will not be probed in LoadFrom context. Native image will only be probed in default load context, like with Assembly.Load().
    LOG: Start validating all the dependencies.
    LOG: [Level 1]Start validating native image dependency mscorlib, Version=, Culture=neutral, PublicKeyToken=b77a5c561934e089.
    Native image has correct version information.
    LOG: Validation of dependencies succeeded.
    LOG: Bind to native image succeeded.
    Attempting to use native image C:\Windows\assembly\NativeImages_v4.0.30319_32\some\804627b300f73759069f96bac51811a0\some.ni.dll.
    Native image successfully used.

    Interestingly the native image was loaded for some.dll even though it was loaded using Assembly.LoadFrom. This was done in spite of loader clearly warning in the log that it will not attempt to load the native image.

    Now lets trying running this same program just ensuring that the exe and dll is not in the same folder

    c:\temp>copy loadfrom.exe ..
    1 file(s) copied.

    c:\temp>cd ..
    Got assembly
    In this case the log says something different

    WRN: Native image will not be probed in LoadFrom context. Native image will only be probed in default load context, like with Assembly.Load().
    LOG: IL assembly loaded from c:\temp\some.dll.

    As you can see the NI image was not loaded.

    The reason is Load Context promotion. When LoadFrom is used on a path from which a Load would’ve anyway found an assembly the LoadFrom results in loading the assembly in the default context. Or the load-context is promoted to the default context. In our first example since c:\temp\some.dll was on the applications base path (APPBASE) the load landed in default-context and ni was loaded. The same didn’t happen in the second example.


    1. NGEN images is only supported on the default context. E.g. for Assemblies loaded for implicit references or through Assemb.Load() API call
    2. NGEN images is not supported on explicit-loads done via Assembly.LoadFile(path)
    3. NGEN images is not reliably supported on explicit-loads done via Assembly.LoadFrom(path)
    Given the above there is no real way to load Assemblies from arbitrary paths and get NGEN native image support. In the modern programming world a lot of large applications are moving away from the traditional GAC based approach to a more plug-in based, loosely-coupled-components approach. These large application locate it’s plug-ins via its own proprietary probing logic and loads them using one of the explicit path based load mechanisms. For these there is no way to get the performance boost based on native images. I think this is a limitation which CLR needs to address in the future.
  • I know the answer (it's 42)

    Dad loves Surface

    Mt. Rainier

    I have given a surface to my daughter. A lot of my friends/family ask me how I like using Surface and whether my daughter likes it as well. I can tell you that the killer feature from dad’s point of view is Family Safety. I am not an iPad/Android user, so I do not know how they do in this area, but I love the capability in Windows. In my humble opinion this is pretty under-sold and a lot of parents are unaware of this gem.

    You just need to follow the steps at http://windows.microsoft.com/en-US/windows/set-up-family-safety to set it up either for a new account or for your child’s existing Microsoft account. Once you do just head onto https://familysafety.microsoft.com/


    Tap/click into your child’s account and you can setup various things. I use all of them including time restrictions, app restrictions.


    I get weekly email with her activity report and can ding her on the time she spends on Netflix.

    Also I get an email when she tries to install weird games. And no I’m not going to allow her to use Gangnam Guy


    Setting daily time limits and curfew hours is fun :)




    When she goes out of this time range Surface locks up with a screen asking her to take the device to a parent to unlock. I can use my live ID and extend her hours if I so want. Now the main problem is to say no to such a cute face :)


  • I know the answer (it's 42)

    .NET: Loading Native (NGEN) images and its interaction with the GAC


    It’s common for people to think that NGEN works with strong named assemblies only and it places output files or uses GAC closely. This is mostly not true.

    If you are new to this there’s a quick primer on NGEN that I wrote http://blogs.msdn.com/b/abhinaba/archive/2013/12/10/net-ngen-gac-and-their-interactions.aspx.


    The Global Assembly Cache or GAC is a central repository where managed assemblies can be placed either using the command like gacutil tool or programmatically using Fusion APIs. The main benefits of GAC is

    1. Avoid dll hell
    2. Provide for a central place to discover dependencies (place your binary in the central place and other applications will find it)
    3. Allow side-by-side publication of multiple versions of the same assembly
    4. Way to apply critical patches, especially security patches that will automatically flow all app using that assembly
    5. Sharing of assemblies across processes. Particularly helpful for system assemblies that are used in most managed assemblies
    6. Provide custom versioning mechanisms (e.g. assembly re-directs / publisher policies)

    While GAC has it’s uses it has its problems as well. One of the primary problem being that an assembly has to be strongly named to be placed in GAC and it’s not always possible to do that. E.g. read here and here.


    The NIC or Native Image Cache is the location where NGEN places native images. When NGEN is run to create a native image as in

    c:\Projects>ngen.exe install MyMathLibrary.dll

    The corresponding MyMathLibrary.ni.dll is placed in the NIC. The NIC has a similar purpose as GAC but is not the same location. NIC is placed at <Windows Dir>\assembly\NativeImages_<CLRversion>_<arch>. E.g. a sample path is


    NGEN places the files it generates in NIC along with other metadata to ensure that it can reliably find the right native image corresponding to an IL image.

    How does the .NET Binder find valid native images

    The CLR module that finds assemblies for execution is called the Binder. There are various kinds of binders that CLR uses. The one used to find native images for a given assembly is called the NativeBinder.

    Finding the right native image involves two steps. First the IL image and the corresponding potential native image is located on the file system and then verification is made to ensure that the native image is indeed a valid image for that IL. E.g. the runtime gets a request to bind against an assembly MyMathLibrary.dll as another assembly program.exe has dependency on it. This is what will happen

    1. First the standard fusion binder will kick in to find that assembly. It can find it either in
      1. GAC, which means it is strongly named. The way files are placed in GAC ensures that the binder can extract all the required information about the assembly without physically opening the file
      2. Find it the APPBASE (E.g. the local folder of program.exe). It will proceed to open the IL file and read the assemblies metadata
    2. Native binding will proceed only in the default context (more about this in a later post)
    3. The NativeBinder finds the NI file from the NIC. It reads in the NI file details and metadata
    4. Verifies the NI is indeed for that very same IL assembly. For that it goes through a rigorous matching process which includes (but not limited to) full assembly name match (same name, version,  public key tokens, culture), time stamp matching (NI has to be newer than IL), MVID (see below)
    5. Also verifies that the NI has been generated for the same CLR under which it is going to be run (exact .NET version, processor type, etc…) .
    6. Also ensures that the NI’s dependencies are also valid. E.g. when the NI was generated it bound against a particular version of mscorlib. If that mscorlib native image is not valid then this NI image is also rejected

    The question is what happens if the assembly is not strongly named? The answer is in that case MVID is used to match instead of relying on say the signing key tokens. MVID is a guid that is embedded in an IL file when a compiler compiles it. If you compile an assembly multiple times, each time the IL file is generated it has an unique MVID. If you open any managed assembly using ildasm and double lick on it’s manifest you can see the MVID

    .module MyMathLibrary.dll 
    // MVID: {EEEBEA21-D58F-44C6-9FD2-22B57F4D0193}

    If you re-compile and re-open you should see a new id. This fact is used by the NativeBinder as well. NGEN stores the mvid of the IL file for which a NI is generated. Later the native binder ensures that the MVID of the IL file matches with the MVID of the IL file for which the NI file was generated. This step ensures that if you have multiple common.dll in your PC and all of which has version and is not signed, even then NI for one of the common.dll will not get used for another common.dll.

    The Double Loading Problem

    In early version of .NET when a NI file was opened the corresponding IL file was also opened. I found a 2003 post from Jason Zander on this. However, currently this is partially fixed. In the above steps look at step 1. To match NI with its IL a bunch of information is required from the IL file. So if that IL file comes from the GAC then the IL file need not be opened to get those information. Hence no double loading happens. However, if the IL file comes from outside the GAC then it is indeed opened and kept open. This causes significant memory overhead in large applications. This is something which the CLR team needs to fix in the future.


    1. Unsigned (non strong-named) assemblies can also be NGEN’d
    2. Assemblies need not be placed in GAC to ngen them or to consume the ngen images
    3. However, GAC’d files provide better startup performance and memory utilization while using NI images because it avoids double loading
    4. NGEN captures enough metadata on an IL image to ensure that if its native image has become stale (no longer valid) it will reject the NI and just use the IL
  • I know the answer (it's 42)

    NGEN Primer


    I am planning to write couple of NGEN/GAC related posts. I thought I’d share out some introductory notes about NGEN. This is for the a beginner managed developer.


    Consider I have a math-library which has this simple C# code.

    namespace Abhinaba
        public class MathLibrary
            public static int Adder(int a, int b)
                return a + b;

    The C# compiler compiles this code into processor independent CIL (Common Intermediate Language) instead of a machine specific (e.g. x86 or ARM) code. That CIL code can be seen by opening the dll generated by C# compiler in a IL disassembler like the default ildasm that comes with .NET. The CIL code looks as follows

    .method public hidebysig static int32  Adder(int32 a,
                                                 int32 b) cil managed
      // Code size       9 (0x9)
      .maxstack  2
      .locals init ([0] int32 CS$1$0000)
      IL_0000:  nop
      IL_0001:  ldarg.0
      IL_0002:  ldarg.1
      IL_0003:  add
      IL_0004:  stloc.0
      IL_0005:  br.s       IL_0007
      IL_0007:  ldloc.0
      IL_0008:  ret
    } // end of method MathLibrary::Adder

    To abstract away machine architecture the .NET runtime defines a generic stack based processor and generates code for this make-belief processor. Stack based means that this virtual processor works on a stack and it has instructions to push/pop values on the stack and instructions to operate on the values already inside the stack. E.g. in this particular case to add two values it pushes both the arguments onto the stack using ldarg instructions and then issues an add instruction which automatically adds the value on the top of the stack and pushes in the result. The stack based architecture places no assumption on the number of registers (or even if the processor is register based) the final hardware will have.

    Now obviously there is no processor in the real world which executes these CIL instructions. So someone needs to convert those to object code (machine instructions). These real world processors could be from the x86, x64 or ARM families (and many other supported platforms). To do this .NET employs Just In Time (JIT) compilation. JIT compilers responsibility is to generate native machine specific instructions from the generic IL instructions on demand, that is as a method is called for the first time JIT generates native instructions for it and hence enables the processor to execute that method. On my machine the JIT produces the following x86 code for the add

    02A826DF  mov         dword ptr [ebp-44h],edx  
    02A826E2  nop  
    02A826E3  mov         eax,dword ptr [ebp-3Ch]  
    02A826E6  add         eax,dword ptr [ebp-40h]  

    This process happens on-demand. That is if Main calls Adder, Adder will be JITed only when it is actually being called by Main. If a function is never called it’s in most cases never JITed. The call stack clearly shows this on-demand flow.

    clr!UnsafeJitFunction <------------- This will JIT Abhinaba.MathLibrary.Adder 
    App!ConsoleApplication1.Program.Main()+0x3c <----- This managed code drove that JIT

    The benefits of this approach are

    1. It provides for a way to develop applications with a variety of different languages. Each of these languages can target the MSIL and hence interop seamlessly
    2. MSIL is processor architecture agnostic. So the MSIL based application could be made to run on any processor on which .NET runs (build once, run many places)
    3. Late binding. Binaries are bound to each other (say an exe to it’s dlls) late which results in allowing more significant lee-way on how loosely couple they could be
    4. Possibility of very machine specific optimization. As the compilation is happening on the exact same machine/device on which the application will run

    JIT Overhead

    The benefits mentioned above comes with the overhead of having to convert the MSIL before execution. The CLR does this on demand, that is when a method is just going to execute it is converted to native code. This “just in time” dynamic compilation or JITing adds to both application startup cost (a lot of methods are executing for the first time) as well as execution time performance. As a method is run many times, the initial cost of JITing fades away. The cost of executing a method n times can expressed as

    Cost JIT + n * Cost Execution

    At startup most methods are executing for the first time and n is 1. So the cost of JIT pre-dominates. This might result in slow startup. This effects scenarios like phone where slow application startup results in poor user experience or servers where slow startup may result in timeouts and failure to meet system SLAs.

    Also another problem with JITing is that it is essentially generating instructions in RW data pages and then executing it. This does not allow the operating system to share the generated code across processes. So even if two applications is using the exact same managed code, each contains it’s own copy of JITed code.

    NGEN: Reducing or eliminating JIT overhead

    From the beginning .NET supports the concept of pre-compilation by a process called NGEN (derived from Native image GENeration). NGEN consumes a MSIL file and runs the JIT in offline mode and generates native instructions for all managed IL functions and store them in a native or NI file. Later applications can directly consume this NI file. NGEN is run on the same machine where the application will be used and run during installation of that application. This retains all the benefits of JIT and at the same time removes it’s overhead. Also since the file generated is a standard executable file the executable pages from it can be shared across processes.

    c:\Projects\ConsoleApplication1\ConsoleApplication1\bin\Debug>ngen install MyMathLibrary.dll
    Microsoft (R) CLR Native Image Generator - Version 4.0.30319.33440
    Copyright (c) Microsoft Corporation.  All rights reserved.
    1>    Compiling assembly c:\Projects\bin\Debug\MyMathLibrary.dll (CLR v4.0.30319) ...

    One of the problem with NGEN generated executables is that the file contains both the IL and NI code. The files can be quiet large in size. E.g. for mscorlib.dll I have the following sizes

    Directory of C:\Windows\Microsoft.NET\Framework\v4.0.30319

    09/29/2013  08:13 PM         5,294,672 mscorlib.dll
                   1 File(s)      5,294,672 bytes

    Directory of C:\Windows\Microsoft.NET\Framework\v4.0.30319\NativeImages

    10/18/2013  12:34 AM        17,376,344 mscorlib.ni.dll
                   1 File(s)     17,376,344 bytes


    Read up on MPGO tool on how this can be optimized (http://msdn.microsoft.com/library/hh873180.aspx)

    NGEN Fragility

    Another problem NGEN faces is fragility. If something changes in the system the NGEN images become invalid and cannot be used. This is true especially for hardbound assemblies.

    Consider the following code

    class MyBase
        public int a;
        public int b;
        public virtual void func() {}
    static void Main()
        MyBase m = new MyBase();
        mb.a = 42;
        mb.b = 20;

    Here we have a simple class whose variables have been modified. If we look into the MSIL code of the access it looks like

    L_0008: ldc.i4.s 0x2a
    L_000a: stfld int32 ConsoleApplication1.MyBase::a
    L_000f: ldloc.0 
    L_0010: ldc.i4.s 20
    L_0012: stfld int32 ConsoleApplication1.MyBase::b

    The native code for the variable access can be as follows

                mb.a = 42;
    0000004b  mov         eax,dword ptr [ebp-40h] 
    0000004e  mov         dword ptr [eax+4],2Ah 
                mb.b = 20;
    00000055  mov         eax,dword ptr [ebp-40h] 
    00000058  mov         dword ptr [eax+8],14h 

    The code generation engine essentially took a dependency of the layout of MyBase class while generating code to modify and update that. So the hard coded layout dependency is that compiler assumes that MyBase looks like

    <base> MethodTable
    <base> + 4 a
    <base> + 8 b

    The base address is stored in eax register and the updates are made at an offset of 4 and 8 bytes from that base. Now consider that MyBase is defined in assembly A and is accessed by some code in assembly B, and that Assembly A and B are NGENed. So if for some reason the MyBase class (and hence assembly A is modified so that the new definition becomes.

    class MyBase
        public int foo;
        public int a;
        public int b;
        public virtual void func() {}

    If we looked from the perspective of MSIL code then the reference to these variables are on their symbolic names ConsoleApplication1.MyBase::a, so if the layout changes the JIT compiler at runtime will find their new location from the metadata located in the assembly and bind it to the correct updated location. However, from NGEN this all changes and hence the NGEN image of the accessor is invalid and have to be updated to match the new layout

    <base> MethodTable
    <base> + 4 foo
    <base> + 8 a
    <base> + 12 b

    This means that when the CLR picks up a NGEN image is needs to be absolutely sure about it’s validity. More about that in a later post.

  • I know the answer (it's 42)

    .NET: Figuring out if your application is exception heavy

    Ocean beach

    In the past I worked on a application which used modules from different teams. Many of these modules raised and caught a ton of exceptions. So much so that performance data was showing that these exceptions were causing issues. So I had to figure out an easy way to programmatically find out these code and inform their owners that exception is for exceptional scenarios and shouldn’t be used for normal codeflow :)

    Thankfully CLR provides an easy hook in the form of an AppDomain event. I just need to hook into the AppDomain’s FirstChanceException event and CLR notifies me upfront when the exception is raised. It does that even before any managed code gets a chance to handle it (and potentially suppresses it).

    The following is a plugin which throws and immediately catches an exception.

    namespace Plugins
        public class FunkyPlugin
            public static void ThrowingFunction()
                    Console.WriteLine("Just going to throw");
                    throw new Exception("Cool exception");
                catch (Exception ex)
                    Console.WriteLine("Caught a {0}", ex.Message);

    In the main application I added code to subscribe to the FirstChanceException event before calling the plugins

    using System;
    using System.Runtime.ExceptionServices;
    using System.Reflection;
    namespace foo
        public class Program
            static void Main()
                // Register handler
                AppDomain.CurrentDomain.FirstChanceException += FirstChanceHandler; 
            static void FirstChanceHandler(object o, 
                                           FirstChanceExceptionEventArgs e)
                MethodBase site = e.Exception.TargetSite;
                Console.WriteLine("Thrown by : {0} {1}({2})", site.Module, 
                Console.WriteLine("Stack: {0}", e.Exception.StackTrace);

    Line 11 is the event subscription and FirstChanceHandler just dumps out the name of the assembly and type that raises the exception. The output of this program is as follows

    Just going to throw
    Thrown by : some.dll Plugins.FunkyPlugin(Void ThrowingFunction())
    Stack:    at Plugins.FunkyPlugin.ThrowingFunction()
    Caught a Cool exception

    As you can see the handler runs even before the catch block executes and I have the full information of the assembly, type and method that throws the exception.

    Behind the Scene

    For most it might suffice to know that the event handler gets called before anyone gets a chance to handle the exception. However if you care about when this is fired, then its in the first pass (first chance) just after the runtime notifies the debugger/profiller.

    The managed exception system piggy backs on native OS exception handling system. Though the x86 exception handling (FS:0 based chaining) is significantly different from the x64 (PDATA) it has the same basic idea

    1. From outside a managed exception looks exactly like a native exception and hence the OSes normal exception handling mechanism kicks in
    2. Exception handling requires some mechanism to walk the thread callstack on which the exception is thrown. So that it can find an uplevel catch block as well as call the finally block of all functions in-between the catch and the point of exception being thrown. The mechanism varies in between x86 and x64 but is not super relevant for our discussion. (a series of data-structures pushed onto the stack in case of x86 or a series of data-structure table registered with OS in x64).
    3. On an exception the OS walks the stack and for managed function frames calls into CLR’s registered personality routine (that's what its called :)). This routine knows how to handle managed exceptions
    4. This routine notifies the profiler then the debugger of this first-chance exception, so that debugger can potentially break on the exception and do other relevant operations. If debugger did not handle the first chance exception the processing of the exception continues
    5. If there is a registered handler for FirstChanceException that is called
    6. JIT is consulted to find appropriate catch block for the exception (none might be found)
    7. The CLR returns the right set of information to the OS indicating that indeed the exception will be processed
    8. The OS initiates the second-pass
    9. For every function in between the frame of exception and the found catch block the CLR’s handler routine is called and the CLR consults the JIT to find the appropriate finally blocks and proceeds to call them for cleanup. In this phase the stack actually starts unwinding
    10. This continues till the frame in which the catch was initially found is reached. CLR proceeds to execute the catch block.
    11. If all is well the exception has been caught and processed and peace is restored to the world.

    As it should be evident from the above basic flow the FirstChanceHandler will get called before any code gets the chance to catch it and also in case the exception will go unhandled.

    PS: Please don’t throw an exception in the FirstChance handler :)

  • I know the answer (it's 42)

    Bing it on


    In early 2008 I joined the CLR team to clean garbage (or to write Garbage Collectors :)). It has been a wild ride writing generational Garbage Collectors, memory managers and tinkering with runtime performance or memory models. It’s been great to see people on the street use stuff that I had helped build or to see internal team reports as .NET updates/downloads went out to 100s of millions of machines. In one case I was under a medical diagnostic device clearly running on .NET. I didn’t run out indicating my faith in the stuff we built (or maybe I was sedated, who knows).

    I decided to change things up a bit. So I decided to move from the world of devices, desktops and consoles to that of the cloud. From this week I have begun working in the Bing team. From now on I will no longer be a part of the team that builds CLR but will become part of the team which really pushes the usage of CLR to the extreme. Using .NET to serve billions of queries on thousands of machines.

    I hope to continue blogging about CLR/.NET and provide a users perspective of the best managed runtime in the world.

    BTW the photo above is the fortune cookie I got at my farewell lunch with the CLR team. Very appropriate.

  • I know the answer (it's 42)

    Quick/Dirty Function Coverage using Windbg


    To find code coverage at line and block granularity you need a full-fledged code coverage tool. However, sometimes you can use a quick and dirty trick in WinDbg to see which functions are being called. This works well when you need to do this for a small set of functions, which is what I recently needed to do. Lets say it was for all the functions of a class called Graph.

    First I got the application under the debugger, wherein it automatically broke into the debugger at the application start. Then I added breakpoints to all these functions using the following

    0:000> bm test!Graph*::* 10000
      2: 003584a0          @!"test!Graph::Vertex::`scalar deleting destructor'"
      3: 00356f80          @!"test!Graph::Vertex::~Vertex"
      4: 00358910          @!"test!Graph::AddVertex"
      5: 00356b70          @!"test!Graph::~Graph"
      6: 003589d0          @!"test!Graph::RangeCheck"
      7: 003589b0          @!"test!Graph::Count"
      8: 00357ce0          @!"test!Graph::operator[]"
      9: 003561a0          @!"test!Graph::Vertex::Vertex"
     10: 00356170          @!"test!Graph::Vertex::Vertex"
     11: 00358130          @!"test!Graph::`scalar deleting destructor'"
     12: 003588a0          @!"test!Graph::AddEdge"
     13: 003551e0          @!"test!Graph::Graph"

    Here I am telling windbg to add breakpoints on all methods in the Graph class with a very large hit count of 0x10000. Then I just let the application proceed and play with the various controls. Finally I closed the application, when it again broke under the debugger. At this point I just list the breakpoints.

    0:000> bl
     1 e 00352c70     0001 (0001)  0:**** test!main
     2 e 003584a0     fd8f (10000)  0:**** test!Graph::Vertex::`scalar deleting destructor'
     3 e 00356f80     fcc7 (10000)  0:**** test!Graph::Vertex::~Vertex
     4 e 00358910     ff38 (10000)  0:**** test!Graph::AddVertex
     5 e 00356b70     ffff (10000)  0:**** test!Graph::~Graph
     6 e 003589d0     d653 (10000)  0:**** test!Graph::RangeCheck
     7 e 003589b0     c1de (10000)  0:**** test!Graph::Count
     8 e 00357ce0     fda7 (10000)  0:**** test!Graph::operator[]
     9 e 003561a0     fd8f (10000)  0:**** test!Graph::Vertex::Vertex
    10 e 00356170     ff38 (10000)  0:**** test!Graph::Vertex::Vertex
    11 e 00358130     ffff (10000)  0:**** test!Graph::`scalar deleting destructor'
    12 e 003588a0     ec56 (10000)  0:**** test!Graph::AddEdge
    13 e 003551e0     ffff (10000)  0:**** test!Graph::Graph

    The key fields are marked below

    13 e 003551e0 ffff (10000) 0:**** test!Graph::Graph

    10000 indicates that after 0x10000 times the breakpoint is hit should the debugger actually break on this. FFFF indicates how many times it is left for this break to happen. So a simple subtraction (0x10000 – 0xFFFF) tells us that this function has been called once. It’s easy to see that one Graph object was created and destroyed (1 call to ctor and 1 to dtor) and that the Graph<T>::Count has been called 15906 times (0x10000 – 0xC1DE). So I didn’t really miss any of the functions in that test. If I did it would say 10000 (10000) for the function that I missed.

  • I know the answer (it's 42)

    Custom Resolution of Assembly References in .NET


    Right now I am helping out a team with an assembly resolution conflict bug. I thought I’d share the details out, because I am sure others have landed in this situation.

    In a complex managed applications, especially in cases where the application uses dynamically loaded plugins/addons, it’s common that all the assemblies required by these addons are not present in the assembly probing path. In the perfect world the closure set of all assembly references of an application is strong-named and the CLR binder uses standard probing order to find all assemblies and everything works perfectly. However, the world is not ideal.

    Requirement of having to resolve assemblies from different locations do arise and .NET has support for that. E.g. this stackoverflow question http://stackoverflow.com/questions/1373100/how-to-add-folder-to-assembly-search-path-at-runtime-in-net has been rightly been answered by pointing to AssemblyResolve event. When .NET fails to find an assembly after probing (looking through) the various folders .NET uses for assemblies, it raises an AssemblyResolve event. User code can subscribe to this event and supply assemblies from whatever path it wants to.

    This simple mechanism can be abused and results in major system issues. The main problem arises from over-eager resolution code. Consider an application A that uses two modules (say plugins) P1 and P2. P1 and P2 is somehow registered to A, and A uses Assembly.Load to load P1 and P2. However, P1 and P2 ships with various dependencies which it places in various sub-directories which A is unaware of and the CLR obviously doesn’t look into those folders to resolve the assemblies. To handle this situation both P1 and P2 has independently decided to subscribe to the AssemblyResolve event.

    The problem is that for all cases CLR fails to locate an assembly it will call these resolve-event handlers sequentially. So based on the order of registering these handlers, it is possible that for a missing dependency of P2 the resolution handler of P1 gets called. Coincidentally it is possible that the assembly CLR is failing to resolve is known to both P1 and P2. Could be because the name is generic or because maybe it’s a widely used 3rd party assembly which a ton of plugins use. So P1 loads this missing assembly P2 is referring to and returns it. CLR goes ahead and binds P2 to the assembly P1 has returned. This is when bad things start to happen because maybe P2 needs a different version of it. Crash follows.

    The MSDN documentation has already called out how to handle these issues in http://msdn.microsoft.com/en-us/library/ff527268.aspx. Essentially follow these simple rules

    1. Follow the best practices for assembly loading http://msdn.microsoft.com/en-us/library/dd153782.aspx
    2. Return null if you do not recognize the referring assembly
    3. Do not try to resolve assemblies you do not recognize
    4. Do not use Assembly.Load or AppDomain.Load to resolve the assemblies because that can result in recursive calls to the same ResolveEvent finally leading to stack-overflow.

    The skeleton code for the resolve event handler can be something like

    static Assembly currentDomain_AssemblyResolve(object sender, ResolveEventArgs args)
        // Do not try to resolve assemblies which your module doesn't explicitly handle
        // There will be other resolve handlers that are called in-sequence, let them
        // do their job
        if (args.RequestingAssembly == null || !IsKnownAssembly(args.RequestingAssembly))
            return null;
        // parse and create name of the assembly being requested and then use your own logic
        // to locate the assembly
        AssemblyName aname = new AssemblyName (args.Name);
        string path = FindAssemblyByCustomLogic(aname);
        if (!File.Exists(path))
            return null;
        Assembly assembly = Assembly.LoadFile(path);
        return assembly;

    Here you need to fill in the implementation of IsKnownAssembly which only returns true for assemblies that belong to your module.

  • I know the answer (it's 42)

    Arduino Fun – Door Entry Alarm

    Arduino UNO based door entry alarm

    Physical computing and “internet of things” is a super exciting area that is unfolding right now. Even decades back one could hook up sensors and remotely get those data and process it. What is special now is that powerful micro-controllers are dirt cheap and most of us have in our pockets a really powerful computing device. Connecting everything wirelessly is also very easy now and almost every home has a wireless network.

    All of these put together can create some really compelling and cool stuff where data travels from sensor over wireless networks into the cloud and finally into the cell phone we carry everywhere. I finally want to create a smart door so that I can get an notification while at work when someone knocks at our home door. Maybe I can remotely open the door. The possibilities are endless, but time is not, so lets see how far I get in some reasonable amount of time.

    Arduino UNO

    I unboxed the Arduino Uno Ultimate Starter Kit that I had got last week and spend some time with my daughter setting it up. The kit comes with a fantastic manual that helped me recap the basics of electronics. It contains an Arduino UNO prototyping board based on ATmega328 chip. It is a low-power 8-bit microprocessor running at max 20 MHz. To most people that seems paltry but it’s amazing what you can pull off with one of these chips.

    The final assembled setup of the kit looks as follows. It comes with a nice handy plastic surface on which the Arduino (in blue) and a breadboard is stuck. It’s connected over USB to my PC.


    Getting everything up was easy with the instruction booklet that came. Only glitch was that Windows 8 wouldn’t let me install the drivers because they are not properly signed. So I had to follow the steps given here to disable driver verification.

    Post that the Arduino IDE connected to the board and I could easily write and deploy code (C like syntax).


    The tool bar icons are a bit weird though (side arrow for upload and up arrow for open????).

    There was no way to debug through the IDE (or at least couldn’t find one). So I setup some easy printf style debugging. Basically you write to the serial port and the IDE displays it.


    It was after this that I got to know that there’s a Visual Studio plugin with full debugging support. However, I haven’t yet used that.

    The Project

    imageI decided to start out with making a simple entry alarm and see how much time it takes to get everything done. In college I built something similar, but without a microcontroller (based on 555 IC and IR photo-transistors) and it took decent amount of time to hook up all the components. Basically the idea is that across the door there will be some source of light and a sensor will be on the other side. When someone passes in between the light on the sensor will be obstructed and this will sound an alarm.

    When I last did it in college I really made it robust by using pulsating (at fixed frequency) IR LED as source and IR sensors. Now for this project I relied on visible light and the photo-resistor that came with the kit.

    I built the following circuit.


    Connected a photo-resistor in series with another 10K resistor and connected the junction to the analog input pin A0 of Arduino. Essentially this acts like a voltage divider. In bright light the junction and hence A0 input reads around 1.1 V. When light is obstructed the resistance of photo-resistor changes and the junction reads 2.6 V. The analog pins read in a range of 0 (means 0 volt) and 1023 (for 5V). So this roughly comes to around 225 in light and 530 in the shade. Obviously these are relative based on the strength of the light and how dark it becomes when someone obstructs the light. To avoid taking absolute dependency on the value I created another voltage divider using a potentiometer and connected that to another analog input pin A1. So now I can change the potentiometer to control a threshold value. If the voltage of A0 is above this threshold it would mean that it’s dark enough that someone obstructed the light falling on the resistor and it’s time to sound the alarm.

    The alarm consists of flashing blue and red LEDs (obviously to match police lights) and a standard siren sound played using a piezo crystal that also came with the kit.

    This full assembled and deployed setup looks as follows.


    Update*** The picture above says photo-transistor, it should be photo-resistor


    The key functions are setup() which is automatically called at startup and loop() which as the name suggests is called in a loop.setup() sets up the digital pins for output to drive the flashing LEDS. In loop() I read in the values of  photo-resistor and that from the potentiometer. Based on comparison I sound the alarm

    // Define the constants
    const int sensorPin = 0; // Photo-resistor pin
    const int controlPin = 1; // Potentiometer pin
    const int buzzerPin = 9; // Buzzer pin
    const int rLedPin = 10; // Red LED pin
    const int bLedPin = 11; // Blue LED pin

    // Always called at startup
    void setup()
    // Set the two LED pins as output
    pinMode(rLedPin, OUTPUT);
    pinMode(buzzerPin, OUTPUT);

    // This loops forever
    void loop()
    int sensorVal = analogRead(sensorPin);
    int controlVal = analogRead(controlPin);

    if(sensorVal < controlVal)
    // Light is below threshold so sound buzzer


    void playBuzzer(const int buzzerPin)
    for(int i = 0; i < 3; ++i)
    // alternate between two tones, one high and one low
    // at the same time alternate the blue and red LED flashing

    digitalWrite(rLedPin, HIGH); // Red LED on
    tone(buzzerPin, 400); // play 400 Hz tone for 500 ms
    digitalWrite(rLedPin, LOW); // RED LED off

    digitalWrite(bLedPin, HIGH); // Blue LED on
    tone(buzzerPin, 800); // play 800Hz tone for 500ms
    digitalWrite(bLedPin, LOW); // Blue LED off

    // Stop the buzzer

    imageNext Steps

    This system has some obvious flaws. Someone can duck below or over the light-path or even shine a flashlight on the sensor while passing through. To make this robust consider using strips of mirrors on the two side and then use a laser (preferably IR) bouncing off them so that it’s virtually impossible to get through without breaking the light

    Also you can also use a pulsating source of light and detect the frequency on the sensor. This will just make it more harder to break.

  • I know the answer (it's 42)

    C# code for Creating Shortcuts with Admin Privilege

    Seattle skyline


    If you just care about the code, then jump to the end Smile

    In the CLR team and across other engineering teams in Microsoft we use build environments which are essentially command shells with a custom environment setup using bat and cmd scripts. These scripts setup various paths and environment variables to pick up the right set of build tools and output paths matching the architecture and build flavor. E.g. one example of launching such a shell could be…

    cmd.exe /k %BRANCH_PATH%\buildenv.bat <architecture> <build-flavor> <build-types> <etc...>

    The architectures can vary between things like x86, amd64, ARM, build flavors vary between debug, check, retail, release, etc… The build-types indicate the target like desktop, CoreCLR, metro. Even though all combination is not allowed, the allowed combination approaches around 30. In case of .NET Compact Framework which supports way more architectures (e.g. MIPS, PPC) and targets (e.g. Xbox360, S60) the combination is even larger.

    For day to day development I either need to enter this command each time to move to a different shell, or I have to create desktop shortcuts for all the combination. This becomes repetitive each time I move to a different code branch. I had created a small app that I ran each time I moved to a new branch and it would generate all the combination of shortcut given the branch details. However, our build requires elevation (admin-privilege). So even though I created the shortcuts, I’d have to either right click and use “Run as administrator” OR set that in the shortcuts property.



    This was a nagging pain for me. I couldn’t find any easy programmatic way to create a shortcut with Administrator privilege (I’m sure there is some shell API to do that). So finally I binary compared two shortcuts, one with the “Run as administrator” and one without. I saw that only one byte was different. So I hacked up a code to generate the shortcut and then modify the byte. I am sure there is better/safer way to do this, but for now this “Works for me”.

    The Code

    Since I didn’t find any online source for this code, I thought I’d share. Do note that this is a major hack and uses un-documented stuff. I’d never do this for shipping code or for that matter anything someone other than me would rely on. So use at your own risk… Also if you have a better solution let me know and I will use that…

       1:  // file-path of the shortcut (*.lnk file)
       2:  string shortcutPath = Path.Combine(shortCutFolder, string.Format("{0} {1}{2}.lnk", arch, flavor, extra));
       3:  Console.WriteLine("Creating {0}", shortcutPath);
       4:  // the contents of the shortcut
       5:  string arguments = string.Format("{0} {1} {2} {3}{4} {5}", "/k", clrEnvPath, arch, flavor, extra, precmd);
       7:  // shell API to create the shortcut
       8:  IWshShortcut shortcut = (IWshShortcut)shell.CreateShortcut(shortcutPath);
       9:  shortcut.TargetPath = cmdPath;
      10:  shortcut.Arguments = arguments;
      11:  shortcut.IconLocation = "cmd.exe, 0";
      12:  shortcut.Description = string.Format("Launches clrenv for {0} {1} {2}", arch, flavor, extra);
      13:  shortcut.Save();
      15:  // HACKHACK: update the link's byte to indicate that this is a admin shortcut
      16:  using (FileStream fs = new FileStream(shortcutPath, FileMode.Open, FileAccess.ReadWrite))
      17:  {
      18:      fs.Seek(21, SeekOrigin.Begin);
      19:      fs.WriteByte(0x22);
      20:  }
  • I know the answer (it's 42)

    Moving to Outlook from Google Reader


    I am sure everyone by now knows that Google Reader is being shutdown. I am a heavy user of Google Reader or Greeder as I call it and I immediately started looking for an alternative, when this suddenly occurred to me, that all PC’s I use have Outlook installed on them. So if you work for an organization that runs on Exchange server, this could really work out well. You can use Office Outlook and Exchange as a great RSS feed reader. Consider this

    1. It will provide full sync across multiple Outlook clients running on different PCs
    2. It will provide on the go access via Outlook Web-access
    3. Your phone’s outlook client should also work with it
    4. You can pretend to work while wasting time on boingboing

    First things first: Export the opml file from Google Reader

    Login to www.google.com and then go to https://www.google.com/takeout/#custom:reader


    This will take some time and create an archive.


    Click on the download button and save the zip. Then extract the zip as follows


    Inside the extracted folder you will have the opml file. For me it’s in C:\Users\admin\Desktop\XXXXXXXX@gmail.com-takeout\XXXXXXXX@gmail.com-takeout\Reader\subscriptions.xml

    Import to Outlook

    This opml file needs to be imported into outlook. Use the File tab and bring up the following UI in Outlook.


    Then use Import. To bring up the following


    Choose OPML file and tap on Next. Now point it to the file you extracted. For me it was C:\Users\admin\Desktop\XXXXXXXX@gmail.com-takeout\XXXXXXXX@gmail.com-takeout\Reader\subscriptions.xml

    Hit next and finally choose the feeds you want to import (Select All).


    The tap on Next and here you have Outlook as an Rss feed reader…


    Read Everywhere

    It totally sync’s on the cloud. Here I have it open on the browser. As you read a post it tracks what you are reading and across the browsers and all your outlook clients at work and home it will update and keep everything in sync.


    Works great on the Windows Phone as well. I assume any Exchange client should work.


    Pain Points

    While reading on Outlook was seamless, there are some usability issues in both the browser and the phone. Surface RT is broken. Someone should really fix the mail client on Surface Sad smile

    The major paint point I see is that in the Outlook Web Access the pictures are not shown. Tapping on the picture placeholders work. I think some security feature is blocking the embedded images.

    Also on the Windows Phone you have to go to each feed and set it up so that it syncs that folder. This is a pain but I guess this is to protect against download over the carrier networks.

  • I know the answer (it's 42)

    C++/CLI and mixed mode programming


    I had very limited idea about how mixed mode programming on .NET works. In mixed mode the binary can have both native and managed code. They are generally programmed in a special variant of the C++ language called C++/CLI and the sources needs to be compiled with /CLR switch.

    For some recent work I am doing I had to ramp up on Managed C++ usage and how the .NET runtime supports the mixed mode assemblies generated by it. I wrote up some notes for myself and later thought that it might be helpful for others trying to understand the inner workings.


    The initial foray of C++ into the managed world was via the managed extension for C++ or MC++. This is deprecated now and was originally released on VS 2003.  This MC++ syntax turned out to be too confusing and wasn’t adopted well. The MC++ was soon replaced with C++/CLI. C++/CLI added limited extension over C++ and was more well designed so that the language feels more in sync with the general C++ language specification.


    The code looks like below.

    ref class CFoo
            pI = new int;
            *pI = 42;
            str = L"Hello";
        void ShowFoo()
            printf("%d\n", *pI);
        int *pI;
        String^ str;

    In this code we are defining a reference type class CFoo. This class uses both managed (str) and native (pI) data types and seamlessly calls into managed and native code. There is no special code required to be written by the developer for the interop.

    The managed type uses special handles denoted by ^ as in String^ and native pointers continue to use * as in int*. A nice comparison between C++/CLI and C# syntax is available at the end of http://msdn.microsoft.com/en-US/library/ms379617(v=VS.80).aspx. Junfeng also has a good post at http://blogs.msdn.com/b/junfeng/archive/2006/05/20/599434.aspx

    The benefits of using mixed mode

    1. Easy to port over C++ code and take the benefit of integrating with other managed code
    2. Access to the extensive managed API surface area
    3. Seamless managed to native and native to managed calls
    4. Static-type checking is available (so no mismatched P/Invoke signatures)
    5. Performance of native code where required
    6. Predictable finalization of native code (e.g. stack based deterministic cleanup)


    Implicit Managed and Native Interop

    Seamless, static type-checked, implicit, interop between managed and native code is the biggest draw to C++/CLI.

    Calls from managed to native and vice versa are transparently handled and can be intermixed. E.g. managed --> unmanaged --> managed calls are transparently handled without the developer having to do anything special. This technology is called IJW (it just works). We will use the following code to understand the flow.

    #pragma managed
    void ManagedAgain(int n)
        Console::WriteLine(L"Managed again {0}", n);
    #pragma unmanaged
    void NativePrint(int n)
        wprintf(L"Native Hello World %u\n\n", n);
    #pragma managed
    void ManagedPrint(int n)
        Console::WriteLine(L"Managed {0}", n);

    The call flow goes from ManagedPrint --> NativePrint –> ManagedAgain

    Native to Managed

    For every managed method a managed and an unmanaged entry point is created by the C++ compiler. The unmanaged entry point is a thunk/call-forwarder, it sets up the right managed context and calls into the managed entry point. It is called the IJW thunk.

    When a native function calls into a managed function the compiler actually binds the call to the native forwarding entry point for the managed function. If we inspect the disassembly of the NativePrint we see the following code is generated to call into the ManagedAgain function

    00D41084  mov         ecx,dword ptr [n]         // Store NativePrint argument n to ECX
    00D41087  push        ecx                       // Push n onto stack
    00D41088  call        ManagedAgain (0D4105Dh)   // Call IJW Thunk

    Now at 0x0D4105D is the address for the native entry point. If forwards the call to the actual managed implementation

    00D4105D  jmp         dword ptr [__mep@?ManagedAgain@@$$FYAXH@Z (0D4D000h)]  

    Managed to Native

    In the case where a managed function calls into a native function standard P/Invoke is used. The compiler just defines a P/Invoke signature for the native function in MSIL

    .method assembly static pinvokeimpl(/* No map */) 
            void modopt([mscorlib]System.Runtime.CompilerServices.CallConvCdecl) 
            NativePrint(int32 A_0) native unmanaged preservesig
      .custom instance void [mscorlib]System.Security.SuppressUnmanagedCodeSecurityAttribute::.ctor() = ( 01 00 00 00 ) 
      // Embedded native code
      // Disassembly of native methods is not supported.
      //  Managed TargetRVA = 0x00001070
    } // end of method 'Global Functions'::NativePrint

    The managed to native call in IL looks as

    Manged IL:
      IL_0010:  ldarg.0
      IL_0011:  call void modopt([mscorlib]System.Runtime.CompilerServices.CallConvCdecl) NativePrint(int32)

    The virtual machine (CLR) at runtime generates the correct thunk to get the managed code to P/Invoke into native code. It also takes care of other things like marshaling the managed argument to native and vice-versa.

    Managed to Managed

    While it would seem this should be easy, it was a bit more convoluted. Essentially the compiler always bound to native entry point for a given managed method. So a managed to managed call degenerated to managed -> native -> managed and hence resulted in suboptimal double P/Invoke. See http://msdn.microsoft.com/en-us/library/ms235292(v=VS.80).aspx

    This was fixed in later versions by using dynamic checks and ensuring managed calls always call into managed targets directly. However, in some cases managed to managed calls still degenerate to double P/Invoke. So an additional knob provided was the __clrcall calling convention keyword. This will stop the native entry point from being generated completely. The pitfall is that these methods are not callable from native code. So if I stick in a __clrcall infront of ManagedAgain I get the following build error while compiling NativePrint.

    Error	2	error C3642: 'void ManagedAgain(int)' : cannot call a function with
    __clrcall calling convention from native code <filename>


    If a C++ file is compiled with this flag, instead of mixed mode assembly (one that has both native and MSIL) a pure MSIL assembly is generated. So all methods are __clrcall and the Cpp code is compiled into MSIL code and NOT to native code.

    This comes with some benefits as in the assembly becomes a standard MSIL based assembly which is no different from another managed only assembly. Also it comes with some limitation. Native code cannot call into the managed codes in this assembly because there is no native entry point to call into. However, native data is supported and also the managed code can transparently call into other native code. Let's see a sample

    I moved all the unmanaged code to a separate /C++:CLI dll as

    void NativePrint(int n)
        wprintf(L"Native Hello World %u\n\n", n);

    Then I moved my managed C++ code to a new project and compiled it with /C++:PURE

    #include "stdafx.h"
    #include "..\Unmanaged\Unmanaged.h"
    using namespace System;
    void ManagedPrint(int n)
        char str[30] = "some cool number";     // native data  
        str[5] = 'f';                          // modifying native data
        Console::WriteLine(L"Managed {0}", n); // call to BCL
        NativePrint(n);                        // call to my own native methods
        printf("%s %d\n\n", str, n);           // CRT
    int main(array ^args)
        return 0;

    The above builds and works fine. So even with C/++:PURE I was able to

    1. Use native data like a char array and modify it
    2. Call into BCL (Console::WriteLine)
    3. Call transparently into other native code without having to hand generate P/Invoke signatures
    4. Use native CRT (printf)

    However, no native code can call into ManagedPrint. Also do note that even though Pure MSIL is generated, the code is unverifiable (think C# unsafe). So it doesn't get the added safety that the managed runtime provides (e.g. I can just do str[200]  = 0 and not get any bounds check error)


    /CLR:safe compiler switch generates MSIL only assemblies whose IL is fully verifiable. The output is not different from anything generated from say C# or VB.NET compilers. This provides more security to the code but at the same time losses on several capabilities over and above the PURE variant

    1. No support for CRT
    1. Only explicit P/Invokes

    So for /CLR:Safe we need to do the following

    void NativePrint(int i);
    void ManagedPrint(int n)
        //char str[3000] = "some cool number"; // will fail to compile with  
        //str[5] = 'f';                        // "this type is not verifiable"
        Console::WriteLine(L"Managed {0}", n);
        NativePrint(n);                        // Hand coded P/Invoke


    MSDN has some nice articles on people trying to migrate from /CLR to

    1. To /CLR:Pure http://msdn.microsoft.com/en-US/library/ms173253(v=vs.80).aspx
    1. To /CLR:Safe http://msdn.microsoft.com/en-US/library/ykbbt679(v=vs.80).aspx
  • I know the answer (it's 42)

    Windows Phone 8: Evolution of the Runtime and Application Compatibility


    Long time back at the wake of the release of Windows Phone 7 (WP7) I posted about the Windows Phone 7 series programming model. I also published how .NET Compact framework powered the applications on WP7.

    Further simplifying the block diagram, we can think of the entire WP7 application system as followsimage

    As with most block diagrams, this is gross simplifications. However, I hope it helps to easily picture the entire system.

    Essentially the application can be purely managed (written in say C# or VB.net). The application can only utilize services exposed by the developer platform and core services provided by .NET Compact Framework. The application can in no way directly use native code or talk to the OS (say call an Win32 API). It has to always go through the runtime infrastructure and is in a security sandbox.

    The application manager is the loose term I am using to encompass everything that is used to managed the application including the host.

    Windows Phone 8 (WP8) is a huge huge change from Windows Phone 7.x (WP7). From the perspective of a WP7 application running on a WP8 device the system looks as follows


    Everything in Green in this diagram got outright replaced with entire new codebase and the rest of the system other than the application was heavily modified to work with the new OS and the new managed runtime.

    Shared Windows Core

    The OS moved away from Windows Compact Embedded (WinCE) OS core that was used in WP7 to a new OS which shares it’s core with the desktop Windows 8. This means that a bunch of things in the WP8 OS is shared with the desktop implementation, this includes things like kernel, networking, driver framework and others. The shared core obviously brings great value as innovations and features will more easily flow across the two form factors (device and desktop) and also reduce engineering redundancy on Microsoft side. Some of the benefits are readily visible today like great multi-core support, WinRT interop and others are more subtle.


    .NET Compact Framework (NETCF) that was used in WP7 has a very different design philosophy and hence a completely different implementation from the desktop .NET. I will have a follow up post on this but for now it suffices to note that .NETCF is a very portable runtime that is designed to be very versatile and cross platform. Desktop CLR on the other hand is more closely tied with Windows and the processor architecture. It closely works with the OS and the underlying HW to give the maximum performance benefit to managed code running on it.

    With the new Windows RT which works on ARM, desktop CLR was anyway updated to work on the ARM processor. So when the phone chose to move to shared core it was an obvious choice to move the CLR as well. This gave the same benefits of shared innovation and reduced engineering redundancy.

    The full desktop CLR is more heavy and provides functionality that is not really required by the phone scenarios and hence a lighter variant of it (which is built from the same source) called CoreCLR was chosen for WP8. CoreCLR is the evolution of the lightweight runtime that powered Silverlight. With the move to CoreCLR developers get a much faster runtime with extended feature set that includes interop via WinRT.

    Backward Compat

    One of the simple statements made during all of these WP8 launch presentations was that applications in the store built for WP7 will work as is for WP8. This is a small statement but is a huge achievement and effort from the runtime implementation perspective. Making apps work as is when the entire runtime, OS and chipset has changed is non-trivial to say the least. Our team worked very hard to make this possible.

    One of the biggest things that played out to our benefit was that the WP7 apps were fully sandboxed and couldn’t use any native code. This means that they didn’t have any means of taking behavioral dependence on the OS APIs or underlying HW. The OS APIs were used via the CLR and it could always add quirks to expose consistent behavior to the applications.

    API Compatibility
    This required us to ensure that CoreCLR exposes the same API set as NETCF. We ran various automated tools to manage the API surface area changes and retain meaningful API compat between WP7 and WP8. With a closed application store it was possible for us to gather complete metrics on API usage and correctly prioritize engineering resources to ensure that majority of applications continued to see the same API set in signature and semantics.

    We also needed to ensure that the same APIs behave as closely as possible with that provided by NETCF. We tested a lot of applications in the app store to get as close as we can and believe that we are at a place that should allow most WP7 application’s API usage to transparently fall over to the new runtime.

    Runtime behavior changes
    When a runtime changes there are behavioral changes that can expose pre-existing issues with applications. This includes things like timing differences. Even though these runtime behaviors are not documented, or in some cases especially called out that user code should not take dependence on them, some apps still did do that.

    Couple of the examples we saw

    1. Taking dependence on finalization order
      Even though CLI specification clearly calls out that finalization order in .NET is not deterministic, code still took subtle dependency on them. In one particular case object F1 used a file and it’s finalizer released it. Later another object F2 opens the same file. With change in the runtime, the timing of GC changed so that at the time F2 tried to open the file, F1 which has been already collected hasn’t yet had its finalizer run. Hence the application crashed. We got in touch with the app developer and got the code moved to use the right dispose pattern
    2. GC Timing and changes in number of generations
      Application finalizers contrary to .NET guidelines modified other managed objects. Now changed GC timings and generation resulted in those objects to have already been collected and hence resulted in finalizer crashes
    3. Threading issues
      This was one of the painful ones. With the change in OS and hence thread scheduler and addition of multiple cores in the ARM CPU, a lot of subtle races and deadlocks in the applications got exposed. These were pre-existing issues where synchronization primitives were not used correctly or some code relied on the quanta based thread scheduling WinCE did. With move to WP8, threads run in parallel on different cores and scheduled in different order, this lead to various deadlocks and race driven crashes. Again we had to get in touch with app developer to address these issues
    4. There were other cases where exact precision of floating point math was relied on. This resulted in a board game where pucks flew off from its surface
    5. Whether or not functions got inlined
    6. Order of static constructor initialization (especially in conjunction with function in-lining)

    We addressed some of these issues where it was realistic to fix them in the runtime. For some of the others we got in touch with the application developers. Obviously all applications in the store and all of their features were not tested. So you should try to test your applications when you have access to WP8.

    At one point we were all playing games on our phones telling our managers that I am compat testing Need For Speed and I need to test till level 10 :)

  • I know the answer (it's 42)

    Test Driven Development of a Generational Garbage Collection

    Crater Lake, OR panorama

    These days everyone is talking about being agile and test driven development (TDD). I wanted to share a success story of TDD that we employed for developing Generational Garbage Collector (GC) for Windows Phone Mango.

    The .NET runtime on Windows Phone 7 shipped with a mark-sweep-compact; stop the world global non-generational GC. Once a GC was triggered, it stopped all managed execution and scanned the entire managed heap to look up all managed references and cleaned up objects that were not in use. Due to performance bottleneck we decided to enhance the GC by adding a generational GC (referred to as GenGC). However, post the General Availability or GA of WP7 we had a very short coding window. Replacing such a fundamental piece of the runtime in that short window was very risky. So we decided to build various kinds of stress infrastructure first, and then develop the GC. So essentially

    1. Write the tests
    2. See those tests failing
    3. Write code for the generational GC
    4. Get tests to pass
    5. Use the tests for regression tracking as we refactor the code and make it run faster


    Now building tests for a GC is not equivalent of traditional testing of features or APIs where you write tests to call into mocked up API, see it fail until you add the right functionality. Rather these tests where verifications modes and combination of runtime stresses that we wrote.

    To appreciate the testing steps we took do read the Back To Basics: Generational Garbage Collection and WP7 Mango: Mark-Sweep collection and how does a Generational GC help posts

    Essentially in a generational GC run all of the following references should be discovered by the GC

    1. Gen0 Objects reachable via roots (root –> object OR recursively root –> object –> object )
    2. Objects accessible from runtime native code (e.g. pinned for PInvoke, COM interop, internal runtime references)
    3. Objects referenced via Gen1 –> Gen0 pointers

    The first two were anyway heavily covered by our traditional GC tests. #3 being the new area being added.

    To implement a correct generational GC we needed to ensure that at all places in the runtime where managed object references are updated they need to get reflected in the CardTable (#3 above). This is a daunting task and prone to bugs via omission as we need to ensure that

    1. All forms of assignments in MSIL that we JIT there are calls to update the CardTable.
    2. All places in the native runtime code where such references are directly and or indirectly updated the same thing is ensured. This includes all JIT worker-routines, COM, Marshallers.


    If a single instance is missed then it would result in valid/reachable Gen0 objects being collected (or deleted) and hence in the longer run result in memory corruption, crashes that will be hard if not impossible to debug. This was assessed to be the biggest risk to shipping generational GC.

    The other problem is that these potential omissions can be only exposed by certain ordering of allocation and collection. E.g. only a missing tracked reference of A –> B can result in a GC issue only if a GC happened in between allocations of A and B (A is in higher generation than B). Also due to performance reasons (write atomicity for lock-less updates) for every assignment of A = B we do not update the card-table bit that covers the memory area of A. Rather we update the whole byte in the card-table. This means an update to A will cover other objects allocated adjacent to A. Hence if an update to an object just beside A in the memory is missed it will not be discovered until some other run where that object lands up being allocated farther away from A.

    GC Verification mode

    Our solution to all of these problems was to first create the GC verification mode. What this mode does is runs the traditional full mark-sweep GC. While running that GC it goes through all objects in the memory and as it traverses them for every reference A (Gen1) –> B(Gen0), it verifies that the card table bit for A is indeed set. This ensures that if a GenGC was to run, it would not miss that references

    Granular CardTable

    We used very high granular card-table resolution for test runs. For these special runs each bit of the card-table corresponded to almost one object (1 bit to 2 byte resolution). Even though the card-table size exploded it was fine because this wasn’t a shipping configuration. This spaced out objects covered by the card-table and exposed adjacent objects not being updated.

    GC Stress

    In addition we ran the GC stress mode, where we made the GC run extremely frequently (we could push it up to a GC in every allocation). The allocator was also updated to ensure that allocations were randomized so that objects moved around everywhere in the memory.

    Hole Finder

    Hole finder moves all objects around in memory after a GC. This exposes stale pointer issues. If an object didn’t get updated properly due to the GC it would now point to invalid memory because all previous memory locations are now invalid memory. So a subsequent write will fail-fast with AV and we can easily detect that point of failure.

    With all of these changes we ran the entire test suites. Also by throttling down the GC Stress mode we could still use the runtime to run real apps on the phone. Let me tell you playing NFS on a device with the verification mode, wasn’t fun :)

    With this precaution we ensured that not a single GenGC bug has come in from the phone. It shipped rock solid and we were more confident with code churn because regressions would always be caught. I actually never blogged about this because I felt that if I do, it’ll jinx something :)

  • I know the answer (it's 42)

    Core Parking


    For some time now, my main box got a bit slow and was glitching all the time. After some investigation I found that some power profile imposed by our I T department enabled CPU parking on my machine. This effectively parks CPU on low load condition to save power. However,

    1. This effects high load conditions as well. There is a perceptible delay between the load hitting the computer and the CPU’s being unparked. Also some CPU’s remain parked for spike loads (large but smaller duration usage spikes)
    2. This is a desktop and not a laptop and hence I really do not care about power consumption that much

    Windows Task Manager (Ctrl + Shift + Esc and then Performance tab) clearly shows this parking feature. 3 parked cores show flatlines


    You can also find out if your machine is behaving the same from Task Manager -> Performance Tab -> Resource Monitor -> CPU Tab. The CPU graphs on the right will show which cores if any are parked.


    To disable this you need to

    1. Find all occurrences of 0cc5b647-c1df-4637-891a-dec35c318583 in the Windows registry (there will be multiple of those)
    2. For all these occurrences set the ValueMin and ValueMax keys to 0
    3. Reboot

    For detailed steps see the video http://www.youtube.com/watch?feature=player_detailpage&v=mL-w3X0PQnk#t=85s

    Everything is back to normal once this is taken care off Smile


Page 1 of 16 (383 items) 12345»