I know the answer (it's 42)

A blog on coding, .NET, .NET Compact Framework and life in general....

Posts
  • I know the answer (it's 42)

    Arduino Fun – Door Entry Alarm

    • 4 Comments
     
    Arduino UNO based door entry alarm

    Physical computing and “internet of things” is a super exciting area that is unfolding right now. Even decades back one could hook up sensors and remotely get those data and process it. What is special now is that powerful micro-controllers are dirt cheap and most of us have in our pockets a really powerful computing device. Connecting everything wirelessly is also very easy now and almost every home has a wireless network.

    All of these put together can create some really compelling and cool stuff where data travels from sensor over wireless networks into the cloud and finally into the cell phone we carry everywhere. I finally want to create a smart door so that I can get an notification while at work when someone knocks at our home door. Maybe I can remotely open the door. The possibilities are endless, but time is not, so lets see how far I get in some reasonable amount of time.

    Arduino UNO

    I unboxed the Arduino Uno Ultimate Starter Kit that I had got last week and spend some time with my daughter setting it up. The kit comes with a fantastic manual that helped me recap the basics of electronics. It contains an Arduino UNO prototyping board based on ATmega328 chip. It is a low-power 8-bit microprocessor running at max 20 MHz. To most people that seems paltry but it’s amazing what you can pull off with one of these chips.

    The final assembled setup of the kit looks as follows. It comes with a nice handy plastic surface on which the Arduino (in blue) and a breadboard is stuck. It’s connected over USB to my PC.

    IMG_9181

    Getting everything up was easy with the instruction booklet that came. Only glitch was that Windows 8 wouldn’t let me install the drivers because they are not properly signed. So I had to follow the steps given here to disable driver verification.

    Post that the Arduino IDE connected to the board and I could easily write and deploy code (C like syntax).

    image

    The tool bar icons are a bit weird though (side arrow for upload and up arrow for open????).

    There was no way to debug through the IDE (or at least couldn’t find one). So I setup some easy printf style debugging. Basically you write to the serial port and the IDE displays it.

    image

    It was after this that I got to know that there’s a Visual Studio plugin with full debugging support. However, I haven’t yet used that.

    The Project

    imageI decided to start out with making a simple entry alarm and see how much time it takes to get everything done. In college I built something similar, but without a microcontroller (based on 555 IC and IR photo-transistors) and it took decent amount of time to hook up all the components. Basically the idea is that across the door there will be some source of light and a sensor will be on the other side. When someone passes in between the light on the sensor will be obstructed and this will sound an alarm.

    When I last did it in college I really made it robust by using pulsating (at fixed frequency) IR LED as source and IR sensors. Now for this project I relied on visible light and the photo-resistor that came with the kit.

    I built the following circuit.

    image

    Connected a photo-resistor in series with another 10K resistor and connected the junction to the analog input pin A0 of Arduino. Essentially this acts like a voltage divider. In bright light the junction and hence A0 input reads around 1.1 V. When light is obstructed the resistance of photo-resistor changes and the junction reads 2.6 V. The analog pins read in a range of 0 (means 0 volt) and 1023 (for 5V). So this roughly comes to around 225 in light and 530 in the shade. Obviously these are relative based on the strength of the light and how dark it becomes when someone obstructs the light. To avoid taking absolute dependency on the value I created another voltage divider using a potentiometer and connected that to another analog input pin A1. So now I can change the potentiometer to control a threshold value. If the voltage of A0 is above this threshold it would mean that it’s dark enough that someone obstructed the light falling on the resistor and it’s time to sound the alarm.

    The alarm consists of flashing blue and red LEDs (obviously to match police lights) and a standard siren sound played using a piezo crystal that also came with the kit.

    This full assembled and deployed setup looks as follows.

    IMG_9167

    Update*** The picture above says photo-transistor, it should be photo-resistor

    Code

    The key functions are setup() which is automatically called at startup and loop() which as the name suggests is called in a loop.setup() sets up the digital pins for output to drive the flashing LEDS. In loop() I read in the values of  photo-resistor and that from the potentiometer. Based on comparison I sound the alarm

    // Define the constants
    const int sensorPin = 0; // Photo-resistor pin
    const int controlPin = 1; // Potentiometer pin
    const int buzzerPin = 9; // Buzzer pin
    const int rLedPin = 10; // Red LED pin
    const int bLedPin = 11; // Blue LED pin

    // Always called at startup
    void setup()
    {
    // Set the two LED pins as output
    pinMode(rLedPin, OUTPUT);
    pinMode(buzzerPin, OUTPUT);
    }

    // This loops forever
    void loop()
    {
    int sensorVal = analogRead(sensorPin);
    int controlVal = analogRead(controlPin);

    if(sensorVal < controlVal)
    {
    // Light is below threshold so sound buzzer
    playBuzzer(buzzerPin);
    }

    delay(100);
    }

    void playBuzzer(const int buzzerPin)
    {
    for(int i = 0; i < 3; ++i)
    {
    // alternate between two tones, one high and one low
    // at the same time alternate the blue and red LED flashing

    digitalWrite(rLedPin, HIGH); // Red LED on
    tone(buzzerPin, 400); // play 400 Hz tone for 500 ms
    delay(500);
    digitalWrite(rLedPin, LOW); // RED LED off

    digitalWrite(bLedPin, HIGH); // Blue LED on
    tone(buzzerPin, 800); // play 800Hz tone for 500ms
    delay(500);
    digitalWrite(bLedPin, LOW); // Blue LED off
    }

    // Stop the buzzer
    noTone(buzzerPin);
    }

    imageNext Steps

    This system has some obvious flaws. Someone can duck below or over the light-path or even shine a flashlight on the sensor while passing through. To make this robust consider using strips of mirrors on the two side and then use a laser (preferably IR) bouncing off them so that it’s virtually impossible to get through without breaking the light

    Also you can also use a pulsating source of light and detect the frequency on the sensor. This will just make it more harder to break.

  • I know the answer (it's 42)

    C# code for Creating Shortcuts with Admin Privilege

    • 1 Comments
    Seattle skyline

    Context

    If you just care about the code, then jump to the end Smile

    In the CLR team and across other engineering teams in Microsoft we use build environments which are essentially command shells with a custom environment setup using bat and cmd scripts. These scripts setup various paths and environment variables to pick up the right set of build tools and output paths matching the architecture and build flavor. E.g. one example of launching such a shell could be…

    cmd.exe /k %BRANCH_PATH%\buildenv.bat <architecture> <build-flavor> <build-types> <etc...>

    The architectures can vary between things like x86, amd64, ARM, build flavors vary between debug, check, retail, release, etc… The build-types indicate the target like desktop, CoreCLR, metro. Even though all combination is not allowed, the allowed combination approaches around 30. In case of .NET Compact Framework which supports way more architectures (e.g. MIPS, PPC) and targets (e.g. Xbox360, S60) the combination is even larger.

    For day to day development I either need to enter this command each time to move to a different shell, or I have to create desktop shortcuts for all the combination. This becomes repetitive each time I move to a different code branch. I had created a small app that I ran each time I moved to a new branch and it would generate all the combination of shortcut given the branch details. However, our build requires elevation (admin-privilege). So even though I created the shortcuts, I’d have to either right click and use “Run as administrator” OR set that in the shortcuts property.

    image

    image

    This was a nagging pain for me. I couldn’t find any easy programmatic way to create a shortcut with Administrator privilege (I’m sure there is some shell API to do that). So finally I binary compared two shortcuts, one with the “Run as administrator” and one without. I saw that only one byte was different. So I hacked up a code to generate the shortcut and then modify the byte. I am sure there is better/safer way to do this, but for now this “Works for me”.

    The Code

    Since I didn’t find any online source for this code, I thought I’d share. Do note that this is a major hack and uses un-documented stuff. I’d never do this for shipping code or for that matter anything someone other than me would rely on. So use at your own risk… Also if you have a better solution let me know and I will use that…

       1:  // file-path of the shortcut (*.lnk file)
       2:  string shortcutPath = Path.Combine(shortCutFolder, string.Format("{0} {1}{2}.lnk", arch, flavor, extra));
       3:  Console.WriteLine("Creating {0}", shortcutPath);
       4:  // the contents of the shortcut
       5:  string arguments = string.Format("{0} {1} {2} {3}{4} {5}", "/k", clrEnvPath, arch, flavor, extra, precmd);
       6:   
       7:  // shell API to create the shortcut
       8:  IWshShortcut shortcut = (IWshShortcut)shell.CreateShortcut(shortcutPath);
       9:  shortcut.TargetPath = cmdPath;
      10:  shortcut.Arguments = arguments;
      11:  shortcut.IconLocation = "cmd.exe, 0";
      12:  shortcut.Description = string.Format("Launches clrenv for {0} {1} {2}", arch, flavor, extra);
      13:  shortcut.Save();
      14:   
      15:  // HACKHACK: update the link's byte to indicate that this is a admin shortcut
      16:  using (FileStream fs = new FileStream(shortcutPath, FileMode.Open, FileAccess.ReadWrite))
      17:  {
      18:      fs.Seek(21, SeekOrigin.Begin);
      19:      fs.WriteByte(0x22);
      20:  }
  • I know the answer (it's 42)

    Moving to Outlook from Google Reader

    • 3 Comments

    I am sure everyone by now knows that Google Reader is being shutdown. I am a heavy user of Google Reader or Greeder as I call it and I immediately started looking for an alternative, when this suddenly occurred to me, that all PC’s I use have Outlook installed on them. So if you work for an organization that runs on Exchange server, this could really work out well. You can use Office Outlook and Exchange as a great RSS feed reader. Consider this

    1. It will provide full sync across multiple Outlook clients running on different PCs
    2. It will provide on the go access via Outlook Web-access
    3. Your phone’s outlook client should also work with it
    4. You can pretend to work while wasting time on boingboing

    First things first: Export the opml file from Google Reader

    Login to www.google.com and then go to https://www.google.com/takeout/#custom:reader

    image

    This will take some time and create an archive.

    image

    Click on the download button and save the zip. Then extract the zip as follows

    image

    Inside the extracted folder you will have the opml file. For me it’s in C:\Users\admin\Desktop\XXXXXXXX@gmail.com-takeout\XXXXXXXX@gmail.com-takeout\Reader\subscriptions.xml

    Import to Outlook

    This opml file needs to be imported into outlook. Use the File tab and bring up the following UI in Outlook.

    image

    Then use Import. To bring up the following

    image

    Choose OPML file and tap on Next. Now point it to the file you extracted. For me it was C:\Users\admin\Desktop\XXXXXXXX@gmail.com-takeout\XXXXXXXX@gmail.com-takeout\Reader\subscriptions.xml

    Hit next and finally choose the feeds you want to import (Select All).

    image

    The tap on Next and here you have Outlook as an Rss feed reader…

    image

    Read Everywhere

    It totally sync’s on the cloud. Here I have it open on the browser. As you read a post it tracks what you are reading and across the browsers and all your outlook clients at work and home it will update and keep everything in sync.

    image

    Works great on the Windows Phone as well. I assume any Exchange client should work.

    wp_ss_20130314_0002wp_ss_20130314_0001

    Pain Points

    While reading on Outlook was seamless, there are some usability issues in both the browser and the phone. Surface RT is broken. Someone should really fix the mail client on Surface Sad smile

    The major paint point I see is that in the Outlook Web Access the pictures are not shown. Tapping on the picture placeholders work. I think some security feature is blocking the embedded images.

    Also on the Windows Phone you have to go to each feed and set it up so that it syncs that folder. This is a pain but I guess this is to protect against download over the carrier networks.

  • I know the answer (it's 42)

    C++/CLI and mixed mode programming

    • 2 Comments
    beach

    I had very limited idea about how mixed mode programming on .NET works. In mixed mode the binary can have both native and managed code. They are generally programmed in a special variant of the C++ language called C++/CLI and the sources needs to be compiled with /CLR switch.

    For some recent work I am doing I had to ramp up on Managed C++ usage and how the .NET runtime supports the mixed mode assemblies generated by it. I wrote up some notes for myself and later thought that it might be helpful for others trying to understand the inner workings.

    History

    The initial foray of C++ into the managed world was via the managed extension for C++ or MC++. This is deprecated now and was originally released on VS 2003.  This MC++ syntax turned out to be too confusing and wasn’t adopted well. The MC++ was soon replaced with C++/CLI. C++/CLI added limited extension over C++ and was more well designed so that the language feels more in sync with the general C++ language specification.

    C++/CLI

    The code looks like below.

    ref class CFoo
    {
    public:
        CFoo()
        {
            pI = new int;
            *pI = 42;
            str = L"Hello";
        }
    
        void ShowFoo()
        {
            printf("%d\n", *pI);
            Console::WriteLine(str);
        }
    
        int *pI;
        String^ str;
    };

    In this code we are defining a reference type class CFoo. This class uses both managed (str) and native (pI) data types and seamlessly calls into managed and native code. There is no special code required to be written by the developer for the interop.

    The managed type uses special handles denoted by ^ as in String^ and native pointers continue to use * as in int*. A nice comparison between C++/CLI and C# syntax is available at the end of http://msdn.microsoft.com/en-US/library/ms379617(v=VS.80).aspx. Junfeng also has a good post at http://blogs.msdn.com/b/junfeng/archive/2006/05/20/599434.aspx

    The benefits of using mixed mode

    1. Easy to port over C++ code and take the benefit of integrating with other managed code
    2. Access to the extensive managed API surface area
    3. Seamless managed to native and native to managed calls
    4. Static-type checking is available (so no mismatched P/Invoke signatures)
    5. Performance of native code where required
    6. Predictable finalization of native code (e.g. stack based deterministic cleanup)

     

    Implicit Managed and Native Interop

    Seamless, static type-checked, implicit, interop between managed and native code is the biggest draw to C++/CLI.

    Calls from managed to native and vice versa are transparently handled and can be intermixed. E.g. managed --> unmanaged --> managed calls are transparently handled without the developer having to do anything special. This technology is called IJW (it just works). We will use the following code to understand the flow.

    #pragma managed
    void ManagedAgain(int n)
    {
        Console::WriteLine(L"Managed again {0}", n);
    }
    
    #pragma unmanaged
    void NativePrint(int n)
    {
        wprintf(L"Native Hello World %u\n\n", n);
        ManagedAgain(n);
    }
    
    #pragma managed
    
    void ManagedPrint(int n)
    {
        Console::WriteLine(L"Managed {0}", n);
        NativePrint(n);
    }
    

    The call flow goes from ManagedPrint --> NativePrint –> ManagedAgain

    Native to Managed

    For every managed method a managed and an unmanaged entry point is created by the C++ compiler. The unmanaged entry point is a thunk/call-forwarder, it sets up the right managed context and calls into the managed entry point. It is called the IJW thunk.

    When a native function calls into a managed function the compiler actually binds the call to the native forwarding entry point for the managed function. If we inspect the disassembly of the NativePrint we see the following code is generated to call into the ManagedAgain function

    00D41084  mov         ecx,dword ptr [n]         // Store NativePrint argument n to ECX
    00D41087  push        ecx                       // Push n onto stack
    00D41088  call        ManagedAgain (0D4105Dh)   // Call IJW Thunk
    
    

    Now at 0x0D4105D is the address for the native entry point. If forwards the call to the actual managed implementation

    ManagedAgain:
    00D4105D  jmp         dword ptr [__mep@?ManagedAgain@@$$FYAXH@Z (0D4D000h)]  
    

    Managed to Native

    In the case where a managed function calls into a native function standard P/Invoke is used. The compiler just defines a P/Invoke signature for the native function in MSIL

    .method assembly static pinvokeimpl(/* No map */) 
            void modopt([mscorlib]System.Runtime.CompilerServices.CallConvCdecl) 
            NativePrint(int32 A_0) native unmanaged preservesig
    {
      .custom instance void [mscorlib]System.Security.SuppressUnmanagedCodeSecurityAttribute::.ctor() = ( 01 00 00 00 ) 
      // Embedded native code
      // Disassembly of native methods is not supported.
      //  Managed TargetRVA = 0x00001070
    } // end of method 'Global Functions'::NativePrint
    
    

    The managed to native call in IL looks as

    Manged IL:
      IL_0010:  ldarg.0
      IL_0011:  call void modopt([mscorlib]System.Runtime.CompilerServices.CallConvCdecl) NativePrint(int32)
    

    The virtual machine (CLR) at runtime generates the correct thunk to get the managed code to P/Invoke into native code. It also takes care of other things like marshaling the managed argument to native and vice-versa.

    Managed to Managed

    While it would seem this should be easy, it was a bit more convoluted. Essentially the compiler always bound to native entry point for a given managed method. So a managed to managed call degenerated to managed -> native -> managed and hence resulted in suboptimal double P/Invoke. See http://msdn.microsoft.com/en-us/library/ms235292(v=VS.80).aspx

    This was fixed in later versions by using dynamic checks and ensuring managed calls always call into managed targets directly. However, in some cases managed to managed calls still degenerate to double P/Invoke. So an additional knob provided was the __clrcall calling convention keyword. This will stop the native entry point from being generated completely. The pitfall is that these methods are not callable from native code. So if I stick in a __clrcall infront of ManagedAgain I get the following build error while compiling NativePrint.

    Error	2	error C3642: 'void ManagedAgain(int)' : cannot call a function with
    __clrcall calling convention from native code <filename>

    /CLR:PURE

    If a C++ file is compiled with this flag, instead of mixed mode assembly (one that has both native and MSIL) a pure MSIL assembly is generated. So all methods are __clrcall and the Cpp code is compiled into MSIL code and NOT to native code.

    This comes with some benefits as in the assembly becomes a standard MSIL based assembly which is no different from another managed only assembly. Also it comes with some limitation. Native code cannot call into the managed codes in this assembly because there is no native entry point to call into. However, native data is supported and also the managed code can transparently call into other native code. Let's see a sample

    I moved all the unmanaged code to a separate /C++:CLI dll as

    void NativePrint(int n)
    {
        wprintf(L"Native Hello World %u\n\n", n);
    }
    

    Then I moved my managed C++ code to a new project and compiled it with /C++:PURE

    #include "stdafx.h"
    #include 
    
    #include "..\Unmanaged\Unmanaged.h"
    using namespace System;
    
    void ManagedPrint(int n)
    {
        char str[30] = "some cool number";     // native data  
        str[5] = 'f';                          // modifying native data
        Console::WriteLine(L"Managed {0}", n); // call to BCL
        NativePrint(n);                        // call to my own native methods
        printf("%s %d\n\n", str, n);           // CRT
    }
    
    int main(array ^args)
    {
        ManagedPrint(42);
        return 0;
    }
    

    The above builds and works fine. So even with C/++:PURE I was able to

    1. Use native data like a char array and modify it
    2. Call into BCL (Console::WriteLine)
    3. Call transparently into other native code without having to hand generate P/Invoke signatures
    4. Use native CRT (printf)

    However, no native code can call into ManagedPrint. Also do note that even though Pure MSIL is generated, the code is unverifiable (think C# unsafe). So it doesn't get the added safety that the managed runtime provides (e.g. I can just do str[200]  = 0 and not get any bounds check error)

    /CLR:Safe

    /CLR:safe compiler switch generates MSIL only assemblies whose IL is fully verifiable. The output is not different from anything generated from say C# or VB.NET compilers. This provides more security to the code but at the same time losses on several capabilities over and above the PURE variant

    1. No support for CRT
    1. Only explicit P/Invokes

    So for /CLR:Safe we need to do the following

    [DllImport("Unmanaged.dll")]
    void NativePrint(int i);
    
    void ManagedPrint(int n)
    {
        //char str[3000] = "some cool number"; // will fail to compile with  
        //str[5] = 'f';                        // "this type is not verifiable"
    
        Console::WriteLine(L"Managed {0}", n);
    
        NativePrint(n);                        // Hand coded P/Invoke
    

    Migration

    MSDN has some nice articles on people trying to migrate from /CLR to

    1. To /CLR:Pure http://msdn.microsoft.com/en-US/library/ms173253(v=vs.80).aspx
    1. To /CLR:Safe http://msdn.microsoft.com/en-US/library/ykbbt679(v=vs.80).aspx
  • I know the answer (it's 42)

    Windows Phone 8: Evolution of the Runtime and Application Compatibility

    • 3 Comments

    Long time back at the wake of the release of Windows Phone 7 (WP7) I posted about the Windows Phone 7 series programming model. I also published how .NET Compact framework powered the applications on WP7.

    Further simplifying the block diagram, we can think of the entire WP7 application system as followsimage

    As with most block diagrams, this is gross simplifications. However, I hope it helps to easily picture the entire system.

    Essentially the application can be purely managed (written in say C# or VB.net). The application can only utilize services exposed by the developer platform and core services provided by .NET Compact Framework. The application can in no way directly use native code or talk to the OS (say call an Win32 API). It has to always go through the runtime infrastructure and is in a security sandbox.

    The application manager is the loose term I am using to encompass everything that is used to managed the application including the host.

    Windows Phone 8 (WP8) is a huge huge change from Windows Phone 7.x (WP7). From the perspective of a WP7 application running on a WP8 device the system looks as follows

    image

    Everything in Green in this diagram got outright replaced with entire new codebase and the rest of the system other than the application was heavily modified to work with the new OS and the new managed runtime.

    Shared Windows Core

    The OS moved away from Windows Compact Embedded (WinCE) OS core that was used in WP7 to a new OS which shares it’s core with the desktop Windows 8. This means that a bunch of things in the WP8 OS is shared with the desktop implementation, this includes things like kernel, networking, driver framework and others. The shared core obviously brings great value as innovations and features will more easily flow across the two form factors (device and desktop) and also reduce engineering redundancy on Microsoft side. Some of the benefits are readily visible today like great multi-core support, WinRT interop and others are more subtle.

    CoreCLR

    .NET Compact Framework (NETCF) that was used in WP7 has a very different design philosophy and hence a completely different implementation from the desktop .NET. I will have a follow up post on this but for now it suffices to note that .NETCF is a very portable runtime that is designed to be very versatile and cross platform. Desktop CLR on the other hand is more closely tied with Windows and the processor architecture. It closely works with the OS and the underlying HW to give the maximum performance benefit to managed code running on it.

    With the new Windows RT which works on ARM, desktop CLR was anyway updated to work on the ARM processor. So when the phone chose to move to shared core it was an obvious choice to move the CLR as well. This gave the same benefits of shared innovation and reduced engineering redundancy.

    The full desktop CLR is more heavy and provides functionality that is not really required by the phone scenarios and hence a lighter variant of it (which is built from the same source) called CoreCLR was chosen for WP8. CoreCLR is the evolution of the lightweight runtime that powered Silverlight. With the move to CoreCLR developers get a much faster runtime with extended feature set that includes interop via WinRT.

    Backward Compat

    One of the simple statements made during all of these WP8 launch presentations was that applications in the store built for WP7 will work as is for WP8. This is a small statement but is a huge achievement and effort from the runtime implementation perspective. Making apps work as is when the entire runtime, OS and chipset has changed is non-trivial to say the least. Our team worked very hard to make this possible.

    One of the biggest things that played out to our benefit was that the WP7 apps were fully sandboxed and couldn’t use any native code. This means that they didn’t have any means of taking behavioral dependence on the OS APIs or underlying HW. The OS APIs were used via the CLR and it could always add quirks to expose consistent behavior to the applications.

    API Compatibility
    This required us to ensure that CoreCLR exposes the same API set as NETCF. We ran various automated tools to manage the API surface area changes and retain meaningful API compat between WP7 and WP8. With a closed application store it was possible for us to gather complete metrics on API usage and correctly prioritize engineering resources to ensure that majority of applications continued to see the same API set in signature and semantics.

    We also needed to ensure that the same APIs behave as closely as possible with that provided by NETCF. We tested a lot of applications in the app store to get as close as we can and believe that we are at a place that should allow most WP7 application’s API usage to transparently fall over to the new runtime.

    Runtime behavior changes
    When a runtime changes there are behavioral changes that can expose pre-existing issues with applications. This includes things like timing differences. Even though these runtime behaviors are not documented, or in some cases especially called out that user code should not take dependence on them, some apps still did do that.

    Couple of the examples we saw

    1. Taking dependence on finalization order
      Even though CLI specification clearly calls out that finalization order in .NET is not deterministic, code still took subtle dependency on them. In one particular case object F1 used a file and it’s finalizer released it. Later another object F2 opens the same file. With change in the runtime, the timing of GC changed so that at the time F2 tried to open the file, F1 which has been already collected hasn’t yet had its finalizer run. Hence the application crashed. We got in touch with the app developer and got the code moved to use the right dispose pattern
    2. GC Timing and changes in number of generations
      Application finalizers contrary to .NET guidelines modified other managed objects. Now changed GC timings and generation resulted in those objects to have already been collected and hence resulted in finalizer crashes
    3. Threading issues
      This was one of the painful ones. With the change in OS and hence thread scheduler and addition of multiple cores in the ARM CPU, a lot of subtle races and deadlocks in the applications got exposed. These were pre-existing issues where synchronization primitives were not used correctly or some code relied on the quanta based thread scheduling WinCE did. With move to WP8, threads run in parallel on different cores and scheduled in different order, this lead to various deadlocks and race driven crashes. Again we had to get in touch with app developer to address these issues
    4. There were other cases where exact precision of floating point math was relied on. This resulted in a board game where pucks flew off from its surface
    5. Whether or not functions got inlined
    6. Order of static constructor initialization (especially in conjunction with function in-lining)

    We addressed some of these issues where it was realistic to fix them in the runtime. For some of the others we got in touch with the application developers. Obviously all applications in the store and all of their features were not tested. So you should try to test your applications when you have access to WP8.

    At one point we were all playing games on our phones telling our managers that I am compat testing Need For Speed and I need to test till level 10 :)

  • I know the answer (it's 42)

    Test Driven Development of a Generational Garbage Collection

    • 2 Comments
    Crater Lake, OR panorama

    These days everyone is talking about being agile and test driven development (TDD). I wanted to share a success story of TDD that we employed for developing Generational Garbage Collector (GC) for Windows Phone Mango.

    The .NET runtime on Windows Phone 7 shipped with a mark-sweep-compact; stop the world global non-generational GC. Once a GC was triggered, it stopped all managed execution and scanned the entire managed heap to look up all managed references and cleaned up objects that were not in use. Due to performance bottleneck we decided to enhance the GC by adding a generational GC (referred to as GenGC). However, post the General Availability or GA of WP7 we had a very short coding window. Replacing such a fundamental piece of the runtime in that short window was very risky. So we decided to build various kinds of stress infrastructure first, and then develop the GC. So essentially

    1. Write the tests
    2. See those tests failing
    3. Write code for the generational GC
    4. Get tests to pass
    5. Use the tests for regression tracking as we refactor the code and make it run faster

     

    Now building tests for a GC is not equivalent of traditional testing of features or APIs where you write tests to call into mocked up API, see it fail until you add the right functionality. Rather these tests where verifications modes and combination of runtime stresses that we wrote.

    To appreciate the testing steps we took do read the Back To Basics: Generational Garbage Collection and WP7 Mango: Mark-Sweep collection and how does a Generational GC help posts

    Essentially in a generational GC run all of the following references should be discovered by the GC

    1. Gen0 Objects reachable via roots (root –> object OR recursively root –> object –> object )
    2. Objects accessible from runtime native code (e.g. pinned for PInvoke, COM interop, internal runtime references)
    3. Objects referenced via Gen1 –> Gen0 pointers

    The first two were anyway heavily covered by our traditional GC tests. #3 being the new area being added.

    To implement a correct generational GC we needed to ensure that at all places in the runtime where managed object references are updated they need to get reflected in the CardTable (#3 above). This is a daunting task and prone to bugs via omission as we need to ensure that

    1. All forms of assignments in MSIL that we JIT there are calls to update the CardTable.
    2. All places in the native runtime code where such references are directly and or indirectly updated the same thing is ensured. This includes all JIT worker-routines, COM, Marshallers.

     

    If a single instance is missed then it would result in valid/reachable Gen0 objects being collected (or deleted) and hence in the longer run result in memory corruption, crashes that will be hard if not impossible to debug. This was assessed to be the biggest risk to shipping generational GC.

    The other problem is that these potential omissions can be only exposed by certain ordering of allocation and collection. E.g. only a missing tracked reference of A –> B can result in a GC issue only if a GC happened in between allocations of A and B (A is in higher generation than B). Also due to performance reasons (write atomicity for lock-less updates) for every assignment of A = B we do not update the card-table bit that covers the memory area of A. Rather we update the whole byte in the card-table. This means an update to A will cover other objects allocated adjacent to A. Hence if an update to an object just beside A in the memory is missed it will not be discovered until some other run where that object lands up being allocated farther away from A.

    GC Verification mode

    Our solution to all of these problems was to first create the GC verification mode. What this mode does is runs the traditional full mark-sweep GC. While running that GC it goes through all objects in the memory and as it traverses them for every reference A (Gen1) –> B(Gen0), it verifies that the card table bit for A is indeed set. This ensures that if a GenGC was to run, it would not miss that references

    Granular CardTable

    We used very high granular card-table resolution for test runs. For these special runs each bit of the card-table corresponded to almost one object (1 bit to 2 byte resolution). Even though the card-table size exploded it was fine because this wasn’t a shipping configuration. This spaced out objects covered by the card-table and exposed adjacent objects not being updated.

    GC Stress

    In addition we ran the GC stress mode, where we made the GC run extremely frequently (we could push it up to a GC in every allocation). The allocator was also updated to ensure that allocations were randomized so that objects moved around everywhere in the memory.

    Hole Finder

    Hole finder moves all objects around in memory after a GC. This exposes stale pointer issues. If an object didn’t get updated properly due to the GC it would now point to invalid memory because all previous memory locations are now invalid memory. So a subsequent write will fail-fast with AV and we can easily detect that point of failure.

    With all of these changes we ran the entire test suites. Also by throttling down the GC Stress mode we could still use the runtime to run real apps on the phone. Let me tell you playing NFS on a device with the verification mode, wasn’t fun :)

    With this precaution we ensured that not a single GenGC bug has come in from the phone. It shipped rock solid and we were more confident with code churn because regressions would always be caught. I actually never blogged about this because I felt that if I do, it’ll jinx something :)

  • I know the answer (it's 42)

    Core Parking

    • 5 Comments
    Rainier

    For some time now, my main box got a bit slow and was glitching all the time. After some investigation I found that some power profile imposed by our I T department enabled CPU parking on my machine. This effectively parks CPU on low load condition to save power. However,

    1. This effects high load conditions as well. There is a perceptible delay between the load hitting the computer and the CPU’s being unparked. Also some CPU’s remain parked for spike loads (large but smaller duration usage spikes)
    2. This is a desktop and not a laptop and hence I really do not care about power consumption that much

    Windows Task Manager (Ctrl + Shift + Esc and then Performance tab) clearly shows this parking feature. 3 parked cores show flatlines

    clip_image002

    You can also find out if your machine is behaving the same from Task Manager -> Performance Tab -> Resource Monitor -> CPU Tab. The CPU graphs on the right will show which cores if any are parked.

    clip_image004

    To disable this you need to

    1. Find all occurrences of 0cc5b647-c1df-4637-891a-dec35c318583 in the Windows registry (there will be multiple of those)
    2. For all these occurrences set the ValueMin and ValueMax keys to 0
    3. Reboot

    For detailed steps see the video http://www.youtube.com/watch?feature=player_detailpage&v=mL-w3X0PQnk#t=85s

    Everything is back to normal once this is taken care off Smile

    clip_image006

  • I know the answer (it's 42)

    Managing your Bugs

    • 0 Comments
    Rainier

    I have worked on bunch of large scale software product development that spanned multiple teams, thousands of engineers, many iterations. Some of those products plug into even larger products. So obviously we deal with a lot of bugs. Bugs filed on us, bugs we file on other partner teams, integration bugs, critical bugs, good to have fixes and plan ol crazy bugs. However, I have noticed one approach to handling those bugs which kind of bugs me.

    Some people just cannot make hard calls on the gray area bugs and keeps hanging them around. They bump the priority down and take those bugs from iteration to iteration. These bugs always hang around as Priority 2 or 3 (in a scale of Priority 0 through 3).

    I personally believe that bugs should be dealt with the same way emails should be dealt with. I always follow the 4 Ds for real bugs.

    Do it

    If the bug meets the fix bar and is critical enough, just go fix it. This is the high priority critical bug which rarely gives rise to any confusion. Something like the “GC crashes on GC_Sweep for application X”

    Delete it (or rather resolve it)

    If you feel a bug shouldn’t be fixed then simply resolve the bug. Give it back to the person who created the bug with clear reasons why you feel this shouldn’t be fixed. I know it’s hard to say “won’t fix”, but the problem is if one was to fix every bug in the system then the product will never ship. So just don’t defer the thought of making this hard call. There is no benefit in just keeping the bug hanging around for 6 months only to make the same call 2 weeks prior to shipping, citing time constraint as a reason. Make the hard call upfront. You can always tag the bug appropriately so that for later releases you can relook.

    Defer it

    Once you have made a call to fix the bug it should be correctly prioritized. With all bugs you know you won’t be fixing out of the way, its easy to make the right call of fixing it in the current development cycle or move it to the next iteration.

    Delegate it

    If you feel someone else needs to take a look at the bug because it’s not your area of expertise, QA needs to give better repro steps, someone else needs to make a call on it (see #1 and #2 above) then just assign it to them. No shame in accepting that you aren’t the best person to make a call.

  • I know the answer (it's 42)

    WP7: CLR Managed Object overhead

    • 2 Comments

    Winter day

    A reader contacted me over this blog to inquire about the overhead of managed objects on the Windows Phone. which means that when you use an object of size X the runtime actually uses (X + dx) where dx is the overhead of CLR’s book keeping. dx is generally fixed and hence if x is small and there are a lot of those objects it starts showing up as significant %.

    There seems to be a great deal of information about the desktop CLR in this area. I hope this post will fill the gap for NETCF (the CLR for the Windows Phone 7)

    All the details below are implementation detail and is provided just as guidance. Since these are not mandated by any specification (e.g. ECMA) they may and most probably will change.

    General layout

    The overhead varies on the type of objects. However, in general the object layout looks something like

    imageSo internally just before the actual object there is a small object-header. If the object is a finalizable object (object with a finalizer) then there’s another 4-byte pointer at the end of the object. 
    The exact size of the header and what's inside that header depends on the type of the object.

    Small objects (size < 16KB)

    All objects smaller than 16KB has a 8 byte object header.image

    The 32-bits in the flag are used to store the following information

    1. Locking
    2. GC information. E.g.
      1. Bits to mark objects
      2. Generation of the object
      3. Whether the object is pinned
    3. Hashcode (for GetHashCode support)
    4. Size of the object

    Since all objects are 4 byte aligned, the size is stored as an unit of 4 bytes and uses 11 bits. Which means that the maximum size that can be stored in that header is 16KB.

    The second flag contains a 32-bit pointer to the Class-Descriptor or the type information of that object. This field is overloaded and is also used by the GC during compaction phase.

    Since the header cost is fixed, the smaller the object the larger the overhead becomes in terms of % overhead.

    Large Objects

    As we saw in the previous section that the standard object header supports describing 16KB objects. However, we do need to support larger objects in the runtime.

    This is done using a special large-object header which is 20 bytes.image

    The overhead of large object comes to be at max of around 0.1%. A dedicated 32-bit size field allows the runtime to support very large objects (close to 2GB).

    It’s weird to see that the normal object header is repeated twice. The reason is that very large objects are rare on the phone. Hence the runtime tries to optimize it’s various operations so that the large object doesn’t bring in additional code paths and checks. Given an object reference having the normal header just before it helps us in speeding various operations. At the same time having the header as the first data-structure of all objects helps us to be faster in other cases. Bottom line is that the additional 8 bytes or 0.04% extra space usage helps us in enough performance gains at other places so that we pay that extra overhead cost.

    Pool overhead

    NETCF uses 64KB object pools and then sub-allocates objects from it. So in addition to the per-object overhead it uses an additional 44-byte of object-pool-header per 64kb. This means another 0.06% overhead per 64KB. For every object, the contribution of this can be ignored.

    Arrays

    Arrays pan out slightly differently.

    image

    In case of value type array every member doesn’t really need to have the type information and hence the object header associated with it. The Array header is a standard object header of 8 bytes or big object header of 20bytes (based on the size of the array). Then there is a 32-bit count of the number of elements in the array. For single dimension array there is no more overhead. However, for higher dimension arrays the length in each dimension is also stored to support runtime bounds check.

    The reference type arrays are similar but instead of the array contents being inlined into the array the elements are references to the objects on the managed heap and each of those objects have their standard headers.

    Putting it all together

    Standard object 8 bytes ( + 4 bytes for finalizable object)
    Large objects (>16KB) 20 bytes ( + 4 bytes for finalizable object)
    Arrays object overhead + 4 bytes array header +
     ( (nDimension > 1) ?  nDimension x 4 bytes : 0)
  • I know the answer (it's 42)

    My first Windows Phone App

    • 1 Comments
    lighthouse

    imageI have been involved with Windows Phone 7 from the very beginning and worked on delivering the CLR for the GA version and all the way through Nodo and Mango.

    All this while I was busy working on one of the lower layers and seeing bits and bytes flying by. For some time I wanted to try out writing an app for the phone to experience how it feels to be our developer customer. Unfortunately my skills fall more on system programming than designing good looking UI. I had a gazillion ideas floating in my head but most were beyond my graphic and UI design skills. I was also waiting for some sort of inspiration.

    Over the last school break our local King Country Library System which earned the best library system of the year award run an interesting effort. Students received a sheet which they filled up as they read books. On hitting 500 minutes and 1000 minutes of book reading they got prizes. One of the prize was a Kaleidoscope. Not the one with colored glass pieces like I had as a child, but one through which you could see everything around. This was the inspiration I was waiting for and I wrote an WP7 Mango app that does the same. Interestingly the app got a lot of good reviews and is rated at 5 stars (something I didn’t expect). It’s free, go check it out.

    Get the app here, documentation is at http://bonggeek.com/Kaleidoscope/

  • I know the answer (it's 42)

    Delay Sending an Email

    • 0 Comments
    Cascade Mountains, WA

    Like many engineer I am partially fueled by the engineers angst. I am not always able to vent it in code and sometimes resort to sending really dumb emails :).

    I figured out that I can tolerate delaying sending my emails over having to smack my face every other day. Basically all emails I send is delayed by about 5 minutes and that makes a huge difference. Somehow you always know you’ve sent something you shouldn’t have in the first 2 minutes of sending that email.

    This is how you set this up on Outlook 2010… Bring up the new rules UI from the ribbon (Home > Rules > Managed Rules and Alerts)

    image

    Next you’d get another UI just hit next and yes on the following warning.

    image

    After that choose the delay duration. I use 5 minutes

    image

    Bonus!!!

    In case you want to pretend that you are hard at work when you aren’t then you can also use the timed email feature in Outlook. In the new email window use the following.

    image

    image

    Unfortunately if you are an software engineer your work is measured by the amount and quality of code you check in and not emails so sadly this doesn’t work Smile

  • I know the answer (it's 42)

    Windows Phone Mango: Under the hood of Fast Application Switch

    • 3 Comments
    spin

    Fast Application Switch of FAS is kind of tricky for application developers to handle. There are a ton of documentation around how the developers need to handle the various FAS related events. I really liked the video http://channel9.msdn.com/Events/DevDays/DevDays-2011-Netherlands/Devdays059 which walks through the entire FAS experience (jump to around 8:30).

    In this post I want to talk about how the CLR (Common Language Runtime or .NET runtime) handles FAS and what that means to your application. Especially the Active –> Dormant –> Active flow. Most of the documentation/presentation quickly skips over this with the vague “The application is made dormant”. This is equivalent to the “witches use brooms to fly”. What is the navigation mechanism or how the broom is propelled are the more important questions which no one seems to answer (given the time of year, I just couldn’t resist :P) . Do note that most developers can just follow the coding guidelines for FAS and never need to care about this. However, a few developers, especially the ones developing multi-threaded apps and using threading primitives may need to care about this. And hence this post

    Design Principle

    The entire Multi-threading design was made to ensure the following

    Principle 1: Pre-existing WP7 apps shouldn’t break on Mango.
    Principle 2: When an application is sent to the background it shouldn’t consume any resources
    Principle 3: Application should be resumed fast (hence the name FAS)

    As you’d see that these played a vital role in the design being discussed below.

    States

    The states an application goes through is documented in http://msdn.microsoft.com/en-us/library/ff817008(VS.92).aspx 

    Execution Model Diagram for Windows Phone 7.5

    CLR Design

    The diagram below captures the various phase that are used to rundown the application to make it dormant and later re-activated.  It gives the flow of an application as it goes through the Active –> Dormant –> Active state (e.g. the application was running and the user launches another application and then uses the back button to go back to the first application).

    image

    Deactivated

    The Deactivated event is sent to the application to notify it that the user is navigating away from the application. After this there is 3 possible outcomes. It will either remain dormant, gets tombstoned or finally gets killed as well. Since there is no way to know which would happen, the application should store its transient state into the PhoneApplicationPage.State and it’s persistent state into some persistent store like the IsolatedStorage or even in the cloud. However, do note that the application has 10 seconds to handle the Deactivated event. In the 3 possible situations this is how the stored data will be used back

    1. Deactivate –> Dormant –> Active
      In this situation the entire process was intact in memory and just it’s execution was stopped (more about it below). In the Activated event the application can just check the IsApplicationInstancePreserved property. If this is true then the application is coming back from Dormant state and can just use the in-memory state. Nothing needs to be re-read in.
    2. Deactivate –> Dormant –> Tombstoned –> Active
      In this case the application’s in-memory state is gone. However, the PhoneApplicationPage.State is serialized back. So the application should read persistent user data from IsolatedStorage or other permanent sources in the Activated event. At the same time it case use the PhoneApplicationPage.State in the OnNavigatedTo event.
    3. Deactivate –> Dormant –> Terminated
      This case is no different from the application being re-launched. So in the Launching event the user data needs to be re-created from the permanent store. PhoneApplicationPage.State  is empty in this case

    The above supports the #1 principle of not breaking pre-existing WP7 apps. A WP7 app would’ve been designed without considering the dormant stage. Hence it would’ve just skipped the #1 option. So the only issue will be that a WP7 app will result in re-creating the application state each time and not get the benefit of the Dormant stage (it will get the performance of Tombstoning but not break in Mango).

    Post this event the main thread never transitions to user code (e.g. no events are triggered). The requirement on the application for deactivate is that

    1. It shouldn’t run any user code post this point. This means it should voluntarily stop background threads, cancel timers it started and so on
    2. This event needs to be handled in 10 seconds

    If app continues to run code, e.g. in another thread and modifies any application state then that state cannot be persisted (as there will be no subsequent Deactivated type event)

    Paused

    This event is an internal event that is not visible to the application. If the application adhered to the above guideline it shouldn’t care about it anyway.

    The CLR does some interesting stuff on this event. Adhering to the “no resource consumption” principle is very important. Consider that the application had used ManualResetEvent.WaitOne(timeout). Now this timeout can expire in the time when the application was dormant. If that happened it would result in some code running when the application is dormant. This is not acceptable because the phone maybe behind locked screen and this context switch can get the phone out of a low-power state. To handle this the runtime detaches Waits, Thread.Sleep at Paused. Also it cancels all Timers so that no Timer callbacks happen post this Pause event.

    Since Pause event is not visible to the application, it should consider that some time post Deactivated this detach will happen. This is completely transparent to user code. As far as the user code is considered, it just that these handles do not timeout or sleeps do not return during the time the application is dormant. The same WaitHandle objects or Thread.Sleeps start working as is after the application is activated (more about timeout adjustment below).

    This is also the place where other parts of the tear-down happens. E.g. things like asynchronous network calls cancelled, media is stopped.

    Note that the background user threads can continue to execute. Obviously that is a problem because the user code is supposed to voluntarily stop them at Deactivated.

    Freeze

    Besides user code there are a lot of other managed code running in the system. These include but not limited to Silverlight managed code, XNA managed code. Sometime after Paused all managed code is required to stop. This is called the CLRFreeze. At this point the CLR freezes or blocks all managed execution including user background threads. To do that it uses the same mechanism as used for foreground GC. In a later post I’d cover the different mechanics NETCF and desktop CLR uses to stop managed execution.

    Around freeze the application enters the Dormant stage where it’s in 0 CPU utilization mode.

    Thaw

    Managed threads stopped at Freeze are re-started at this point.

    Resuming

    At Resuming the WaitHandle, Thread.Sleep detached in Paused is re-attached. Also timeout adjustments are made during this time. Consider that the user had two handles on which the user code started Waits with 5 seconds and 10 seconds timeouts. After 3 seconds of starting the Waits the application is made dormant. When the application is re-activated, the Waits are restarted with the amount of timeout remaining at the point of the application getting deactivated. So essentially in the case below the first Wait is restarted with 2 seconds and the later with 7. This ensures that relative gap between Sleeps, Waits are maintained.

    image

    Note timers are still not restarted.

    Activated

    This is the event that the application gets and it is required to re-build it’s state when the activation is from Tombstone or just re-use the state in memory when the activation is from Dormant stage.

    Resumed

    This is the final stage or FAS. This is where the CLR restarts the Timers. The idea behind the late start of timers is that they are essentially asynchronous callbacks. So the callbacks are not sent until the application is activated (built its state) and ready to consume those callbacks.

    Conclusion

    1. Ideally application developer needs to take care of the FAS by properly supporting the various events like Deactivated, Activated
    2. Background threads continue to run post the Deactivated event. This might lead to issues by corrupting application state and losing state changes. Handle this by terminating the background threads at Deactivated
    3. While making application dormant Waits, Sleeps and Timers are deactivated. They are later activated with timeout adjustments. This happens transparently to user code
    4. Not all waiting primitives are time adjusted. E.g. Thread.Join(timeout) is not adjusted.
  • I know the answer (it's 42)

    WP7 Mango: The new Generational GC

    • 6 Comments

    In my previous post “Mark-Sweep collection and how does a Generational GC help” I discussed how a generational Garbage Collector (GC) works and how it helps in reducing collection latencies which show up as long load times (startup as well as other load situations like game level load) and gameplay or animation jitter/glitches. In this post I want to discuss how those general principles apply to the WP7 Generational GC (GenGC) specifically.

    Generations and Collection Types

    We use 2 generations on the WP7 referred to as Gen0 and Gen1. A collection could be any of the following 4 types

    1. An ephemeral or Gen0 collection that runs frequently and only collects Gen0 objects. Object surviving the Gen0 collection is promoted to Gen1
    2. Full mark-sweep collection that collects all managed objects (both Gen1 and Gen0)
    3. Full mark-sweep-compact collection that collects all managed objects (both Gen1 and Gen0)
    4. Full-GC with code-pitch. This is run under severe low memory and can even throw away JITed code (something that desktop CLR doesn’t support)

    The list above is in the order of increasing latency (or time they take to run)

    Collection triggers

    GC triggers are the same and as outlined in my previous post WP7: When does the GC run. The distinction between #2 and #3 above is that at the end of all full-GC the collector considers the memory fragmentation and can potentially run the memory compactor as well.

    1. After significant allocation
      After significant amount of managed allocation the GC is started. The amount today is 1MB (called GC quanta) but is open to change. This GC can be ephemeral or full-GC. In general it’s an ephemeral collection. However, it might be a full collection under the following cases
      1. After significant promotion of objects from Gen0 to Gen1 the collections become full collections. Today 5MB of promotion triggers a full GC (again this number is subject to change).
      2. Application’s total memory usage is close to the maximum memory cap that apps have (very little free memory left). This indicates that the application will get terminated if the memory utilization is not cut-back.
      3. Piling up of native resources. We use different heuristics like native to managed memory ratio and finalizer queue heuristics to detect if GC needs to turn to full collection to release native resources being held-up due to Gen0 only collections
    2. Resource allocation failure
      All resource allocation failure means that the system is under memory pressure and hence such collections are always full collection. This can lead to code pitch as well
    3. User code triggered GC
      User code can start collections via the System.GC.Collect() managed API. This results in a full collection as documented by that API. We have not added the method overload System.GC.Collect(generation). Hence there is no way for the developer to start a ephemeral or Gen0 only collection
    4. Sharing server initiated
      Sharing server can detect phone wide memory issue and start GC in all managed processes running. These are full-GC and can potentially pitch code as well.

     

    So from all of the above, the 3 key takeaways are

    1. Low memory or memory cap related collections are always full-collections. These could also turn out to be the more costly compacting collection and/or pitch JITed code
    2. Collections are in general ephemeral and become full-collection after significant object promotion
    3. No fundamental changes to the GC trigger policies. So an app written for WP7 will not see any major changes to the number of GC’s that happen. Some GC will be ephemeral and others will be full-GCs.

     

    Write Barriers/Card-table

    As explained in my previous post, to keep track of Gen1 to Gen0 reference we use write-barrier/card-table.

    Card-table can be visualized as a memory bitmap. Each bit in the card-table covers n bytes of the net address space. Each such bit is called a Card. For managed reference updates like  A.b = C in addition to JITing the real assignment, calls are added to Write-barrier functions. This  write barrier locates the Card corresponding to the address of write and sets it. Later during collection the collector checks all Gen-1 objects covered by a set card-bit and marks Gen-0 references in those objects.

    This essentially brings in two additional cost to the system.

    1. Memory cost of adding those calls to the WB in the JITed code
    2. Cost of executing the write barrier while modifying reference

    Both of the above are optimized to ensure they have minimum execution impact. We only JIT calls to WB when absolutely required and even then we have an overhead of a single instruction to make the call. The WB are hand-tuned assembly code to ensure they take minimum cycles. In effect the net hit on process memory due to write barriers is way less than 0.1%. The execution hit in real-world applications scenarios is also not in general measureable (other than real targeted testing).

    Differences from desktop

    In principle both the desktop GC and the WP7 GC are similar in that they use mark-sweep generational GC. However, there are differences based on the fact that the WP7 GC targets a more constrained device.

    1. 2 generations as opposed to 3 on the desktop
    2. No background or incremental collection supported on the phone
    3. WP7 GC has additional logic to track and handle application policies like application memory caps and total memory utilization
    4. The phone CLR uses a very different memory layout which is pooled and not linear. So no concept of Large Object Heap. So lifetime of large objects is no different
    5. No support for particular generation collection from user code
  • I know the answer (it's 42)

    WP7 Mango: Mark-Sweep collection and how does a Generational GC help

    • 7 Comments

    About a month back we announced that in the next release of Windows Phone 7 (codenamed Mango) we will ship a new garbage collector in the CLR. This garbage collector (GC) is a generational garbage collector.

    This post is a “back to basics” post where I’ll try to examine how a mark-sweep GC works and how adding generational collection helps in boosting it’s performance. We will take a simplified look at how mark-sweep-compact GC works and how generational GC can enhance it’s performance. In later posts I’ll try to elaborate on the specifics of the WP7 generational GC and how to ensure you get the best performance out of it.

    Object Reference Graph

    Objects in the memory can be considered to be a graph. Where every object is a node and the references from one object to another are edges. Something like

    image

    To use an object the code should be able to “reach” it via some reference. These are called reachable objects (in blue). Objects like a method’s local variable, method parameters, global variables, objects held onto by the runtime (e.g. GCHandles), etc. are directly reachable. They are the starting points of reference chains and are called the roots (in black).

    Other objects are reachable if there are some references to them from roots or from other objects that can be reached from the roots. So Object4 is reachable due to the Object2->Object4 reference. Object5 is reachable because of Object1->Object3->Object5 reference chain. All reachable objects are valid objects and needs to be retained in the system.

    On the other hand Object6 is not reachable and is hence garbage, something that the GC should remove from the system.

    Mark-Sweep-Compact GC

    A garbage collector can locate garbage like Object6 in various ways. Some common ways are reference-counting, copying-collection and Mark-Sweep. In this section lets take a more pictorial view of how mark-sweep works.

    Consider the following object graph

    1

    At first the GC pauses the entire application so that the object graph is not being mutated (as in no new objects or references are being created). Then it goes into the mark phase. In mark phase the GC traverses the graph starting at the roots and following the references from object to object. Each time it reaches an object through a reference it flips a bit in the object header indicating that this object is marked or in other words reachable (and hence not garbage). At the end everything looks as follows

    2

    So the 2 roots and the objects A, C, D are reachable.

    Next it goes into the sweep phase. In this phase it starts from the very first object and examines the header. If the header’s mark bit is set it means that it’s a reachable object and the sweep resets that bit. If the header’s bit is not set, it’s not reachable and is flagged as garbage.

    3

    So B and E gets flagged as garbage. Hence these areas are free to be used for other objects or can be released back to the system

    4

    This is where the GC is done and it may resume the execution of the application. However, if there are too many of those holes (released objects) created in the system, then the memory gets fragmented. To reduce memory fragmentation. The GC may compact the memory by moving objects around. Do note that compaction doesn’t happen for every GC, it is run based on some fragmentation heuristics.

    5

    Both C and D is moved here to squeeze out the hole for B. At the same time all references to these objects in the system is also update to point to the correct location.

    One important thing to note here is that unlike native objects, managed objects can move around in memory due to compaction and hence taking direct references to it (a.k.a memory pointers) is not possible. In case this is ever required, e.g. a managed buffer is passed to say the microphone driver native code to copy recorded audio into, the GC has to be notified that the corresponding managed object cannot be moved during compaction. If the GC runs a compaction and object moves during that microphone PCM data copy, then memory corruption will happen because the object being copied into would’ve moved. To stop that, GCHandle has to be created to that object with GCHandleType.Pinned to notify the GC that the corresponding object should never move.

    On the WP7 the interfaces to these peripherals and sensors are wrapped by managed interfaces and hence the WP7 developer doesn’t really have to do these things, they are taken care offm under the hood by those managed interfaces.

    The performance issue

    As mentioned before during the entire GC the execution of the application is stopped. So as long as the GC is running the application is frozen. This isn’t a problem in general because the GC runs pretty fast and infrequently. So small latencies of the order of 10-20ms is not really noticeable.

    However, with WP7 the capability of the device in terms of CPU and memory drastically increased. Games and large Silverlight applications started coming up which used close to 100mb of memory. As memory increases the number of references those many objects can have also increases exponentially. In the scheme explained above the GC has to traverse each and every object and their reference to mark them and later remove them via sweep. So the GC time also increases drastically and becomes a function of the net workingset of the application. This results in very large pauses in case of large XNA games and SL applications which finally manifests as long startup times (as GC runs during startup) or glitches during the game play/animation.

    Generational Approach

    If we take a look at a simplified allocation pattern of a typical game (actually other apps are also similar), it looks somewhat like below

    image

    The game has a large steady state memory which contains much of it’s steady state data (which are not released) and then per-frame it goes on allocating/de-allocating some more data, e.g. for physics, projectiles, frame-data. To collect this memory layout the traditional GC has to walk or traverse the entire 50+ mb of data to locate the garbage in it. However, most of the data it traverses will almost never be garbage and will remain in use.

    This application behavior is used for the Generational GC premise

    1. Most objects die young
    2. If an object survives collection (that is doesn’t die young) it will survive for a very long time

    Using these premise the generational GC tries to segregate the managed heap into older and younger generations objects. The younger generation called Gen-0 is collected in each GC (premise 1), this is called the Ephemeral or Gen0 Collection. The older generation is called Gen-1. The GC rarely collects the Gen-1 as the probability of finding garbage in it is low (premise 2).

    image

    So essentially the GC becomes free of the burden of the net working set of the application.

    Most GC will be ephemeral GC and it will only traverse the recently allocated objects, hence the GC latency remains very low. Post collection the surviving objects are promoted to the higher generation. Once a lot of objects are promoted, the higher generation starts becoming full and then a full collection is run (which collects both gen-1 and gen-0). However, due to premise 1, the ephemeral collection finds a lot of garbage in their runs and hence promotes very few objects. This means the growth rate of the higher generation is low and hence full-collection will run very infrequently.

    Ephemeral/Gen-0 collection

    Even in ephemeral collection the GC needs to deterministically find all objects in Gen-0 which are not reachable. This means the following objects needs to survive a Gen-0 collection

    1. Objects directly reachable from roots
    2. Root –> Gen0 –> Gen-0 objects (indirectly reachable from roots)
    3. Objects referenced from Gen1 to Gen0

    Now #1 and #2 pose no issues as in the Ephemeral GC, we will anyway scan all roots and Gen-0 objects. However to find objects from Gen1 which are referencing objects in Gen-0, we would have to traverse and look into all Gen1 objects. This will break the very purpose of having segregating the memory into generation. To handle this write-barrier+card-table technique is used.

    The runtime maintains a special table called card-table. Each time any references are taken in the managed code e.g. a.b = c; the code JITed for this assignment also updates an internal data-structure called CardTable to capture that write. Later during the ephemeral collection, the GC looks into that table to find all the ‘a’ which took new references. If that ‘a’ is a gen-1 object and ‘c’ a gen-0 object then it marks ‘c’ recursively (which means all objects reachable from ‘c’ is also marked). This technique ensures that without examining all the gen-1 objects we can still find all live objects in Gen-0. However, the cost paid is

    1. Updating object reference is a little bit more costly now
    2. Making old object taking references to new objects increases GC latency (more about these in later posts)

    Putting it all together, the traditional GC would traverse all references shown in the diagram below. However, an ephemeral GC can work without traversing the huge number of Red references.

    image

    It scans all the Roots to Gen-0 references (green) directly. It traverses all the Gen1->Gen0 references (orange) via the CardTable data structure.

    Conclusion

    1. Generational GC reduces the GC latency by avoiding looking up all objects in the system
    2. Most collections are gen-0 or ephemeral collection and are of low latency this ensures fast startup and low latency in game play
    3. However, based on how many objects are being promoted full GC’s are run sometimes. When they do, they exhibit the same latency as a full GC on the previous WP7 GC

    Given the above most applications will see startup and in-execution perf boost without any modification. E.g today if an application allocates 5 MB of data during startup and GC runs after every MB of allocation then it traverses 15mb (1 + 2 + 3 + 4 + 5). However, with GenGC it might get away with traversing as low as only 5mb.

    In addition, especially game developers can optimize their in-gameplay allocations such that during the entire game play there is no full collection and hence only low-latency ephemeral collections happens.

    How well the generational scheme works depend on a lot of parameters and has some nuances. In the subsequent posts I will dive into the details of our implementation and some of the design choices we made and what the developers needs to do to get the most value out of it.

  • I know the answer (it's 42)

    Emulator updates for the new Windows Phone Mango tools

    • 0 Comments

     

    GPS

    One of the most requested feature for the emulator was support for sensor. Developers were apparently asking their managers for travel budget to go to Hawaii/Europe so that they could test out their location based apps :). Today at MIX11 we announced that GPS sensor support for the emulator will be shipped in the next version of the tools. Some of the features that will be present are

    1. Developers would be able to launch a new tool window from the emulator and in that click on locations on maps to feed that location to apps running in the emulator. This covers building apps like Yelp, Weather app on the device that needs static location information
    2. Have pre-recorded location and able to play that out so that new locations are sent to the phone at a preset rate. This covers building apps like sports tracker, navigation apps.
    3. The app running in the emulator will not need to change in anyway or even know that the data is simulated. It will come over the usual sensor channels and even apps like the built in bing maps app will pick up the fed in data.

    Here’s a short video which I quickly took from the laptop of our PM. Being away from family in Las Vegas was getting me down so he took me to Seattle in a second (or rather simulated it).

    Direct link http://www.youtube.com/watch?v=wt0DdiYqdkk 

     

    Accelerometer

    Another new sensor support is the accelerometer. So no more picking up the PC monitor running the emulator and rotating it to see if the emulator gets the movements. There’s going to be a new tool window using which the developer can feed in motion to the apps running in the emulator. Also some pre-canned motions like shake will be present. Here’s another video of that

    Direct link http://www.youtube.com/watch?v=Gc1kuXj7eCE 

  • I know the answer (it's 42)

    Generational GC in Windows Phone Mango

    • 2 Comments

     

    This is an announcement only post, do subscribe to this blog feed or on to http://twitter.com/abhinaba as I’d be making more detailed posts on how we are building the Generational GC and what developers need to know.

    Today in the MIX11 keynote ScottGu just announced something that I’ve been working on for some time. The next version of the Windows Phone will have a Generational Garbage Collector (GenGC for short). A bunch of folks has worked real hard to get this piece into WP7 in a short time.

    Today on Windows Phone 7 we have a stop the world, mark-sweep-compact, non-generational GC. When it runs it pauses the entire execution, looks through each object in the application to find and eliminate all unused data. This manifests as longer app startup time and stutters during time critical execution.

    In Mango we are adding Generational GC to reduce collection latency to address both of these problems . Existing apps and games even without any changes can expect faster startup, faster level loads and reduction in gameplay stutters due to collection. Developers can specifically optimize for the new generational GC to completely remove stutters during animations and game play that came due to these GC pauses.

    As an example see how one of the existing games butterfly benefits from the GenGC. One of the phone below is running the GenGC and the other is not and it should be obvious which one is based on which starts up first (do note that both in Keynote and here we are showing startup gains because it’s easier to show that. In gameplay stutters is hard to show on lower resolution videos). Also note that not just at the core startup at every level it gets a bit faster.

    Direct link http://www.youtube.com/watch?v=FtusaSuFIpc

    Please refer to my previous blog http://blogs.msdn.com/b/abhinaba/archive/2009/03/02/back-to-basics-generational-garbage-collection.aspx on what is a generational GC and how it helps.

    The new GenGC uses 2 generations and write barriers to track Gen1 to Gen0 references. This post is just to announce the feature. I will be making a series of posts to get into the gory details as we get closer to handing over the bits to our developers. I am sure the developers would want to know the sizes of the various generations, when full vs generational collections happen and so much more. Do register to my blog feed or my twitter account http://twitter.com/abhinaba  for the announcements as I publish these posts.

  • I know the answer (it's 42)

    SIMD/ARM-NEON support in Windows Phone Mango

    • 9 Comments

    This is an announcement only post, do subscribe to this blog feed or on to http://twitter.com/abhinaba as I’d be making more detailed posts on these topics as we get close to handing over these bits to our developer customers.

    ARM processors support SIMD (Single Instructions Multiple Data) instructions through the ARM® NEON™technology that is available on ARMV7 ISA. SIMD allows parallelization/HW-acceleration of some operations and hence performance gains. Since the Windows Phone 7 chassis specification requires ARMV7-A; NEON is available by default on all WP7 devices. However, the CLR on Windows Phone 7 (NETCF) did not utilize this hardware functionality and hence it was not available to the managed application developers. We just announced in MIX11 that in the next version of Windows Phone release the NETCF runtime JIT will utilize SIMD capabilities on the phones.

    What it means to the developers

    Certain operations on some XNA types will be accelerated using the NEON/SIMD extensions available on the phone. Examples include operations on Vector2, Vector3, Vector4, Matrix from the Microsoft.Xna.Framework namespace will get this acceleration. NOTE: At the point the exact types and the exact operations on them are not closed yet and subject to change. Do note that user types will not get this acceleration. E.g. if you have rolled out your own vector type and use say dot operations on it, the CLR will not accelerate them. This is a targeted acceleration for some XNA types and not a vectorizing JIT compiler feature.

    Apps and types heavily using these XNA types (our research shows a lot of games do) will see good performance gain. For example we took this Fluid simulation sample from a team (note this was not written specifically for us to demo) and saw huge gains because it heavily uses Matrix and Vector operations to simulate fluid particles and the forces that work in between them. Frame rates shot up from 18fps to 29fps on the very same device.

    Based on the usage of these types and operations in your app you’d see varying amounts of gains. However, this feature should be a good motivation to move to these XNA types.

    How does SIMD work

    SIMD as the name suggests can process the same operation on multiple data in parallel.

    Consider the following Vector addition

    public static Vector2 Add(Vector2 value1, Vector2 value2)
    {
        Vector2 vector;
        vector.X = value1.X + value2.X;
        vector.Y = value1.Y + value2.Y;
        return vector;
    }

    If you see the two lines in blue it’s essentially doing the same addition operation on two data sets and putting the result in two other locations. This today will be performed sequentially. Where the JITer will emit processor instructions to load the X values in registers, add them, store them, and then do the same thing again for the Y values. However, this is inherently parallelizable and the ARM NEON provides an easy way to load such values (vpop, vldr) in single instructions and use a single VADD NEON instruction to add the values in parallel.

    A way to visualize that is as follows

    image

    A single instruction both X1 and Y1 is loaded, another instruction loads both X2, Y2 and the 3rd in parallel adds them together.

    For an easy sample on how that works head onto http://www.arm.com/files/pdf/NEON_Support_in_the_ARM_Compiler.pdf

  • I know the answer (it's 42)

    WP7: When does GC Consider a Local Variable as Garbage

    • 1 Comments
     

    Consider the following code that I received:

    static void Foo()
    {
        TestClass t = new TestClass();
        List<object>l = new List<object>();
        l.Add(t); // Last use of l and t
    
        WeakReference w = new WeakReference(t);
    
        GC.Collect();
        GC.WaitForPendingFinalizers();
    
        Console.WriteLine("Is Alive {0}", w.IsAlive);
    }

    If this code is is run on the desktop it prints Is Alive False. Very similar to the observation made by a developer in http://stackoverflow.com/questions/3161119/does-the-net-garbage-collector-perform-predictive-analysis-of-code. This happens because beyond the last usage of l and t, the desktop GC considers l and t to be garbage and collects it.

    Even though this seems trivial, it isn’t. During JITing of this method the JITer performs code analysis and ensures that runtime tables are updated to reflect whether at a particular time a given local variable is alive or not (it is not used beyond that point). This needs a bunch of data structures to be maintained (lifetime tables) and also dynamic analysis of code.

    In case of Windows Phone 7 (or other platforms using NETCF) for the same code you’d get Is Alive True. Which means the NETCF GC considers the object to be alive even beyond it’s last usage in the function.

    Due to the limited resources available on devices, NETCF does not do these dynamic analysis. So we trade off some runtime performance to ensure faster JITing (and hence application faster startup). All local variables of all functions in the call stacks of the all managed threads in execution during garbage collection is considered live (hence also object roots) and not collect. In NETCF the moment the function Foo returns and the stack is unwound the local objects on it including t and l becomes garbage.

    Do note that both of these implementations follow the ECMA specification (ECMA 334 Section 10.9) which states

    For instance, if a local variable that is in scope is the only existing reference to an object, but that local variable is never referred to in any possible continuation of execution from the current execution point in the procedure, an implementation might (but is not required to) treat the object as no longer in use.”

    Hence even though on the desktop CLR it is not important to set variables to null, it is sometimes required to do so on WP7. In case t and l was set to null GC would’ve collected them. So the guidance for WP7 is

    1. In most cases setting local variables to null is not required. Especially if the function returns fairly soon
    2. If you have locals that hold onto very large data structures and that function will remain executing for a long time then set those variables to null when you are done using them (e.g. in between the last usage of a variable in a large function and calls to a bunch of web-services in the same function which will take a long time to return).
    3. Use dispose patterns where required. It’s important on devices to free up limited resources as soon as possible.
  • I know the answer (it's 42)

    Windows Phone 7, my story

    • 0 Comments
    Olympic Peninsula - Hurricane Ridge

    Now that Windows Phone 7 is released all the gadget/tech blogs like TechCrunch, Engadget , Gixmodo are humming with reviews. Given that it’s a Version 1 product (ignore that 7 in the phone name) the reviews are great.

    Seeing all the buzz around gets me thinking about how I got involved in the project and my experiences. As the Windows Phone 7 head said “These are the good old days”

    Not so long ago, sometime after I joined the .NET Compact Framework team (and drew this cartoon) we started hearing about Windows Mobile reset and the awesome new phone being built and how our .NET Compact framework will be it’s runtime. Then we started getting access to CEPC images (these are phone OS that works on desktop virtual machines) to start developing and debugging our code. It felt strange at the beginning to see the ugly blue tiles and the truncated or cut-off text. These CEPC didn’t have hardware acceleration and hence there was no animation. The standard PC displays clipped off most of the phone display area and made the UI look really weird. I thought it was getting the wrong font as well as the text on top seemed to be cut off. All in all I wasn’t that impressed and just didn’t get the Metro UI.

    Working on the runtime is not all that glamorous and my view of Windows Phone 7 remained pretty much confined into the debugger. I’d interact with it over command shell and see bits and bytes flying by. So this is the view at 10 feel elevation (you can ignore the source file open in the IDE Smile, I wasn’t going to give a faux pas and reveal our source code, would I?).

    image

    As time went by and I had a chance to see the first prototype device, I suddenly “got” the Metro UI. It felt wonderful, a no pretense UI which doesn’t try to fake 3D buttons and kind of gets out of they way. The at a glance UI totally made sense (just look on the locked screen or the start window and know your email count, schedule and other bunch of things).  I started playing with the initial apps (Jelly Splat Smile) and then the realization dawned, each time someone picks up the device and launches any app, our code would start executing. Exactly the reason I joined Microsoft.

    As more and more applications started getting developed by 3rd parties the power of our developer tools really shows up. IMO with the Windows Phone Developer tools (Visual Studio 2010 based), Expression, XNA Game Studio, Emulator, Silverlight and host of other 1st and 3rd party tools, WP7 easily has the best development story out there and it’s just V1.

    Now when I see my daughter play Flowerz or I watch a Netflix movie and get a 10,000 feet view, it feels strangely wonderful and at the same time disconcerting (did you see http://abstrusegoose.com/307 ?). I strongly believe that Windows Phone 7 is one of the best Phones I have ever used and it passes the significant other test with flying colors. 

     

    image

  • I know the answer (it's 42)

    Windows Phone 7 App Development: When does the GC run

    • 11 Comments

    Cub

    If you are looking for information on the new Generational GC on Windows Phone Mango please visit http://blogs.msdn.com/b/abhinaba/archive/2011/06/14/wp7-mango-the-new-generational-gc.aspx

    Many moons ago I made a post on When does the .NET Compact Framework Garbage Collector run. Given that a lot of things have changed since then, it’s time to make another post about the same thing.

    For the developers coming to Windows Phone 7 (WP7) from the Windows desktop let me first clarify that the runtime (CLR) that is running on the WP7 is not the same as the one running on the desktop. The WP7 runtime is known as .NET Compact Framework (NETCF) and it works differently than the “desktop CLR”. For 90% of cases this is irrelevant as the WP7 developer targets the XNA or Silverlight programming model and hence what is running underneath is really not important. E.g when you drive you really do not care about the engine. This post is for the other 5% where folks do run into issues (smoke coming out of the car).

    Moreover do note that when the GC is run is really an implementation detail that is subject to change.

    Now that we have all the disclaimers behind us lets get down to the list.

    The Garbage Collector is run in the following situations

    1. After some significant allocation:
      When an application tries to allocate managed memory the allocator first checks a counter that indicates the number of bytes of managed data allocated since the last GC. If this counter crosses a threshold (which is changeable and set to 1MB currently) then GC is fired (and the counter is obviously reset).
      The basic idea is that there has been significant allocation since the last GC and hence do it again.
    2. Resource allocation failure
      If some internal native allocation fails, like loadlibrary fails or JIT buffer allocation fails due to out-of-memory condition then GC is started to free up some memory and the allocation is re-attempted (only 1 re-attempt)
    3. User code can trigger GC
      Using the managed API System.GC.Collect(), user code can force a GC
    4. Sharing server initiated
      One of the new features in the WP7 CLR is the sharing server (see my post http://blogs.msdn.com/b/abhinaba/archive/2010/04/28/we-believe-in-sharing.aspx for details). In WP7 there is a central server coordinating all the managed processes. If this sharing server is notified of low memory condition, it starts GC in all the managed processes in the system.

    The GC is NOT run in the following cases (I am explicitly calling these out because in various conferences and interactions I’ve heard folks thinking it might be)

    1. GC is not run on some timer. So if a process is not allocating any memory and there is no low-memory situation then GC will never be fired irrespective of how long the application is running
    2. The phone is never woken up by the CLR to run GC. GC is always in response to an active request OR allocation failure OR low memory notification.
    3. In the same lines there is no GC thread or background GC on WP7

     

    For folks migrating from NETCF 3.5 the list below gives you the changes

    1. WinForm Application going to background used to fire GC. On WP7 this is no longer true
    2. Sharing server based changes are obviously new
    3. The GC quantum cannot be changed by user
  • I know the answer (it's 42)

    Date format

    • 7 Comments

    Let me start by saying that using mm-dd-yyyy is just plain wrong. No really it just doesn’t make any sense to me. Neither does it make any sense to most people world-over if you go my the date-format map up at http://en.wikipedia.org/wiki/File:Date_upd1.PNG

    If one uses dd-mm-yyyy it makes sense because it’s in decreasing order of granularity (kind of LSB first). yyyy-mm-dd makes ever more sense because

    1. It’s in decreasing order of granularity
    2. Natural ordering in many ways like telephone numbers (country-code, area-code, local-code, number) or IP address
    3. String sorting automatically sorts by that date. E.g. I can easily have folders/files named in this order and DIR lists them nicely sorted out (yea I know I can sort by date/time as well).

    d:\MyStuff\Personal\Pictures>dir 2010*
    Volume in drive D is Data
    Volume Serial Number is 3657-F386

    Directory of d:\MyStuff\Personal\Pictures

    06/17/2010  01:06 PM    <DIR>          2010_0501
    06/17/2010  01:07 PM    <DIR>          2010_0504
    06/17/2010  01:16 PM    <DIR>          2010_0508
    06/17/2010  01:20 PM    <DIR>          2010_0509
    06/17/2010  01:24 PM    <DIR>          2010_0515
    06/17/2010  01:29 PM    <DIR>          2010_0517
    06/17/2010  01:30 PM    <DIR>          2010_0523
    06/17/2010  01:33 PM    <DIR>          2010_0528
    06/17/2010  01:37 PM    <DIR>          2010_0529
    06/17/2010  01:43 PM    <DIR>          2010_0605
    06/17/2010  01:47 PM    <DIR>          2010_0606
    06/21/2010  08:40 PM    <DIR>          2010_0616
    06/28/2010  10:33 PM    <DIR>          2010_0619
                   0 File(s)              0 bytes
                  13 Dir(s)  55,925,829,632 bytes free

    But I just cannot fathom why would anyone use mm/dd/yyyy. In what way is that intuitive?

  • I know the answer (it's 42)

    Inside the Windows Phone Emulator

    • 0 Comments
    US

    Learn from the dev lead and PM of Windows Phone 7 Emulator on how it works and delivers the awesome performance.

    Some key points

    1. The emulator is essentially a x86 based Virtual Machine running full image of Windows Phone 7 OS
    2. It emulates various peripherals (e.g. audio, networking), the list hopefully will grow in the future
    3. Supports multi-touch if your host PC has a touch screen
    4. It uses the GPU on the host PC to accelerate the graphics
    5. Shameless plug: One of the reason it’s possible to emulate x86 and still have managed apps running as-is, is because the runtime (.NET Compact Framework) supports both x86 and ARM seamlessly (provides JITer for both architectures and runs the same managed code on both seamlessly)

    Get Microsoft Silverlight

  • I know the answer (it's 42)

    Working in the United States

    • 5 Comments
    Eucalyptus trees-  Trek in the hills

    I am sure everyone has read about the “you cannot possibly pronounce or spell” volcano in Iceland throwing up in the air and screwing up the entire flight system of this planet. While most people were reading about it from the comfort of their home I was doing the same while waiting in airports with a 5 year old tagged along. Anyways, 48 hours and a switch in the direction of flights got me to some countries I didn’t really plan to visit and over the Pacific to land in Seattle.

    Going forward I will be working out of the Microsoft Redmond offices, doing pretty much the same stuff I was doing before (working on the .NET Compact Framework runtime).

    Right now I am settling down with my family and my daughter is excited to see snow for the first time (she got lucky as there was a small snowfall on Saturday in Snoqualmie pass). The changes we need to adjust to is humongous, we left Hyderabad when it was 43° Celsius and now it’s 6° Celsius in Redmond. Oh sorry I should’ve said it was 110°F and now it’s  43°F in Redmond.

    Wish me luck and hopefully finally I will be able to attend some nerddinners.

  • I know the answer (it's 42)

    We Believe in Sharing

    • 0 Comments
    Good Morning

    In Building NETCF for Windows Phone 7 series we put in couple of features to enhance startup performance and ensure that the working set remains small. One of these features added is code/data sharing.

    Native applications have inherent sharing where multiple processes can share the same executable code. However, in case of managed code running on NETCF 3.5 even when multiple applications use the same assembly, the managed code in them are JITed in context of each process separately. This results in suboptimal resource utilization because of the following

    1. Repeated JITing of the same code
    2. Memory overhead of having identical copies of the same JITed code for every process running them. The execution engine also maintains records of the various types loaded and even though two (or more) processes may be loading the same types from the same assemblies; per process copies are maintained for them.

    Together the overhead can be significant.

    In the latest version of the runtime shipping with Windows Phone 7 (.NET Compact Framework 3.7) we added a feature to share both the JITed code and these type information across multiple managed processes.

    image

    The runtime essentially uses a client server architecture. We added a new kernel mode server which is responsible of maintaining a system wide shared heap. The server on receiving requests from the various client processes (each managed app is a client) JITs code and type information into this shared heap. The shared heap is accessed Read-only from all the client apps and Read/write from the server.

    When a process requests for the same type to be loaded or code to be JITed it just re-uses the already loaded type info or JITed code from the shared heap.
    What is shared

    1. Various Execution engine data-structures like loader type-information
    2. JITed code from a predefined list of platform assemblies (user code is not shared).

    Do note that user code is not shared and neither is all platform code. Sharing arbitrarily would actually increase cost if they ultimately didn’t get reused in a lot of applications. The list of assemblies/data shared was carefully chosen and tuned to ensure only those are shared that provided the maximum performance benefit.

    Sharing provides the following advantages

    1. Warm startup time is significantly reduced. When a new application is coming up a large
      percentage of the initial code and type-info is already found on the shared heap and hence
      there is significant perf boost. The exact number depends on the applications but it’s common
      to get around 30% gains
    2. Significant reduction in net working set as a lot of commonly used platform assembly JITed code is re-used
    3. Obviously there are other goodness like less JITing means less processing on the device

    Like user code the system is also capable of pitching the shared code when there is a low memory situation on the device.

  • I know the answer (it's 42)

    What’s this .NET Compact Framework thingy?

    • 13 Comments
    Bidar - On the way back to hyderabad

    If you are following the latest Windows Phone stories than you have surely heard about the .NET Compact Framework (NETCF) which is the core managed runtime used by the Windows Phone 7 programming model. This is an implementation of the .NET specification that works on devices and a disparate combination of HW and SW.

    image

    In case you are hearing about NETCF for the first time, then hopefully this post will help you to understand it’s architecture and how it is different from the desktop .NET.

    The Architecture

    At the heart of NETCF is the execution engine. This contains the usual suspects including the JIT, GC, Loader and other service providers (e.g. Marshalling that marshals native objects to managed and vice versa, type system).

    The entire Virtual machine is coded against a platform agnostics Platform Abstraction Layer or the PAL. In order to target a platform like say Nokia S60 we implemented the PAL for that platform. Stringent policy of not taking any platform dependency outside PAL and ensuring in PAL we only use features that are commonly available on most modern embedded OS allows NETCF to be highly portable. Currently the PAL supports a variety of OS and processor architecture. The ones in Red are those that are used for Silverlight, the others are available for the legacy .NETCF 3.5.

    The other system dependent part is the JITter which compiles the MSIL into the platform dependent instructions. For every supported processor architecture there’s a separate implementation of JIT.

    This entire runtime is driven by a host. E.g. for Silverlight on S60 it’s the Silverlight host running as a plugin inside the Nokia browser, on Windows Phone it’s the Windows Phone task host. These hosts uses the runtime services to run managed code. The host interacts with the runtime over the hosting interfaces.

    Managed code can either be the framework managed code like BCL and other Silverlight/XNA framework managed code or the user code that is downloaded from the web/application-store. Even though the framework managed code can interact with the underlying system including the execution-engine and the OS (via Platform-Invoke) the user managed code is carefully sandboxed. If you see the arrows moving out of the user managed code it is evident that user managed code can call only into the framework managed code (which in turn can call into the system post security verification). User code is not allowed to access any native resources (including P/Invoke).

    The host knows about the UI and it’s rendering and uses reverse Invoke to call into the managed code. E.g. if a user clicks on a XAML button the Silverlight host using the XAML object tree and rendering logic (hit-testing) figures out which object got clicked on and it uses reverse-pinvoke to call into the corresponding managed objects (which provides the event handler). That managed code is then verified, security checked, jitted and run by NETCF.

    Portability

    NETCF is highly portable. As far as I know it’s one of the very few (or is it the only one :)) runtime shipping out of Microsoft that supports Big-Endian processor architecture (Xbox360 uses Big-Endian processor). It is designed for resource constrained devices and is battery friendly. For many time-space trade-offs it tilts towards conserving space (e.g. Code-pitching which I’ll cover later) and at the same time works well on Xbox360 running high-end games. The list of processor/OS I have given above is just illustrative and it actually works or there are POC (proof-of-concept) of it working on really esoteric SW+HW combinations.

    Some CF Facts

    1. Design Goals
      1. Keep the runtime highly portable, that is the main NETCF USP
      2. Designed for resource constrained devices
      3. Battery friendly (use less power)
    2. Nothing comes for free. The above design results in some unique NETCF features (more about that later) and some limitations as a consequence.
    3. Some of the well known platforms powered by NETCF runtime is Windows Mobile 6.5 and below, Windows Phone 7, XNA (on Xbox 360), Zune, Media Room (pdf), Silverlight on Nokia S60.
    4. Currently shipping versions are 3.7 on Windows Phone and 3.5 elsewhere.
    5. Windows Phone runs on both ARM (the real phone devices) and x86 when running on the emulator (which is a x86 VM). NETCF becomes a natural choice because it runs on both these processors.
Page 1 of 15 (365 items) 12345»