I know the answer (it's 42)

A blog on coding, .NET, .NET Compact Framework and life in general....

  • I know the answer (it's 42)

    Back To Basics: Copying Garbage Collection

    Banana leaf

    This post is Part 5 in the series of posts on Garbage Collection (GC). Please see the index here.

    In my previous post I have discussed reference counting and mark-sweep garbage collector in some detail. This post is a quick intro into the copying GC. I’ll not be going into the gory details because this is not much used in the .NET world.

    The copying GC conceptually breaks the memory into two regions. Let’s call the ping and pong as shown below.


    The buffer Ping is initially used for allocation of memory. It maintains a pointer at the end of the last allocated memory and for each new request simply returns that pointer and then increments it to point to the new end. So at the beginning when no memory is allocated it points to the bottom. After a bunch of allocation this is what the it looks like


    Allocation is very fast as it involves only incrementing a pointer. However, after a bunch of allocations the situation will arrive where current-Pointer + requested > Ping-top. This means no more memory can be allocated from Ping. It is at this time the GC is run.

    Garbage collection involves enumerating the roots and then copying every live object from Ping to Pong. Since objects are moved in memory all their references have to be updated as well. After the copy the buffers are switched so that subsequent allocations can now happen from Pong. Ping remains clean for the next copy cycle.

    For the example set above lets assume when the GC is run Object 1 and 3 are live but 2 isn’t. After GC runs the memory layout becomes


    Advantages and disadvantages

    The primary advantage is that allocation is extremely fast (almost equal to stack allocation) because out of memory checks is just a pointer check and actual allocation is just incrementing the free pointer. Since the GC compacts memory it maintains good locality of reference resulting in better cache hit ratios and hence faster execution.

    The main disadvantage with copying GC is that almost double the memory space is required. Even though it is argued that in virtual memory based systems it shouldn’t be a problem because the un-used memory pages will simply be paged out to the disk, the argument is not essentially true. Specially during GC phase both the memory half's are touched (due to ping <-> pong copy) and the number of page-faults generated are much larger than other GC schemes.

  • I know the answer (it's 42)

    Back To Basics: Mark and Sweep Garbage Collection


    This post is Part 4 in the series of posts on Garbage Collection (GC). Please see the index here.

    Recap of definition

    I’ll quickly repeat couple of definitions which are used in this post

    1. Object: This is a unit of storage on the heap. It generally means an object in the object oriented sense but is equally applicable to a procedural language (e.g. a struct/native-type in C) or functional language.
    2. Object/Reference graph: This is the directional graph of objects in memory. A typical sample is below. The nodes are objects in memory and the edges (arrows) are references one object holds to another (e.g. a pointer or reference in C/C++). There can be circular references between two nodes (Object 3 and Object 5) or nodes referencing themselves (Object 6).
    3. Roots: These are the set of nodes in the object graph from which the references start. These are typically references held in registers, local variable on the stack or global variables. The green nodes in the diagram above are roots.
    4. Unreachable object: These are nodes in the graph which have no edge referencing them. The Orange node in the diagram above is an unreachable object. This is the node that the GC needs to clean/free because it is not reachable from any node and is hence garbage memory.

    Mark-sweep GC

    In the last post I discussed reference counting GC which continually tracks memory and frees them the moment they go out of reference. Mark and sweep GC takes a very different approach.

    In this scheme memory is not freed the moment they become garbage. Actually no subsystem is used to keep track of memory as they are being used.

    When the system starts running out of memory (or some other such trigger) the GC is fired. It first enumerates all the roots and then starts visiting the objects referenced by them recursively (essentially travelling the nodes in the memory graph). When it reaches an object it marks it with a special flag indicating that the object is reachable and hence not garbage. At the end of this mark phase it gets into the sweep phase. Any object in memory that is not marked by this time is garbage and the system disposes it.

    The algorithm

    At the top level the algorithm in C like Pseudocode looks as follows

    void GC()
    ObjectCollection roots = GetRoots();
    for(int i = 0; i < roots.Count(); ++i)

    It has three phases, enumeration of roots, marking all object references starting from the root and finally sweeping them.

    Root Enumeration

    As I mentioned above the root enumeration lists all references held in registers, global or static fields, local variables on the stack, function arguments on stack, etc… The runtime system needs to provide ways for the GC to collect the list of roots. E.g. in the case of .NET the JIT maintains the information on the active roots and provides APIs to the GC to enumerate them. The system can also do some form of dynamic analysis to further optimize it. Say a function accepts a pointer argument. Now while JITing the jitter can figure out that the argument is not being used through out the method and can drop it from the list of roots the moment it is last accessed in the method.


    Typically every object is given some sort of header as I mentioned in my last post. This header contains a bit flag which is used to “mark” that object. The Mark procedure is recursive and acts as follows

    void Mark(Object* pObj)
    if (!Marked(pObj)) // Marked returns the marked flag from object header
    MarkBit(pObj); // Marks the flag in obj header

    // Get list of references that the current object has
    // and recursively mark them as well

    ObjectCollection children = pObj->GetChildren();
    for(int i = 0; i < children.Count(); ++i)

    Mark(children[i]); // recursively call mark

    This is a recursive algorithm with heavy dose of simplification used and we will re-visit this in later posts on with actual implementation implications.

    The recursion termination criteria is

    1. Either a node doesn’t have any children (leaf node)
    2. If it is already marked which ensures that we do not land in infinite recursion for circular references

    At the end of this marking phase every object that can be reached using a reference from the roots would have been visited and it’s marked bit set.


    Sweep starts by iterating through the entire memory and frees memory blocks that are not marked. It also clears the Mark bit so that subsequent GC passes can correctly mark/unmark them (resets the memory to pre-mark state).

    void Sweep()
    Object *pHeap = pHeapStart;
    while(pHeap < pHeapEnd)
    if (!Marked(pHeap))
    Free(pHeap); // put it to the free object list

    pHeap = GetNext(pHeap);

    There are a bunch of assumptions here which has to be met. Some of them are that the

    1. Entire heap can be iterated over
    2. Both free memory blocks and allocated objects have the same sort of header that can be queried with Marked
    3. You can find the next memory block when given one block (GetNext implemented)
    4. There is some sort of free memory pool which tracks the list of free memory and you can add a block to it by calling Free(Object *)

    Special memory layout is needed to handle this assumptions and I’ll try to cover how they are handled atleast in .NETCF in later post.


    Repeated allocation can actually fragment memory and slow allocation speed significantly (head over here for a nice story on this). So many systems actually combine sweep with a compaction. This ensures fragmentation reduces and hence subsequent allocations are faster. However, when memory is compacted objects move and hence all references to them has to be updated to point to the new location.

    When is the GC run

    At the face of it, it seems that GC should be run when memory allocation fails. However, when memory allocation fails there is a high chance that GC will fail as well because it would need some working memory to function which won’t be available. So in real world systems GC is fired under a variety of conditions like

    1. System is low on memory (enough for a GC to run but soon allocations are going to fail)
    2. Memory it too much fragmented
    3. After allocating a bunch of memory (e.g. .NET CF fires GC on allocating 1MB after the last GC)

    This obviously varies from system to system and is carefully tuned to match the scenarios being supported.

    Advantage and disadvantages

    The primary advantage of mark-sweep is that it handles cyclic references naturally. Moreover, no additional overheads are added while normal execution (e.g. allocation, pointer manipulations). Combined with compaction it ensures good locality of reference and reduced fragmentation and hence optimal subsequent allocations.

    However, mark-sweep pauses useful execution and walks entire memory marking and sweeping it. Hence it adds large freezes which is un-acceptable in most interactive systems. However, incremental variations of Mark-sweep are available which works around this limitation. Mark-sweep also starts thrashing when memory fills up as it fails to clean up memory and it gets triggered again and again as memory approaches exhaustion. Self tuning GC helps in this scenario.

  • I know the answer (it's 42)

    Back To Basics: Reference Counting Garbage Collection

    Blowing the candle

    This is Part 3 in a series of post on GC, visit the list here.

    Reference counting (refcounting) GC is one of the two primary GC mechanisms widely used. The basic workings of this GC is pretty simple and based on counting the number of reference to a memory block (or object) from other blocks. Each time memory is created or a reference to the object is assigned it’s refcount is bumped up. Each time a pointer is assigned away from it the refcount is reduced. When refcount of the block goes to 0, it means that no pointer references it and hence it is un-reachable and is garbage and can be reclaimed.

    Let’s assume RefCount(VOID *pMem) gives us the reference count of the object at pMem. Then the following code shows how the reference count of a specific object in memory varies with the code flow.

    Object * obj1 = new Object(); // RefCount(obj1) starts at 1
    Object * obj2 = obj1;         // RefCount(obj1) incremented to 2 as new reference is added
    Object * obj3 = new Object(); 
    obj2 = NULL;                  // RefCount(obj1) decremented to 1 as ref goes away
    obj1 = obj3;                  // RefCount(obj1) decremented to 0 and can be collected

    As the execution flow mutates the state of the program, the refcount of each object is updated transparently and the block is freed when refcount hits 0. As you might guess this record keeping and collection can be implemented in various ways. Here I discuss one of them…

    Storage structure and allocation

    For each object the refcount has to be stored in some place. Since this field will be accessed very often it’s best to keep it inside or close to the object to preserve spatial locality of reference which has a lot of impact on performance (by ensuring good cache hit ratio). One of the approach we can take (which is most common) is to add a small header to the start of every object allocated. So each time allocation is asked for a object we allocate sizeof(header) + sizeof(type to be allocate). The memory layout looks like


    As apparent from the diagram the pointer returned is at the start of the actual data and not the start of the data allocated. This means that the caller isn’t even aware of the header and can use the data as is. However the garbage collector knows that every object has this header and can use any object pointer to access this header.

    The header structure we choose is

    struct MemHeader
        UINT32 refCount;

    The allocation routine (in pseudo C) looks like

    // cb is the number of bytes to be allocated
    PVOID GC_Alloc(size_t cb)
        // allocate MemHeader + cb but cast it to MemHeader
        MemHeader* pHdr = (MemHeader*)PlatformAlloc(MEMHEADERSIZE + cb);
        if (pHdr == NULL)
            return NULL;
        // set the initial refCount
        pHdr->refCount = 1;
        // increment the pointer by the size of MemHeader 
        // and make it point to the start of the actual data
        return (PVOID)pHdr;

    When the GC wants to access the header it can simply use

    inline MemHeader * GetHeader(PVOID pMem)
        return ((MemHeader*)pMem) - 1;

    Counting references

    For every pointer updation the affected objects’ reference count has to be updated. E.g. for a pointer assignment of sort l = r the following needs to be done

    void GC_UpdateRef(PVOID *l, PVOID r)
        // used for assignment of l = r
        // increment ref count for r-value as an additional reference is being taken
        if (r != NULL)
        // decrease ref count of l-value as it’s releasing it’s previous reference
        // do the final assignment
        *l = r;

    However, this function has to be called for all pointer assignments and even when pointers go out of scope. This is the tricky part and is achieved by various ways

    1. Compilers can insert call to GC_UpdateRef for all pointer assignment and also when references go out of scope (like it inserts call to destructors/finalizers)
    2. For languages that support operator overriding this behavior can be added from outside. However, that might need some changes to the type system like making all garbage collectable types inherit from some common type (which then has the overriding added).

    Freeing objects

    The final piece is actually freeing objects when their reference count goes down to 0.

    VOID GC_ReleaseRef(PVOID pMem)
        if (pMem == NULL)
        // get the memory header and decrement the refcount
        MemHeader *pHdr = GetHeader(pMem);
        // if ref count has gone down to zero, free it
        if (pHdr->refCount == 0)

    Even though this seems straightforward it is actually lacking a crucial block. Consider the following reference graph


    Here if object 1 goes out of reference and gets reclaimed by GC_ReleaseRef then recursively we need to check (and free if required) all other references held by it. So in this particular case object 4 shouldn’t be freed (as object 2 has a ref to it) but both Object 3 and Object 5 needs to be freed. So the full method would be something like

    VOID GC_ReleaseRef(PVOID pMem)
        if (pMem == NULL) return;
        MemHeader *pHdr = GetHeader(pMem);
        if (pHdr->refCount == 0)
            foreach(PVOID pChild in Get_Child(pHdr)) 

    The additional part are the lines in red. The assumption is that Get_Child returns all other references held by the object and we recursively free them. However, actually implementing Get_Child is hard. One of the mechanism used it to maintain compiler/type-system generated reference mask which I will cover that as a part of mark-sweep garbage collection.

    Advantage/disadvantage of reference counting GC

    Reference counting GC has a primary disadvantage. Which is used on it’s own it completely fails to handle circular references. Consider the following reference graph


    Here Object2 refers to Object3 and vice-versa. Now if Object 1 releases the reference marked in red then both Object2 and Object3 becomes un-reachable and garbage. However, since they refer to each other both will have a refCount of 1 and will not get freed. To handle this situation ref counting GC is never used on it’s own. It is used in conjunction with some other GC like mark-sweep which is run periodically to clean these circular reference leaks.

    Refcounting GC adds significant code/perf bloat as for every assignment calls to update refCount has to be added. Moreover, for multi-threaded system it can prove to be a major issue as locks have to be taken when updating refCount. Even the most performant atomic operations (e.g. Win32 InterlockedIncrement) are costly when used repeatedly. The header space cost which seems petty also eats into the system because for small objects they cause significant overhead. E.g. for an allocated int32 it adds 100% overhead (as the header is also 32 bit value).

    Advantage besides being easy to implement is that objects gets re-claimed the moment they become garbage. This means if objects support destructors/finalizers then these are run pretty soon and system resources held by them return to the OS faster.

    Reference counting GC ensures that the collection is distributed over the whole period of execution and hence for interactive system doesn’t result in system freezes like say mark-sweep GC does.

  • I know the answer (it's 42)

    Back to basic: Series on dynamic memory management

    Beach by the side of Suratkal Engineering college

    After becoming the .NET Compact Framework (.NETCF) dynamic memory management module owner I am continually learning a lot about the subject. Based on hallway discussion I figured out that a lot of developers are not very clear about the subject and would like to learn more. Most online material I encountered gets into too much of details and implementation for most casual readers.

    So here is my attempt to write up a series on GC. I plan to cover the basic stuff and then move into details of .NET CF GC including performance, profiling. I plan to cover bits of desktop .NET GC as well (but for that Maoni’s blog is a better resource) The first two in this series is already published. I will keep this post updated to act as an index into the series.

    1. Memory allocation, a walk down the history
    2. Why use garbage collection
    3. Reference Counting Garbage Collection
    4. Mark-sweep garbage collection
    5. Copying garbage collection
    6. Optimizing reference counting garbage collection
    7. Handling overflow in mark stage
    8. Generational Garbage Collection
    9. How does the GC find object references
    10. More to come … :)

    In case you have some suggestion send it my way and I will ty to include that as well…

  • I know the answer (it's 42)

    Windows 7 rocks


    Ok I know this blog is not about Windows but I thought I’d share the love.

    To me it looks like as is someone has combed through the entire user experience and fixed most of the stuff that bothered me.

    The taskbar change alone are good enough for the upgrade. E.g. the taskbar feels very natural when set in vertical position (it didn’t work well in this configuration before). The icons in tray and the status bar can be dragged around and it actually shows download progress in the IE taskbar icon itself (See the green progress in the screenshot below). I can even close an app by hovering over it’s icon on the taskbar (no need to restore it and then travel across the entire screen and hit close button). The taskbar is very accessible through keyboard, hit Win+T and taskbar is in focus and then shift through the icons using the arrow keys.


    The greatest shocker was when I connected my laptop to my TV and it instantly popped up the following, it then correctly picked up my TV’s resolution (1080p) and matched screen resolution to that. Almost felt magical…


    I could go on and on (like the awesome window management features) but you can head over to Tim’s blog and read more details there.

  • I know the answer (it's 42)

    Back to basics: Why use garbage collection


    This post is Part 2 in the series of posts on Garbage Collection (GC). Please see the index here.

    Garbage Collection does exactly what it’s more fancier name “Automatic dynamic memory management” suggests. As I discussed in my last post on Memory Allocation dynamic memory is hard to manage and GC attempts to do that automatically and relieves the coder from the hard task.

    GC basically attempts to take care of two basic scenarios remove garbage and avoid dangling pointers. They are very inter-related but are different scenarios

    Garbage Collect

    Consider a typical object reference graph. In the graph every rectangle is an object and the arrows denote object references. So A is an object which references B. For a program to be able to use an object it should be in one of these reference chains. The reference chain start from what is called Root and are typically references held in registers, on stack as local variable or global variables.


    Let’s assume that due to some operation, A relinquishes the reference to B and the graph becomes something like this…


    Now B and hence C is not reachable from any valid root in the program and hence have become Garbage (un-reachable). The programmer must ensure to follow all reference and free (de-allocate them). One of the duty of a GC system is to automate this process by tracking down (using various algorithms) such objects and reclaim the memory used by them automatically. So in a GC system when the reference is broken it will figure out that B and hence C is not reachable and will de-allocate them.

    Hanging/dangling reference

    Let’s consider another object graph which is similar to the one above but in addition to A, another object A’ also has a reference to B (or in other words B is shared between them)


    Even here after some operation object A doesn’t need reference to B. The programmer does what he thinks is right and de-allocates B and C


    However, A’ still has a references to B and hence that reference is now hanging or in more specific term pointing to invalid memory and would typically return unpredictable result when accessed. The key here is the unpredictable behavior. It is not necessary that program will crash. Unless the memory location in B is re-used it will seem to have valid data and de-references from A’ will work fine. So the failure will come up in un-expected ways and totally un-related places in the program and will make locating the root cause extremely hard.

    GC helps by automatically taking care of both of the above scenarios and ensuring that the system doesn’t land up in either of them.

    How important is GC

    If GC is so cool then why doesn’t all systems use GC. There are multiple reasons but with newer technology most of them are going away. One of the primary reasons of not using GC is the performance overhead. This makes GC a less lucrative deal for real-time systems, device drivers and even gaming platforms. However, there are examples where GC is used even for these systems, e.g. .NET Compact Framework (the team I work for) is used with full GC support very successfully in XNA games on Xbox.

    However, on some systems like functional languages which relies a lot on closures and deferred execution of those, it makes execution flow very un-predictable and hence GC becomes almost mandatory.

  • I know the answer (it's 42)

    Back To Basics: Memory allocation, a walk down the history

    Star fish on the Suratkal beach

    This post is Part 1 in the series of posts on Garbage Collection (GC). Please see the index here.

    Most languages we deal with today support a whole bunch of different memory allocation options. We have an option to use static allocation, stack-allocation and heap allocation. However, same was not true in the pre-historic era, life was simple then and the earlier languages just supported static allocation, then slowly stack and then heap allocation came by and today we have come to expect automatic memory management from most new languages.

    However, that didn’t really mean that there was no advantage is the simplistic approach taken by these early languages.

    Static allocation

    This is the simplest allocation option and was supported by initial languages like Fortran (early 50’s). All memory was allocated (reserved) during compile time and variable names were bound to the target memory and frozen. This had some serious limitation like variable memory couldn’t be supported. In case memory usage might vary the developer simply allocated the largest size he anticipated and used from it.

    This also meant that size of all Data Structures were known at runtime (e.g. fixed length arrays, no link-lists). Since name binding happened statically, simple things we have come to expect over time like recursion was not possible because the second instantiation of a method would mean that same variables are accessed and since names are bound to addresses statically they would over-write data.

    Even though this seems to be pretty restrictive this had some advantages. The most obvious is robustness and predictable behavior. Since there are no runtime memory allocation, there is no possibility of running out of memory (so no OutOfMemoryException :) ). The execution was also very fast because there is no memory to manage and no stack/heap to tear down. Moreover the compiler could emit direct memory accesses instruction without going through any indirection (as symbol->address mapping doesn’t change).

    Stack Allocation

    The first languages to bring in stack allocation was Algol in late 50’s. This worked past some limitations of static allocation by supporting memory frames that got pushed onto the system stack as procedures were called. What this meant was that based on the input parameters of a method it could push a different sized frame and hence had variable memory allocation (atleast for method local variables). However, return value had to be of fixed size (fixed by compile time) because the caller would need to know how much was left behind by callee on the stack.

    Recursion was possible because every invocation would have it’s exclusive frame, and hence when a method returned the frame would unwind (pop) and memory allocated by it would be no longer available to the callee.

    Obviously with this came reduced robustness because based on control flow the program could get out of stack and the additional stack management overhead reduced speed (though not significantly).

    Even though it wasn’t much but it still was a huge improvement…

    Heap Allocation

    Funny to call it so but this is the state of the art in memory allocation :) and has remained so for a long time (with some improvements).

    With heap data could be allocated in chunks of variable size and later de-allocated. There is no ordering requirement in-between the allocation and de-allocation unlike the stack frames where callee memory gets deallocated before the caller. So now it was possible to allocate and pass variable sized memory around (e.g. from Callee to Caller). It was easier to manage memory better, e.g. by using growable non-contiguous lists over arrays, model data closer to their real life existence like a tree.

    With the heap allocation life seemed simpler at first but actually became harder. Robustness nose-dived. Memory allocated dynamically had to be managed carefully because if allocated memory is not de-allocated after it’s use is over, it becomes garbage and un-available (called memory leak) and slowly runs out of space. At the end the dreaded out of memory scenario comes up where there is no more memory left to be allocated to the requestor and the program fails to continue. Heap allocation is also a costly operation and can have severe perf implications.

    Even with all the tooling support it still remains hard to locate memory leaks and constitutes one of the harder classes of bugs to be tackled.

    Garbage Collection

    LISP in 1959 started a new technique called Automatic Memory Management known by the more popular term Garbage Collection (GC). This tries to address the memory leak issue by needing explicit memory allocation but removing the need to de-allocate memory. Instead it provides language facilities that actually hunt down the un-used memory and automatically free them without user-code intervention.

    As you might guess that even though GC increases robustness and reduces coding overhead significantly it also has performance implications. This makes using GC in hard in a whole bunch of scenarios. E.g. you typically do not expect to see a GC being used for a real-time system or a system driver (but there are exceptions to this as well); but you do expect to see it in a web-development platform that manages huge amount of memory and can sometimes live with a bit of response delay when the GC is actually running.

    Putting it all together

    Most modern programming languages support all three forms of storage. E.g. C++ or C# supports all 3.

    Even though all of them might be available at the same time, the choice of usage is key to ensure robustness and meet performance goals of a program. Unfortunately a lot of developers take a casual approach to it and later land up into trouble when the memory foot-print become larger and harder to handle.

    Consider the following pseudo-code

    void foo()
        MyType m1;                                   // Option-1 stack allocation
        MyType* m2 = AllocateOnHeap(sizeof(MyType)); // Option-2 heap allocation
        if (SomeCond)
           foo();     // recursive call    


    Here foo is called in recursion. If MyType is of significant size and we use option-1 of using stack allocation then the depth of recursion supported by the method will be significantly reduced because each function call will need additional stack space to accommodate MyType. However, if we use heap (Option-2) then the stack needed will be significantly lower (increase in depth of recursion) but since heap allocation is much slower the speed will suffer. Based on the size of MyType, the need of the program the allocation would have to be carefully chosen.

    Based on target domain different languages have chosen

  • I know the answer (it's 42)

    When the spell checkers attack


    Today Boing Boing had a post about the Cupertino effect in which spell-checkers erroneously change valid words. I had worked on Adobe Acrobat and Adobe FrameMaker's spell checking feature and do happen to appreciate the headache with getting them right.

    I face Cupertino effect most often while typing in Indian names because most word-processors don't recognize them. For my wife's name Somtapa the offered spelling is Stomata followed by Sumatra. So I started calling her Stomata until it got onto her nerves and something very similar to the photo above happened to me

  • I know the answer (it's 42)

    Is interlocked increment followed by comparison thread safe?

    Party canceled after heavy downpour

    Sorry about the blog title, my imagination failed me :(.

    In our internal alias someone asked the question "Is the following thread safe"

    if(Interlocked.Increment(ref someInt) == CONSTANT_VAL)

    My instant reaction was no because even though the increment is done in a thread safe way using System.Threading.Interlocked class, the comparison that follows is not safe.

    My reasoning was that the "if" expression can be broken down to the following operations

    1. Fetch of someInt
    2. Increment operation
    3. Write back of someInt
    4. Comparison

    The first 3 are done inside the Increment method and it provides concurrency protection and hence cannot be interleaved by another instruction.

    So if two threads are running in parallel (one marked in red and the other in green) I assumed that the following interleaving is possible

    1. someInt is 4 and CONSTANT_VAL is 5
    2. Fetch of someint      -> someInt ==4
    3. Increment operation   -> someInt ==5
    4. Write back of someint -> someInt ==5
    5. Fetch of someint      -> someInt == 5
    6. Increment operation   -> someInt == 6
    7. Write back of someint -> someInt == 6
    8. comparison            -> compare 6 & CONSTANT_VAL
    9. comparison            -> compare 6 & CONSTANT_VAL

    This means that the comparison of both thread will fail.

    However, someone responded back that I was wrong as the return value is being used and not the written back value. This made me do some more investigation.

    If I see the JITted code then it looks like

    if (Interlocked.Increment(ref someInt) == CONSTANT_VAL)
    00000024  lea         ecx,ds:[002B9314h] 
    0000002a  call        796F1221 
    0000002f  mov         esi,eax 
    00000031  cmp         esi,5

    The call (at 0x000002a) is to the native code inside CLR which in turn calls Win32 api (InterlockedIncrement).

    However, the last 2 lines are the interesting ones. Register EAX contains the return value and comparison is happening against that and CONSTANT_VAL. So even if the second thread had already changed the value of someInt it doesn’t have any effect as the return of the first increment is being used and not the safeInt value in memory. So first comparison (step 8 above) will actually compare CONSTANT_VAL against 5 and succeed.

  • I know the answer (it's 42)

    C/C++ Compile Time Asserts


    The Problem

    Run time asserts are fairly commonly used in C++. As the MSDN documentation for assert states

    "(assert) Evaluates an expression and, when the result is false, prints a diagnostic message and aborts the program."

    There is another type of asserts which can be used to catch code issues right at the time of compilation. These are called static or compile-time asserts. These asserts can be used to do compile time validations and are very effectively used in the .NET Compact Framework code base.

    E.g. you have two types Foo and Bar and your code assumes (may be for a reinterpret_cast) that they are of the same size. Now being in separate places there is always a possibility that someone modifies one without changing the other and that results in some weird bugs. How do you express this assumption in code? Obviously you can do a run-time check like

    assert(sizeof(foo) == sizeof(bar));

    If that code is not hit during running this assert will not get fired. This might be caught in testing later. However, if you notice carefully all of the information is available during compilation (both the type and the sizeof is resolved while compilation). So we should be able to do compile time validation, with-something to the effect

    COMPILE_ASSERT(sizeof(int) == sizeof(char));

    This should be tested during compilation and hence whether the code is run or not the assert should fail.

    The Solution

    There are many ways to get this done. I will discuss two quick ways

    Array creation

    You can create a MACRO expression as follows

    #define COMPILE_ASSERT(x) extern int __dummy[(int)x]

    This macro works as follows

    // compiles fine 
    COMPILE_ASSERT(sizeof(int) == sizeof(unsigned)); 
    // error C2466: cannot allocate an array of constant size 0
    COMPILE_ASSERT(sizeof(int) == sizeof(char));

    The first expression gets expanded to int __dummy[1] and compiles fine, but the later expands to int __dummy[0] and fails to compile.

    The advantage of this approach is that it works for both C and C++, however, the failure message is very confusing and doesn't indicate what the failure is for. It is left to the developer to visit the line of compilation failure to see that it's a COMPILE_ASSERT.

    sizeof on incomplete type

    This approach works using explicit-specialization of template types and the fact that sizeof of incomplete types fail to compile.

    Consider the following

    namespace static_assert
        template <bool> struct STATIC_ASSERT_FAILURE;
        template <> struct STATIC_ASSERT_FAILURE<true> { enum { value = 1 }; };

    Here we defined the generic type STATIC_ASSERT_FAILURE. As you see the type is incomplete (no member definition). However we do provide a explicit-specialization of that type for the value true. However, the same for false is not provided. This means that sizeof(STATIC_ASSERT_FAILURE<true>) is valid but sizeof(STATIC_ASSERT_FAILURE<false>) is not. This can be used to create a compile time assert as follows

    namespace static_assert
        template <bool> struct STATIC_ASSERT_FAILURE;
        template <> struct STATIC_ASSERT_FAILURE<true> { enum { value = 1 }; };
        template<int x> struct static_assert_test{};
    #define COMPILE_ASSERT(x) \
        typedef ::static_assert::static_assert_test<\
            sizeof(::static_assert::STATIC_ASSERT_FAILURE< (bool)( x ) >)>\

    Here the error we get is as follows

    // compiles fine
    COMPILE_ASSERT(sizeof(int) == sizeof(unsigned)); 
    // error C2027: use of undefined type 'static_assert::STATIC_ASSERT_FAILURE<__formal>
    COMPILE_ASSERT(sizeof(int) == sizeof(char)); 

    So the advantage is that the STATIC_ASSERT_FAILURE is called out right at the point of failure and is more obvious to figure out

    The macro expansion is as follows

    1. typedef static_assert_test< sizeof(STATIC_ASSERT_FAILURE<false>) >  _static_assert_typedef_
    2. typedef static_assert_test< sizeof(incomplete type) >  _static_assert_typedef_

    Similarly for true the type is not incomplete and the expansion is

    1. typedef static_assert_test< sizeof(STATIC_ASSERT_FAILURE<true>) >  _static_assert_typedef_
    2. typedef static_assert_test< sizeof(valid type with one enum member) >  _static_assert_typedef_
    3. typedef static_assert_test< 1 > _static_assert_typedef_

    Put it all together

    All together the following source gives a good working point to create static or compile time assert that works for both C and C++

    #ifdef __cplusplus
    #define JOIN( X, Y ) JOIN2(X,Y)
    #define JOIN2( X, Y ) X##Y
    namespace static_assert
        template <bool> struct STATIC_ASSERT_FAILURE;
        template <> struct STATIC_ASSERT_FAILURE<true> { enum { value = 1 }; };
        template<int x> struct static_assert_test{};
    #define COMPILE_ASSERT(x) \
        typedef ::static_assert::static_assert_test<\
            sizeof(::static_assert::STATIC_ASSERT_FAILURE< (bool)( x ) >)>\
                JOIN(_static_assert_typedef, __LINE__)
    #else // __cplusplus
    #define COMPILE_ASSERT(x) extern int __dummy[(int)x]
    #endif // __cplusplus
    #define VERIFY_EXPLICIT_CAST(from, to) COMPILE_ASSERT(sizeof(from) == sizeof(to)) 
    #endif // _COMPILE_ASSERT_H_

    The only extra part is the JOIN macros. They just ensure that the typedef is using new names each time and doesn't give type already exists errors.

    More Reading

    1. Boost static_assert has an even better implementation that takes cares of much more scenarios and compiler quirks. Head over to the source at http://www.boost.org/doc/libs/1_36_0/boost/static_assert.hpp
    2. Modern C++ Design by the famed Andrei Alexandrescu
  • I know the answer (it's 42)

    Taking my job more seriously

    Durga puja

    I generally take my work very seriously. However, the following from SICP was still my favorite quote

    "I think that it's extraordinarily important that we in computer science keep fun in computing. When it started out, it was an awful lot of fun. Of course, the paying customers got shafted every now and then, and after a while we began to take their complaints seriously. We began to feel as if we really were responsible for the successful, error-free perfect use of these machines. I don't think we are. I think we're responsible for stretching them, setting them off in new directions, and keeping fun in the house. I hope the field of computer science never loses its sense of fun. Above all, I hope we don't become missionaries. Don't feel as if you're Bible salesmen. The world has too many of those already. What you know about computing other people will learn. Don't feel as if the key to successful computing is only in your hands. What's in your hands, I think and hope, is intelligence: the ability to see the machine as more than when you were first led up to it, that you can make it more.''

    However, couple of days back I was lying in a cold room with a doctor probing my Thyroid gland with ultrasound beams. She went on explaining the details of the tumor that I had, pointing to the screen. In my mind I was thinking about folks who have to code for these things and how careful they need to be.

    The first thing I did when I came out of the room is search for whether .NET Compact Framework gets used for these kinds of equipment. Of the hundreds of pages I hit the one titled "Control System for Lung Ventilation Equipment with Windows CE , Microsoft .Net Compact Framework and Visual Studio Team System" struck me. Mostly because it had two products I personally coded for (NETCF and VSTS). There were even more life saving equipment listed on the search page.

    It was a very humbling experience. I made a little vow, I'll be more careful when I code from tomorrow.

  • I know the answer (it's 42)

    Silverlight on Nokia S60 devices


    In many of my blog posts (e.g.here and here) I refer to .NET Compact Framework and Symbian OS (S60) and obviously folks keep asking me via comments (or assume) that we are porting .NETCF on S60 devices. So I thought it's time to clarify :)

    The short answer is that we are not porting .NETCF to S60 devices, but we are porting the Silverlight on S60 devices. This was jointly announced by Microsoft and Nokia, read Nokia's press release here.

    We are in very early stage of development and it is very hard to tell how much will be in it (e.g. will it be SL v1.0 or v2.0). However, we are working hard and as and when more details emerge I will share it out from this blog. So keep reading.

  • I know the answer (it's 42)

    Who halted the code

    Fisher men

    Today a bed time story involving un-initialized variable access and a weird coincidence.

    Couple of weeks back I was kind of baffled with a weird issue I was facing with the .NET Compact Framework Jitter. We were making the Jitter work for Symbian OS (S60) emulator. The S60 emulator is a Windows process and hence we needed to bring up the i386 Jitter in our code base. After running some JITted code the application would stop executing with the message "Execution halted" in the stack trace.

    Now obviously the user didn't halt anything and the program was a simple hello world.

    After some debugging I found an interesting thing. For the debug build the Symbian C compiler memsets un-initialized variables to 0xcccc. This is fairly standard and is used to catch un-initialized variable access.

    However, our Jit code had a bug such that in some scenario it was not emitting function call/returns correctly. So instead of returning we were executing arbitrary memory. But instead of some sort of access violation (or some thing equally bad) the execution was actually halting.

    The reason was we started executing un-initialized data. Now this data is 0xcccc. The S60 debugger uses interrupt 3 (INT 3) for inserting breakpoints and the following line reveals the rest of the story


    For i386 the instruction used for inserting breakpoint and the pattern used for un-initialized data matched and the debugger thought that it hit a breakpoint.

  • I know the answer (it's 42)

    Hex signatures

    Tooth paste advertisement

    Frequently in code we need to add bit patterns or magic numbers. Since hex numbers have the alphabets A-F folks become creative and create all kinds of interesting words. Typical being 0xDEAD, 0xBEEF, 0xFEED or various combination of these like 0xFEEDBEEF. However, someone actually used the following in a sample


    No naming the guilty :)

  • I know the answer (it's 42)

    Back To Basics: Finding your stack usage


    Handling stack overflow is a critical requirement for most Virtual Machines. For .NET Compact framework (NETCF) it is more important because it runs on embedded devices with very little memory and hence little stack space.

    The .NETCF Execution Engine (EE) needs to detect stack overflows and report it via StackOverflowException. To do this it uses some logic which we internally refer to as StackCheck (that will be covered in a later post). The algorithm needs fair prediction of stack usage for system APIs (e.g. Win32 APIs or for Symbian OS, S60 APIs).

    Each time we target a new platform we do some measurements in addition to referring to specs :) to find it's stack characteristics. As we are currently making the NETCF work on Symbian OS we are doing these measurements again. So I thought of sharing how we are going about measuring stack usage using simple watermarking.

    The technique


    On method Entry store the two current and max stack values. This typically available via system APIs which in case of Symbian is available over the TThreadStackInfo class (iBase, iLimit and other members).

                           |               |
                           |               |
                           |               |
    current stack ------>  +---------------+
                           |               |
                           |   Available   |
                           |    Stack      |
                           |               |
                           |               |
                           |               |
                           |               |
    Stack limit ---------> +---------------+
                           |               |
                           |               |
                           .               .
                           .               .
                           .               . 

    Step :2

    Get a pointer on to the current stack pointer. How to get the pointer will vary based on the target platform. Options include system APIs, de-referencing stack pointer register (e.g. ESP register on x86) or simply creating a local variable on the stack and getting it's pointer.

    Then Memset the whole region from current stack to the total available with some known pattern, e.g. 0xDEAD (a larger signature is a better approach to ensure there is no accidental match)

                           |               |
                           |               |
                           |               |
    current stack ------>  +---------------+
                           |     DEAD      |
                           |     DEAD      |
                           |     DEAD      |
                           |     DEAD      |
                           |     DEAD      |
                           |     DEAD      |
                           |     DEAD      |
                           |     DEAD      |
    Stack limit ---------> +---------------+
                           |               |
                           |               |
                           .               .
                           .               .
                           .               . 

    Step: 3

    Make an OS or whatever call you want to measure.

                           |               |
                           |               |
                           |               |
    current stack ------>  +---------------+
                           |     1231      | --+
                           |     1231      |   |
                           |     D433      |   +--> Stack usage
                           |     D324      |   |
                           |     3453      | --+
                           |     DEAD      |
                           |     DEAD      |
                           |     DEAD      |
    Stack limit ---------> +---------------+
                           |               |
                           |               |
                           .               .
                           .               .
                           .               . 

    Step 4:

    When the call returns the stack will get modified. Iterate through the memory starting from the current stack pointer looking for the first occurrence of the pattern you’ve set in Step:2. This is the water mark and is the point till which the stack got used by the OS call. Subtract the water mark from the original entry point saved in Step 1 and you have the stack usage.

  • I know the answer (it's 42)

    Tail call optimization


    I had posted about tail call and how .NET handles it before. However there was some email exchanges and confusion on internal DLs around it and so I thought I'd try to elaborate more on how .NET goes about handling tail calls

    Let’s try to break this down a bit.

    a) A piece of C# code contains tail recursion .

    static void CountDown(int num)
    Console.WriteLine("{0} ", num);
    if (num > 0)

    b) The C# compiler does not optimize this and generates a normal recursive call in IL

     IL_000c:  ...
    IL_000d:  ...
    IL_000e:  sub
    IL_000f:  dup
    IL_0010:  starg.s num
    IL_0012:  call void TailRecursion.Program::CountDown(int32)

    c) The Jitter on seeing this call doesn’t optimize it in any way and just inserts a normal call instruction for the platform it targets (call for x86)

    However, if any compiler is smart enough to add the tail. recursive IL statement then things change.

    a) Scheme code

    (define (CountDown n)
        (if (= n 0)
          (CountDown (- n 1))))

    b) Hypothetical IronScheme compiler will generate (note for scheme it has to do the optimization)

     IL_000c:  ...
    IL_000d:  ...
    IL_000e:  sub
    IL_0023:  call void TailRecursion.Program::CountDown(int32)
    IL_0029:  ret

    c) Based on which JIT you are using and various other scenarios the JIT now may honour the tail. IL instruction and optimize this with a jmp when generating machine instruction

    Please note the may in the very last point and refer to here for some instances where the optimization still might not take place…

  • I know the answer (it's 42)

    A* Pathfinding algorithm animation screen saver


    I'm trying to learn WPF and IMO it is not a simple task. Previously whenever I upgraded my UI technology knowledge it had been easy as it build on a lot of pre-existing concepts. However, WPF is more of a disruptive change.

    I decided to write some simple fun application in the effort to learn WPF. Since I cannot write a BabySmash application as Scott Hanselman has already done a awesome job out of it, I decided to code up a simulation of A* path finding algorithm and push it out as a screen saver. The final product looks as follows.


    What it does is that it shows a source, a destination with blockages (wall, mountain?) in between and then uses A* algorithm to find a path in between them. This is the same kind of algorithm that is used in say games like Age of the Empires as workers move around collecting wood and other resources. The algorithm is documented here.


    1. It supports plugging in your own scenes with help of the awesome screen/board designer I blogged about
    2. Comes with a WPF and a console client to show the animation
    3. Algorithm can be played with easily to change or tweak it.
    4. Shows full animation including start, end, obstacle, closed cells, current path being probed and the final path
    5. Multi-screen support. Each screen shows a different board being solved.


    Obviously this was more of a quick dirty hobby project and there remains a ton to work on. E.g.

    1. Screen saver preview doesn't work.
    2. Setting dialog is a sham.
    3. The boards do not flip for vertically aligned screens
    4. The XAML data binding is primitive and needs some work
    5. The path can choose diagonally across cell even if it seems to cross over diagonal wall of obstacle.
    6. Mouse move alone doesn't shut down the screen saver. You need to hit a
    7. Many more :)


    Download the final binaries from here. Unzip to a folder, right click on the *.scr and choose Install

    Download sources (including VS 2008 sln) from here.


  • I know the answer (it's 42)

    How Many Types are loaded for Hello World

    Fairy princess


    Consider the following super simple C# code

    namespace SmartDeviceProject1
        class Program
            static void Main(string[] args)

    Can you guess how many managed Type gets loaded to run this? I was doing some profiling the .NET Compact Framework loader (for entirely unrelated reason) and was surprised by the list that got dumped. 87 177 types**, never could've guessed that...

    1. System.Object
    2. System.ValueType
    3. System.Enum
    4. System.Void
    5. System.Boolean
    6. System.Char
    7. System.SByte
    8. System.Byte
    9. System.Int16
    10. System.UInt16
    11. System.Int32
    12. System.UInt32
    13. System.Int64
    14. System.UInt64
    15. System.Single
    16. System.Double
    17. System.String
    18. System.Type
    19. System.Reflection.MemberInfo
    20. System.RuntimeType
    21. System.Array
    22. System.IntPtr
    23. System.UIntPtr
    24. System.Text.StringBuilder
    25. System.Delegate
    26. System.MulticastDelegate
    27. System.DateTime
    28. System.Exception
    29. System.MarshalByRefObject
    30. System.AppDomain
    31. System.__ComObject
    32. System.Decimal
    33. System.SZArrayHelper
    34. System.Collections.IEnumerable
    35. System.Collections.IEnumerator
    36. System.Nullable`1
    37. System.SystemException
    38. System.Security.VerificationException
    39. System.Runtime.InteropServices.CurrencyWrapper
    40. System.Runtime.InteropServices.UnknownWrapper
    41. System.Runtime.InteropServices.DispatchWrapper
    42. System.Runtime.InteropServices.ErrorWrapper
    43. System.Runtime.InteropServices.CustomMarshalerHelper
    44. System.Attribute
    45. System.Runtime.InteropServices.InterfaceTypeAttribute
    46. System.Runtime.InteropServices.GuidAttribute
    47. System.Runtime.InteropServices.ComVisibleAttribute
    48. System.Runtime.InteropServices.ComEventInterfaceAttribute
    49. System.Runtime.InteropServices.ComSourceInterfacesAttribute
    50. System.Runtime.InteropServices.LCIDConversionAttribute
    51. System.Runtime.InteropServices.ComDefaultInterfaceAttribute
    52. System.Runtime.InteropServices.DispIdAttribute
    53. System.CorPubObject
    54. System.Char[]
    55. System.Collections.Hashtable
    56. System.Reflection.BindingFlags
    57. System.Reflection.MemberFilter
    58. System.Reflection.Binder
    59. System.Type[]
    60. System.Reflection.TypeAttributes
    61. System.Void*
    62. System.Int32[]
    63. System.IntPtr[]
    64. System.Collections.IDictionary
    65. System.UnhandledExceptionEventHandler
    66. System.AppDomainManager
    67. System.Version
    68. System.Runtime.InteropServices.ComInterfaceType
    69. System.Collections.ICollection
    70. System.Collections.IEqualityComparer
    71. System.AppDomainManagerInitializationOptions
    72. System.ArithmeticException
    73. System.ArgumentException
    74. System.MissingMemberException
    75. System.MemberAccessException
    76. System.AppDomainSetup
    77. System.Runtime.InteropServices.Marshal
    78. System.PInvoke.EE
    79. System.Reflection.AssemblyName
    80. System.Byte[]
    81. System.Globalization.CultureInfo
    82. System.Reflection.Assembly
    83. System.Configuration.Assemblies.AssemblyHashAlgorithm
    84. System.Configuration.Assemblies.AssemblyVersionCompatibility
    85. System.Reflection.AssemblyNameFlags
    86. System.Globalization.CultureTableRecord
    87. System.Globalization.CompareInfo
    88. System.Globalization.TextInfo
    89. System.Globalization.NumberFormatInfo
    90. System.Globalization.DateTimeFormatInfo
    91. System.Globalization.Calendar
    92. System.Globalization.BaseInfoTable
    93. System.Globalization.CultureTable
    94. System.Globalization.CultureTableData
    95. System.Globalization.CultureTableData*
    96. System.UInt16*
    97. System.Globalization.NumberStyles
    98. System.Globalization.DateTimeStyles
    99. System.String[]
    100. System.Globalization.DateTimeFormatFlags
    101. System.Globalization.TokenHashValue
    102. System.Globalization.TokenHashValue[]
    103. System.Byte*
    104. System.Globalization.CultureTableHeader
    105. System.Globalization.CultureTableHeader*
    106. System.Globalization.CultureNameOffsetItem
    107. System.Globalization.CultureNameOffsetItem*
    108. System.Globalization.RegionNameOffsetItem
    109. System.Globalization.RegionNameOffsetItem*
    110. System.Globalization.IDOffsetItem
    111. System.Globalization.IDOffsetItem*
    112. System.TokenType
    113. System.Int32&
    114. System.IO.TextReader
    115. System.IO.TextWriter
    116. System.IFormatProvider
    117. System.Console
    118. System.Char&
    119. System.Char*
    120. System.ArgumentNullException
    121. System.PInvoke.NSLIntl
    122. System.ArgumentOutOfRangeException
    123. System.RuntimeTypeHandle
    124. System.NotSupportedException
    125. System.PlatformNotSupportedException
    126. System.String&
    127. System.TypeLoadException
    128. System.Globalization.EndianessHeader
    129. System.Globalization.EndianessHeader*
    130. System.Globalization.GlobalizationAssembly
    131. System.BCLDebug
    132. System.PInvoke.TableData
    133. System.InvalidProgramException
    134. System.Collections.HashHelpers
    135. System.Globalization.CultureTableItem
    136. System.UInt32&
    137. System.Security.CodeAccessSecurityEngine
    138. System.LocalDataStoreMgr
    139. System.Threading.ExecutionContext
    140. System.LocalDataStore
    141. System.Collections.ArrayList
    142. System.Threading.SynchronizationContext
    143. System.Runtime.Remoting.Messaging.LogicalCallContext
    144. System.Runtime.Remoting.Messaging.IllogicalCallContext
    145. System.Threading.Thread
    146. System.Object[]
    147. System.Collections.Generic.Dictionary`2
    148. System.Runtime.Remoting.Messaging.CallContextRemotingData
    149. System.Runtime.Remoting.Messaging.CallContextSecurityData
    150. System.Collections.Generic.IEqualityComparer`1
    151. System.InvalidOperationException
    152. System.Globalization.CultureTableRecord[]
    153. System.Threading.Monitor
    154. System.Globalization.CultureTableRecord&
    155. System.Object&
    156. System.Threading.Interlocked
    157. System.Runtime.CompilerServices.RuntimeHelpers
    158. System.RuntimeFieldHandle
    159. System.PInvoke.PAL
    160. System.IndexOutOfRangeException
    161. System.IntPtr&
    162. System.Buffer
    163. System.NullReferenceException
    164. System.OutOfMemoryException
    165. System.InvalidCastException
    166. System.OverflowException
    167. System.DivideByZeroException
    168. System.ArrayTypeMismatchException
    169. System.MissingMethodException
    170. System.FormatException
    171. System.RankException
    172. System.Security.SecurityException
    173. System.StackOverflowException
    174. System.Threading.ThreadAbortException
    175. System.Threading.ThreadTerminateException
    176. System.MethodAccessException
    177. SmartDeviceProject1.Program

    **This is for the compact framework CLR. Your mileage will vary if you run the same on the desktop CLR.

  • I know the answer (it's 42)

    Designer for my path finding boards


    I'm writing a small application (or rather a screen saver) that animates and demonstrates A* search algorithm. The idea is simple. On screen you see a start and end point and some random obstacles in between them. Then you see animation on how A* algorithm is used to navigate around the board to find a path (close to the shortest possible) between the two.

    All of this is fairly standard. However, I got hit by a simple issue. I wanted to have the ability to design this board visually and also let my potential million users do the same. Obviously I don't have time to code up a designer and hence choose the all time manager's favorite technique. Re-use :)

    So the final solution I took was to use Microsoft Office Excel as the WYSIWYG editor. I created a xlsx file with the following conditional formatting which colors cells based on the value of the cells.

    Excel Conditional Formatting screen shot for blog

    So in this case

    1. w = Wall marking the end of the table
    2. b = blocks/bricks/obstacle
    3. s = start
    4. e = end

    Using this I can easily design the board. The designer in action looks as follows

    Excel Designer For A* blog screenshot

    Since the excel sheet has conditional formatting the user just types in s, e, b, w in the cells and they all light up visually. At the end he/she just saves the file using File => Save As and uses CSV format. This saves the file shown above as


    As you see the format is simply a row per line and every column separated by a comma. My simulator reads this file and renders using whatever renderer is configured (console or WPF).

    More about the A* simulator to come soon...

  • I know the answer (it's 42)

    It's is easy to claim that the world won't get destroyed today

    Taken inside the Microsoft Campus Hyderabad

    The clock is ticking and the Large Hadron Collider in Cern is going to get switched on today (9/10/2008). Even though there are speculations, CERN is claiming it's perfectly safe and the world won't end. But it's easy to claim that, who'll be around to prove them wrong in case they are :)

    It's one of the coolest device at an operating temperature of < -270°C. But I'd get really angry if there's any disturbance to my Birthday celebrations!!

  • I know the answer (it's 42)

    Team Foundation Server tool dump workspace details

    Bangalore airport

    I juggle around with a lot of workspaces. The reason is .NET Compact Framework is consumed in a whole bunch of things like Windows Mobile, Xbox, Zune, Nokia and most of them are on different source branches. On top of this active feature work happens in feature branches and there are other service branches to frequently peek into.

    So the net effect is I need to sometime go to different folders and see what it's source control workspace mapping is or I need to dump details of workspaces on other computers and see if I can use that workspace as a template to create a new workspace.

    The thing I hate here is to run cd to that folder and do a tf workspace to bring up the GUI and then dismiss the UI. I don't like GUI when things can be done equally well on the console. So I quickly hacked up a tool that dumps workspace details onto the console, something as below

    d:\>workspace.exe NETCF\F01
    Name:      ABHINAB2_S60
    Server:    TfsServer01
    Owner:     Abhinaba
    Computer:  ABHINAB2
    Working folders:
    Active   $/NetCF/F01/tools     d:\NETCF\F01\tools
    Active   $/NetCF/F01/source    d:\NETCF\F01\source


    This tool can be used either to see the mapping of a folder or given a workspace name and owner dump it details. It uses the exact same format as show in the tf workspace UI.


    The source and a build binary is linked below. It is a single source file (Program.cs) and you should be able to pull it into any version of Visual Studio and even build it from the command line using CSC

    1. Source
    2. Binary

    Using the tool

    Build it or use the pre-built binary and edit the .config file to point it to your TFS server and you are all set to go.
  • I know the answer (it's 42)

    Obsession with desktop continues


    This is my new office setup. I have been pushing around a lot of code lately and felt I needed more real-estate to effectively do what I'm doing. So I hooked up another monitor. All  3 are standard HP LP1965 19" monitors.

    However, since none of my video cards support 3 monitors and I have a whole bunch of computers to hook up I had to do some complex wiring around :). If I number the screens 1,2,3 from left to right this is how it works out

    • 1 and 2 is connected to my main dev box (machine1)
    • 2 and 3 is connected to the machine I use for emails and browsing (machine 2)
    • 2 actually goes through a 4-port KVM switch using which it can circle through machine1, machine2 and two other less used machines
    • 1 also goes through a 2 port KVM switch and connects a xbox and machine 1
    • I don't use the laptop at work directly, I connect to it using remote desktop.

    Sweet :)

    It's not as complex as it sounds. For my normal flow I just use the KVM to switch between machine-1 and machine-2. I rarely need to switch between the xbox and machine1.

    PS: Don't try to make much sense of the Vintage books on the shelve, I plan to keep them for some more time before handing them off to some historian.

  • I know the answer (it's 42)

    Back to Basic: Using a System.Threading.Interlocked is a great idea

    Bound to give you diabetes :)

    I just saw some code which actually takes a lock to do a simple set operation. This is bad because taking locks are expensive and there is an easy alternative. The System.Threading.Interlocked class and its members can get the same job done and much faster.

    I wrote the following two methods which increments a variable a million times. The first method does that by taking a lock and the other uses the Interlocked.Increment API.

    static int IncrementorLock()
        int val = 0;
        object lockObject = new object();
        for (int i = 0; i < 1000000; ++i)
            lock (lockObject)
        return val;
    static int IncrementorInterlocked()
        int val = 0;
        for (int i = 0; i < 1000000; ++i)
            System.Threading.Interlocked.Increment(ref val);
        return val;


    I then used the Visual Studio Team System Profiler (instrumented profiling) and got the following performance data.

    Function Name Elapsed Inclusive Time Application Inclusive Time
    LockedIncrement.Program.IncrementorInterlocked() 1,363.45 134.43
    LockedIncrement.Program.IncrementorLock() 4,374.23 388.69

    Even though this is a micro benchmark and uses a completely skewed scenario, it still drives the simple point that using the interlocked class is way faster.

    The reason the interlocked version is faster is because instead of using .NET locks it directly calls Win32 apis like InterlockedIncrement which ensures atomic updation to variables at the OS scheduler level.

    These Interlocked APIs complete depend on the support of the underlying OS. This is the reason you'd see that all the APIs are not available uniformly across all versions of .NET (e.g. there is no uniform coverage over WinCE and Xbox). While implementing these for the Nokia platform (S60) we are hitting some scenarios where there is no corresponding OS API.

  • I know the answer (it's 42)

    String equality

    Flowers At the Botanical park

    akutz has one of the most detailed post on string interning and equality comparison performance metrics I have ever seen. Head over to the post here

    I loved his conclusion which is the crux of the whole story.

    "In conclusion, the String class’s static Equals method is the most efficient way to compare two string literals and the String class’s instance Equals method is the most efficient way to compare two runtime strings. The kicker is that there must be 10^5 (100,000) string comparisons taking place before one method starts becoming more efficient than the next. Until then, use whichever method floats your boat."

  • I know the answer (it's 42)

    Writing exception handlers as separate methods may prove to be a good idea

    Flowers At the Botanical park

    Let us consider a scenario where you catch some exception and in the exception handler do some costly operation. You can write that code in either of the following ways

    Method-1 : Separate method call

    public class Program
        public static void Main(string[] args)
                using (DataStore ds = new DataStore())
                    // ...
            catch (Exception ex)
        private static void ErrorReporter(Exception ex)
            string path = System.IO.Path.GetTempFileName();
            ErrorDumper ed = new ErrorDumper(path, ex);
            XmlDocument xmlDoc = new XmlDocument();
            RemoteErrorReporter er = new RemoteErrorReporter(xmlDoc);


    Method-2 : Inline

    public static void Main(string[] args)
            using (DataStore ds = new DataStore())
                // ...
        catch (Exception ex)
            string path = System.IO.Path.GetTempFileName();
            ErrorDumper ed = new ErrorDumper(path, ex);
            XmlDocument xmlDoc = new XmlDocument();
            RemoteErrorReporter er = new RemoteErrorReporter(xmlDoc);


    The simple difference is that in the first case the exception handler is written as a separate method and in the second case it is placed directly inline inside the handler itself.

    The question is which is better in terms of performance?

    In case you do have significant code and type reference in the handler and you expect the exception to be thrown rarely in an application execution then the Method-1 is going to be more performant.

    The reason is that just before executing a method the whole method gets jitted. The jitted code contains stubs to the other method's it will call but it doesn't do a recursive jitting. This means when Main gets called it gets jitted but the method ErrorReporter is still not jitted. So in case the exception is never fired all the code inside ErrorReporter never gets Jitted. This might prove to be significant saving in terms of time and space if the handling code is complex and refers to type not already referenced.

    However, if the code is inline then the moment Main gets jitted all the code inside the catch block gets jitted. This is expensive not only because it leads to Jitting of code that is never executed but also because all types referenced in the catch block is also resolved resulting in loading a bunch of dlls after searching though the disk. In our example above System.Xml.dll and the other dll containing remote error reporting gets loaded even though they will never be used. Since disk access, loading assemblies and type resolution are slow, the simple change can prove to give some saving.

Page 4 of 16 (378 items) «23456»