I am a developer at Microsoft and work in the .NET Common Language Runtime (CLR) team. For the last 4 years I have been working on virtual machine technologies on a variety of form factors including desktops (Windows, Linux), tablets (Win8), gaming-consoles (Xbox 360), mobile devices (Windows Phone 7, Windows CE, Symbian).I have worked on various core pieces of the runtime including Garbage Collector, memory manager, platform abstraction layer, runtime-performance, etc.Before working on .NET I worked on Visual Studio Team Foundation Server, Visual Studio Team System, Adobe Framemaker, Adobe Acrobat, Texas Instrument's Code Composer Studio.
A reader contacted me over this blog to inquire about the overhead of managed objects on the Windows Phone. which means that when you use an object of size X the runtime actually uses (X + dx) where dx is the overhead of CLR’s book keeping. dx is generally fixed and hence if x is small and there are a lot of those objects it starts showing up as significant %.
There seems to be a great deal of information about the desktop CLR in this area. I hope this post will fill the gap for NETCF (the CLR for the Windows Phone 7)
All the details below are implementation detail and is provided just as guidance. Since these are not mandated by any specification (e.g. ECMA) they may and most probably will change.
The overhead varies on the type of objects. However, in general the object layout looks something like
So internally just before the actual object there is a small object-header. If the object is a finalizable object (object with a finalizer) then there’s another 4-byte pointer at the end of the object. The exact size of the header and what's inside that header depends on the type of the object.
All objects smaller than 16KB has a 8 byte object header.
The 32-bits in the flag are used to store the following information
Since all objects are 4 byte aligned, the size is stored as an unit of 4 bytes and uses 11 bits. Which means that the maximum size that can be stored in that header is 16KB.
The second flag contains a 32-bit pointer to the Class-Descriptor or the type information of that object. This field is overloaded and is also used by the GC during compaction phase.
Since the header cost is fixed, the smaller the object the larger the overhead becomes in terms of % overhead.
As we saw in the previous section that the standard object header supports describing 16KB objects. However, we do need to support larger objects in the runtime.
This is done using a special large-object header which is 20 bytes.
The overhead of large object comes to be at max of around 0.1%. A dedicated 32-bit size field allows the runtime to support very large objects (close to 2GB).
It’s weird to see that the normal object header is repeated twice. The reason is that very large objects are rare on the phone. Hence the runtime tries to optimize it’s various operations so that the large object doesn’t bring in additional code paths and checks. Given an object reference having the normal header just before it helps us in speeding various operations. At the same time having the header as the first data-structure of all objects helps us to be faster in other cases. Bottom line is that the additional 8 bytes or 0.04% extra space usage helps us in enough performance gains at other places so that we pay that extra overhead cost.
NETCF uses 64KB object pools and then sub-allocates objects from it. So in addition to the per-object overhead it uses an additional 44-byte of object-pool-header per 64kb. This means another 0.06% overhead per 64KB. For every object, the contribution of this can be ignored.
Arrays pan out slightly differently.
In case of value type array every member doesn’t really need to have the type information and hence the object header associated with it. The Array header is a standard object header of 8 bytes or big object header of 20bytes (based on the size of the array). Then there is a 32-bit count of the number of elements in the array. For single dimension array there is no more overhead. However, for higher dimension arrays the length in each dimension is also stored to support runtime bounds check.
The reference type arrays are similar but instead of the array contents being inlined into the array the elements are references to the objects on the managed heap and each of those objects have their standard headers.
I have been involved with Windows Phone 7 from the very beginning and worked on delivering the CLR for the GA version and all the way through Nodo and Mango.
All this while I was busy working on one of the lower layers and seeing bits and bytes flying by. For some time I wanted to try out writing an app for the phone to experience how it feels to be our developer customer. Unfortunately my skills fall more on system programming than designing good looking UI. I had a gazillion ideas floating in my head but most were beyond my graphic and UI design skills. I was also waiting for some sort of inspiration.
Over the last school break our local King Country Library System which earned the best library system of the year award run an interesting effort. Students received a sheet which they filled up as they read books. On hitting 500 minutes and 1000 minutes of book reading they got prizes. One of the prize was a Kaleidoscope. Not the one with colored glass pieces like I had as a child, but one through which you could see everything around. This was the inspiration I was waiting for and I wrote an WP7 Mango app that does the same. Interestingly the app got a lot of good reviews and is rated at 5 stars (something I didn’t expect). It’s free, go check it out.
Get the app here, documentation is at http://bonggeek.com/Kaleidoscope/
Like many engineer I am partially fueled by the engineers angst. I am not always able to vent it in code and sometimes resort to sending really dumb emails :).
I figured out that I can tolerate delaying sending my emails over having to smack my face every other day. Basically all emails I send is delayed by about 5 minutes and that makes a huge difference. Somehow you always know you’ve sent something you shouldn’t have in the first 2 minutes of sending that email.
This is how you set this up on Outlook 2010… Bring up the new rules UI from the ribbon (Home > Rules > Managed Rules and Alerts)
Next you’d get another UI just hit next and yes on the following warning.
After that choose the delay duration. I use 5 minutes
In case you want to pretend that you are hard at work when you aren’t then you can also use the timed email feature in Outlook. In the new email window use the following.
Unfortunately if you are an software engineer your work is measured by the amount and quality of code you check in and not emails so sadly this doesn’t work
Fast Application Switch of FAS is kind of tricky for application developers to handle. There are a ton of documentation around how the developers need to handle the various FAS related events. I really liked the video http://channel9.msdn.com/Events/DevDays/DevDays-2011-Netherlands/Devdays059 which walks through the entire FAS experience (jump to around 8:30).
In this post I want to talk about how the CLR (Common Language Runtime or .NET runtime) handles FAS and what that means to your application. Especially the Active –> Dormant –> Active flow. Most of the documentation/presentation quickly skips over this with the vague “The application is made dormant”. This is equivalent to the “witches use brooms to fly”. What is the navigation mechanism or how the broom is propelled are the more important questions which no one seems to answer (given the time of year, I just couldn’t resist :P) . Do note that most developers can just follow the coding guidelines for FAS and never need to care about this. However, a few developers, especially the ones developing multi-threaded apps and using threading primitives may need to care about this. And hence this post
The entire Multi-threading design was made to ensure the following
Principle 1: Pre-existing WP7 apps shouldn’t break on Mango. Principle 2: When an application is sent to the background it shouldn’t consume any resources Principle 3: Application should be resumed fast (hence the name FAS)
As you’d see that these played a vital role in the design being discussed below.
The states an application goes through is documented in http://msdn.microsoft.com/en-us/library/ff817008(VS.92).aspx
The diagram below captures the various phase that are used to rundown the application to make it dormant and later re-activated. It gives the flow of an application as it goes through the Active –> Dormant –> Active state (e.g. the application was running and the user launches another application and then uses the back button to go back to the first application).
The Deactivated event is sent to the application to notify it that the user is navigating away from the application. After this there is 3 possible outcomes. It will either remain dormant, gets tombstoned or finally gets killed as well. Since there is no way to know which would happen, the application should store its transient state into the PhoneApplicationPage.State and it’s persistent state into some persistent store like the IsolatedStorage or even in the cloud. However, do note that the application has 10 seconds to handle the Deactivated event. In the 3 possible situations this is how the stored data will be used back
The above supports the #1 principle of not breaking pre-existing WP7 apps. A WP7 app would’ve been designed without considering the dormant stage. Hence it would’ve just skipped the #1 option. So the only issue will be that a WP7 app will result in re-creating the application state each time and not get the benefit of the Dormant stage (it will get the performance of Tombstoning but not break in Mango).
Post this event the main thread never transitions to user code (e.g. no events are triggered). The requirement on the application for deactivate is that
If app continues to run code, e.g. in another thread and modifies any application state then that state cannot be persisted (as there will be no subsequent Deactivated type event)
This event is an internal event that is not visible to the application. If the application adhered to the above guideline it shouldn’t care about it anyway.
The CLR does some interesting stuff on this event. Adhering to the “no resource consumption” principle is very important. Consider that the application had used ManualResetEvent.WaitOne(timeout). Now this timeout can expire in the time when the application was dormant. If that happened it would result in some code running when the application is dormant. This is not acceptable because the phone maybe behind locked screen and this context switch can get the phone out of a low-power state. To handle this the runtime detaches Waits, Thread.Sleep at Paused. Also it cancels all Timers so that no Timer callbacks happen post this Pause event.
Since Pause event is not visible to the application, it should consider that some time post Deactivated this detach will happen. This is completely transparent to user code. As far as the user code is considered, it just that these handles do not timeout or sleeps do not return during the time the application is dormant. The same WaitHandle objects or Thread.Sleeps start working as is after the application is activated (more about timeout adjustment below).
This is also the place where other parts of the tear-down happens. E.g. things like asynchronous network calls cancelled, media is stopped.
Note that the background user threads can continue to execute. Obviously that is a problem because the user code is supposed to voluntarily stop them at Deactivated.
Besides user code there are a lot of other managed code running in the system. These include but not limited to Silverlight managed code, XNA managed code. Sometime after Paused all managed code is required to stop. This is called the CLRFreeze. At this point the CLR freezes or blocks all managed execution including user background threads. To do that it uses the same mechanism as used for foreground GC. In a later post I’d cover the different mechanics NETCF and desktop CLR uses to stop managed execution.
Around freeze the application enters the Dormant stage where it’s in 0 CPU utilization mode.
Managed threads stopped at Freeze are re-started at this point.
At Resuming the WaitHandle, Thread.Sleep detached in Paused is re-attached. Also timeout adjustments are made during this time. Consider that the user had two handles on which the user code started Waits with 5 seconds and 10 seconds timeouts. After 3 seconds of starting the Waits the application is made dormant. When the application is re-activated, the Waits are restarted with the amount of timeout remaining at the point of the application getting deactivated. So essentially in the case below the first Wait is restarted with 2 seconds and the later with 7. This ensures that relative gap between Sleeps, Waits are maintained.
Note timers are still not restarted.
This is the event that the application gets and it is required to re-build it’s state when the activation is from Tombstone or just re-use the state in memory when the activation is from Dormant stage.
This is the final stage or FAS. This is where the CLR restarts the Timers. The idea behind the late start of timers is that they are essentially asynchronous callbacks. So the callbacks are not sent until the application is activated (built its state) and ready to consume those callbacks.
In my previous post “Mark-Sweep collection and how does a Generational GC help” I discussed how a generational Garbage Collector (GC) works and how it helps in reducing collection latencies which show up as long load times (startup as well as other load situations like game level load) and gameplay or animation jitter/glitches. In this post I want to discuss how those general principles apply to the WP7 Generational GC (GenGC) specifically.
We use 2 generations on the WP7 referred to as Gen0 and Gen1. A collection could be any of the following 4 types
The list above is in the order of increasing latency (or time they take to run)
GC triggers are the same and as outlined in my previous post WP7: When does the GC run. The distinction between #2 and #3 above is that at the end of all full-GC the collector considers the memory fragmentation and can potentially run the memory compactor as well.
So from all of the above, the 3 key takeaways are
As explained in my previous post, to keep track of Gen1 to Gen0 reference we use write-barrier/card-table.
Card-table can be visualized as a memory bitmap. Each bit in the card-table covers n bytes of the net address space. Each such bit is called a Card. For managed reference updates like A.b = C in addition to JITing the real assignment, calls are added to Write-barrier functions. This write barrier locates the Card corresponding to the address of write and sets it. Later during collection the collector checks all Gen-1 objects covered by a set card-bit and marks Gen-0 references in those objects.
This essentially brings in two additional cost to the system.
Both of the above are optimized to ensure they have minimum execution impact. We only JIT calls to WB when absolutely required and even then we have an overhead of a single instruction to make the call. The WB are hand-tuned assembly code to ensure they take minimum cycles. In effect the net hit on process memory due to write barriers is way less than 0.1%. The execution hit in real-world applications scenarios is also not in general measureable (other than real targeted testing).
In principle both the desktop GC and the WP7 GC are similar in that they use mark-sweep generational GC. However, there are differences based on the fact that the WP7 GC targets a more constrained device.
About a month back we announced that in the next release of Windows Phone 7 (codenamed Mango) we will ship a new garbage collector in the CLR. This garbage collector (GC) is a generational garbage collector.
This post is a “back to basics” post where I’ll try to examine how a mark-sweep GC works and how adding generational collection helps in boosting it’s performance. We will take a simplified look at how mark-sweep-compact GC works and how generational GC can enhance it’s performance. In later posts I’ll try to elaborate on the specifics of the WP7 generational GC and how to ensure you get the best performance out of it.
Objects in the memory can be considered to be a graph. Where every object is a node and the references from one object to another are edges. Something like
To use an object the code should be able to “reach” it via some reference. These are called reachable objects (in blue). Objects like a method’s local variable, method parameters, global variables, objects held onto by the runtime (e.g. GCHandles), etc. are directly reachable. They are the starting points of reference chains and are called the roots (in black).
Other objects are reachable if there are some references to them from roots or from other objects that can be reached from the roots. So Object4 is reachable due to the Object2->Object4 reference. Object5 is reachable because of Object1->Object3->Object5 reference chain. All reachable objects are valid objects and needs to be retained in the system.
On the other hand Object6 is not reachable and is hence garbage, something that the GC should remove from the system.
A garbage collector can locate garbage like Object6 in various ways. Some common ways are reference-counting, copying-collection and Mark-Sweep. In this section lets take a more pictorial view of how mark-sweep works.
Consider the following object graph
At first the GC pauses the entire application so that the object graph is not being mutated (as in no new objects or references are being created). Then it goes into the mark phase. In mark phase the GC traverses the graph starting at the roots and following the references from object to object. Each time it reaches an object through a reference it flips a bit in the object header indicating that this object is marked or in other words reachable (and hence not garbage). At the end everything looks as follows
So the 2 roots and the objects A, C, D are reachable.
Next it goes into the sweep phase. In this phase it starts from the very first object and examines the header. If the header’s mark bit is set it means that it’s a reachable object and the sweep resets that bit. If the header’s bit is not set, it’s not reachable and is flagged as garbage.
So B and E gets flagged as garbage. Hence these areas are free to be used for other objects or can be released back to the system
This is where the GC is done and it may resume the execution of the application. However, if there are too many of those holes (released objects) created in the system, then the memory gets fragmented. To reduce memory fragmentation. The GC may compact the memory by moving objects around. Do note that compaction doesn’t happen for every GC, it is run based on some fragmentation heuristics.
Both C and D is moved here to squeeze out the hole for B. At the same time all references to these objects in the system is also update to point to the correct location.
One important thing to note here is that unlike native objects, managed objects can move around in memory due to compaction and hence taking direct references to it (a.k.a memory pointers) is not possible. In case this is ever required, e.g. a managed buffer is passed to say the microphone driver native code to copy recorded audio into, the GC has to be notified that the corresponding managed object cannot be moved during compaction. If the GC runs a compaction and object moves during that microphone PCM data copy, then memory corruption will happen because the object being copied into would’ve moved. To stop that, GCHandle has to be created to that object with GCHandleType.Pinned to notify the GC that the corresponding object should never move.
On the WP7 the interfaces to these peripherals and sensors are wrapped by managed interfaces and hence the WP7 developer doesn’t really have to do these things, they are taken care offm under the hood by those managed interfaces.
As mentioned before during the entire GC the execution of the application is stopped. So as long as the GC is running the application is frozen. This isn’t a problem in general because the GC runs pretty fast and infrequently. So small latencies of the order of 10-20ms is not really noticeable.
However, with WP7 the capability of the device in terms of CPU and memory drastically increased. Games and large Silverlight applications started coming up which used close to 100mb of memory. As memory increases the number of references those many objects can have also increases exponentially. In the scheme explained above the GC has to traverse each and every object and their reference to mark them and later remove them via sweep. So the GC time also increases drastically and becomes a function of the net workingset of the application. This results in very large pauses in case of large XNA games and SL applications which finally manifests as long startup times (as GC runs during startup) or glitches during the game play/animation.
If we take a look at a simplified allocation pattern of a typical game (actually other apps are also similar), it looks somewhat like below
The game has a large steady state memory which contains much of it’s steady state data (which are not released) and then per-frame it goes on allocating/de-allocating some more data, e.g. for physics, projectiles, frame-data. To collect this memory layout the traditional GC has to walk or traverse the entire 50+ mb of data to locate the garbage in it. However, most of the data it traverses will almost never be garbage and will remain in use.
This application behavior is used for the Generational GC premise
Using these premise the generational GC tries to segregate the managed heap into older and younger generations objects. The younger generation called Gen-0 is collected in each GC (premise 1), this is called the Ephemeral or Gen0 Collection. The older generation is called Gen-1. The GC rarely collects the Gen-1 as the probability of finding garbage in it is low (premise 2).
So essentially the GC becomes free of the burden of the net working set of the application.
Most GC will be ephemeral GC and it will only traverse the recently allocated objects, hence the GC latency remains very low. Post collection the surviving objects are promoted to the higher generation. Once a lot of objects are promoted, the higher generation starts becoming full and then a full collection is run (which collects both gen-1 and gen-0). However, due to premise 1, the ephemeral collection finds a lot of garbage in their runs and hence promotes very few objects. This means the growth rate of the higher generation is low and hence full-collection will run very infrequently.
Even in ephemeral collection the GC needs to deterministically find all objects in Gen-0 which are not reachable. This means the following objects needs to survive a Gen-0 collection
Now #1 and #2 pose no issues as in the Ephemeral GC, we will anyway scan all roots and Gen-0 objects. However to find objects from Gen1 which are referencing objects in Gen-0, we would have to traverse and look into all Gen1 objects. This will break the very purpose of having segregating the memory into generation. To handle this write-barrier+card-table technique is used.
The runtime maintains a special table called card-table. Each time any references are taken in the managed code e.g. a.b = c; the code JITed for this assignment also updates an internal data-structure called CardTable to capture that write. Later during the ephemeral collection, the GC looks into that table to find all the ‘a’ which took new references. If that ‘a’ is a gen-1 object and ‘c’ a gen-0 object then it marks ‘c’ recursively (which means all objects reachable from ‘c’ is also marked). This technique ensures that without examining all the gen-1 objects we can still find all live objects in Gen-0. However, the cost paid is
Putting it all together, the traditional GC would traverse all references shown in the diagram below. However, an ephemeral GC can work without traversing the huge number of Red references.
It scans all the Roots to Gen-0 references (green) directly. It traverses all the Gen1->Gen0 references (orange) via the CardTable data structure.
Given the above most applications will see startup and in-execution perf boost without any modification. E.g today if an application allocates 5 MB of data during startup and GC runs after every MB of allocation then it traverses 15mb (1 + 2 + 3 + 4 + 5). However, with GenGC it might get away with traversing as low as only 5mb.
In addition, especially game developers can optimize their in-gameplay allocations such that during the entire game play there is no full collection and hence only low-latency ephemeral collections happens.
How well the generational scheme works depend on a lot of parameters and has some nuances. In the subsequent posts I will dive into the details of our implementation and some of the design choices we made and what the developers needs to do to get the most value out of it.
GPS
One of the most requested feature for the emulator was support for sensor. Developers were apparently asking their managers for travel budget to go to Hawaii/Europe so that they could test out their location based apps :). Today at MIX11 we announced that GPS sensor support for the emulator will be shipped in the next version of the tools. Some of the features that will be present are
Here’s a short video which I quickly took from the laptop of our PM. Being away from family in Las Vegas was getting me down so he took me to Seattle in a second (or rather simulated it).
Direct link http://www.youtube.com/watch?v=wt0DdiYqdkk
Accelerometer
Another new sensor support is the accelerometer. So no more picking up the PC monitor running the emulator and rotating it to see if the emulator gets the movements. There’s going to be a new tool window using which the developer can feed in motion to the apps running in the emulator. Also some pre-canned motions like shake will be present. Here’s another video of that
Direct link http://www.youtube.com/watch?v=Gc1kuXj7eCE
This is an announcement only post, do subscribe to this blog feed or on to http://twitter.com/abhinaba as I’d be making more detailed posts on how we are building the Generational GC and what developers need to know.
Today in the MIX11 keynote ScottGu just announced something that I’ve been working on for some time. The next version of the Windows Phone will have a Generational Garbage Collector (GenGC for short). A bunch of folks has worked real hard to get this piece into WP7 in a short time.
Today on Windows Phone 7 we have a stop the world, mark-sweep-compact, non-generational GC. When it runs it pauses the entire execution, looks through each object in the application to find and eliminate all unused data. This manifests as longer app startup time and stutters during time critical execution.
In Mango we are adding Generational GC to reduce collection latency to address both of these problems . Existing apps and games even without any changes can expect faster startup, faster level loads and reduction in gameplay stutters due to collection. Developers can specifically optimize for the new generational GC to completely remove stutters during animations and game play that came due to these GC pauses.
As an example see how one of the existing games butterfly benefits from the GenGC. One of the phone below is running the GenGC and the other is not and it should be obvious which one is based on which starts up first (do note that both in Keynote and here we are showing startup gains because it’s easier to show that. In gameplay stutters is hard to show on lower resolution videos). Also note that not just at the core startup at every level it gets a bit faster.
Direct link http://www.youtube.com/watch?v=FtusaSuFIpc
Please refer to my previous blog http://blogs.msdn.com/b/abhinaba/archive/2009/03/02/back-to-basics-generational-garbage-collection.aspx on what is a generational GC and how it helps.
The new GenGC uses 2 generations and write barriers to track Gen1 to Gen0 references. This post is just to announce the feature. I will be making a series of posts to get into the gory details as we get closer to handing over the bits to our developers. I am sure the developers would want to know the sizes of the various generations, when full vs generational collections happen and so much more. Do register to my blog feed or my twitter account http://twitter.com/abhinaba for the announcements as I publish these posts.
This is an announcement only post, do subscribe to this blog feed or on to http://twitter.com/abhinaba as I’d be making more detailed posts on these topics as we get close to handing over these bits to our developer customers.
ARM processors support SIMD (Single Instructions Multiple Data) instructions through the ARM® NEON™technology that is available on ARMV7 ISA. SIMD allows parallelization/HW-acceleration of some operations and hence performance gains. Since the Windows Phone 7 chassis specification requires ARMV7-A; NEON is available by default on all WP7 devices. However, the CLR on Windows Phone 7 (NETCF) did not utilize this hardware functionality and hence it was not available to the managed application developers. We just announced in MIX11 that in the next version of Windows Phone release the NETCF runtime JIT will utilize SIMD capabilities on the phones.
What it means to the developers
Certain operations on some XNA types will be accelerated using the NEON/SIMD extensions available on the phone. Examples include operations on Vector2, Vector3, Vector4, Matrix from the Microsoft.Xna.Framework namespace will get this acceleration. NOTE: At the point the exact types and the exact operations on them are not closed yet and subject to change. Do note that user types will not get this acceleration. E.g. if you have rolled out your own vector type and use say dot operations on it, the CLR will not accelerate them. This is a targeted acceleration for some XNA types and not a vectorizing JIT compiler feature.
Apps and types heavily using these XNA types (our research shows a lot of games do) will see good performance gain. For example we took this Fluid simulation sample from a team (note this was not written specifically for us to demo) and saw huge gains because it heavily uses Matrix and Vector operations to simulate fluid particles and the forces that work in between them. Frame rates shot up from 18fps to 29fps on the very same device.
Based on the usage of these types and operations in your app you’d see varying amounts of gains. However, this feature should be a good motivation to move to these XNA types.
How does SIMD work
SIMD as the name suggests can process the same operation on multiple data in parallel.
Consider the following Vector addition
public static Vector2 Add(Vector2 value1, Vector2 value2) { Vector2 vector; vector.X = value1.X + value2.X; vector.Y = value1.Y + value2.Y; return vector; }
If you see the two lines in blue it’s essentially doing the same addition operation on two data sets and putting the result in two other locations. This today will be performed sequentially. Where the JITer will emit processor instructions to load the X values in registers, add them, store them, and then do the same thing again for the Y values. However, this is inherently parallelizable and the ARM NEON provides an easy way to load such values (vpop, vldr) in single instructions and use a single VADD NEON instruction to add the values in parallel.
A way to visualize that is as follows
A single instruction both X1 and Y1 is loaded, another instruction loads both X2, Y2 and the 3rd in parallel adds them together.
For an easy sample on how that works head onto http://www.arm.com/files/pdf/NEON_Support_in_the_ARM_Compiler.pdf
Consider the following code that I received:
static void Foo() { TestClass t = new TestClass(); List<object>l = new List<object>(); l.Add(t); // Last use of l and t WeakReference w = new WeakReference(t); GC.Collect(); GC.WaitForPendingFinalizers(); Console.WriteLine("Is Alive {0}", w.IsAlive); }
If this code is is run on the desktop it prints Is Alive False. Very similar to the observation made by a developer in http://stackoverflow.com/questions/3161119/does-the-net-garbage-collector-perform-predictive-analysis-of-code. This happens because beyond the last usage of l and t, the desktop GC considers l and t to be garbage and collects it.
Even though this seems trivial, it isn’t. During JITing of this method the JITer performs code analysis and ensures that runtime tables are updated to reflect whether at a particular time a given local variable is alive or not (it is not used beyond that point). This needs a bunch of data structures to be maintained (lifetime tables) and also dynamic analysis of code.
In case of Windows Phone 7 (or other platforms using NETCF) for the same code you’d get Is Alive True. Which means the NETCF GC considers the object to be alive even beyond it’s last usage in the function.
Due to the limited resources available on devices, NETCF does not do these dynamic analysis. So we trade off some runtime performance to ensure faster JITing (and hence application faster startup). All local variables of all functions in the call stacks of the all managed threads in execution during garbage collection is considered live (hence also object roots) and not collect. In NETCF the moment the function Foo returns and the stack is unwound the local objects on it including t and l becomes garbage.
Do note that both of these implementations follow the ECMA specification (ECMA 334 Section 10.9) which states
“For instance, if a local variable that is in scope is the only existing reference to an object, but that local variable is never referred to in any possible continuation of execution from the current execution point in the procedure, an implementation might (but is not required to) treat the object as no longer in use.”
Hence even though on the desktop CLR it is not important to set variables to null, it is sometimes required to do so on WP7. In case t and l was set to null GC would’ve collected them. So the guidance for WP7 is
Now that Windows Phone 7 is released all the gadget/tech blogs like TechCrunch, Engadget , Gixmodo are humming with reviews. Given that it’s a Version 1 product (ignore that 7 in the phone name) the reviews are great.
Seeing all the buzz around gets me thinking about how I got involved in the project and my experiences. As the Windows Phone 7 head said “These are the good old days”
Not so long ago, sometime after I joined the .NET Compact Framework team (and drew this cartoon) we started hearing about Windows Mobile reset and the awesome new phone being built and how our .NET Compact framework will be it’s runtime. Then we started getting access to CEPC images (these are phone OS that works on desktop virtual machines) to start developing and debugging our code. It felt strange at the beginning to see the ugly blue tiles and the truncated or cut-off text. These CEPC didn’t have hardware acceleration and hence there was no animation. The standard PC displays clipped off most of the phone display area and made the UI look really weird. I thought it was getting the wrong font as well as the text on top seemed to be cut off. All in all I wasn’t that impressed and just didn’t get the Metro UI.
Working on the runtime is not all that glamorous and my view of Windows Phone 7 remained pretty much confined into the debugger. I’d interact with it over command shell and see bits and bytes flying by. So this is the view at 10 feel elevation (you can ignore the source file open in the IDE , I wasn’t going to give a faux pas and reveal our source code, would I?).
As time went by and I had a chance to see the first prototype device, I suddenly “got” the Metro UI. It felt wonderful, a no pretense UI which doesn’t try to fake 3D buttons and kind of gets out of they way. The at a glance UI totally made sense (just look on the locked screen or the start window and know your email count, schedule and other bunch of things). I started playing with the initial apps (Jelly Splat ) and then the realization dawned, each time someone picks up the device and launches any app, our code would start executing. Exactly the reason I joined Microsoft.
As more and more applications started getting developed by 3rd parties the power of our developer tools really shows up. IMO with the Windows Phone Developer tools (Visual Studio 2010 based), Expression, XNA Game Studio, Emulator, Silverlight and host of other 1st and 3rd party tools, WP7 easily has the best development story out there and it’s just V1.
Now when I see my daughter play Flowerz or I watch a Netflix movie and get a 10,000 feet view, it feels strangely wonderful and at the same time disconcerting (did you see http://abstrusegoose.com/307 ?). I strongly believe that Windows Phone 7 is one of the best Phones I have ever used and it passes the significant other test with flying colors.
If you are looking for information on the new Generational GC on Windows Phone Mango please visit http://blogs.msdn.com/b/abhinaba/archive/2011/06/14/wp7-mango-the-new-generational-gc.aspx
Many moons ago I made a post on When does the .NET Compact Framework Garbage Collector run. Given that a lot of things have changed since then, it’s time to make another post about the same thing.
For the developers coming to Windows Phone 7 (WP7) from the Windows desktop let me first clarify that the runtime (CLR) that is running on the WP7 is not the same as the one running on the desktop. The WP7 runtime is known as .NET Compact Framework (NETCF) and it works differently than the “desktop CLR”. For 90% of cases this is irrelevant as the WP7 developer targets the XNA or Silverlight programming model and hence what is running underneath is really not important. E.g when you drive you really do not care about the engine. This post is for the other 5% where folks do run into issues (smoke coming out of the car).
Moreover do note that when the GC is run is really an implementation detail that is subject to change.
Now that we have all the disclaimers behind us lets get down to the list.
The Garbage Collector is run in the following situations
The GC is NOT run in the following cases (I am explicitly calling these out because in various conferences and interactions I’ve heard folks thinking it might be)
For folks migrating from NETCF 3.5 the list below gives you the changes
Let me start by saying that using mm-dd-yyyy is just plain wrong. No really it just doesn’t make any sense to me. Neither does it make any sense to most people world-over if you go my the date-format map up at http://en.wikipedia.org/wiki/File:Date_upd1.PNG
If one uses dd-mm-yyyy it makes sense because it’s in decreasing order of granularity (kind of LSB first). yyyy-mm-dd makes ever more sense because
d:\MyStuff\Personal\Pictures>dir 2010* Volume in drive D is Data Volume Serial Number is 3657-F386
Directory of d:\MyStuff\Personal\Pictures
06/17/2010 01:06 PM <DIR> 2010_0501 06/17/2010 01:07 PM <DIR> 2010_0504 06/17/2010 01:16 PM <DIR> 2010_0508 06/17/2010 01:20 PM <DIR> 2010_0509 06/17/2010 01:24 PM <DIR> 2010_0515 06/17/2010 01:29 PM <DIR> 2010_0517 06/17/2010 01:30 PM <DIR> 2010_0523 06/17/2010 01:33 PM <DIR> 2010_0528 06/17/2010 01:37 PM <DIR> 2010_0529 06/17/2010 01:43 PM <DIR> 2010_0605 06/17/2010 01:47 PM <DIR> 2010_0606 06/21/2010 08:40 PM <DIR> 2010_0616 06/28/2010 10:33 PM <DIR> 2010_0619 0 File(s) 0 bytes 13 Dir(s) 55,925,829,632 bytes free
But I just cannot fathom why would anyone use mm/dd/yyyy. In what way is that intuitive?
Learn from the dev lead and PM of Windows Phone 7 Emulator on how it works and delivers the awesome performance.
Some key points
I am sure everyone has read about the “you cannot possibly pronounce or spell” volcano in Iceland throwing up in the air and screwing up the entire flight system of this planet. While most people were reading about it from the comfort of their home I was doing the same while waiting in airports with a 5 year old tagged along. Anyways, 48 hours and a switch in the direction of flights got me to some countries I didn’t really plan to visit and over the Pacific to land in Seattle.
Going forward I will be working out of the Microsoft Redmond offices, doing pretty much the same stuff I was doing before (working on the .NET Compact Framework runtime).
Right now I am settling down with my family and my daughter is excited to see snow for the first time (she got lucky as there was a small snowfall on Saturday in Snoqualmie pass). The changes we need to adjust to is humongous, we left Hyderabad when it was 43° Celsius and now it’s 6° Celsius in Redmond. Oh sorry I should’ve said it was 110°F and now it’s 43°F in Redmond.
Wish me luck and hopefully finally I will be able to attend some nerddinners.
In Building NETCF for Windows Phone 7 series we put in couple of features to enhance startup performance and ensure that the working set remains small. One of these features added is code/data sharing.
Native applications have inherent sharing where multiple processes can share the same executable code. However, in case of managed code running on NETCF 3.5 even when multiple applications use the same assembly, the managed code in them are JITed in context of each process separately. This results in suboptimal resource utilization because of the following
Together the overhead can be significant.
In the latest version of the runtime shipping with Windows Phone 7 (.NET Compact Framework 3.7) we added a feature to share both the JITed code and these type information across multiple managed processes.
The runtime essentially uses a client server architecture. We added a new kernel mode server which is responsible of maintaining a system wide shared heap. The server on receiving requests from the various client processes (each managed app is a client) JITs code and type information into this shared heap. The shared heap is accessed Read-only from all the client apps and Read/write from the server.
When a process requests for the same type to be loaded or code to be JITed it just re-uses the already loaded type info or JITed code from the shared heap. What is shared
Do note that user code is not shared and neither is all platform code. Sharing arbitrarily would actually increase cost if they ultimately didn’t get reused in a lot of applications. The list of assemblies/data shared was carefully chosen and tuned to ensure only those are shared that provided the maximum performance benefit.
Sharing provides the following advantages
Like user code the system is also capable of pitching the shared code when there is a low memory situation on the device.
If you are following the latest Windows Phone stories than you have surely heard about the .NET Compact Framework (NETCF) which is the core managed runtime used by the Windows Phone 7 programming model. This is an implementation of the .NET specification that works on devices and a disparate combination of HW and SW.
In case you are hearing about NETCF for the first time, then hopefully this post will help you to understand it’s architecture and how it is different from the desktop .NET.
The Architecture
At the heart of NETCF is the execution engine. This contains the usual suspects including the JIT, GC, Loader and other service providers (e.g. Marshalling that marshals native objects to managed and vice versa, type system).
The entire Virtual machine is coded against a platform agnostics Platform Abstraction Layer or the PAL. In order to target a platform like say Nokia S60 we implemented the PAL for that platform. Stringent policy of not taking any platform dependency outside PAL and ensuring in PAL we only use features that are commonly available on most modern embedded OS allows NETCF to be highly portable. Currently the PAL supports a variety of OS and processor architecture. The ones in Red are those that are used for Silverlight, the others are available for the legacy .NETCF 3.5.
The other system dependent part is the JITter which compiles the MSIL into the platform dependent instructions. For every supported processor architecture there’s a separate implementation of JIT.
This entire runtime is driven by a host. E.g. for Silverlight on S60 it’s the Silverlight host running as a plugin inside the Nokia browser, on Windows Phone it’s the Windows Phone task host. These hosts uses the runtime services to run managed code. The host interacts with the runtime over the hosting interfaces.
Managed code can either be the framework managed code like BCL and other Silverlight/XNA framework managed code or the user code that is downloaded from the web/application-store. Even though the framework managed code can interact with the underlying system including the execution-engine and the OS (via Platform-Invoke) the user managed code is carefully sandboxed. If you see the arrows moving out of the user managed code it is evident that user managed code can call only into the framework managed code (which in turn can call into the system post security verification). User code is not allowed to access any native resources (including P/Invoke).
The host knows about the UI and it’s rendering and uses reverse Invoke to call into the managed code. E.g. if a user clicks on a XAML button the Silverlight host using the XAML object tree and rendering logic (hit-testing) figures out which object got clicked on and it uses reverse-pinvoke to call into the corresponding managed objects (which provides the event handler). That managed code is then verified, security checked, jitted and run by NETCF.
Portability
NETCF is highly portable. As far as I know it’s one of the very few (or is it the only one :)) runtime shipping out of Microsoft that supports Big-Endian processor architecture (Xbox360 uses Big-Endian processor). It is designed for resource constrained devices and is battery friendly. For many time-space trade-offs it tilts towards conserving space (e.g. Code-pitching which I’ll cover later) and at the same time works well on Xbox360 running high-end games. The list of processor/OS I have given above is just illustrative and it actually works or there are POC (proof-of-concept) of it working on really esoteric SW+HW combinations.
Some CF Facts
In the comments of my previous post Windows Phone 7 Series Programming Model and elsewhere (e.g. twitter, WP7 forum) there seems to be a lot of confusion around how NETCF 3.5 compares against the new WP7 programming model powered by NETCF 3.7. People are asking why some API got removed from 3.5, or is 3.7 a subset of NETCF 3.5 as they are not seeing some familiar 3.5 APIs.
First I’d urge you to go visit the Windows Phone 7 Series Programming Model to see what it offers.
Essentially NETCF 3.5 can be broken down to
Now each of this has been updated (in order) as follows
So the runtime is hugely updated (and hopefully you’d feel the improvements, especially in performance).
For #2 and #3 the level of difference is same in magnitude as you compare WinForm with Silverlight on desktop. They are simply two different programming models.
In case you hit a 3.5 API that’s missing then it’s either because it’s (a) WinForm based API that got totally replaced, or (b) a BCL API that the Silverlight profile doesn’t support. The workaround is to search for how to do the same thing in Silverlight (e.g. use XDocument instead of XmlDocument). In some cases you’d get into situations where there is no direct alternative. E.g. there is no P/Invoke and if your code say P/Invoked to get data from a bar-code scanner you cannot do so now.
Also keep in mind that the bits available right now is just early preview bits and subject to a lot of change. The whole purpose of having CTPs is to hear our customers and ensure we use the remaining time before release in fixing issues that helps our customers the most. Hopefully some of the gaps will get closed before the final release.
Today at MIX 2010 we announced Silverlight beta for Nokia S60 phones. Go download the beta from http://silverlight.net/getstarted/devices/symbian/.
Currently the supported platform is S60 5th Edition phones and in particular the 5800 XpressMusic, N97 and N97 mini. It works in-browser and only in the Nokia default web-kit based browser.
Silverlight on S60 uses .NET Compact Framework (NETCF) underneath it. I work in the dev team for the Execution Engine of NETCF and I was involved at the very beginning with getting NETCF to work on the S60 platform.
NETCF is extremely portable (more about that in later posts) and hence we didn’t have a lot of trouble getting it onto S60. We abstract away the platform by coding against a generic platform abstraction layer (PAL), but even then we had some challenges. Getting .NET platform on a non-Windows OS is challenging because so much of our code relies on Windows like functionality (semantics) which might not be available directly on other platforms.
As I was digging through emails I found the following screenshot that I had sent out to the team when I just got NETCF to work on the S60 Emulator…
As you can see the first code to run was a console application and it was not Hello World :). We have come a long way since then as evident from the touch based casual game running on the N97 shown below
Just sometime back I posted on the MIX 2010 announcements. One of the major aspects of the announcement was
“The application model supported on Windows Phone 7 series will be managed only and will leverage Silverlight, XNA and the .NET Framework”
That’s a mouthful and includes 3 framework names in once sentence :). This was already disclosed and has resulted in some flutter over the web and twitter. Let me try to explain how these 3 interplays in the application model with the following diagram
Managed only
First of all the application model allows users to write only managed code (to repeat native code is not allowed). That means they do not have direct access to any native APIs and cannot use platform-invoke to call into native user or system code. So all resources including phone capabilities have to be accessed using the various managed APIs provided.
Two Flavors of applications (XNA and Silverlight)
There are 2 variants or flavors of applications users can write, XNA Games and Silverlight applications. Obviously the former is for writing games and the later is for your typical phone applications (nothing stops you from writing a Silverlight animation based game though). So you cannot mix-match SL and XNA UI in the same application.
However, do note that the traditional NETCF development using WinForm based UI is not supported.
Common Services available to both
That said, beyond the UI rendering and controls there is the common services which both SL and XNA applications can use. This includes all the phone capabilities like microphone, location, accelerometer, sound, media, networking, etc... Some of these features come from SL and others from XNA but land up in the common pool usable by either types of applications.
Core Runtime is .NET Compact Framework 3.7
The core runtime for both SL and XNA applications is .NET Compact Framework (henceforth called NETCF). This is the team I work for. .NETCF is a highly portable and compact (as in smaller size and footprint) implementation of the .NET specification. It is designed to run on resource constrained devices and on disparate HW and SW configurations.I will have a post later on the detailed architecture or .NETCF.
Over the last two years a lot of work has gone into .NETCF to scale it in terms of features, performance and stress to meet the requirements of Windows Phone and other newer platforms on which it is being targeted. We also added a lot of features which I hope you will enjoy. Some of the features is just to reach parity with the desktop CLR and others are device specific and not available on the desktop.
Over this blog in the course of the coming months I’ll try to share with you what all we did to reach here.
Today is a very big day for me and my team. The stuff we have been working on is finally went live on stage MIX 2010.
Windows Phone 7 is easily the most dramatic re-entry/reset Microsoft has undertaken in any field it’s in. We announced the phone in MWC and the reception it got was phenomenal. However, in the new connected devices world, the application platform+market story is as or more important than the device itself. Today at MIX we just disclosed details of that.
Our team was involved with building the IDE, the emulator and the core .NET runtime that powers the applications. So watch out and in the coming posts I’ll deep dive into these areas.
Someone contacted me over my blog about his managed application where the working set goes on increasing and ultimately leads to out of memory. In the email at one point he states that he is using .NET and hence there should surely be no leaks. I have also talked with other folks in the past where they think likewise.
However, this is not really true. To get to the bottom of this first we need to understand what the the GC does. Do read up http://blogs.msdn.com/abhinaba/archive/2009/01/20/back-to-basics-why-use-garbage-collection.aspx.
In summary GC keeps track of object usage and collects/removes those that are no longer referenced/used by any other objects. It ensures that it doesn’t leave dangling pointers. You can find how it does this at http://blogs.msdn.com/abhinaba/archive/2009/01/25/back-to-basic-series-on-dynamic-memory-management.aspx
However, there is some catch to the statement above. The GC can only remove objects that are not in use. Unfortunately it’s easy to get into a situation where your code can result in objects never being completely de-referenced.
Example 1: Event Handling (discussed in more detail here).
Consider the following code
EventSink sink = new EventSink(); EventSource src = new EventSource(); src.SomeEvent += sink.EventHandler; src.DoWork(); sink = null; // Force collection GC.Collect(); GC.WaitForPendingFinalizers();
In this example at the point where we are forcing a GC there is no reference to sink (explicitly via sink = null ), however, even then sink will not be collected. The reason is that sink is being used as an event handler and hence src is holding an reference to sink (so that it can callback into sink.EventHandler once the src.SomeEvent is fired) and stopping it from getting collected
Example 2: Mutating objects in collection (discussed here)
There can be even more involved cases. Around 2 years back I saw an issue where objects were being placed inside a Dictionary and later retrieved, used and discarded. Now retrieval was done using the object key. The flow was something like
Now the object was not immutable and in using the object in step 3 some fields of that object got modified and the same field was used for calculating the objects hash code (used in overloaded GetHashCode). This meant the Remove call in step 4 didn’t find the object and it remained inside the dictionary. Can you guess why changing a field of an object that is used in GetHashCode fails it from being retrieved from the dictionary? Check out http://blogs.msdn.com/abhinaba/archive/2007/10/18/mutable-objects-and-hashcode.aspx to know why this happens.
There are many more examples where this can happen.
So we can conclude that memory leaks is common in managed memory as well but it typically happens a bit differently where some references are not cleared as they should’ve been and the GC finds these objects referenced by others and does not collect them.
I am sure most people haven’t yet forgotten the Y2K problem. This year our team faced a mini Y2K, but don’t worry we anticipated it and all is well.
If you head over to NETCF Wikipedia page you’d notice the NETCF versions look as follows
<MajorVersion>.<MinorVersion>.<Year><DayofYear>.<Rev>
A sample version number is
3.5.7283.0
This represents .NETCF version 3.5 build on the 283rd day of 2007. I guess by now you can guess the problem. We use a single digit to represent the year. By that nomenclature, the version of the build churned out today would be 3.5.0001.0 which is lower than the one generated the year before and would fail to install.
These numbers are automatically generated by scripts on the server that churns out daily builds. The numbering system was invented a long time ago in early 2000 and no one bothered to fix it. We anticipated that it’s going to fail as we move into the new decade and have updated it to now have 2 digits for the year (and yes we know it will break again in the future, but hopefully that’s too far out to care right now).
Happy new Year.
Working in the developer division is very exciting because I can relate so well with the customers. However, that is also an issue. You relate so well that you tend to evolve some strong opinions that can cloud your view. While working in the Visual Studio Team System team and now in the .NET Compact Framework team I have learnt some lessons. I thought I’d share some of them
I am not *the* customer (or rather the only customer) Even though I represent the customer in different avatars (sometimes as a developer, sometimes as an office worker, sometimes just as a geek) I am not THE only customer that the product targets. When your product is going to be used by 100,000 developers/testers the average or even the predominant usage is going to be very different from what you think it will be. Sitting though an usability study is a very humbling experience. Like I believed that everyone should just be using incremental search.
Consistency is important Geeks and bloggers tend to over-tout coolness. While cool user experience (UX) seems awesome, it is frequently overdone and what seems cool on casual usage tends to tire soon. Consistency on the other hand lets your users get on-board faster and lets them spend time doing stuff they care about and not learning things that should work automatically.
Consistency doesn’t mean that every application needs to look exactly like notepad. Expression Blend is a great example which looks refreshingly cool (appeals to the designers) and at the same time provides an experience that is consistent to other windows apps.
Learn to let go ”If there’s not something you can actively do on a project, if it’s something you can only influence and observe, then you have to learn to provide your feedback, and then let go and allow other people to fail… People don’t learn if they never make any mistakes” ~ Raymond Chen on Microspotting
Corporate software development is very different from Indie development. Large software development projects have a bunch of people/teams involved. It is not necessary that the collective opinion matches yours. At some point you need to learn to let go and do what is required. As an example I can debate on what a Debug Assert dialog should look like or do. However, there are other folks to design and think about the UX, as a developer I need to give inputs and once the call has been made it’s my job to provide the best engineering solution that implements that UX**.
** Do note the debug assert dialog is a fictitious example, I never worked on the IDE side.
If you have tried inputting Indian languages in Windows you know it’s a major pain. That is particularly sad because Windows comes with very good support of Indian languages. I had almost given up using my native language Bengali on a computer due to this. Even when I was creating the About Page for this blog and wanted to have a version in Bengali, I had to cut it short a lot because typing it out was so painful.
There are web-based tools like the Google Transliteration tool that works well for entering text into web-pages where they are integrated (e.g. Orkut). However, I wanted a solution that pans the desktop, so that I can use it for say writing a post using Windows Live Writer.
Enter the Microsoft Indic Language Input tool. Head over to the link and install the desktop version. You can install the various languages individually (currently Bengali, Hindi, Kannada, Malayalam, Tamil and Telugu is supported). I personally installed the Bengali and Hindi versions.
Since I am on Windows 7 which comes pre-installed with Complex language support I needn’t do anything special. However, on older OS like XP you need to do some extra steps which are available through the Getting Started link on that page.
Once you are setup you can keep the Windows Language Bar floating on the desktop. The tool extends the language bar to allow you to enter Indic languages using an English keyboard via transliteration.
Go to the application where you want to enter Indic language and then switch to Bengali (or any of the other 6 supported Indic language) using this language bar. Start typing বেঙ্গলি using English keyboard and the tool will transliterates. The moment you’d hit a word terminator like space it inserts the Bengali word.
I tried some difficult words like কিংকর্তববিমূঢ় and it worked amazingly well
I had a very good experience with the tool. The only issue I faced was that the tool was extremely slow with some WPF apps like Seesmic twitter client. However, I got to know from the dev team that they are aware of the issue (it’s for some specific WPF apps and not WPF in general). I hope they fix it before they RTM (the tool is in Beta).
Tip: You can hit alt+shift to cycle the various languages in the toolbar without having to use your mouse (which is handy if you typing using a mix of languages).