Vance Morrison's Weblog

Vance Morrison is currently an Architect on the .NET Runtime Team, specializing in performance issues with the runtime or managed code in general.

  • Here is your chance to get your Performance requests in for the next version of the .NET Runtime

    If you use managed code, and you care about performance, then we want to hear from you.

     We have not yet shipped the next version (what we call Version 4) of the .NET Runtime, however we are 'locked down' enough that at least a few of us on the runtime team are preparing for the version after that (as yet unnamed).   As part of that planning excercise we want YOUR input.    The .NET Performance team wants to know what your 'performance pain points' (is it startup, some throughput scenario, ASP.NET client etc), so we can address your most important issues.

     To submit your feedback simply go to the site http://dotnetfxperf2009.questionpro.com/ and fill out the form.   We tried to keep it short and 'low overhead' (we are the performance team after all (:-).     

    Thanks

    Vance Morrison

    .NET Runtime Performance Architect. 

  • Musings on the .NET Runtime on Channel 9

    In case anyone is interested I did an video interview on the runtime in general.   This is more about philosophy than technology, but if you are in the mood for some philosophy about the runtime, here is some from someone who has been around from the CLR's begining. 

    http://channel9.msdn.com/shows/Going+Deep/Vance-Morrison-CLR-Through-the-Years/

     

  • MeasureIt Update: Tool for doing MicroBenchmarks for .NET

    Almost a year ago now I wrote part 1 and part 2 of a MSDN article entitled 'Measure Early and Measure Often for Good Performance'.  In this article I argued that if you want to design high performance applications you need to be measuring performance early and often in the design process.   To help doing this I posted a tool call 'MeasureIt'.  It basically makes it easy to write benchmarks for .NET code.   In particular it also comes with at set of built-in benchmarks that measure the most important fundamental operations in .NET, so you can know what is expensive and what is not. 

    Well, is has been a almost a year, and I have made a few bug fixes / improvements ot MeasureIt since then, so this post brings MeasureIt up to date.   It also allows me to correct a mistake I made when I posted the unoptimized version of MeasureIt for that article.  It also allows me to improve the user experience during the download. 

    MeasureIt is supper easy to use in the simple case (just run it), and has good documentation (MeasureIt /usersGuide).   You are in fact only 7 clicks away from running it right now.   To do so

    1. Click on the 'MeasureIt.zip' link at the bottom of this article.  This brings up a download dialog box.
    2. Click on the 'Open' button.  This brings up a security dialog.
    3. You should click 'Allow' on the security dialog.  (Yes you have to trust me).  This opens the ZIP archive.
    4. Click on the 'MeasureIt' directory (alternatively you can copy it to your hard drive). 
    5. Click on the 'MeasureIt.exe' file
    6. Click on the 'Run' dialog box that comes up.
    7. You may get another security dialog.  You have to allow that. 

    At this point it runs MeasureIt.  It takes a few seconds for it it run and then it displays the data on the costs of 50 or so .NET operations.   As mentioned in the article, this it really useful stuff.   MeasureIt comes with its own source code, and typeing 'MeasureIt /Edit' allows you to add more benchmarks to it if you want to measure somethign of your own.   

    Happy Performance Investigating!

     

  • Slides for our All Day PDC 2008 talks on: Performance By Design

    Every year or so, when Microsoft believes it has something useful to say to develoeprs it hosts a Programmer Developer's Conference.  It is doing so this year from 10/27 through 10/30.    Between Rico Marani, Mark Friedman and myself, we gave a all day talk on writing high performance software for the .NET Runtime.   

     I am posting the slides here for reference.   I had to break the slides up into the sessions we gave them to fit within the size restrictions of the blog.  I also had to post them as seperate posting (since the blog software seems to only support one attachment per blog entry).   However I have put the quick links to them here.

    These slides are of course more useful to those who actually attended the talks, however I think there is a lot that can be learned from them even without having heard us talk.  I am happy to answer any questions that anyone might have after viewing the slides.

    You can view the other talks and slides from the PDC by going to there web site http://microsoftpdc.com/Agenda/.  This will certainly work for those attending the PDC, but might work for everyone (I don't really know).

  • Slides for PDC 2008 Talk: Performance By Design: ASP.NET Performance

    These are the slides that Mark Friedman used to talk about ASP.NET Performance at the Programmer Developers Conference.

  • Slides for PDC 2008 Talk: Performance By Design: Rico Mariani's Introduction

    These are the slides Rico Mariani used to introduce the all day session we gave on .NET performance at the 2008 Programmer Developers Conference. 
  • Slides for PDC 2008 Talk: Performance By Design: Parrallel Programming

    These are the slides for the third (of 3) talks I gave on 10/26/2008 at the Programmer Developer Converence (PDC) doing parallel programming with the .NET runtime.
  • Slides for PDC 2008 Talk: Performance By Design: Measuring Memory

    These are the slides for the second (of 3) talks I gave on 10/26/2008 at the Programmer Developer Converence (PDC) memory investigation.

  • Slides for PDC 2008 Talk: Performance By Design: Measuring CPU Time

    These are the slides for the talk I gave on 10/26/2008 at the Programmer Developer Converence (PDC) on the basics of performance investigation

  • Links to MSDN articles I have written on designing for performance

    I just happen to notice that I don't have any links from my blog to some recient MSDN articles I wrote on performance.   I want to quickly correct his with this posting.

    There is actually a very nice summary page that MSDN created that gathers together all the articles I have written over time.  The link is here (a web search of MSDN Vance Morrison will also turn it up).

     The two articles I want to call your attention to however are these

    CLR Inside Out: Measure Early and Often for Performance, Part 1

    CLR Inside Out: Measure Early and Often for Performance, Part 2

    These articles are about how most perf problems start very early in the design, and that there is no substitute for knowing how much things cost when doing your design.

    There is a nice companion tool that I wrote callled 'MeasureIt' which you get download (free), that allows you to

    1. Get a bunch of useful benchmarks on how much various .NET operations cost
    2. Measure other stuff that you happen to be interested in easily (eg how fast is that collection class you are using?)

    In fact, I intend I just recently used MeasureIt to measure how expensive the various levels of the CPU cache hierarchy are.   (that is how much does a fetch from L1 cost?  How much if you miss the L1 Cache but hit the L2 cache?  How much more expensive is it when you miss both L1 and L2 and have to go to main memory?).   If there is interest I can write a quick blog about that. 

     

  • Giving Performance Talk at Programmer Developers Conference (PDC) 10/26

    This is a quick plug for a pre-conference session I am giving on 10/26 at Microsoft’s Programmer Developers Conference (PDC) 10/26 held this year in Los Angeles.   My talks are part of a all day session that I am giving along with Rico Mariani and Mark Friedman entitled

     

    Performance by design using the .NET Framework

     

    In the morning Rico will be starting us off with some basics: having a performance budget, designing for performance and the basic of measuring.   Mark will speak on the performance characteristics of important parts of the framework.   In the afternoon, I will be doing two sessions, the first is on memory performance in .NET:  when you should care, how to measure it, behavior of the GC, and best practices.   I will follow up on multi-threading performance in .NET.  This session will overlap a bit with the talk on multi-core programming, but I will focus on the .NET aspects only try to have a very practical perspective.   Finally Mark will finish off with two talks, one on ASP.NET performance, and another on Silverlight performance.

     

    I am looking forward to it.   If you are interested, you can learn more about registration here.   I am also interested in talking to customers and getting feedback, so if you want to talk to someone who has been on the .NET Runtime team 'forever', come to the PDC and we can talk. 

  • To Inline or not to Inline: That is the question

    In a previous posting, I mentioned that .NET V3.5 Service Pack 1 had significant improvements in the Just in time (JIT) compiler for the X86 platform, and in particular its ability to inline methods was improved (especially for methods with value type arguments).   Well now that this release is publically available my claim can be put to the test.   In fact a industrious blogger named Steven did just that and blogged about it here.   What he did was to create a series of methods each a bit bigger than the previous one, and determined whether they got inlined or not.   Steven did this by throwing an exception and then programmatically inspecting the stack trace associated with the exception.    This makes sense when you are trying to automate the analysis, but for simple one-off cases, it is simpler and more powerful to simply look at the native instructions.  See this blog for details on how to using Visual Studio.

    What Steven found was that when tried to inline the following method

            public void X18(int a)

            {

                if (a < 0 || a == 100)

                {

                    Throw(a * 2);

                }

            }

     

    It did not inline.  This was not what Steven expected, because this method was only 18 bytes of IL.  The previous version of the runtime would inline methods up to 32 bytes.   It seems like the JIT’s ability to inline is getting worse, not better.   What is going on?

    Well, at the heart of this anomaly is a very simple fact: It is not always better to inline.  Inlining always reduces the number of instructions executed (since at a minimum the call and return instructions are not executed), but it can (and often does), make the resulting code bigger.  Most of us would intuitively know that it does not make sense to inline large methods (say 1Kbytes), and that inlining very small methods that make the call site smaller (because a call instruction is 5 bytes), are always a win, but what about the methods in between?  

    Interestingly, as you make code bigger, you make it slower, because inherently, memory is slow, and the bigger your code, the more likely it is not in the fastest CPU cache (called L1), in which case the processor stalls 3-10 cycles until it can be fetched from another cache (called L2), and if not there, in main memory (taking 10+ cycles).  For code that executes in tight loops, this effect is not problematic because all the code will ‘fit’ in the fastest cache (typically 64K), however for ‘typical’ code, which executes a lot of code from a lot of methods, the ‘bigger is slower’ effect is very pronounced.  Bigger code also means bigger disk I/O to get the code off the disk at startup time, which means that your application starts slower. 

    In fact, the first phase of the JIT inlining improvement was simply to remove the restrictions on JIT inlining.   After that phase was complete we could inline A LOT, and in fact the performance of many of our ‘real world’ benchmarks DECREASED.   Thus we had irrefutable evidence that inlining could be BAD for performance.  We had to be careful; too much inlining was a bad thing. 

    Ideally you could calculate the effect of code size on caching and make a principled decision on when inlining was good and bad.   Unfortunately, the JIT compiler has does not have enough information to take such a principled approach.   However some things where clear

    1.     If inlining makes code smaller then the call it replaces, it is ALWAYS good.  Note that we are talking about the NATIVE code size, not the IL code size (which can be quite different). 

    2.     The more a particular call site is executed, the more it will benefit from inlning.  Thus code in loops deserves to be inlined more than code that is not in loops.

    3.     If inlining exposes important optimizations, then inlining is more desirable.  In particular methods with value types arguments benefit more than normal because of optimizations like this and thus having a bias to inline these methods is good.

    Thus the heuristic the X86 JIT compiler uses is, given an inline candidate.

    1.     Estimate the size of the call site if the method were not inlined.

    2.     Estimate the size of the call site if it were inlined (this is an estimate based on the IL, we employ a simple state machine (Markov Model), created using lots of real data to form this estimator logic)

    3.     Compute a multiplier.   By default it is 1

    4.     Increase the multiplier if the code is in a loop (the current heuristic bumps it to 5 in a loop)

    5.     Increase the multiplier if it looks like struct optimizations will kick in.

    6.     If InlineSize  <= NonInlineSize * Multiplier do the inlining. 

     What this means is that by default, only methods that do not grow the call site will be inlined, however if the code is in a loop, it can grow as much as 5x

    What does this mean for Steven’s test?

    It means that simple tests based solely on IL size are not accurate.   First what is important is the Native size, not the IL size, and more importantly, it is much more likely to be inlined if the method is in a loop.  In particular, if you modify Steven’s test so that the methods are in a loop when they are called, in fact all of his test methods get inlined.   

    To be sure, the heuristics are not perfect.  The worse case is a method that is too big to be inlined, but is still called A LOT (it is in a loop) and calls other small methods that COULD be inlined but are not because they are not in a loop.   The problem is that the JIT does not know if the method is called a lot or not, and by default does not inline in that case.   We are considering adding an attribute to a method which gives a strong hint that the method is called a lot and thus would bump the multiplier much like if there was a loop, but this does not exist now. 

    We definitely are interested in feedback on our inlining heuristic.  If Steven or anyone else finds real examples where we are missing important inlining opportunities we want to know about them, so we can figure out whether we can adjust our heuristics (but please keep in mind that they are heurisitics.  They will never be perfect). 

     

  • .NET Framework 3.5 SP1 Allows managed code to be launched from a network share!

    Hurray, its finally fixed!  manage code 'just works' from network file share!

    Now I know that some of you are probably just saying 'who cares' or 'huh?' but for those of us who have hit this problem, this has been a major deployment headache, and I am happy to say that the end of this particular problem is in sight.

    The problem scenario is this.  If you have a managed applications like 'MyApp.exe' it works great if you run it locally (eg C:\bin\MyApp.exe), but fails when you try to run it from a network location (eg \\Myhost\bin\MyApp.exe).   The problem is that the security system for the runtime treats network locations as less trustworthy than local locations, and thus throws an security exception.     The problem is that failing to run managed code WHILE STILL ALLOWING UNMANAGED EXE's to run, does not provide any security (because hackers will simply use unmanaged code) but does cause nontrivial deployment headaches (manage apps can't be run from network locations). 

    Well, the better part of a year ago I ask Brad Abrams to do a poll on this issue and we found that there was quite a bit of customer deployment pain associated with this issue, and after much deliberation decided to fix it.    The exact details were covered in Shawn Farkas Blog, however the high level take-away is that for the vast majority of scenarios 'it just works' meaning that managed code acts just like unmanaged code as far as launching EXE from network shares are concerned. 

    So I do encourage you to down load the .NET 3.5 SP1 service pack.  It is a very low risk, drop-in update for the runtime.  Once you do this, you get network launch for free.   Because it is a service pack, you can also simply just wait, and get the update automatically in the next several weeks via windows update.    Thus if you are software deployer, pretty soon now, with high probability your customers will have this newer runtime.

    Have fun! 

    P.S: for those of you who are concerned that we have opened security holes by doing this, we have tried to be VERY careful not to do this.  The basic rationale is that we are not opening any holes that were not already there because Windows allows non-managed exes to run from a network share.    By the way, if you WANT to lock down your network access (prevent exes from a network share from running, (or even just exes that are in speical locations), you can do this with Software Policies.  That is the proper way to lock down your computer. 

  • What's Coming in .NET Runtime Performance in Version V3.5 SP1

    What's Coming in .NET Runtime Performance in Version V3.5 SP1

    It certainly has been a while since I last blogged.   Most of this is laziness on my part, but I can truthfully say that it is partly because I have been busy trying to get the next servicing release of the .NET framework (called Version 3.5 Service Pack 1) out the door.    Part of the framework (runtime + libraries) servicing is an updated version of the runtime DLL (which we call the CLR).    

    How do you know if you have an updated CLR?   If the file

    • c:\windows\microsoft.net\Framework\V2.0.50727\mscorwks.dll

    has a version number greater than 50727.3000 then you have the changes I will be talking about.   Because this is just a servicing of the runtime, we had to confine ourselves to changes that we felt had a very low chance of breaking existing applications.   Nevertheless we were able to add substantial performance value.   There are two changes I would like to talk about in particular.

    • Improved Cold Startup performance

    Cold startup time is the time it takes an application to start when all the data needed to run has to be fetched from disk (rather than from the operating system disk cache).  The second time an application starts (Warm startup) is substantially faster because very few (slow) disk reads are needed to get the application running.   It is not unusual for the warm startup time to be good (< 1 sec), but for the cold startup time to be quite bad (5-10 sec).  In the new runtime we go to a lot of effort to pack all the code and runtime structures (for now only in DLLS associated with the framework itself), to reduce the amount of Disk I/O needed.   While the improvements benefit all code that uses the framework, it helps the most on code that has been precompiled using NGEN.exe and the effect is more pronounced on larger applications. 

    • Inlining of value types (C# structs).  (X86) 

    C# structs are what the .NET runtime calls value types.  They are called this because when you have fields or arrays of such data structures, the value is embedded directly in the field (not referenced through a pointer).    All the primitive types (int, char, bool) are value types, as well as a few types defined in the base class library like DateTime, Decimal, Point and Rectangle.    The previous version of the code generator (for X86) did almost no optimization on value types.   This was unfortunate, because although value types are not common (most types are classes, not structs), when they are used, they can be used heavily, and so optimization is important.  In fact the biggest single piece of feedback we got on our feedback site related to performance was concerning more agressive inlining of value type methods.    The runtime’s 64 bit code generators could already optimize value types, but the code generator for X86 could not.   Since X86 is still the dominate platform there was an unmet need.    In this servicing we included the work we did to enable this for X86.    Your own code often may not benefit from this improvement (because it does not use value classes much), but if you do use them, you tend to use them heavily, and the value could be substantial. 

    64 Bit machines: X64

    Much of the cold startup work was rearranging the data in precompiled framework executables to pack heavily used items together, and as such is applies equally well for 64 bit platforms like X64.   However there are places where 64 bit machines are quite different (how exceptions are delivered by the operating system) and we did not invest heavily in tuning (packing) these places.  The result is that cold startup wins will not be as dramatic on 64 bit machines.   There were improvements to the 64 bit code generator, but since this code generator could already inline value types, we did not need to do work in that specific area.

    64 Bit machines: Intel Itanium

    Most users of the .NET runtime run it on the X86 processor, however beginning with Version 2.0 of the runtime also supports the Intel Itanium architecture.   In fact, internally the Itanium version of the runtime came on line well before the X64 version did.   The Itanium processor tends to be used in high end server applications, where scalability is a great concern.   In the next major release of the Framework, we are continuing to investigate and improve our performance on the Itanium architecture.   Microsoft is a charter member of the Itanium Solutions Alliance (ISA) and we are excited to work with the other members of ISA to further enhance the performance of the .NET Framework on Intel Itanium Architecture.  Check out the CLR Performance Team’s blog on investigating Itanium perf  for more on this work. 

     

  • Writing approachable code: Introducing the hyperaddin for Visual Studio!

    A few years ago now, several of us on the .NET Runtime team where lamenting how unapproachable the code base was for new developers to the team.    We agreed that more commenting would certainly help the situation, but that alone was not enough, because you also need to FIND the comments WHEN they are relevent for the comments to actually be useful.   Typically the most useful comments are not about a particular line of code, but information about the purpose of a whole method, class, or subsystem, and what the global invarients are.   Finding these very useful comments is often not a trivial exercise. 

    It occured to us at the time that this problem is not so different from finding relevent data on internet, and that having hyperlinks would help alot to 'channel' people to important relevant comments.  This was the genesis of the HyperAddin.   The idea is that Visual Studio already supported having hyperlinks in comments (any strings starting with http:// becomes a link that you can Ctrl-left click on).    The idea is to add a new string called code:identifier  which turns into a hyperlink that looks up the definition to identifier.  Now it is possible for one part of the code base to refer to another by symbolic name.   

    Taking our cue from HTML, we added the capability of sub-file anchors.  Any comment of the form

    • // #SomeIdentifier

    Defines an anchor in that file called SomeIdentifier.  You can then refer to it by the hyperlink code:identifier#tag.  Like HTML, this is a hyperlink to code:identifier (which gets you to the correct file) followed by a search on that 'page' (file) for 'tag'.  The result is that it is now trivial for one comment to refer to another.  

    With this simple change, it is now possible to write a very nice overview comment and refer to it in dozens of places.  Having used it for several months now, I find it very constraining on those occasions that I am not writing code in visual studio, and thus do not have my code hyperlinks.    In retrospect I am amazed that someone did not think of this earlier (which is a hallmark of a good idea).

    There is more to say about what the hyperaddin can do for you (stay tuned), but this blog entry is mostly about getting you to try it.   Just today I published the addin on codePlex as HyperAddin.  Please click on the link and follow the installation instructions (it is an XCOPY deployment).    If you use Visual Studio, give it a try right now.

    If you like the capabilities of HyperAddin and wish that Visual Studio had this feature, you can vote on this here.  To vote, click on the 'Sign in to Rate' (and register if necessary), and then click on the number of stars in the rating box that corresponds to the value of this feature.   The more people who vote, the more likely this feature is to get into Visual Studio.

More Posts Next page »

© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker