Vance Morrison's Weblog

Vance Morrison is currently an Architect on the .NET Runtime Team, specializing in performance issues with the runtime or managed code in general.

  • Slides for Patterns & Practices Talk: Desiging for Performance

    Tomorrow (10/14/2009) I am giving a talk on performance at Microsoft-Campus Paterns and Practices Summit.  Below is the abstact for my talk

    To many developers, high performance development is ‘magic’.   It involves multiple metrics (time, memory allocations, footprint, startup) many special tools (profilers, performance counters, ETW, NGEN), special knowledge (cache lines, page faults, seek times), conflicting advice (are generics good or bad for performance?), lots of confusing numbers, and a bunch of extra work.    While it is true that performance is a broad topic with lots of details, it is also true that it is relatively easy to sort all of this detail into categories that make the topic of performance much more approachable.    This talk simplifies and prioritizes what you need to know in order build high performance applications on .NET.    

    I have posted the slides here for reference.   For those of you who may not have power-point, I have also cut and pasted the bibliography of web links, which are generally useful for those who care about performance. For those of you with power-point, simply click on the hyper link here or at the end of the article.

    Articles
    Blogs
    Windows Performance Analysis Developer Center (not really a blog, but has FAQ and links to other blogs …)
    CLR and Framework Perf Blog (.NET Runtime’s Performance Team notes on Performance)
    Hazim Shafi's Blog (details on VS 2010 new Performance tools)
    Tools
    MeasureIt (Benchmarking tool for design time)
    Visual Studio 2008 Profiler (Part of Visual Studio Team System) General CPU profiling
    Visual Studio  2010 Parallel Performance Analyzer (Part of Visual Studio Team System) Good all-round profiling (CPU, Disk, Blocked)
    Windows Performance Analyzer (WPA) (XPERF), General Sub-process performance analysis.
    VMMap (Measuring the coarse memory usage within a process)
    CLR Profiler for the .NET Framework 2.0 (Measuring detailed memory usage within the GC heap)
    Process Explorer (A more feature-rich Task Manager)
    Process Monitor (A tool for monitoring 
    Event Tracing Windows (ETW) Articles
  • Slides for a talk on Performance

     I am giving a .NET Performance talk for the users group at the Royal Bank of Canada, and I am posting the slides for the talk here for future reference.  (see power point document attached to this entry below). 

     Anyone who has heard me speak will see recognise the themes:  Good perf does not just happen, you have to plan, and to plan you have to measure.  I touch very briefly on some ways of measuring (Stopwatch, MeasureIt, XPerf)

    As with all talks, the goal is not so much to teach something, but rather to MOTIVATE you to learn more.   Thus I want it to be very easy to follow up on the topics I so briefly touched upon.   Thus I have pulled the bibliography out of the deck and placed it here so you can quickly research more.

    I did not have time to more than just mention it, but I want to emphasize here that Event Tracing for Windows (ETW) and its viewer (XPERF), is the most useful general purpose tool for doing performance analysis on windows, period.    It is well worth investing time in learning about it more.    Currently it has some limitations associated with diagnosing managed code, however rest assured that we are addressing these issues.   

     An important take away here is that learning about XPERF is a great investment.   It is THE tool that the entire Windows team uses for all of its performance diagnosis, and we are investing in it for the future too.    

    Posted Tuesday, July 21, 2009 11:05 AM by vancem | 1 Comments
    Filed under:

    Attachment(s): RBC_Talk.pptx
  • Here is your chance to get your Performance requests in for the next version of the .NET Runtime

    If you use managed code, and you care about performance, then we want to hear from you.

     We have not yet shipped the next version (what we call Version 4) of the .NET Runtime, however we are 'locked down' enough that at least a few of us on the runtime team are preparing for the version after that (as yet unnamed).   As part of that planning excercise we want YOUR input.    The .NET Performance team wants to know what your 'performance pain points' (is it startup, some throughput scenario, ASP.NET client etc), so we can address your most important issues.

     To submit your feedback simply go to the site http://dotnetfxperf2009.questionpro.com/ and fill out the form.   We tried to keep it short and 'low overhead' (we are the performance team after all (:-).     

    Thanks

    Vance Morrison

    .NET Runtime Performance Architect. 

  • Musings on the .NET Runtime on Channel 9

    In case anyone is interested I did an video interview on the runtime in general.   This is more about philosophy than technology, but if you are in the mood for some philosophy about the runtime, here is some from someone who has been around from the CLR's begining. 

    http://channel9.msdn.com/shows/Going+Deep/Vance-Morrison-CLR-Through-the-Years/

     

  • MeasureIt Update: Tool for doing MicroBenchmarks for .NET

    Almost a year ago now I wrote part 1 and part 2 of a MSDN article entitled 'Measure Early and Measure Often for Good Performance'.  In this article I argued that if you want to design high performance applications you need to be measuring performance early and often in the design process.   To help doing this I posted a tool call 'MeasureIt'.  It basically makes it easy to write benchmarks for .NET code.   In particular it also comes with at set of built-in benchmarks that measure the most important fundamental operations in .NET, so you can know what is expensive and what is not. 

    Well, is has been a almost a year, and I have made a few bug fixes / improvements ot MeasureIt since then, so this post brings MeasureIt up to date.   It also allows me to correct a mistake I made when I posted the unoptimized version of MeasureIt for that article.  It also allows me to improve the user experience during the download. 

    MeasureIt is supper easy to use in the simple case (just run it), and has good documentation (MeasureIt /usersGuide).   You are in fact only 7 clicks away from running it right now.   To do so

    1. Click on the 'MeasureIt.zip' link at the bottom of this article.  This brings up a download dialog box.
    2. Click on the 'Open' button.  This brings up a security dialog.
    3. You should click 'Allow' on the security dialog.  (Yes you have to trust me).  This opens the ZIP archive.
    4. Click on the 'MeasureIt' directory (alternatively you can copy it to your hard drive). 
    5. Click on the 'MeasureIt.exe' file
    6. Click on the 'Run' dialog box that comes up.
    7. You may get another security dialog.  You have to allow that. 

    At this point it runs MeasureIt.  It takes a few seconds for it it run and then it displays the data on the costs of 50 or so .NET operations.   As mentioned in the article, this it really useful stuff.   MeasureIt comes with its own source code, and typeing 'MeasureIt /Edit' allows you to add more benchmarks to it if you want to measure somethign of your own.   

    Happy Performance Investigating!

     

  • Slides for our All Day PDC 2008 talks on: Performance By Design

    Every year or so, when Microsoft believes it has something useful to say to develoeprs it hosts a Programmer Developer's Conference.  It is doing so this year from 10/27 through 10/30.    Between Rico Marani, Mark Friedman and myself, we gave a all day talk on writing high performance software for the .NET Runtime.   

     I am posting the slides here for reference.   I had to break the slides up into the sessions we gave them to fit within the size restrictions of the blog.  I also had to post them as seperate posting (since the blog software seems to only support one attachment per blog entry).   However I have put the quick links to them here.

    These slides are of course more useful to those who actually attended the talks, however I think there is a lot that can be learned from them even without having heard us talk.  I am happy to answer any questions that anyone might have after viewing the slides.

    You can view the other talks and slides from the PDC by going to there web site http://microsoftpdc.com/Agenda/.  This will certainly work for those attending the PDC, but might work for everyone (I don't really know).

  • Slides for PDC 2008 Talk: Performance By Design: ASP.NET Performance

    These are the slides that Mark Friedman used to talk about ASP.NET Performance at the Programmer Developers Conference.

  • Slides for PDC 2008 Talk: Performance By Design: Rico Mariani's Introduction

    These are the slides Rico Mariani used to introduce the all day session we gave on .NET performance at the 2008 Programmer Developers Conference. 
  • Slides for PDC 2008 Talk: Performance By Design: Parrallel Programming

    These are the slides for the third (of 3) talks I gave on 10/26/2008 at the Programmer Developer Converence (PDC) doing parallel programming with the .NET runtime.
  • Slides for PDC 2008 Talk: Performance By Design: Measuring Memory

    These are the slides for the second (of 3) talks I gave on 10/26/2008 at the Programmer Developer Converence (PDC) memory investigation.

  • Slides for PDC 2008 Talk: Performance By Design: Measuring CPU Time

    These are the slides for the talk I gave on 10/26/2008 at the Programmer Developer Converence (PDC) on the basics of performance investigation

  • Links to MSDN articles I have written on designing for performance

    I just happen to notice that I don't have any links from my blog to some recient MSDN articles I wrote on performance.   I want to quickly correct his with this posting.

    There is actually a very nice summary page that MSDN created that gathers together all the articles I have written over time.  The link is here (a web search of MSDN Vance Morrison will also turn it up).

     The two articles I want to call your attention to however are these

    CLR Inside Out: Measure Early and Often for Performance, Part 1

    CLR Inside Out: Measure Early and Often for Performance, Part 2

    These articles are about how most perf problems start very early in the design, and that there is no substitute for knowing how much things cost when doing your design.

    There is a nice companion tool that I wrote callled 'MeasureIt' which you get download (free), that allows you to

    1. Get a bunch of useful benchmarks on how much various .NET operations cost
    2. Measure other stuff that you happen to be interested in easily (eg how fast is that collection class you are using?)

    In fact, I intend I just recently used MeasureIt to measure how expensive the various levels of the CPU cache hierarchy are.   (that is how much does a fetch from L1 cost?  How much if you miss the L1 Cache but hit the L2 cache?  How much more expensive is it when you miss both L1 and L2 and have to go to main memory?).   If there is interest I can write a quick blog about that. 

     

  • Giving Performance Talk at Programmer Developers Conference (PDC) 10/26

    This is a quick plug for a pre-conference session I am giving on 10/26 at Microsoft’s Programmer Developers Conference (PDC) 10/26 held this year in Los Angeles.   My talks are part of a all day session that I am giving along with Rico Mariani and Mark Friedman entitled

     

    Performance by design using the .NET Framework

     

    In the morning Rico will be starting us off with some basics: having a performance budget, designing for performance and the basic of measuring.   Mark will speak on the performance characteristics of important parts of the framework.   In the afternoon, I will be doing two sessions, the first is on memory performance in .NET:  when you should care, how to measure it, behavior of the GC, and best practices.   I will follow up on multi-threading performance in .NET.  This session will overlap a bit with the talk on multi-core programming, but I will focus on the .NET aspects only try to have a very practical perspective.   Finally Mark will finish off with two talks, one on ASP.NET performance, and another on Silverlight performance.

     

    I am looking forward to it.   If you are interested, you can learn more about registration here.   I am also interested in talking to customers and getting feedback, so if you want to talk to someone who has been on the .NET Runtime team 'forever', come to the PDC and we can talk. 

  • To Inline or not to Inline: That is the question

    In a previous posting, I mentioned that .NET V3.5 Service Pack 1 had significant improvements in the Just in time (JIT) compiler for the X86 platform, and in particular its ability to inline methods was improved (especially for methods with value type arguments).   Well now that this release is publically available my claim can be put to the test.   In fact a industrious blogger named Steven did just that and blogged about it here.   What he did was to create a series of methods each a bit bigger than the previous one, and determined whether they got inlined or not.   Steven did this by throwing an exception and then programmatically inspecting the stack trace associated with the exception.    This makes sense when you are trying to automate the analysis, but for simple one-off cases, it is simpler and more powerful to simply look at the native instructions.  See this blog for details on how to using Visual Studio.

    What Steven found was that when tried to inline the following method

            public void X18(int a)

            {

                if (a < 0 || a == 100)

                {

                    Throw(a * 2);

                }

            }

     

    It did not inline.  This was not what Steven expected, because this method was only 18 bytes of IL.  The previous version of the runtime would inline methods up to 32 bytes.   It seems like the JIT’s ability to inline is getting worse, not better.   What is going on?

    Well, at the heart of this anomaly is a very simple fact: It is not always better to inline.  Inlining always reduces the number of instructions executed (since at a minimum the call and return instructions are not executed), but it can (and often does), make the resulting code bigger.  Most of us would intuitively know that it does not make sense to inline large methods (say 1Kbytes), and that inlining very small methods that make the call site smaller (because a call instruction is 5 bytes), are always a win, but what about the methods in between?  

    Interestingly, as you make code bigger, you make it slower, because inherently, memory is slow, and the bigger your code, the more likely it is not in the fastest CPU cache (called L1), in which case the processor stalls 3-10 cycles until it can be fetched from another cache (called L2), and if not there, in main memory (taking 10+ cycles).  For code that executes in tight loops, this effect is not problematic because all the code will ‘fit’ in the fastest cache (typically 64K), however for ‘typical’ code, which executes a lot of code from a lot of methods, the ‘bigger is slower’ effect is very pronounced.  Bigger code also means bigger disk I/O to get the code off the disk at startup time, which means that your application starts slower. 

    In fact, the first phase of the JIT inlining improvement was simply to remove the restrictions on JIT inlining.   After that phase was complete we could inline A LOT, and in fact the performance of many of our ‘real world’ benchmarks DECREASED.   Thus we had irrefutable evidence that inlining could be BAD for performance.  We had to be careful; too much inlining was a bad thing. 

    Ideally you could calculate the effect of code size on caching and make a principled decision on when inlining was good and bad.   Unfortunately, the JIT compiler has does not have enough information to take such a principled approach.   However some things where clear

    1.     If inlining makes code smaller then the call it replaces, it is ALWAYS good.  Note that we are talking about the NATIVE code size, not the IL code size (which can be quite different). 

    2.     The more a particular call site is executed, the more it will benefit from inlning.  Thus code in loops deserves to be inlined more than code that is not in loops.

    3.     If inlining exposes important optimizations, then inlining is more desirable.  In particular methods with value types arguments benefit more than normal because of optimizations like this and thus having a bias to inline these methods is good.

    Thus the heuristic the X86 JIT compiler uses is, given an inline candidate.

    1.     Estimate the size of the call site if the method were not inlined.

    2.     Estimate the size of the call site if it were inlined (this is an estimate based on the IL, we employ a simple state machine (Markov Model), created using lots of real data to form this estimator logic)

    3.     Compute a multiplier.   By default it is 1

    4.     Increase the multiplier if the code is in a loop (the current heuristic bumps it to 5 in a loop)

    5.     Increase the multiplier if it looks like struct optimizations will kick in.

    6.     If InlineSize  <= NonInlineSize * Multiplier do the inlining. 

     What this means is that by default, only methods that do not grow the call site will be inlined, however if the code is in a loop, it can grow as much as 5x

    What does this mean for Steven’s test?

    It means that simple tests based solely on IL size are not accurate.   First what is important is the Native size, not the IL size, and more importantly, it is much more likely to be inlined if the method is in a loop.  In particular, if you modify Steven’s test so that the methods are in a loop when they are called, in fact all of his test methods get inlined.   

    To be sure, the heuristics are not perfect.  The worse case is a method that is too big to be inlined, but is still called A LOT (it is in a loop) and calls other small methods that COULD be inlined but are not because they are not in a loop.   The problem is that the JIT does not know if the method is called a lot or not, and by default does not inline in that case.   We are considering adding an attribute to a method which gives a strong hint that the method is called a lot and thus would bump the multiplier much like if there was a loop, but this does not exist now. 

    We definitely are interested in feedback on our inlining heuristic.  If Steven or anyone else finds real examples where we are missing important inlining opportunities we want to know about them, so we can figure out whether we can adjust our heuristics (but please keep in mind that they are heurisitics.  They will never be perfect). 

     

  • .NET Framework 3.5 SP1 Allows managed code to be launched from a network share!

    Hurray, its finally fixed!  manage code 'just works' from network file share!

    Now I know that some of you are probably just saying 'who cares' or 'huh?' but for those of us who have hit this problem, this has been a major deployment headache, and I am happy to say that the end of this particular problem is in sight.

    The problem scenario is this.  If you have a managed applications like 'MyApp.exe' it works great if you run it locally (eg C:\bin\MyApp.exe), but fails when you try to run it from a network location (eg \\Myhost\bin\MyApp.exe).   The problem is that the security system for the runtime treats network locations as less trustworthy than local locations, and thus throws an security exception.     The problem is that failing to run managed code WHILE STILL ALLOWING UNMANAGED EXE's to run, does not provide any security (because hackers will simply use unmanaged code) but does cause nontrivial deployment headaches (manage apps can't be run from network locations). 

    Well, the better part of a year ago I ask Brad Abrams to do a poll on this issue and we found that there was quite a bit of customer deployment pain associated with this issue, and after much deliberation decided to fix it.    The exact details were covered in Shawn Farkas Blog, however the high level take-away is that for the vast majority of scenarios 'it just works' meaning that managed code acts just like unmanaged code as far as launching EXE from network shares are concerned. 

    So I do encourage you to down load the .NET 3.5 SP1 service pack.  It is a very low risk, drop-in update for the runtime.  Once you do this, you get network launch for free.   Because it is a service pack, you can also simply just wait, and get the update automatically in the next several weeks via windows update.    Thus if you are software deployer, pretty soon now, with high probability your customers will have this newer runtime.

    Have fun! 

    P.S: for those of you who are concerned that we have opened security holes by doing this, we have tried to be VERY careful not to do this.  The basic rationale is that we are not opening any holes that were not already there because Windows allows non-managed exes to run from a network share.    By the way, if you WANT to lock down your network access (prevent exes from a network share from running, (or even just exes that are in speical locations), you can do this with Software Policies.  That is the proper way to lock down your computer. 

More Posts Next page »

© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker