Roslyn performance (Matt Gertz)

Roslyn performance (Matt Gertz)

Rate This
  • Comments 20

(For the next few posts, I’m going to introduce readers to the different feature teams in the Managed Languages org.  Today, I’m starting this with a focus on the performance team.)

Back in 2000, I found myself assigned to be the performance lead of the Visual Basic team, and my first assigned goal was to drive the performance marks of the (then) forthcoming Visual Basic .NET to be in line with the numbers for Visual Basic 6.0.  The primary focus was on the VB runtime APIs at first.  That was a relatively simple task; APIs are nicely discrete bits of code, easy to measure and easy to evaluate and so, within a couple of months, I’d worked with the team to get the APIs at parity (or better).   “Ah,” said I, “this performance work is so simple that I wonder why everyone is challenged by it.  I must have a special gift for performance or something!”

This peculiar form of self-delusion lasted about a week, until the next challenge arose, which involved improving the shutdown speed of Visual Basic .NET itself, which was taking over a minute for large solutions.  That functionality was not in code that was nicely constrained like APIs, and so it took me a long time (probably longer than it should have, in retrospect) to realize that the process was blocking on background compilation even when shutting down, instead of just abandoning the compilation altogether.  And, having fixed that, I then moved on to tuning F5 build times, which involved several threads all needing to get their tasks done quickly and yet in a semi-serial fashion.  None of them were operating incorrectly with respect to themselves; it was the combination of them that were causing slowdowns.  That took days and days of investigation (and a lot of collaboration among several teams) to lock down.  In that investigation, I encountered the blunt truths about performance:  there is no perfect solution to a general performance problem, and also that you are never truly done tuning your product, because other dependent code can and will change around you. 

Which brings us to 2014…

Now, in the intervening 14 years, the tools to evaluate performance have of course become more powerful, and we can find and address issues far faster than in days of yore, when code inspection & stopwatch timing was roughly 75% of the job.  At the same time, however, the applications themselves have become so complex (either internally or with respect to the environment in which they run) that solving problems after the fact still creates a big challenge.  In fact, it’s become even more imperative to design for performance up front, because there are more ways than ever to get whammied.  During my recent stint in XBOX, for example, my team over there worked hard to generate performant code for the back end of SmartGlass, only to discover near the end that we hadn’t accounted for the inherent latency of using SSL between us and the data store – it was not a code issue per se, but a limitation of the environment that we hadn’t properly designed for.  (Fortunately, our design was modular enough that we were able to put in some caching at relatively low cost to the project and still meet our performance goals on schedule!)

As we all know and as I allude to above, you’ll save a lot of time and effort if you design for performance in the first place.  That’s always been a Microsoft tenet (indeed, our interview questions often touch upon generating performant code), and we take it very seriously on the Managed Languages team.  But, since some performance issues will slip through just due to human nature, and since designs which seemed good at first may prove to be problematic afterwards, ongoing vigilance is paramount – constant monitoring is the key to success. 

Performance and Roslyn

With Roslyn, therefore, we treat performance exactly as if it was a feature area which plans for specific work and which has progress presented to the team at each end-of-sprint showcase.  It was designed for performance up-front, and during development we’ve constantly re-assessed & re-tuned the architecture to make it adhere to the goals that we’ve set for it.  We have a performance lead (Paul) who runs a performance “v-team” (virtual team) drawn from the ranks of Managed Languages engineers as needed, and who works with a “performance champ” (Murad), telemetry champ (Kevin), and perf PM (Alex) to oversee the state of our performance on a daily basis. 

This performance v-team has goals that it needs to meet and/or maintain, and these goals are drawn from the metrics of the most recently shipped product.  This v-team is directly accountable to me, Manish, and Devindra (the latter two are our test manager and group program manager, respectively), and the three of us meet with the v-team every week to assess the previous week’s performance efforts and to create goals for the upcoming week.  (We then are furthermore accountable to our upper management for meeting goals – and believe me, they are very serious about it!)  The v-team also work with other teams in Visual Studio to find “wins” that improve both sides, and have been very successful at this.

As with any other product, performance is assessed with respect to two main categories: speed of operation and usage of memory.  Trading off between the two is sometimes a tough challenge (I have to admit that more than once we’ve all thought “Hmm, can’t we just ship some RAM with our product?” :-)), and so we have track a number of key scenarios to help us fine-tune our deliverables.  These include (but are not limited to):

  • Build timing of small, medium, and (very) large solutions
  • Typing speed when working in the above solutions, including “goldilocks” tests where we slow the typing entry to the speed of a human being
  • IDE feature speed (navigation, rename, formatting, pasting, find all references, etc…)
  • Peak memory usage for the above solutions
  • All of the above for multiple configurations of CPU cores and available memory

These are all assessed & reported daily, so that we can identify & repair any check-in that introduced a regression as soon as possible, before it becomes entrenched.  Additionally, we don’t just check for the average time elapsed on a given metric; we also assess the 98th & 99.9th percentiles, because we want good performance all of the time, not just some of the time.

We also use real-world telemetry to check our performance, both from internal Microsoft users as well as from customers.  While automated metrics are all well and good, and very necessary for getting a day-to-day check on the performance of the project, “real world use” data is very useful for understanding how the product is actually running for folks.  When the metric values conflict (for example, on typing), this leads us to improve the automated tests, which in turn makes it easier for us to reliably reproduce any problems and fix them. So, whenever you check that box that allows Visual Studio to send data to us, you are directly helping us to improve the product!

So, hopefully this gives you a bit of an insight into the performance aspects of our work.  In the next post in a couple of weeks, I’ll talk a bit about how language planning works.

‘Til next time,

  --Matt--*

Leave a Comment
  • Please add 4 and 3 and type the answer here:
  • Post
  • Any chance you can expand on the work you did in XBOX on SmartGlass backend ?! Was it using Rosyln or were you doing pure perf tuning/debugging for them due to timeline constraints ? :)

  • Nope, it had nothing to do with Roslyn at all.  I was actually *in* the XBOX org at the time, and I was the dev lead of the team in charge of writing the backend to SmartGlass.  Although I've been on Visual Studio or its antecedents for most of my 19-year-career at Microsoft, I took some time off starting in 2011 to go work in XBOX for 20 months -- I wanted to learn more about cloud programming and also refresh my coding skills which, as a dev manager who was spending most of his time in meetings, were in danger of getting out of date.  It was a really excellent experience, and I took a lot of what I learned back to this job.  (I even have some small amount of code in Roslyn now -- I have more pride in that than it possibly warrants given the size of what I personally did, but coding features is certainly fun whenever I get the chance.)

    Since you're curious:  my team over there -- really smart folks! -- created the system by which existing video manifests from 3rd parties (Netflix, Hulu, etc.) could be augmented with synched scene metadata, as well as other related video metadata, which in turn come from another internal Microsoft team (who get to watch an awful lot of movies... :-)).  The work is performed by leveraging Azure storage (both containers and tables) for the interim stages of processing before it ultimately gets cached away on a CDN for your viewing pleasure whenever you select and view the movie though the XBOX Live front end.  (We also wrote the code for acquiring the video manifests themselves from the 3rd parties, and that was a fascinating experience for me as well -- I learned *tons* about data security in cloud scenarios.)

    --Matt--*

  • Performance is cool, but can you make a post about actual Roslyn features? I have tried the CTP and it felt like a build-yourself-a-ReSharper toolkit - in a good sense. However I have seen some mentions that it will simplify things like meta-programming and DSLs. Can you clarify that? Thank you!

  • That all sounds nicebut I was hoping for some more substantial infos: concrete perf comparsions with the current native compiler, both first compile vs incremental compilation. Parallel linking, etc. perf Improvements in the "infoless time" pre 2013, as hinted at by Dustin Campbell. Internal switch to Immutable Collections, what's going on there with ImmutableArray, what the target goal is with mem usage, what the actual perf "hitchs" are now and what's the goal for the shipping Roslyn. Is there a "good enough" barrier for GC pressure. How is the story going with regards to collaboration with the tooling vendors? We will temporarily pay a double-price as long as R# et al haven't switched, even though in the long run it will all be a much more efficient (shared) tooling infrastructure.

  • Please discuss

     - usage statistics for different parts of VS 2012/13.  

     - compile + link times for very large solutions including how much time is spent waiting on disk I/O.  This includes full compile and link as well as incremental compile and link.

     - raw execution times C# binaries compiled with VS 2010 versus VS 2012/13

     - what new code quality rules are to be included by default in VS

  • So what are the actual performace characteristics of Roslyn?

  • All of the comments seem to be asking more or less the same thing, so I'll tackle them all at once (admittedly, in the annoyingly vague way that I must do for anything that is still under development).

    First off, I can't/won't go into precise timings for benchmarks, because frankly they would be meaningless, given that whatever we show off in any future previews (which I alluded to in my first post late last year) will have different numbers.  (Similarly, I won't discuss feature characteristics for the same reasons.)  Our numbers are good; we want them to be better.  We will *always* want them to be better, even if they are already "good," and we'll continually work towards making them better right up until the end when they pry the bits out of our hands.  As a result, performance work is going on all the time, numbers change daily, and I therefore have no wish to enshrine today's numbers (as opposed to yesterday's, or tomorrow's) as "Internet Truth." In my first Roslyn post, I mentioned that VS QA had signed off on the performance before we did the Big Switch, and therefore you can safely conclude that our metrics were at least as good as what the division was experiencing in their day-to-day coding using the native compilers.  And looking at today's perf scorecard, we seem to be pretty much where we were then (good!).  Beyond that, "deponent saith not."  

    The biggest "hitch" that we (or other software makers) have is how to add value (i.e. new features) without adding time, and it's an issue that plays a heavy role in the decisions that we make around perf.  For example (and purely hypothetically):  let's consider a set of perf metrics at a point-in-time, and call it "A."  Now, assume that we conceive of a groovy new feature that leverages some of the AST information that Roslyn caches, and that we furthermore believe that most users will want it on by default.  By itself, the new feature of course involves some additional coding, and the perf characteristics of it are going to be non-zero (perhaps very small, but still non-zero) -- we'll call that "n".  If implemented, what are the perf characteristics now? Are they A + n?  That seems logical, but perhaps we can find some economies of scale when the two pieces come together, so maybe it's A + n - Intersection(A, n).  Or, if we're smart, it will be even less than A -- because we made improvements elsewhere (in "B," perhaps, or in "A" itself) at the same time.  The latter approach is the best of all worlds and also just the right thing to do -- "never assume old code is sacred" is our motto.  But this sort of thing happens all the time, for each new feature under consideration, which also means that perf numbers are constantly in motion (ideally in a good way) for both old and new code.  (All hypothetically, of course.)

    So, apologies for not being more specific (or, in fact, for being maddeningly vague).  I'll leave you with the reminder that we are working on a preview plan so you can experience it all yourselves firsthand...

    --Matt--*

  • Forget performance, tell us more about features! What are the language extension capabilities?

  • I don't worry about compilation performance for my solution builds (waiting 5 or 10 secs. more is almost the same to me).

    However, performance will be a key piece when using future meta-programming features depending on Roslyn, so users will not have to wait too much for execution of code generated on the fly (formulas, DSLs).

    It would be interesting to know the best (and worst) practices for different Roslyn application scenarios.

  • Just wanted to add that my last post was of course critique on a very high level. The whole Roslyn thing will change C# as we know it, and allow for awesome new tooling experiences.

    Just wanted to say a big THANKS to all the Roslyn guys for all the landslide stuff that will come out of it (perf improvements on average VS experience, better continous testing tooling and Testing as a service along the way, etc.). And for one of the few really "game changing" projects coming out of MS in the last years. If only it were opensourced (never give up the hope, I guess).

  • @Den:  Yeah, I know, the torture of not knowing... :-)  The wait will not last forever.  (Hey, I get it:  being a video gamer, I'm always trying to see what info I can get out of the devs of any forthcoming games, and they give me similarly vague answers.  Roslyn isn't quite the same thing as video games, of course, but at any rate -- I love to write blogs, and so everyone should rest assured that I will blog about anything that I can within nanoseconds of it being possible.)

    @Néstor:  Really good point.  One good purpose of blogs and forums is to educate in the fashion you suggest, and that is in fact something I will be asking members of my team to do as time marches on (i.e., when we can get more specific info out there).  In particular, I will make a note to bring in my perf guru Paul at some point to blog about some best practices w/Roslyn.  (He's quite clever about it; I can always count on him to find really esoteric improvements in any code reviews, perf-wise.)

    @okh397:  No worries; I didn't take it in any bad way!  (And thanks for the kind words!) But these comments were a good thing for me to respond to anyway in order to set some ground rules early on in this blog revival for what I can and can't talk about yet (and why).  

  • Interesting post thanks. How do you classify small, medium, and large solutions? Is it the number of projects, and if so what are the boundaries?

  • In other words, it runs like a slug

  • In other words, it runs like a slug.

  • A test case to add if you don’t have it.

    Open a very large solution, and then open lots of files in it.    Then minimise VS and run something else that uses lots of RAM so VS gets paged out.   Then exit VS and see how many of its pages have to be reloaded as part of the existing.

Page 1 of 2 (20 items) 12