Everything you want to know about Visual Studio ALM and Farming
Brian Harry is a Microsoft Technical Fellow working as the Product Unit Manager for Team Foundation Server. Learn more about Brian.
More videos »
You've seen me write a fair amount recently about the VS 2010 Beta 2 performance problems - well, you're going to see me keep writing about them :) We've had enough time, at this point, to understand the feedback and characterize the problem - I'd like to share it with you in hopes of you understanding it and perhaps learning something that can help you avoid the same issue in the future.
The irony of the whole thing is that in this product cycle, we had a bigger concerted focus across the division on performance than we had ever had before. From the very beginning we focused on defining scenarios, goals, regression tests, etc. We went into this product cycle knowing that performance was going to be a challenge. The adoption of WPF for some of our UI elements and a new editor certainly were among the highlights of features likely to lead to issues. We wanted to head off performance problems from the start.
As I look back, there were many things that lead to the situation that we currently find ourselves in. However, if I were to pick one that was the most impactful, I'd say it was the way we measured and goaled performance.
Prior to this release, our performance efforts were hindered by regressions and unreliability of measurements. We set out this product cycle to fix that problem by having every team create a clear list of scenarios and automated tests to measure them (TFS, for instance, has about 150). We focused on making sure the tests were repeatable and had very small standard deviations so that if a test showed a significantly different time than expected, we had a strong reason to believe there was an actual regression to investigate. We set up automated suites that would run some of the tests on virtually every checkin, some every day and some every week. We call the effort RPS (or Regression Prevention System). It was to be our canary in the coal mine.
Clearly it didn't work as well as we had hoped. Why? My analysis is this... It order to get a very reliable, very repeatable set of regression tests, we had to spend a lot of time refining the tests (almost the whole product cycle). We iteratively removed randomization and focused the tests so that we got consistent runs every time. The result is a set of very precise tests that test individual features very well. The problem? No one actually uses the product one very isolated feature at a time. The reality is when you load up a real world application, you have 5 editor windows open, 3 forms designers an architectural diagram, you've just finished debugging and you now hit "Go to definition" on a symbol - it's totally different than what we were testing. I learned 2 things from this exercise about how to ensure your performance testing is measuring the "real world":
So you might ask, don't you use the product? Why are you just relying on these "microbenchmarks". Yes, we do use the product. After we started getting the Beta 2 feedback about performance, we did a survey of internal users (just within DevDiv) on satisfaction with the Beta and ship readiness. 70% of respondents said they were dissatisfied with VS performance (that's more than twice than the 30% of external respondents who said the same thing). So we had the data that it wasn't good enough but we weren't listening to it. Why? I think we made up all kinds of stories. We said it was "long memory" - people's impressions were tainted by the performance of Beta 1. We said it was taint from their development environment (most devs actually use components that they build themselves - not just ones from the official build lab; after all, we do build the IDE :)). We said it was lag - all the great performance improvements we were working on just hadn't made it into all of the branches yet, etc. etc.
Clearly a learning to be had there too.
There are many other issues too, including:
1) Performance hardware - We didn't do a great job ensuring that we were validating on an appropriately wide range of hardware. For example, we weren't looking at netbooks at all and if you look at my intellisense videos from yesterday, you can see that Beta 2 netbook performance was abysmal. There were other hardware related issues - for example, we found that WPF hardware accelerated performance varies dramatically depending on the quality of the video card and is sometimes slower than software rendering.
2) Carving out room for new technologies - Anytime you add something it's likely to take more resources. You have to figure out how to make room for it. We did not do a good job of this in this release. In Windows 7, the Windows team had some really good practices around this. For instance, there was a rule that if you are going to add anything to boot (CPU, Disk, Memory, etc) you must find enough savings elsewhere to pay for it before you are allowed to check in. There was a strict "no-growth" rule in key scenarios. Definitely a practice that we'll be looking at.
3) Coordination across the division - In all aspects of our work we struggle with the tension between letting individual product units run their own business and coordinating efforts centrally across the division. It's clear to me that in this release coordination was not good enough but it's very tough balance. We'll be spending some real time thinking about this in the next release.
4) I'm sure there are more we'll learn as we continue to reflect on this product cycle. In the meantime, we will continue working to ensure a top quality product when we ship.
So what do we know about the performance issues we are having? We've categorized them as follows:
Virtual Memory Exhaustion - Large solutions and lots of feature usage are causing virtual memory (you've got 2GB on a 32-bit OS) to fill up and for VS to become unstable and crash. It's not exactly a performance problem because it doesn't generally cause VS to slow down, however we treat it as one because it's a resource exhaustion problem and many of the things you do to address it also improve actual memory usage and therefore improve performance too.
Leaks - Allocation and retention of memory that is never used again and accumulates over time. It ultimately leads to Virtual Memory Exhaustion but can also cause working set and other issues due to heap fragmentation that keeps the unwanted memory in the workingset. I also like to separate this out because the way I think about it is different. With Virtual Memory, you are balancing trade-offs and it's an optimization problem. With Leaks, it's a zero tolerance policy. No leaks are acceptable in any circumstance. The way you test for them is different, etc. Beta 2 had a LOT of leaks.
Performance - Specific scenarios where the application doesn't perform according to user expectations. This can be due to any bottlenecked resource (CPU, memory bandwidth, network bandwidth, disk I/O, etc). The two most common underlying causes are innefficient algorithms (too much CPU usage) and too large a working set (more memory used than the physical RAM can accomodate, resulting in thrashing pages in and out to the disk). Based on the feedback, we've identified a number of areas of common performance complaints, including Editing/Intellisense, WPF Designer, Debugger, and Project Load.
Yesterday, I blogged about performance gains in editing/intellisense (including before and after videos). Over the next few weeks, I hope to do a post every couple of days on our efforts and the results of our performance work. Stay tuned and hopefully it will be both entertaining and informative. Since Beta 2, we have had some tremendous improvements and I'm confident we're going to ship with good performance and high customer satisfaction, but it has certainly been a bit of a call to action for us.
Brian
Brian you rock !!
Thats the kind of actitude that we love !! the dev leaders to acknowledge an error and to explain the problem, learn from it and ensure the satisfaction of hundred of developers :)
Working with beta 2 on motherboards with integrated video card was a pain for us simply to change the current doc tab !! we opened a 2 documents (small plain classes) and switching between them took near a second, so thats become crazy, and not is the first time, if u are working on two documents and change fast between them the beta 2 seams to don`t speed up or cache nothing at all to speed up document switching
Thanks again for hearing us
Marcos
Brian,
Very insightful article, thank you for writing it. It shows a very human side to the creation of software from Microsoft.
We all go through the maniacal complexity that is software development and face issues from time to time, owning up to it and explaning the pitfalls are extremely helpful to developers like me.
Looking forward for more stuff like this from you.
Speaking of the Win7 team, they mentioned their PerfTrack tool (at the bottom of the post @ http://blogs.msdn.com/e7/archive/2009/02/26/some-changes-since-beta.aspx) so they are driving their performance work based on real telemetry data from real systems instead of (as you mentioned) microbenchmarks and isolated scenarios.
Asking people if they're happy with perf is a good start, but is VS gathering perf data on specific actions from real-world installs (for instance, of 2010 Beta 2?). If not, is this something that's being worked on, even if for Vnext?
I would also like to add that you have a problem obtaining stats from customers who would like to help.
After your posts requesting feedback on Beta2 performance, I reported on a GUI performance scenario. A QA engineer followed up with me and via several emails I was able to give more thorough details of the problem. Then the engineer requested I run a statistics gathering tool and send the results back - I was happy to assist, but the 100K employee corp I work for runs XP and this stats tool doesn't work on XP. #fail.
I've heard nothing since.
It is interesting to learn about your limited test hardware. Our large corp, like many corps, use Full Disk Encryption - a software based disk encryption to help prevent data loss through theft of laptops. This has many implications for application performance, especially when paging. My experience is that Microsoft don't run Full Disk Encryption on a standard loadset. Thus, I have always suspected this is a performance scenario you don't test for.
You describe the performance tests used as 'Microbenchmarks' but they are really more like Unit Tests. These sort of tests are very focused on what they test versus 'Scenario Tests' which look at end-to-end scenarios. I believe you instrument VS builds to send mouse/keystroke level data back home. I'm sure it would be possible to use the same data as a driver, running actual customer scenarios through a VS nightly. This is akin to web browser perf testing which obtain various size corpus' of URLs and get the browser builds to visit these URLs, all the while checking metrics for perf regressions.
We need to work on our telemetry. We currently have 2 forms - Watson (for crash diagnostics) and SQM (for feature usage statistics). We don't have real performance telemetry and yes, that's definitely something we'll be looking at for the next version.
Rich, I'll look into where we dropped the ball. We have followed up with a lot of people. It's not a very efficient way to do things but it's what we've got for now.
Thank you so much for the honesty and clarity Brian. There is no doubt that the v4 CLR and BCL are wonder things - and naturally we all want the IDE to surface those in a manner in which they deserve. I think perhaps Windows 7's stella performance has set new expectations when it comes to the performance of software from Microsoft. This can only be a good thing for all parties involved in the long run - even if in the short run it only serves to highlight the painful truth that VS2010's IDE *is* distractingly slow on occasion.
I'm sure with folks like Rico on the team you can turn this situation around, and I would imagine VS2010 is turning into a huge learning lesson for the teams building it. My slight worry is that VS2010 turns out to be your "Vista" release - i.e. one where a very few customers understand the depth and important of the changes - but most just have a negative experience. Hopefully by RTM you can make it your "Windows 7" version...
I love writing software, you folks make the best tools, I can appreciate and totally support where you want to take Visual Studio (and am in this for the long term) - I do just hope you can deliver a experience that will be appreciated by the majority before Visual Studio 2012.
This is a fantastic post. I was worried that VS2010 was going to become a Vista - just like Tom :-( . BTW Trying to add items to the toolbox makes VS go into an absolute spin. At the same time Toolbox search is really nice
We are all developers.
As Developers, we all make mistakes all the time.
THANK YOU FOR YOUR HONESTY !
What I don't understand is the following: with every piece of code you write these days, you've to realize that modern hardware is capable of running that method without any delay, our hardware these days is that fast.
if you write a method which takes 2 seconds, you've to go back and look whether you made a mistake somewhere: check the algorithm (did you use a proven algorithm with a low O?) and profile the implementation. If it's the absolute bottom line, nothing can be done, but in general, one will find it's easy to find bottlenecks this way.
the idea behind this is actually pretty simple: even if the user triggers a call chain of many methods, it still is ran fast, because the separate elements are fast.
However, today, most developers have some kind of acceptance tick: they accept that some piece of code requires 2+ seconds to run while it should be done instantly. This leads to sloppy code which might even run 'quick' (read: 0.5/1s) in isolation but which will obviously hog down everything if multiple times such a method is called in a chain.
What I also don't understand is that you can claim you'll ship a highly performing VS.NET at rtm in march, yet after B2 you learned there are many performance problems. MS doesn't fix many bugs after the 2nd beta before RTM, as every bug might delay the product and MS has almost always favored releasing on time vs. fixing a bug and delaying vs.net, at least in the past (e.g. 2008 sp1 is a good example: it's shipped really short after 2008 RTM)
I do hope you'll make vs.net 2010 a better performer than 2008 is, but I also hope that MS realizes that we need honesty, not marketing. If vs.net 2010 needs 6 more months, use 6 more months.
Excelent post ... :D
Great post. Thankyou for you honesty, Brian. Good luck.
Thanks for doing this. I would like to add that the performance issues are not just on underpowered systems. VS 2010 is noticeably laggy on my quad-core system compared to VS 2008. The lag is not a deal breaker (sometimes on the order of milliseconds) but it greatly affects the overall experience, especially if you are used to the more snappy VS 2008. Even small things like the top menu dropdowns are laggy. I hope you are able to improve this before release.
Marcos - I'd like to learn more about the issues you're having with integrated video cards. Can you e-mail us at DEVPERF@Microsoft.com?
RichB - We've been trying to find who worked with you and dropped the ball. Can you e-mail me personally at David.Berg at Microsoft.com so we can follow up?
The performance profiling tool we've been sending out is very limited on XP because XP doesn't support getting call stacks for events, Vista and Win 7 do. However, there are other ways of getting performance information on XP, and I agree we shouldn't just stop because of OS limitations.
I've noted your concerns about testing with encrypted hard disks for consideration when we start planning the next update to our performance lab.
Abe - Can you send more information about the toolbox performance issues you've been seeing to DevPERF@Microsoft.com?
Frans - You're dead on about how performance problems can seep into a product, especially one as large and complex as VS. For what it's worth we're taking a LOT more complex, risky fixes post Beta 2 than we normally would, because we really want to improve performance as much as we reasonably can.
Dan - VS is usually pretty snappy on my quad core boxes (but admittedly not always). I'd like to learn more about the problems you're seeing. Can you e-mail us at DevPERF@Microsoft.com?
All - Testing on a wider variety of configurations is, as Brian said, something we're going to have to fix. We thought we could measure disk io, memory, and other factors and as long as we helad the line on those, we'd be okay. But this ignores the human factor - which is that it's really hard to get a developer to care about an extra 100 disk IOs unless we can tie it back to real world customer performance issues. Otherwise it's just treated as noise. The way we have to solve that is by making sure we do have the real world performance numbers to show that those extra disk IOs do constitute a real customer problem.
Regards,
David Berg
Performance Engineering
Please include virus scanner in your “real world” testing.
-----------------------------------------------------------------
I have in the past worked a company that had the virus check setup to do a complete scan of all file types (regardless of extension) each time the file is access (including just asking for the last modified date)
Trying to get IT to let you change the virus checker setup is like hitting your head on a brick wall.
(In the above company the system was being ported from Unit to Windows and the IT department did not touch the Unit machines, so windows was always blamed for the slow speed not the IT department)
So the first block read from a file can cost a LOT more in real life then in the lab!!
Please include storing all FILE on a network file server in your tests.
I have worked for more than one company in the past that would not let you store ANY files on your local disk. Therefore all code files, the project files and even the “bin” directories had to be stored on the file server.
Therefore the latency of file access is a lot more and file watchers don’t work very well to detect when code files have changed.
I know that using a file server is not a recommended setup, but ales developer studio shows an error dlg when you try to load a project from a file sever, IT departments will not change their policies. After all it was Microsoft that told them to stop users saving local data to reduce the cost of managing PCs and all users MUST be treated the same.