Visual Studio 11 Beta Performance Part #3

Visual Studio 11 Beta Performance Part #3

Rate This
  • Comments 24

Welcome back to the 3rd and final part of the Visual Studio 11 Beta Performance series. This week’s topic is Debugging. As I mentioned in the 1st post debugging is a key component in your continuous interaction with Visual Studio and we heard from you that the compile, edit and debug cycle had hiccups and was sluggish. I would like to introduce Tim Wagner, from the Visual Studio Ultimate Team, whose information below describes the work done to improve the Debugging experience for you.

Visual Studio provides an array of features across the entire application lifecycle, but it’s still the core “compile-edit-debug” loop that tends to dominate a developer’s workday. Our goal is to ensure that these central experiences work well so that a developer can stay in the zone, focused on coding and debugging in a tight cycle. In this post I’ll describe some investments we’ve made in the build and debug part of that cycle (collectively referred to as “F5”).

First, the punch line: We’ve made significant performance improvements in the Beta release! Take a look at this video comparing //BUILD and Beta C++ F5 experience side by side:


Video Link


When we shipped the Visual Studio toolset at //BUILD, we focused initially on getting a functionally complete F5 experience in Visual Studio for Metro style apps, knowing that we still had work to do on this scenario to improve its performance. Between //BUILD and Beta we pulled together people from across many teams at Microsoft – Windows, C++, C#/VB, debugger, Project & Build, XAML, and others – to create a virtual team focused on making this scenario perform as quickly as possible. In this post I’ll take you inside the work we did, illustrate our progress between the //BUILD and Beta releases, and shed light on the kinds of changes and tradeoffs we made to improve your experience with the tools. Although we’ve made performance, memory, and responsiveness improvements in other areas (and other application types) with respect to building and debugging, this post is specifically focused on the F5 experience for Metro style apps on Windows 8.

Let’s start with some background on exactly what F5 entails. We usually think of the F5 key informally as a shorthand for “start debugging”, but depending on the state of your application, it can involve an entire range of activities:

  • Build (compiling for managed and native code and linking for native code, plus pre- and post-build steps for all languages)
  • Deploy (get the application into a state where the operating system can launch it; this can include remote transfer when doing remote deploys or simulator start up if simulation has been selected)
  • Register (the first time a given application is launched it has to be registered with Windows 8)
  • Activate (actually launching the process inside the application container)
  • Change UI mode (switch Visual Studio from the editing window layout to the debugging window layout)
  • Start and attach the debugger (including loading symbols for the executable)
  • Execute the process (until a breakpoint, exception, or other stopping condition is encountered)

The breakdown of time among these activities depends on several factors; registration, for example, only needs to be done once unless the application’s manifest changes. The language (JavaScript, C#, VB, or C++) has an obvious impact as well, as does the “temperature” of the scenario – whether a build has been performed, whether code has changed, and so forth. We defined four different scenarios across the four languages to come up with the following matrix for performance analysis:

 

Full

1 Line

No Change

F4 (Refresh)

C++

X

X

X

 

VB

X

X

X

 

C#

X

X

X

 

JavaScript

X

X

X

X

Note that the F4 scenario only applies to the JavaScript case; more on this below.

 

We defined the three temperatures as follows:

Cold – The first F5 of an app, including its registration. This will include a full build and deploy.

Warm – A one-line change after the cold F5 has completed. Warm is faster than cold because builds are incremental rather than full, the app is registered, various caches are primed, and so forth. Warm is the “sweet spot” where we expect most F5’s to occur as people iterate in tight compile/edit/debug iterations, and so it’s the most important of the three scenarios.

Hot – Hitting F5 again with no intervening change. Hot is typically faster than warm because a fast up-to-date check is performed that bypasses build entirely for native and managed code (and in a future release, JavaScript).

To test the languages, we used the Grid template that ships with Visual Studio to find the first level of performance issues. Using a small example like this has two primary advantages in the initial testing phases: It’s small enough to capture and view the sampling traces easily, and fixed overhead tends to dominate (the rendering of the UI mode shift in Visual Studio is a good example; animated transitions, which were present in //BUILD and then removed for Beta, is another). On the downside, small examples like this fail to uncover breadth, end-to-end, and scalability issues. To catch those, we also look at larger test cases with both synthetic and real world apps, including the Zune and Bing apps you can download from the Beta app store. Real apps are the best proxies for actual developer experience, and synthetic load testing helps us find non-linear problems of scale – for example, we found an issue where we were accidentally turning warm cases cold by deploying more than what actually changed, behavior that stands out with larger apps and especially by plotting deployment times against app sizes.

As you’ve heard in previous posts, we use several methods to do the actual performance measurements, including an automated bank of performance analysis machines that we call “RPS” (Regression Prevention System) which can track detailed performance results over time for scenarios like the table shown above. We also use this lab for manual performance testing to “compare apples with apples” when we look at changes or want to compare detailed perf traces for different approaches. Here’s a graph of trends over time between //BUILD and Beta for the matrix above, minus the “F4” case, using our lab machines:

image

C++ has the longest F5 times of the four languages, due in part to its nature (include files, link step), so I’ve broken it out to keep the graph above more readable. The yellow bars represent performance targets for the native case.

image

One thing you can notice from both graphs is that the trend over time is towards improved performance, with occasional regressions. Some of the regressions are intentional – for instance, we fixed several functionality (correctness) bugs where more time was consumed in order to get all the cases correct. Temporary regressions are usually the result of integration issues, when two more pieces of code reach a common branch but have unintended interactions with one another.

Deep Dive into C++

In this section I’d like to provide more insight into the C++ case, how we approached improving its //BUILD-era performance to arrive at Beta, and what we’re continuing to do to improve.

When we released //BUILD, we knew that we had some low-hanging fruit in terms of performance improvements. We used Windows ETW sampling (which will be the basis for all profiler sampling in Visual Studio 11) to collect traces and formulate a coarse grained breakdown of the time. Here’s an example of the cold and warm breakdowns for C++ as they ended in Beta:

image

image

Using diagrams like this, you can quickly apply some Amdahl’s Law analysis – cold C++ F5 is dominated by the full build time. In the //BUILD version of this graph that was even more true. As a result, improving other elements won’t have much of an effect until/unless building gets faster, so this is where we’ve focused our improvements.

Between //BUILD and Beta we improved the C++ build by pre-compiling headers for C++ templates and – most significantly – by switching to “lazy” wrapper generation. At //BUILD, wrapper functions were generated for all methods of a WinMD type in every file in which that type was referenced, leading to redundant code generation, larger object files, and longer link times. For Beta, we changed the generation policy to only do work if a method is actually referenced, greatly reducing the number of generated wrappers and resulting in faster builds and smaller PDBs. We’re continuing to look at compiler improvements here that could make this even faster, such as generating wrappers only for the first reference.

XAML Compiler Changes

Compiling the XAML UI language and its interaction with the host language’s type system (either managed or native) is a significant part of the overall build pipeline. For both managed and native, one of the changes that improved performance was reducing the amount of up-front type information loaded by the XAML compiler. In fact, of all the system assemblies, our traces showed that only four contained types that are typically referenced by Metro style apps. Loading the remaining assemblies only when required (a type lookup fails) reduced startup time for the compiler without changing the semantics of type referencing. In addition, XAML files refer repeatedly to the same types; we took advantage of this fact by adding some additional caching to make subsequent references faster and to reuse type information between compiler passes.

Managed Metro style projects benefited from an additional change to the XAML compiler: change detection. The output of the first pass of the compiler (*.g.i.cs or *.g.i.vb files) is used by VS for IntelliSense and does not need to be generated unless the XAML files are modified. To exploit this, a quick “up to date” check was implemented that skips this step entirely whenever possible. This change also speeds up other scenarios, such as repeatedly tabbing to/from an unchanged XAML file.

Even with these improvements, XAML compilation in the Beta release still dominates small and UI-intensive Metro style native and managed app builds. To make further improvements (especially for native, where the compilation times are slower), we’re currently investigating ways to checksum user types, so that we can skip the XAML codegen step entirely when the surface area of the user’s code is unchanged (i.e., the user’s changes affect the bodies of methods, not the shape of types). We’re also continuously looking at the quality of the T4-generated code for XAML, and welcome feedback on its layout and implementation.

F4

One of the performance issues we needed to tackle for Metro style app development was making targeted changes to JavaScript apps and seeing those changes reflected immediately: Web developers rely on the browser’s refresh functionality when making changes to their apps, and we wanted a similar experience while working on JavaScript-based Metro style apps for Windows 8. In particular, we received feedback from //BUILD (and from our internal users) that the stop debugging / edit / build / launch / attach cycle was too heavyweight, especially by comparison to the typical Web experience. We wanted to focus on making sure that cycle was as fast as possible, but we also wanted a model that mapped more closely to the browser – thus the Refresh Windows app feature (“F4”) was born. The following video shows F4 in action:


Video Link


The Refresh feature is available on the Debug menu in Beta builds and is bound to the keyboard shortcut Ctrl+Shift+R. (We expect to change this to F4 in future releases depending on settings selection, hence its nickname.) F4 speeds up the edit/debug cycle by removing the need to shut down the application whenever changes are made to HTML, CSS or JavaScript. The easiest way to think about the feature is as a ‘fast restart’ where the app will be reloaded and then reactivated, without Visual Studio exiting the debugger; this prevents the window layout changes that generally happen when the debugger restarted, along with removing the need to re-launch the host process and reattach the debugger. In most cases F4 will only take a few hundred milliseconds to complete vs. around a second for a full restart of the debugger (in other words, this is about half the time of our current warm F5).

You may be wondering why you would ever not use F4. The answer is that you should use it whenever possible since it’s generally faster, but there are certain situations in which F4 is not available. You can change HTML, CSS, or JavaScript content, but adding new files, changing the manifest, modifying resources, or adding new references (along with some other, less common, cases) are not supported. When these situations occur, attempting F4 will bring up a dialog indicating that the pending changes prevent F4 and a conventional F5 is needed.

We’re very excited about introducing F4 with the Beta release and look forward to hearing your feedback! Note that there are a few cases in Beta, which will be fixed before we release, where F4 may not activate the app – the workaround is to use F5, which is always available.

Deployment Types – Remote and Simulator

The default deployment in Visual Studio is to launch the application on the same machine that Visual Studio itself is running on. (We refer to this as “local” deployment.) There are two additional debugger targets: remote, which is useful if you want to deploy to specialized hardware for testing (such as ARM devices, where you cannot run Visual Studio), and simulator, which can be used to test rotation/resolution options, location, and other settings that are difficult or impossible to vary on the host machine when deploying locally.

Remote and simulator deployments share the bulk of the F5 steps with local deployment, so whenever we make build (or register or activate) faster, we improve remote and simulator deployments as well. These deployments also have unique performance issues: remote can be dominated by file copying “over the wire, while simulator involves a large amount of startup activity, including logging in the current user for a domain joined host machine. This can lead to a long overall F5 the first time the simulator is used, even if the rest of Visual Studio is “warm” with respect to the F5. While we were unable to eliminate all of the simulator overhead outright (its implementation is based on remote login), we were able to parallelize simulator start with the build step; on most hardware, this has the effect of hiding much of its startup time. Ironically, languages with longer build times (e.g., C++) therefore seem to have faster simulator startup than languages with faster build times (e.g., JavaScript), because a greater percentage of the simulator’s cold start time can be done while the build is happening. The following table illustrates the differences between local and simulator-based deployment for the test matrix; you can see that warm and hot scenarios (where the simulator is already running) have sub-second overhead for the simulator deployment, while cold scenarios (where the simulator is being started) vary by language depending on the length of the full build step.

Scenario

Local

Simulator

Difference

cpp Full

23.9

25.8

1.8

cpp 1 line

4.6

5.3

0.7

cpp no change

0.5

0.8

0.3

cs Full

4.4

6.0

1.6

cs 1 line

2.7

3.2

0.5

cs no change

0.5

0.8

0.3

js Full

2.9

6.4

3.5

js 1 line

0.9

1.4

0.5

js no change

1.0

1.4

0.4

vb Full

4.6

6.3

1.8

vb 1 line

2.8

3.3

0.5

vb no change

0.5

1.0

0.5

Telemetry

Internal performance testing, even with real apps, is ultimately a proxy for the experience of real developers. Telemetry helps us understand what happens after we ship. While the Windows 8 Consumer Preview / Visual Studio 11 Beta is still in its early days, we already have some data flowing back. For F5, one of the best indicators is the data that tracks overall deployment times; this doesn’t capture all of the F5 user experience, but it includes many of the expensive portions (build, deploy) and it helps us discover if we’ve missed any cases. The chart below shows the data we’ve received to date, showing the distribution and cumulative population for bucketed deployment time. Around 89% of deployments take 2 seconds or less. The data also confirms that the warm and hot paths are working as intended, since the majority of deployments are sub-second (and hence not cold). Of course this is still early data, from a population likely dominated by smaller apps and examples, so we will continue tracking this along with your feedback.

image

We hope these and other improvements we’ve made in Beta are providing you additional productivity gains and making the experience of developing Windows 8 Metro style applications a great one. Please help us continue to improve through your feedback and suggestions!

clip_image002Tim Wagner – Director of Development, Visual Studio Ultimate Team

Short Bio: Tim Wagner has been on the Visual Studio team since 2007, working on a variety of areas, including project, build, shell, editor, debugger, profiler, and – most recently – ALM and productivity areas like code review, diagramming, and IntelliTrace. Tim has been involved in performance improvement crews for the past two releases and continues to look for ways to make Visual Studio a better user experience. Prior to joining Microsoft he ran the Web Tools Project in Eclipse while at BEA Systems, and has been involved in several startups and research organizations over the years.

The Last Word

As mentioned above this brings to a conclusion the Performance Series on the Beta of the Visual Studio 11.  I greatly appreciate you taking the time to read and comment on these posts.  Even though this is the last of the series let me reiterate that we are always open for feedback as hearing from you is critical in knowing where you feel we still need improvement.  As always I appreciate your continued support of Visual Studio.

Thanks,
Larry Sullivan
Director of Engineering

Leave a Comment
  • Please add 3 and 6 and type the answer here:
  • Post
  • You write "You may be wondering why you would ever not use F4. The answer is that you should use it whenever possible since it’s generally faster, but there are certain situations in which F4 is not available." and "When these situations occur, attempting F4 will bring up a dialog indicating that the pending changes prevent F4 and a conventional F5 is needed."

    I guess I'm missing something, but why not just put the new F4 experience as default on F5 then (in case of JavaScript)?

  • In response to comment from Erik Olofsson

    Thanks for your feedback. We are investigating the performance issue you reported on inspection in Expression evaluation windows.

    Also it would be great if you can report this issue through connect bug with details on specific types you are having issue with inspecting, this would help us in our investigation.

    Thanks again,

    Visual Studio Debugger team

  • Hello Erik Olofsson,

    It would be great to log the C++ build peformance issue through connect. Also, feel free to contact me directly through aymans at microsoft dot com

    Thanks in advance for reporting the issue!

    Thanks,

    Ayman Shoukry

    Lead Program Manager

    Visual C++ Team

  • Reposting my comment, it just never appeared last time:

    Does C++ edit-and-continue work with the /FI compiler switch now? I haven't been able to test this yet. In previous versions of VS you just get an error about an invalid character. This would certainly help my debugging productivity!

  • Adam/jalf: There was a lot of material to convey on Metro style apps alone, so we concentrated on just that aspect in this blog; as you point out, there is a wealth of information on any one language (more than I could include in a single post); Ayman mentioned some of the other improvements for native compilation and their intent to follow up with more details there, and we’re always interested in hearing what the community would benefit from hearing more about. There are also further Metro F5 improvements coming post Beta; I wanted to primarily focus in the post on what you could get your hands on today, but we just completed some XAML refactoring that improves native Metro app compilation further (dropping our standard build test from 25 to 18 seconds) and have a fast up-to-date check already implemented for JavaScript that improves that language’s ‘hot’ scenario. Reporting on our progress here doesn’t mean we’re satisfied with performance in Beta – we intend to keep pushing for as many perf wins as we can get in Dev 11.

    Jos: Totally understand your point – Given the importance of a brand new API and experience for Windows 8, we wanted to help those who had used the developer tools at //build understand what had changed in Beta and why. For non-Metro scenarios we also have a number of F5 and build improvements arriving in Beta, including VM footprint reductions (described the first post of this performance series), improved responsiveness, out-of-proc build for managed code, and the native compiler improvements Ayman mentioned.

    Sandy: Yes, this is as close to an apples-to-apples comparison as we can get – same machine, same project. Of course there are differences in the IDE, CRT, OS, etc. since one is //build and the other is Beta.

    Tim Wagner

    Microsoft Visual Studio Ultimate Development Director

  • My biggest gripe with debug startup time is the amount of time spent downloading debug symbols.  I love having symbols enabled for debugging but jeez, can't the symbol downloads be delayed until they're actually needed e.g. in a stack trace when I actually breaking into debugging and the callstack is visible?  Even though I have both a local cache and our company has a source server proxy setup that caches, it still feels like I'm not benefitting from these caches.

  • Jeff2,

    You ask a good question - we intentionally did *not* make F5 perform an F4 (refresh) automatically, for a few reasons.

    Most importantly, F4 (refresh) is destructive -- it will destroy all of the state you may have already entered in the application, unlike Edit and Continue, which does its best to maintain state. This would mean that the user experience of adding a comment to your source, adding a space, or even touching a file w/o actually making any changes would lead to the equivalent of a restart.

    There was also the concern about overloading the meaning of F5 - It could mean 'start', if you’re not debugging yet, 'continue', if no changes have been made to the app and you’re at a breakpoint, or 'refresh', if you’ve made changes (at a breakpoint or not).  We were concerned that multiplicity could be confusing, and that users would want to make explicit decisions regarding refresh; however, we're definitely open to hearing opinions on this as people start to use the functionality.

    Finally, there was a point-in-time concern with Beta: While we’re confident that we’ve caught the majority of cases where refresh should be disallowed, it’s possible that there are cases we haven’t considered. At the moment, the fallback for the developer is simply to restart their app.

    Tim Wagner

    Visual Studio

  • @Jeff: The code bases are different (different runtimes and designers, different process architecture) so "porting" performance improvements from Metro style designers to WPF or vice versa isn't straightforward, although there are exceptions (XAML parsing) and conceptual equivalents in some cases. One change we made recently to improve the WPF experience was to only compile dirty XAML files for what we call "IntelliSense builds" (compilations that essentially fund the interactive experience within the VS editor). For a sample project with 25 large XAML files, tab switching/save time reduced from ~10 seconds to under 2. I expect this optimization to be visible in the next public release.

    @Keith: We've got some improvements to the symbol loading experience teed up that didn't make Beta; very much looking forward to feedback on whether they address your concerns, so please check in again after the next Visual Studio update after Beta!

  • We recently switch our production from Visual Studio 2005 to Visual Studio 2010. Link time is faster but it is still to slow. I think that large native C++ developer suffer a lot of slow link time. Please improve it or at least fix the incremental linking for large project.

    I recall my post on connect: connect.microsoft.com/.../optimize-link-time-in-real-developers-environment

    "... Yes we are using incremental linking to build most of our projects. For the biggest projects, it's useless. In fact, it takes more time to link those projects with incremental linking (2min50 compared to 2min44). We observed that it doesn't work when size of ILK files is big (our biggest project generate an ilk of 262144 KB in win 32).

    Bellow, I list others things we tried to reduce link time:

    * Explicit template instantiation to reduce code bloat. Small gain.

    * Partial template specialization to decrease the number of symbols.

    * IncrediLink (IncrediBuild give interesting gain for compilation but almost no gain for link).

    * Remove debug information for libraries who are rarely debugged (good gain).

    * Delete PDB file in « Pre-Build Event » (strangely it give interesting gain ex: 2min44 instead of 3min34).

    * Convert many statics library to DLL. Important gain.

    * Working with computer equiped with lot of RAM in order to maximize disk cache. The biggest gain.

    * Big obj versus small obj. No difference.

    * Use secret linker switch /expectedoutputsize:120000000. Small gain.

    * Maximize Internal linkage vs External linkage. It's a good programming practice.

    * Change project options (/Ob1, /INCREMENTAL, Enable COMDAT folding, Embedding manifest, etc.). Some give interesting gain other not. We try to continuously maximize our settings.

    * Separate software component as much as we can afford. You can then work in unit test that link fast. But we still have to integrate things together, we have legacy code and we worked with third party components.

    Note that for all our experimentations, we meticulously measured link time. Slow link time seriously cost in productivity. When you implement complex algorithm or track difficult bug, you want to iterate rapidly this sequence : modify some code, link, trace debug, modify some code, link, etc...

    Another point to optimize link time is the impact it have on our continuous integration cycle. We have many applications that shared common code and we are running continuous integration on it. Link time of all our applications took half the cycle time (15 minutes)...

    In thread blogs.msdn.com/.../linker-throughput.aspx, some interesting suggestions were made to improve link time. On a 64 bits computer, why not offering an option to work with file completely in RAM ?

    Again, any suggestions that may help us reduce link time is welcome. I can give more details on our experimentations if needed."""

Page 2 of 2 (24 items) 12