Welcome to MSDN Blogs Sign in | Join | Help

This blog is no longer being updated or maintained, but it is archived for future use.  Please direct all WPF performance inquires to the WPF forum:

 http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=119&SiteID=1

Thanks,

Tim

Windows Vista is going to be available to the world on January 30th -- only 4 days away.  Check out this short launch video we did a couple weeks back.  We had a blast doing it -- it's a 50% ridiculous, 50% funny, and 100% excitement about the hands-down best release of Windows to date.

 Part 1 (50 seconds): http://soapbox.msn.com/video.aspx?vid=ffdde496-ed01-4c91-af8c-6f857081ba59

 Part 2 (1m 28s): http://soapbox.msn.com/video.aspx?vid=bc193ff7-9587-4216-9b0b-6899baae8fec

Part 3 (1m 46s): http://soapbox.msn.com/video.aspx?vid=109c5c05-2fac-4597-9130-fdf448bcd1c5

If you’re like the many folks who have asked us for a 'list of performance suggestions', your prayers have been answered.  My colleague & friend Kiran Kumar, has partnered with the brilliant folks on our SDK team (Lorin in particular), to document as much WPF perf knowledge as they could come up with.  Due to the high demand, Kiran has pre-released this tome of knowledge to his blog ahead of the final RTM SDK release.

This document is extremely impressive -- it's the most authoritative summary of WPF performance best-practices to date.  Read this, internalize it, and apply it to your applications and they will run faster. 

Read the full document here.

Cheers!

Since I posted my initial article on command-line profiling, I've found a number of folks copy/paste off of it fairly often, with moderate success.  If you don't want to know all of the 'why's' and just want to get your profile, run the the following steps and copy/paste the commands.

1.  Install VSTS Team Suite or VSTS Team Edition for Developers, and open a command prompt.

2.  Setup your environment.  Copy/paste the following 4 commands into the command window.

set PATH=%PATH%;"C:\Program Files\Microsoft Visual Studio 8\Team Tools\Performance Tools"

VSPerfClrEnv.cmd /sampleon

vsperfcmd /start:SAMPLE /output:MyOutput.vsp

vsperfcmd /globaloff

3.  Launch the profiler.  You have 2 options -- attaching or launching.  Use 'b' to measure startup costs, otherwise, you should normally use'a'.

   a.  Attaching (to measure anything after startup). 

VERY IMPORTANT - Launch your application from the -command window- before running the following 2 commands.  If you don't, managed code won't be profiled.

vsperfcmd /attach:MyTest.exe

vsperfcmd /globalon

   b. Launching (to measure startup time)

vsperfcmd /globalon

vsperfcmd /launch:MyTest.exe

4.  Now that profiling is enabled, perform the action you want to measure. 

5.  Stop profiling, shutdown the profiler, and generate your VSP.

vsperfcmd /globaloff

vsperfcmd /shutdown

6.  Close your application.

7.  Pack symbols.  (Make sure your in the directory where your VSP lives, which is the same directory 'vsperfcmd /start' was issued from)

xcopy /E  "C:\Documents and Settings\timothyc\My Documents\Visual Studio 2005\Projects\MyTest\MyTest\bin\Release\*.PDB" .

vsperfreport /summary:all /packsymbols MyOutput.VSP

 

 

 

Background

 

Over the past few months we’ve been getting feedback that manipulating 3D collections can be slow.  It turns out there is more that one way to skin a … collection (horse references omitted).  In this article I profile many of the collection manipulation options for you, so you can make better choices when manipulating these things.  In addition, you should’ve noticed significant speedups after the May CTP: accessing the property system is 50-100% faster since the Feb. CTP.  That really matters when you’re, for example, trying to change Point3D values every 16.667 milliseconds (ms) (1000 ms / 60 frames per second).

 

Why 3D collections you ask?  Why not 2D?  After all, WPF is an integrated platform whose collections, 3D, 2D, or otherwise, are all pretty much the same.  The answer is two-fold.  First, 3D collections tend to be much larger.  We’re talking about tens of thousands of points, while 2D geometries often contain … tens.  So 3D collections stress the system a bit further.  Second, cool, spiffy 3D effects such as mesh deformations are often calculated at framerate in response to CompositionTarget.Render.  So not are we only changing tens of thousands of points, were trying to do it in a 16 ms budget.  That’s when the cost of manipulating collections becomes critical.

 

Finally, let me point out that WPF isn’t designed to be a game engine, or support that associated level of coolness (read: detail) we’ve all grown to expect in games.  What does that really mean to you? If this sounds uncomfortably close to your app, you need to profile & make sure that WPF can respond acceptably to the frequency of change requests you’re pushing into it.  If it can’t, it’s time to scale back either frame-rate or mesh complexity.  With that said, in addition to these optimizations and the work we’ve done making this scenario faster since the Feb. CTP, I can see more speedups happening in future versions.

 

Populating Collections

 

It’s Turtle vs. the Hare.  Meet the Turtle.

 

   const int ARRAY_SIZE = 100000;

 

   MeshGeometry3D mesh1 = new MeshGeometry3D();            

 

   for (int i = 0; i < ARRAY_SIZE; i++)

      mesh1.Positions.Add(new Point3D(1.0, 1.0, 1.0));

 

Profilers are our friends.  An instrumentation-based profile shows this code takes 196M cycles to run.  Compare that to adding the same number of points to a dynamically growing List<Point3D> (12.6M cycles), or a pre-sized List<Point3D> (1.4M cycles) and you’ll start to understand why people have been telling us about this.  At first glance, it appears to take 15.5X longer to populate a WPF collection than the equivalent dynamically grown List<Point3D>.

 

So, what the heck is WPF doing that’s taking all of this time?  Here’s the breakdown:

  • MeshGeometry3D::ctor: 80M cycles
  • Point3DCollection.Add: 68M cycles
  • get_Positions: 46.9M cycles

MeshGeometry3D::ctor

Out of the 80M cycles spent here, 77M cycles are spent creating static objects like Point3DCollection.Empty.  This cost is incurred only the first time the class is instantiated in this process, so it can be ignored (or easily worked around, by creating a 'warm-up' object at startup).  There is also a one-time cost associated with setting a certain property type for the first time, so that cost should be factored out as well when measuring results.

 

get_Positions

This is an easy cost to optimize.  Move get_Positions outside of the loop.

 

Point3DCollection.Add

You can't see it from the profile, but most of the time in this function is spent growing the collection.  By pre-sizing the collection we can avoid most of this cost.

 

Optimized Collection Population.  Hare, enter stage right.

 

   // Instantiate a ‘warm-up’ MeshGeometry3D, set its

   // Position and TriangleIndicies properties.  Then

   // run the following code.

   MeshGeometry3D mesh1 = new MeshGeometry3D(); 

   // Populate a collection not yet connected to mesh1,

   // pre-sizing it ahead of time

   

   Point3DCollection collection = new Point3DCollection(ARRAY_SIZE);

 

   for (int i = 0; i < ARRAY_SIZE; i++)

      collection.Add(new Point3D(1.0, 1.0, 1.0));

 

   // To reduce working set, call collection.Freeze here before adding

   // the collection if your not planning on changing it

 

   mesh1.Positions = collection;

 

Total cost of this implementation is 3.9M cycles, down from 196M cycles.  This is only 2.7X slower than an pre-sized List<Point3D>, it's faster than an Point3D[] array, and has more functionality than either (e.g., changed notifications).  Here’s the breakdown:

  • Point3DCollection.Add: 3.6M cycles.  A 13.8X speedup due to not resizing the array.
  • Point3DCollection c’tor: 222.0K cycles.  Most of this time is spent pre-allocating the array
  • Point3DCollection.set_Positions: 103K cycles
  • MeshGeometry c’tor: 3.9K cycles

Finally, another option is to populate a List<> and the pass it to the IEnumerable collection constructor (i.e., public Point3DCollection(IEnumerable<Point3D> collection)).  Today this takes about 50% longer than calling .Add because of the IEnumerable cost. 

 

Modifying Collections

 

Slow Collection Modification

 

Just like populating a collection, there is a slow and fast way to modify collections.  The difference between modifying a WPF collection and modifying a CLR collection is realizing that changed notifications are involved.  You can get significant wins by managing these notifications optimally. 

 

Consider this sample of modifying this MeshGeometry3D, which exists within the simplest Visual tree: a single GeometryModel3D set on a ModelVisual3D within a Viewport3DVisual.

 

   Point3D point;

   Point3DCollection positions = mesh2.Positions;

 

   for (int i = 0; i < positions.Count; i++)

   {

      point = positions[i];

      point.X += 10;

      point.Y += 10;

      point.Z += 10;

      positions[i] = point;

   }

 

The author was being ‘smart’, in that the .Positions accessor was pulled outside of the loop.  Running this code on 100K points takes 6.6M cycles.  Here’s the breakdown:

  • Point3DCollection.set_Item: 5.5M
  • Point3DCollection.get_Item: 984K
  • Point3DCollection.get_Count: 154K
  • Point3DCollection.get_Positions: 1.1K

Why is set_Item so much more expensive than get_Item?  Because of changed notifiers of course – 4.43M cycles are spent fire changes in this simple Visual tree.  And this tree is extremely simple -- the more complex the tree, the more elements that will be involved in Changed notifications.

 

Optimized Collection Modification

 

In the following code sample, we ‘unhook’ the collection from the tree before modifying it:

 

   Point3D point;
   Point3DCollection positions = mesh2.Positions;

   mesh2.Positions = null; // Unhook the collection

 

   int count = positions.Count;

  

   for (int i = 0; i < count; i++)
   {
      point = positions[i];
      point.X += 10;
      point.Y += 10;
      point.Z += 10;
      positions[i] = point;
   }

   mesh2.Positions = positions; // Hookup the collection

 

This code runs in 836K cycles, a 7.9X improvement over the original code.  Here’s the breakdown:

  • Point3DCollection.set_Item: 391K
  • Point3DCollection.get_Item: 269K
  • Point3DCollection.set_Positions: 114K
  • Point3DCollection.get_Count: 57K
  • Point3DCollection.get_Positions: 3.3K

Summary

 

There is only a small handful of things you need to do to optimize WPF collection performance.  When populating WPF collections, pre-size the collection, move accessors outside of the loop whenever possible, and factor out startup costs when measuring.  When modifying collections, disconnect them from the live tree before changing them.  And as always, freeze collections you don't need to modify later on.

 

Here is the raw data for various collection operations:

 

Type

Method

Cycles (Millions)

Populating 100K Items

 

 

Point3DCollection

Unoptimized

196.00

Point3DCollection

Using IEnumerable ctor

7.00

Point3DCollection

Optimized

3.90

Point3D []

Initially sized

5.40

List<Point3D>

Not initially sized

12.60

List<Point3D>

Initially sized

1.40

 

 

 

Int32Collection

Unoptimized

144.49

Int32Collection

Using IEnumerable ctor

1.80

Int32Collection

Optimized

1.23

Int32[]

Initially sized

1.50

List<Int32>

Not initially sized

0.69

List<Int32>

Initially sized

0.08

 

 

 

Changing 100K Items

 

 

Point3DCollection

Unoptimized

6.60

Point3DCollection

Optimized

0.84

Point3D[]

N/A

4.20

 

 

My colleague Henry Hahn & I have recorded a primer on WPF performance for all to see.  Available on MSDN TV as of yesterday, we discuss the approaches that we take when optimizing WPF applications. 

Make sure and check us out!

A couple days ago, an article was posted on the DevX portal discussing WPF graphics & performance (my favorite two subjects).  You can find it here here .  Definitely worth a read, especially since Mr. Woodruff mentions this blog!  

It’s always really interesting to how the external press views our platform, both positive and otherwise.  One point made in the article is how powerful a seamless marriage between designers (via Expression ID) & developers (via WPF) will be when it comes to making great-looking applications.  This is so true.  After spending a few years looking at apps that take advantage of this platform, it almost hurts to looks at the lifeless applications of yesteryear.  

For example, just last night I went over to CNet downloads and grabbed a copy of IronTrainer 2.1.8.  I’ve been getting back into strength training and wanted to track my lifts.  From a getting-stuff-done point of view, this app rocks.  I can create completely customized routines, track my progress with bar graphs, scatter graphs, 3D bar graphs, etc.  But the UI was obviously built by a software guy, in some VB6-like, primordial development environment.  And compared to even simple WPF apps, boy is it ugly.  All the controls are grey, static, without any flow, transitions, animations, and definitely lacking whiz-bang effects.  Like Morgan Spurlock on the 10th day of his McDonald’s diet, using the concrete-toned UI mess of stock buttons started to depress me.

That's all going to start changing with the introduction of WPF.  If this was built using our platform, a few hours in Expression and this app could be amazing.  You can’t blame the app developers for the lack of aesthetics – with the tools that existed, these were the types of apps that got built.  But with WPF and Expression ID in-hand, this class of comatose UI design will start fading away.  Get your computers ready for an extreme software makeover, folks.  The tides of good UI design is ripe for a change!

 

 

 

I’ve been getting a lot of the same performance questions over the last few months regarding the WPF graphics model, so I thought I’d post some responses for everyone to see.  This should shed some light on what WPF does and doesn’t offer in terms of fast vector graphics. 

 

It's likely that I'm going to be porting this to an MSDN article soon, but wanted to let you guys see it first, and get your feedback before it goes live.  Have any more questions you'd like to see answered about the graphics model?  Let me know, and they may make it into an MSDN whitepaper!

 

WPF is a retained scene, with a separate rendering thread, which has its own set of data.  My scenario can’t afford all of this overhead.  Why not expose an immediate-mode API?

 

That’s a great point.  The way I like to think about this is to consider what existed before WPF.  There was GDI & GDI+, which are software-based immediate-mode APIs, and Direct3D, the immediate-mode API for hardware rendering.  Today, nothing’s changed there.  All of these APIs still exist, and are still tools in your development toolbox.  For example, if you’re going to write the newest 3D first-person fast-action shooter, you’re going to want to pull out that Direct3D sledge hammer.  Using GDI+ or WPF to write a 3D game might start to feel like breaking concrete with a screwdriver.

 

What we’ve done with WPF is provide an entirely new set of possibilities around retained-mode graphics programming, which simply haven’t existed before.  This is a brand new class of tool, one that frees developers from thinking about pixels, painting, and vector processing.  You describe your scene using high-level constructs (or heck, even a designer tool like Microsoft Expression Interactive Designer), and we’ll worry about the rest.  On top of that, just having a retained scene allows us to expose cutting-edge features that persist beyond an immediate-mode call, like animation & DVD-like ‘just-hit-play’ video controls.  Do I hear any fans of MediaElement out there? 

 

Finally, one thing our team has learned about scene processing is whether or not it’s ‘affordable’ tends to be scenario dependent.  For scenarios which aren’t affordable, my team’s entire job is to continually drive those costs down.  Understanding how WPF performs in your scenario is key.  I heartily recommend prototyping the operations you’re concerned about using WPF, GDI+, and/or Direct3D, and then profiling, to see where the cards lie.  Rico Mariani, the CLR performance architect, calls this process budgeting and it’s a great practice to get into.

 

So why are we shipping a retained-mode API in V1 instead of an immediate-mode API? 

 

WPF is a platform for developing Windows applications, and if you look at what 99% of Windows apps need, it’s a retained scene. In fact, all applications which use immediate-mode APIs (even games) still have some sort of retained scene for generating those immediate-mode calls.  It’s just more specific to their application.  For most folks, you don’t want to spend your time worrying about scene processing, rendering loops, and bilinear interpolation.  There are higher-level fish to fry.

 

What do the remaining 1% of apps look like, who needs an immediate-mode API anyway, and how do I know if I’m in that bucket? 

 

The short answer is, applications that spend nearly all of their time rendering, such that it makes sense for them to invest in a super-performant graphics pipeline.  Graphics-intensive apps like games, CAD, or complex game-like 3D visualizations come to mind.  For these folks, Direct3D is the way to go. 

 

But I’m a GDI/GDI+/WPF programmer working on SuperAmazingCAD 4000, and Direct3D doesn’t provide me with the 2D painting model I’m used to.

 

For this class of folks, software-based GDI+ or hardware-based Direct3D is all you use to be able to choose from.  If GDI+ software rendering wasn’t fast enough, then your only other choice was Direct3D.  In reality, to be successful, you’re probably going to have to break out a couple geometry processing books and use Direct3D.  But that’s the way it’s always been.

 

With that said, WPF is a brand new option, and with its introduction, there are a whole new set of scenarios that WPF supports without going directly to Direct3D.  For the remaining scenarios that can’t afford GDI+ or WPF, what you really need is something Microsoft has never done before – a hardware accelerated 2D immediate-mode API .  This is a great feature request, and if this is your scenario, please give me the details!  As usual, we can’t make any promises, so don’t bet the bank on it happening soon.  But when it comes to new features, there are few things more motivating than scenarios you provide us with, especially if there’s performance data clearly backing them up.

 

WPF allows objects to be used multiple times, creating graphs which are notoriously problematic when it comes to performance (e.g., many other graphics platforms disallow them).   What’s the deal?

 

Simply put, we found that people like to re-use objects.  This is why the Freezable pattern exists.  We tried other patterns that copied objects more prevalently instead of reusing them, but quickly realized that they are extremely difficult to use (anyone remember the Changeable pattern from early builds?) .

 

One simple & prevalent example of the need for object re-use is SolidColorBrush.  If a scene is using 200 black brushes, it just doesn’t make sense for 200 instances of the brush to exist, especially since reusing a SolidColorBrush has no performance impact other than using less memory. 

 

To give you more context as to why re-use can be expensive, consider that all rendering WPF does eventually goes through Direct3D.  If you look at the Direct3D API, you’ll quickly realize that text, paths, rectangles, and gradients don’t exist.  All you really have are points (x,y,z) specified in vertex buffers, and either solid colors or bitmaps (aka textures) that are ‘stretched’ across those points.  Higher-level primitives such as geometry and brushes are a huge value-add that GDI+ and WPF bring to the table.

 

To get from the rich API WPF exposes to the lingua franca of Direct3D, intermediate representations are often created.  For example, it’s no mystery that WPF renders text into bitmaps, which are then handed off to Direct3D.  If these representations are specific to a certain scale factor, or pixel position on the screen, then re-using an object in different places means different intermediate representations are created.  Very often only a single representation is cached (for memory & scalability purposes), so re-using can mean these intermediate representations are re-created more often than they would be in single-use scenarios. 

 

Summarized, the primary cost involved with multi-use is the possibility of invalidating cached intermediate representations, if the particular object happens to use them (logically, APIs which look more like Direct3D APIs will use intermediate representations less heavily).  I used text as an example because this is a multi-use performance issue we recently found in a profile.  The good news is we’ve been working on text improvements since the last CTP, so look for better performance there in future releases.  Optimizations for multi-use scenarios is a pattern that I see continuing over time.

 

With the Freezable pattern, we’re given you the option of whether or not to re-use an object.  You can even opt out of the overhead involved in tracking Changed notifications by freezing it.  You control the scene, so, if using an object multiple times becomes a problem (as evidenced by profiles, or multi-use vs. single-use comparisons), create multiple copies.

 

Adding seemingly simple tweaks (e.g., clipping, bitmap effects) to our scene causes us to fall back to software, and software rending in WPF is slower than GDI+ software rendering.

 

First, the WPF software rendering code is derived from the GDI+ codebase. There are certain limits to what can be accomplished in hardware, and we have to work around what the hardware vendors give us.  As graphics hardware evolves, those limits are likely to become better over time.  If at least some portion of your scene is rendered in hardware, the cost of rendering is already going to be faster than it was in GDI+.  Finally, we shipped a tool at the PDC called ‘Perforator’ to help identify where software rendering occurs.

 

Some WPF features, such as gradients, must be rendered in software and then copied into texture memory on the video card.  With many Windows applications running, doesn’t this create a new class of bottlenecks around copying textures to the video card?

 

Ever since our first public preview, fewer and fewer features are being rendered in software.  That’s a trend I expect to continue.  For example, only a very small portion a linear gradient is rendered by the CPU.  Most of the pixels contained within a linear gradient are calculated by the GPU (for tier-2 cards, radial gradients are also done on the GPU).  In addition, compact WPF vectors are a new option for content which has been traditionally rendered using bitmaps.  The result will be applications which use fewer bitmaps, and thus less video memory, in favor of scalable vector graphics.

 

Finally, with Vista, the WDDM gives us virtualized video memory.  This helps make sure the most important textures are kept in video memory via an efficient memory eviction mechanism -- something we’ve never had before. 

 

For more on when to use Direct3D vs. GDI+ vs. WPF, see Pablo Fernicola’s March 28 blog article.

For more on the WDDM, see Greg Schecter’s April 2 blog aritcle.

 

 

Just in case you haven’t noticed, Beta2 of the Windows Presentation Foundation went live on 5/23.  And since the February CTP, we’ve been doing tons of great performance work. Many scenarios, especially 3D applications & apps using 1000's of Shapes, will see significant improvements.

 

With that said, if you’re having a performance problem, don’t wait for it to ‘get better’.  Performance optimization tends to be fairly scenario dependent.  We have made many, many scenarios orders of magnitude faster.  How much faster your scenario will be depends on how much your scenario overlaps the scenarios we’ve been looking at.  If your scenario is slow, profile it and find out why.  Then let us know.  With a profile in hand, we can tell you whether or not we’ve been looking at your problem already, or if there’s something else you can do to get better perf.  Typically, both are true.

 

Without further ado, here are a few highlights of the performance work done since the February CTP:

 

-         TextBlock is 10-20% smaller, and text selection is faster

-         The Path mini-language (e.g., Data=”M10,10….”) now uses StreamGeometry internally.  See my previous posting on the startup & working-set benefits of StreamGeometry.

-         Ink rendering is significantly faster, with even more improvements being looked at post-Beta2.

-         Shape working-set has been improved by at least 56 bytes/instance.

-         Software rendering of bitmaps & gradients has improved.

-         A number of scalability improvements in shape-heavy scenarios have been checked in.  E.g., loading 50K Path’s is 40%+ faster.

-         3D lighting has been moved to the GPU, for cards supporting Shader Model 2.0.  This is huge news for many 3D apps, especially ones that use lots of specular components. I’ve seen render-bound 3D apps improve 35%+. 

-     Plus, many more incremental fixes.

 

Beta2 is only the tip of the iceberg.  There is still a lot of great more work coming your way.  Want to make sure it's going to impact your scenario?  Profile it at let us know what's slowing you down!

 

I've 3 new tricks to add to a few of my earlier postings:

1.  In my batch file for VSTS profiling I use sleep.exe.  This isn't on a few of my machines, so you may not have it either.  A quick web search shows this is part of the Windows Server 2003 resource kit.  You can download it here (11.8 MB).

2.  Also included in that download is kernrate, a free command-line tool that can profile both kernel & user-mode time.  This is a sample-based profiler.  Take a look at at some example output on this blog entry I found.

3.  When moving VSP files between machines symbols stop resolving, even when the symbol files are copied directly to the 2nd machine.  This is annoying, so to avoid that hassle, make sure to always run this command before copying a VSP file.  In fact, this would make a nice batch file:

REM CopyVSP.CMD

vsperfreport /summary:all /packsymbols %1

copy %1 %2

I've had trouble getting symbols to resolve using _NT_SYMBOL_PATH, so you can save yourself some headache by just copying all the symbols to the same directory as the VSP file before running vsperfreport.  By the way, vsperfreport is in the same directory as the other VSTS profiling tools (e.g., vsperfcmd, vsperfclrenv):

REM CopyVSPAndSymbols.CMD

REM %1 is the symbol path, %2 is the VSP file name, and %3 is the destination

xcopy /E %1 .

vsperfreport /summary:all /packsymbols %2

copy %2 %3

 

I just found out something neat VSTS tells you how many samples were taken in kernel mode. Unfortunately, it doesn’t tell you where in kernel mode that time was spent.  But, it does tell you if you need to be thinking about kernel-mode time, which makes it an even better tool to use on the front line of your perf investigation.

 

For example, when profiling a WPF app, this kind of info is super-helpful if you suspect a foul video driver is causing your app to behave badly.  I've been investigating a couple of those issues recently, and found Intel VTune & AMD's CodeAnalyst to be helpful for narrowing down kernel time, or at least, which module is taking time.

 

 

 

Recently I’ve been reading up on Rico Mariani’s blog.  What a gem!   One especially profound article he’s posted talks about the challenges in promoting  good performance culture.  First-rate performance doesn’t happen by accident.  Like having a solid feature set, it must be deliberate.  This applies whether you’re writing a new platform as large as WPF, or the single developer working on a smaller project.  

 

What sets performance apart from feature development is how that happens.  Features are tangible pieces of functionality that are designed and implemented.  In some ways, good performance in less tangible, and is almost always the by-product of a good performance culture.   If you want a blazing fast app, you have to start by cultivating the performance way of thinking.  It isn’t until you start setting performance goals, writing your applications with a performance mindset, and habitually (or obsessively) profiling your performance scenarios do you start to make progress. 

 

You can read Rico’s full article here: Buckle up: it’s going to be a bumpy ride

I've gotten a few questions about other profiling tools to use when profiling WPF applications.  I'm putting this on my 'things-to-do' list, but if anyone has experience with using other profilers against WPF, I'd love to hear about it.  In the meantime you can checkout this Wikipedia article on Performance Analysis, which has a list of 10+ profilers that I plan to try out at some point.
Pablo posted a great article over on his blog describing what a hardware-accelerated WPF means to your app.  There's also a interesting overview of the various places you can hook into the WPF stack.

Over in the forums a handful of folks have pointed out that profiling WPF is less than seemless, and can be super-frustrating.  Even with profiles in-hand, sometimes there isn't a bunch non-WPFers can do without some guidance.  One of my personal goals is to change that, and we'd like your feedback on what you'd like to see in the future.

First, let's take a inventory of what tools you have today:

- Sample/Instrumentation-Based profiles.  These are great if you work at Microsoft, or it's fairly clear where the time is going.  But if it isn't, you need our help to sort through the problem.

- Perforator.  Except for a handful of specific problems, this tool tells you more about what your bottleneck isn't, than what it is. It's great for diagnosing over-invalidation & software rendering, but if that's not your problem, your left with the profile.

Here's some new ideas we've been tossing around:

Adding / exposing more ETW events, & providing more documentation about what they measure. We use ETW a lot internally because it's very automatable.  ETW is great because it can 'roll-up' CPU usage into understandable chunks (e.g., by reporting the time spent in specific layout passes, rendering a rectangle, resolving bindings, etc.).  Couple that data with documentation about what the event is measuring, and you almost have a insiders' view into the system.

Expanding Perfortator to detect 'known' bottlenecks & causes.  My biggest beef with Perforator is that it doesn't tell you where time is going, it tells you where it isn't going.  Still, when it can identify a bottleneck, it's worth it's weight (or byte-count) in gold.

Profile walk-throughs where different system costs are explained, with a focus on how to optimize them.  This would provide more of a contextual understanding than a direct diagnosis, but having context can really help.

I'd love to hear any other ideas you might have.  Feel free to send me a mail with what your thinking.

More Posts Next page »
 
Page view tracker