WPF performance and .NET Framework Client Profile related blogs provided by Jossef Goldberg.
WPF applications are known to have slower coldstart time. Below are some suggestions and ideas that could help you to improve your WPF applications startup time in general and coldstart time in specific.
1. Understand Coldstart vs. Warm start.
Cold startup is when your application starts for the first time after a reboot or if you start an application, close it, and then launch it again after a long period of time. When an application starts up, if the pages required (code, static data, registry, etc) are not present in the OS memory manager's standby list, disk access is needed to bring those pages into memory, causing page faults also known as hard faults.
In the warm startup scenario, most of the pages for the main common language runtime (CLR) components are already loaded in memory from where the OS can reuse them, saving expensive disk access time. This is why a managed application is much faster to start up the second time you run it.
2. Improve perceived performance
· Make the application main window appear as soon as the user double-click on the application’s icon, when possible perform do all other initialization after.
· Consider adding un-managed splash screen to show before the WPF app start. This will present the user with some UI and improve perceived responsiveness. See Splash Screen To Improve WPF Application Perceived Cold Startup Performance blog for an example that demonstrates how this can be done.
3. Analyze start up code
· Determine what the reason is for your slow Coldstart, maybe I/O is not even the main reason.
One simple test is to launch your WPF application right after reboot and see how long it takes to display. If all subsequent launches of your app (warm start) are much faster, your coldstart issue is indeed because of IO. In some of my WPF test apps, I noticed a ~8 to 10 sec coldstart time decreases to < 1 sec warm start time, a significant improvement. On Vista, you can expect SuperFetch service to further improve the coldstart time depending how often the user launches your app.
Note that before you do thee tests, you need to verify that no other apps or services that are on your Windows startup uses manage or WPF code. Some of the tool available will be disused below.
If your application coldstart issue is not IO related, it is very likely that your application is doing some sort of lengthy initialization or computation, waiting for some event to complete or if the app is not NGEN’d a lot of the application code must be JIT at startup .
· Minimize the use of external resources, such as network, web services, or disk.
· Analyze how many modules are loaded and how your app is going to access configuration data (files on disk, the registry, and so on). Re-factoring your code by removing some dependencies or by delay-loading modules.
4. Load fewer Modules at Startup, remove unnecessary references.
· Use tools such the Process Explorer (see here) to determine which modules your applications loads. Other useful tool is Tlist (see here). Tlist <pid> will show all the modules that loaded by a process.
For example, if you are not connecting to the Web and you see System.Web.ni.dll loaded, it means there is a module in your application referencing it. Check if that the reference is actually necessary.
· If your app have multiple modules, merge them into one (this applies only if you have control over them). In terms of the CPU, assembly loads have fusion binding and CLR assembly-loading overhead in addition to the LoadLibrary call, so fewer modules mean less CPU time. In terms of memory usage, fewer assemblies also mean that the CLR will have less state to maintain
5. Avoid unnecessary initializations
a. Consider postpone other initialization code till after main application window renders
a. Remember that initialization could be performed during class constructor and if that code references other classes, it can cause a cascading effect where a large number of class constructors are executed.
6. Consider avoiding Application Configuration
· For example, if app has simple configuration requirements and has strict startup time goals, registry entries or a simple INI file might be a faster startup alternative.
4. Place Strong-Named Assemblies in the GAC
If an assembly is not installed in the Global Assembly Cache (GAC), you will pay the cost of hash verification of strong-named assemblies along with native code generation (NGEN) image validation if a native image for that assembly is available in the machine. The strong name verification is skipped for all assemblies installed in the GAC.
5. Measure and consider NGEN your app
There is a subtle tradeoff here. Using NGEN means trading CPU consumption for more disk access, since the native image generated by NGEN is likely to be larger than the MSIL image (~3x for 32-bit, and ~6x for 64-bit)
To improve Warm startup it is always recommended to NGEN your app, it simply avoids the CPU cost of JIT the app code.
In certain coldstart scenarios NGEN can also be helpful because it the JIT compiler (mscorjit.dll) does not need to get loaded. However as mentioned above these gains may be subtracted by loading larger NGEN-ed assemblies so you need to measure the effect on coldstart in your own app.
Having both NGEN and JIT modules can have the worst effect, this is because mscorjit.dll needs get loaded and when the code is JIT-ed, a lot of pages in the NGEN images need to be accessed when the JIT compiler reads the assemblies metadata.
It is also important to understand the effect of rebasing which is discussed here in more details.
The way you plan to deploy your application can also make a difference. ClickOnce app deployment does not support NGEN. So if you decide to NGEN your application you will need to use other deployment mechanisms such as MSI.
7. Avoid Rebasing and DLL address collisions
If you use NGEN, you need to be aware that rebasing could occur when the native images are loaded in memory. If a DLL does not get to load at its preferred base address (because that address range is already allocated to another module or allocation), the OS loader will load it wherever it sees fit which could be a very expensive operation.
You can use VAdump to check if there are modules where all the pages are private. If so the module might have been rebased to a different address and thus its pages cannot be shared.
More details on how to set the base address in the NGEN article.
8. Understand Authenticode
Authenticode verification will hurt startup time.
Authenticode-signed assemblies need to be verified with the CA. This verification can be time intensive, as it can require hitting the network several times to download up to date certificate revocation lists, and also to ensure that there is a full chain of valid certificates on the way to a trusted root. This can translate to up several seconds delay while that assembly is being loaded
Consider install the CA certificate on the client machine or avoid using Authenticode when possible.
If you know that your application doesn't need the Publisher evidence, you don't want to pay the cost of the signature verification.
In .NET Framework 3.5 there is a configuration option that allows bypassing the Authenticode verification. This can be done by adding adding the following lines to the .exe.config file:
More information is available here as well as on this blog.
KB936707 discuss how you can also enable this in .NET Framework 2.0
9. Windows Vista vs. Windows XP
SuperFetch is the memory manager in Windows Vista. It is not built-in to Windows XP. SuperFetch analyzes memory usage patterns over time to determine the optimal memory content for a given user and works continuously to maintain that content at any given date or time of day. This differs from the pre-fetch technique used in Microsoft Windows XP, which preloads data into memory without analyzing usage patterns. Overtime, if end-user uses the WPF application frequently, the coldstart time of a WPF app could improve.
SuperFetch also automatically recognizes and uses any additional capacity afforded by nonvolatile flash storage devices enhanced for ReadyBoost and ReadyDrive. SuperFetch in effect improves the performance of the storage layer of the computer’s memory. Click for more info on superfetch and prefetch.
9. Understand the impact of AppDomains
· Load Assemblies as Domain Neutral if possible to ensure that the native image, if one exists, gets used in all AppDomains created in the application
· Enforce Efficient Cross-AppDomain Communication.
Reduce cross-AppDomain calls, when possible calls with no arguments or with simple primitive type arguments offer the best performance.
10. Use NeutralResourcesLanguageAttribute.
Use NeutralResourcesLanguageAttribute to tell the ResourceManager what the neutral culture is and avoid unsuccessful assembly lookups.
· If you must use serialization, it is better if you use the BinaryFormatter class instead of the XmlSerializer class. BinaryFormatter is implemented in the Base Class Library (BCL), or mscorlib.dll. XmlSerializer is implemented in System.Xml.dll, which might represent an additional DLL to load in some scenarios.
If you must use the XmlSerializer, you can achieve better performance if you pre-generate the serialization assembly.
11. Understand the implications of ClickOnce.
- If application uses ClickOnce, to avoid network calls on startup, consider to configure ClickOnce to check the deployment site for updates ‘after’ the application starts and not ‘before’. - Depending on your scenario, consider using the ClickOnce Deployment APIs to first only download the app modules you need for your startup. Postpone download the larger modules of your app to a later stage. See more detail here. - If you use XBAP, understand that ClickOnce will check the deployment site for updates even if the XBAP is already in the ClickOnce cache.
12. Understand the PresentationFontCache service.
The first WPF application starts this service. Although the service which caches the system fonts improves font access and overall performance, there is the general overhead in starting the services. In certain controlled environments, you may want to consider configuring the services to start automatically on machine reboot.
13. Data Binding. Instead of using XAML to declaratively set the DataContext for the main window, consider switch to setting it programmatically in the Application.OnActivated.
14. Use BAML vs. XAML.
The default behavior of Xaml in Visual Studio /Blend projects is to be compiled into a combination of IL and BAML (a binary version of the XAML ). This makes code integration (of a .xaml.cs/.xaml.vb file) easier with XAML. The IL generated wires events and wires up fields so that you can refer to named elements via code in your code behind file.
Sometimes, using XAML in this precompiled way doesn’t work for people.
When not to use XAML:
· Sometimes, people attempt to dynamically generate UI on a server with an .aspx/.php page, like they are used to doing with html. This causes a few problems, such as:
1. Perf of XAML interpretation is slower than perf of BAML interpretation, since BAML was able to pre-tokenize much of the markup, it is faster to load.
2. Code integration with XAML isn’t as easy. You can use FindName to find names, wire your own events though. Ideally, if the server could send data to a rich app, instead of the complete UI described in XAML , the app can dynamically generate the right UI (via code or data-binding)
When OK to use loose XAML :
· Sometimes, people want to load/save some data to a XAML file (they effectively use it as a configuration file).
· Sometimes, people want to load/save some data to a XAML file that serves as a file format for their application – the users different documents are represented in XAML. Sometimes that is XAML that has WPF tags in it. Often, it may be a custom XAML Vocabulary…a custom vocabulary created by creating a set of CLR types that work well with XamlReader/XamlWriter
· Visual Inheritance – today you can’t use markup compilation on a XAML file if the root element was generated/defined by a XAML file. This is currently blocked in BAML codepaths. You can do something like this by using loose XAML to define the UI for the first UserControl and the second user control, and using XamlReader.Load instead of Application.LoadComponent (which is used in the .g.cs to load the BAML file).
In general, BAML is better because:
· It is pre-tokenized
· It is about the same speed as JITt-ed code when building UI
· It doesn’t need to load System.Xml.dll – loading less code for your process is good.
See more here: http://www.windows-now.com/blogs/rrelyea/archive/2004/01/31/2306.aspx
15. Avoid serialization during startup and use SGEN tool for XML Serialization
When you use an XmlSerializer, a helper assembly with a custom type for serializing your type is created. The C# CodeDom is used to generate this code, which then is compiled by the C# compiler. The runtime must then load this generated assembly, and JIT the code. Additionally, a separate assembly is generated for each type you create an XmlSerializer for, each of which requires an invocation of the C# compiler. This causes a large performance hit that should be avoided (not only at startup, but in general).
With .Net Framework 2.0 there is a new SDK (Sgen.exe) that create serialization helper assemblies ahead of time which greatly improves performance.
Bill Wert's blog discuss this in more details.
I should add that Visual Studio has a UI that may be confusing. The UI below will generates serialization code only for the XML Web service proxy types an not for all your app custom type.
Related documents and more details are provided in the following links:
· How to improve startup: http://msdn.microsoft.com/msdnmag/issues/06/02/CLRInsideOut
· Improving .Net application performance: http://msdn2.microsoft.com/en-us/library/ms998530.aspx
· Optimizing WPF application performance: http://msdn2.microsoft.com/en-gb/library/aa970683.aspx
· All about NGEN performance: http://msdn.microsoft.com/msdnmag/issues/06/05/CLRInsideOut
· How to investigate memory issues: http://msdn.microsoft.com/msdnmag/issues/06/11/CLRInsideOut
· Vista accelerators and superfetch: http://www.microsoft.com/whdc/system/sysperf/perfaccel.mspx
· Measure perf in Vista: http://www.microsoft.com/whdc/system/sysperf/Vista_perf.mspx
Below is a list of the performance improvements that you can expect to see in WPF in .Net 3.5 and .Net 3.0 SP1:
Layered Window performance was one of top customer complaints. They are now addressed with a QFE/Hotfix.
These hotfixes are not available for direct download. End users should call Microsoft PSS to obtain them.
Developers and OEMs can contact Microsoft PSS to discuss redistribution rights.
The hotfixes will be included in the next service pack of Microsoft OS’s
For Windows Vista:
§ See: http://support.microsoft.com/kb/938660
§ Hotfix is forthcoming in Vista SP1 (fixes are in milcore.dll)
• The Hotfix improves the read-back of video memory to system memory and therefore the degree of noticeable improvements will depend on certain machine characteristics, such as the video bus speed, the video bus type (e.g. expect better performance on PCIe vs. AGP bus), etc
For Windows XP/Server 2003:
§ Layered Windows was rendered in software, which had performance issues. The hotfix addresses this. On XP layered windows will render using hardware acceleration (if the hardware supports it). If so, you should be able to see significant performance improvement with this hotfix.
§ See http://support.microsoft.com/kb/937106/en-us. (fixes are in d3d9.dll)
§ Hotfix is slated in the upcoming Windows XP SP3
Other Layered Window improvements:
.Net 3.0 SP1 fixes a performance issue that is related to user switching.
A Layered window app running under another user profile caused higher CPU usage. This is now fixed for both XP & Vista.
Improvements should be noticeable in applications that bind to an XML source.
This improvement mainly relates to how WPF reacts to changes in the XML source tree.
We made small incremental improvements to cold and warm startup time.
We made incremental improvements to working set and hit testing under certain scenarios.
6. New Software rendering API
As reported (see here), depending on the machine configuration and the application, software-based rendering is sometimes faster than hardware.
A new API is now available to allow developers to force software rendering in his/her application (per window) instead of using the GPU.
This should provide developers a much better alternative than setting the global ‘Disable HW Acceleration’ registry key (See: http://msdn2.microsoft.com/en-us/library/aa970912.aspx)
Here is an example on how one can use this API.
private void OnLoaded(object sender, EventArgs e)
HwndSource hwndSource = PresentationSource.FromVisual(this) as HwndSource;
HwndTarget hwndTarget = hwndSource.CompositionTarget;
// this is the new WPF API to force render mode.
hwndTarget.RenderMode = RenderMode.SoftwareOnly;
This could improve rendering performance for certain scenarios and machines configuration, in most cases HW rendering should perform better. Please use carefully and verify with your app and machine configuration
7. Other notable Graphics perf improvements
8. Battery life improvement
This is addressed in WPF 3.0 Sp1.
It is important to note that the app developer can work around this issue by:
· Not using DecelerationRatio / AccelerationRatio in the app, or
· Explicitly stopping the animation clock.