This document contains answers to the most frequently asked questions about building well performing applications with the .NET Compact Framework.
This is a live document, so please send us your feedback, new questions and answers.
Virtual calls are ~40% slower than static or instance method calls. .NET Compact Framework interprets virtual calls instead of using fixed vtables because of the working set cost that comes with maintaining space in a table for methods that may never be called. When interpreting a virtual call, .NET Compact Framework walks the class and interface hierarchy looking for the requested method by name and call signature. Looking up calls in this way is an expensive operation compared to indexing directly into a vtable. However, a cache of resolved virtual calls is maintained so the lookup only happens once in most cases. In .NET Compact Framework version 1.0 the cache of resolved virtual methods had a fixed size, which yielded acceptable performance in most cases, but the fixed nature of the cache was not 100% efficient because of cache misses. The efficiency of the cache is further improved in version 2.0. The cache is now a variable size and has nearly a 100% hit rate.
The overhead of a virtual call is relative to the amount of work performed in a single call and is more noticeable for small methods. Note that JIT compiler optimizations are typically not applicable to virtual calls. Specifically, virtual calls are never inlined. So, for example, a virtual call to a simple property getter would be significantly more expensive than non-virtual call. The general guidance is to avoid using virtual calls where they aren’t necessary. If you do need to use a virtual call, do as much work as possible in a single virtual call to minimize the relative overhead of doing multiple calls into simple methods.
It always helps to analyze the IL code of performance-critical functions. Normally compilers are pretty good about optimizing virtual calls. Even if a method is declared virtual, but the call target can be resolved at the compile time, the compiler may generate non-virtual call IL instruction.
It’s also important to understand the difference between the callvirt IL instruction and actual virtual call at run-time. The callvirt instruction itself isn’t necessarily bad and sometimes can’t be avoided (C# likes to use it). If the JIT compiler can figure out the ultimate destination of such a call at JIT-compile time (for example, this can happen if a method is final (sealed) or a class is sealed), a callvirt is no more expensive than a regular instance call and can even be inlined.
That said, avoiding virtual calls should not be a motivation for poorly architected applications. Virtual calls are typically only a major performance issues for small very frequently called methods.
Property getters and setters are methods. That’s why using properties is normally more expensive than accessing fields directly. Simple property access can be inlined by JIT, but no assumptions should be made about this. For example in .NET Compact Framework version 1.0 property setters were never inlined. Virtual properties are particularly expensive as virtual call overhead is added and virtual methods are never inlined. Accessing fields directly normally results in better performance.
Having a mental model of the relative costs of various types of method calls can help you make design tradeoffs that will result in better performing applications. The following bullet points summarize how the costs of various types of calls related:
If Equals() and GetHashCode() are not overridden, the implementations of these methods as defined in the parent class ValueType are used. Because these implementations must work for any valuetype, they perform boxing and use reflection to get information about your type. This approach is not only less precise, but also much slower than a dedicated override. Providing a more precise and efficient implementation of Equals() and GetHashCode() for your concrete type will perform much better than the general implementations supplied by ValueType.
P/Invoke (Platform Invoke) and COM interop calls in .NET Compact Framework are significantly (~5-6 times) slower than regular managed calls. Although the overall performance penalty largely depends on types marshaled between managed and native code (marshalling overhead), there is also a common overhead, primarily due to some internal work preceding and following every platform call. This work is needed to notify the runtime that the call must be GC (Garbage Collector) preemptable to avoid the GC from being blocked until the interop call is completed. This is why it's important to maximize the amount of work performed inside each interop call and avoid multiple frequent invocations.
As described above, the overall cost of an interop call is largely dependent upon the number and type of the parameters that must be marshaled between managed and native code. The following points provide some guidance on how to make your interop calls more efficient.
General Guidelines
Guidelines for PInvoke(calling from managed code to native code)
Guidelines for Com Interop (calling from managed code to native code)
Yes. The .NET Compact Framework interns static strings at JIT time. One can also explicitly force interning of the arbitrary string using String.Intern() method. The C# compiler will use this mechanism to force interning of the string used in the switch statement. If you compare some string against a static string(s) many times or compare some set of strings against each other repeatedly, you may benefit from string interning. There is a shortcut in the string equality check, which attempts to compare object references first, before doing character by character comparison. So, for matching interned strings the object reference will match immediately. However, you should be aware that string interning incurs some additional cost. Specifically, the memory used to store interned strings is not freed until the AppDomain is shutdown, and extra time is required to intern the string (even if the string is already interned). So, don’t use explicit string interning by default, rather use it only when your own performance measurements show that it helps.
Yes you can, although it may not be obvious and it’s hard to do in many cases. Please keep in mind that any attempt to take advantage of JIT optimizations should not be a motivation or excuse for a code which is poorly architected and hard to maintain.
Method Inlining
The .NET Compact Framework JIT compiler will inline simple methods to eliminate the cost associated with a method call. Inlining involves replacing the method’s argumenets with the values passed at call time, and eliminating the call completely.
The inlining rules differ from version to version of the Compact Framework. In version 1.0, only very simple functions that returned a field from their “this” argument or a constant value could be inlined. In version 2.0, the rules are more generous but are still severely limited. In order to be inlined, a method must have:
- 16 bytes of IL or less
- No branching (typically an “if”)
- No local variables
- No exception handlers
- No 32-bit floating point arguments or return value
- If the method has more than one argument, the arguments must be accessed in order from lowest to highest (as seen in the IL)
Typically, this limits inlining to property getter/setters and methods that simply call another method, perhaps adding another argument (as often used for method overloads).
Also remember that inlining never occurs if you are running under a debugger.
In general, it is not possible to predict with 100% accuracy whether a method will be inlined or to confirm that one has been. However, there are some factors that make inlining impossible (virtual calls, exception handlers, etc.), so it’s might be useful to keep performance critical methods as simple as possible to give them a better chance for inlining.
Enregistration
Enregistration is new to .NET Compact Framework version 2.0. The JIT compiler will try to use CPU registers when possible to store 32 bit variables such as locals and method arguments (32 bit integers, object references, etc.). 8 and 16 bit integers can also be enregisted and are almost as efficient as 32-bit ints, but sometimes additional conversions need to be added which make them less optimal. An enum is treated the same as its’ underlying type (by default a 32-bit int) for code generation purposes. Note that variables which are more than 32 bit in size are never enregistered. That’s one of the reasons why 64 bit math is significantly slower than 32 bit math in .NET Compact Framework. So try to stick with 32 bit values where it makes sense. As there is only a small number of registers, the fewer variables you have, the better the chance they will get enregistered. Try to re-use a variable when possible instead of adding a new one.
The cost of performing a garbage collection is a function of the number of live reference types your application has allocated. Each time a collection occurs, the GC traverses the graph of objects looking for those that aren’t referenced anymore. Objects that are no longer referenced are marked and then later freed. Keep in mind that those objects that have finalizers are not immediately freed. Instead, they are placed on a finalization queue where their finalizers get run by a background thread. These objects are then freed the next time the GC runs.
You can determine how much time the GC spends doing collections by looking at the “GC Latency Time” performance counter in mscoree.stat (see Developing Well Performing .NET Compact Framework Applications for details on how to use the counters provided by the .NET Compact Framework)
When looking at how many managed objects your application is creating, remember that operations like boxing and some string manipulations will cause managed objects to be created where it might not be immediately obvious. These objects that are created implicitly often times greater in number than those you explicitly create yourself.
The following example demonstrates how managed objects can be created in places you might not expect. Consider the following class which uses a HashTable to map integer thread identifiers to instances of a ThreadInfo object:
class ThreadViewer
{
Hashtable hashTable;
public ThreadViewer()
hashTable = new Hashtable();
}
public ThreadInfo FindThread(int ThreadId)
return (ThreadInfo) hashTable[ThreadId];
The FindThread method in this example returns the ThreadInfo object at the index indicated by ThreadId. Because the HashTable class must serve as a general purpose hash table, it’s [] operator is defined as accepting a parameter of type Object:
public class Hashtable
public object this[object key] { get; set; }
As a result, each time the integer ThreadId is used to access an entry in the HashTable, that integer is boxed, thereby creating a managed object. If this operation is performed frequently in your application you may end up creating thousands of these short lived objects. In addition to the memory they consume, these objects will also increase the time it takes to perform a garbage collection.
Various string manipulations can create additional objects as well. Instances of the string class are immutable, so a new string object is created every time you attempt to modify the string through operations like concatenation.
The .NET Compact Framework can be configured to log a variety of performance-related statistics as an application is running. In version 1.0 of the .NET Compact Framework, these statistics are written to a file called mscoree.stat when the application terminated. See Developing Well Performing .NET Compact Framework Applications for more details on how to enable mscoree.stat.
Several improvements have been made to the mscoree.stat logs in version 2.0 of the .NET Compact Framework. In addition to several new counters, the logs can now be emitted at intervals as the application is running, instead of only when the application shuts down. The addition of dynamic logging makes it possible to build graphical tools that can be used to monitor an application’s performance in real time.
There are many powerful usability features in .NET Compact Framework Base Class Library which make writing code much easier. However, these features are very general and may not be optimized for a particular user scenario, so it may not be appropriate to use some of these features in every context. There is often a performance and working set penalty for abstraction and flexibility. This penalty is much more severe in the constrained environment of devices, as they just don’t have the computing power of desktop machines. It’s very important to use these powerful features in optimal manner, and, sometimes it’s recommended to defer to an optimized custom implementation, instead of using a general-purpose one from BCL.
The ThreadPool generally results in better performance if your work items have a relatively short lifetime. So if you typically create threads just to run small asynchronous tasks, you’ll get better performance by performing those tasks using the ThreadPool. In these scenarios, the ThreadPool is more efficient primarily because it avoids the overhead of creating and destroying individual threads. Also, because your work items are short in duration, you shouldn’t have to wait for a thread in the pool to become available.
On the other hand it’s advisable for developers to create dedicated Thread objects if their threads have a long lifetime, or if a thread might be blocked for a longer time ( to avoid prolonged occupation of one of the ThreadPool threads) or needs to run at a different priority. Adjusting the priority of ThreadPool threads can be dangerous if the priority isn’t properly restored when you’re done.
Also, if you need to run a large number of work items and you don’t need concurrency, you may choose to create a dedicated Thread and re-use it to do the work. The .NET Compact Framework will try to create a new worker thread in the ThreadPool (up to a certain limit – 25 by default in version 2.0, 256 in version 1.0), if no worker thread is available to process your work immediately. By re-using your own thread you avoid a potential spike in a number of ThreadPool threads.
If you know the exact format used for DateTime serialization, always specify it for parsing. Use DateTime.ParseExact(). Otherwise, the DateTime parser will sequentially try to apply a variety of culture-specific formats trying to make sense of your string, as it doesn’t have any hints about which format was used. The same practice can be applied to a numeric parsing, which is not as slow as DateTime parsing, but still can benefit from specifying a particular numeric format, if you’re not using the default format.
Storing DateTime in binary form using ticks is usually the simplest and fastest way to store a DateTime, although this is not the recommended practice for local times.
The most common performance problems with using BCL collections include:
ArrayList al = new ArrayList(str_array);
foreach (String s in al)
//do something
is compiled into:
IL_002b: callvirt instance class [mscorlib]System.Collections.IEnumerator [mscorlib]System.Collections.ArrayList::GetEnumerator()
IL_0030: stloc.s CS$5$0001
.try
IL_0032: br.s IL_004a
IL_0034: ldloc.s CS$5$0001
IL_0036: callvirt instance object [mscorlib]System.Collections.IEnumerator::get_Current()
IL_003b: castclass [mscorlib]System.String
...
IL_004a: ldloc.s CS$5$0001
IL_004c: callvirt instance bool [mscorlib]System.Collections.IEnumerator::MoveNext()
IL_0051: stloc.s CS$4$0002
IL_0053: ldloc.s CS$4$0002
IL_0055: brtrue.s IL_0034
IL_0057: leave.s IL_0076
} // end .try
Use indexers, if the collection is based on an array as storage.
BCL collections may not be optimized for the performance critical type operation your application performs most often, such as search or insert. We encourage developers to build optimized and strongly typed collections for particular task if you find collection performance to be an issue.
Generic collections provide a way to avoid the boxing and unboxing overhead that comes with using valuetypes in collections. So there are some performance benefits. However, keep in mind that .NET Compact Framework version 2.0 implements generics with representation and JITed code specialization. This means that each distinct instantiation of generic type results in a separate execution engine data representation and JITed code specific to that instantiation. Thus, when using generics you need to be aware of the potential JITed code size impact. If there is a very large number of closed constructed types/methods per generic type/method definition, your application may start experiencing JITed code size pressure. As a result, keep in mind the extra performance hit of re-JITing the code. On the positive side, specialized JITed code is typically more efficient because exact type parameter information is always easily accessible. As with non-generic collections, generic collection types in the BCL may not be optimized for the type of operation your application will do most often (such as sorting, insert, etc.). So, the usual recommendation applies: for best performance write your own optimized collection classes.
If you are developing for the .NET Compact Framework version 1.0, use XmlTextReader and XmlTextWriter for parsing large XML documents or to serialize XML serialization.
If you are using.NET Compact Framework version 2.0, use the factory classes XMLReader/XMLWriter to create a proper optimized reader or writer. Concrete implementations of XmlTextReader, XmlTextWriter, or XmlNodeReader should not be directly instantiated.
· The XMLReader.Create() method returns an optimized XmlReader or XmlWriter depending on the specified settings.
· The XmlReaderSettings and XmlWriterSettings classes are used to specify the features of the reader or writer. Use settings to improve performance.
· This reduces the need to understand how and when to use a specific reader or writer.
· Examples
Creating XmlReader
XmlReaderSettings settings = new XmlReaderSettings();
settings.ConformanceLevel = Conformance.Document;
settings.IgnoreWhitespace = true;
settings.IgnoreComments = true;
XmlReader reader = XmlReader.Create( “foo.xml”, settings );
Creating XmlWriter:
XmlWriterSettings settings = new XmlWriterSettings();
settings.Index = true;
settings.IndentChars = (“ “);
XmlWriter writer = XmlWriter.Create( “foo.xml”, settings );
Yes. You can substantially improve the performance of XMLReader by constructing an optimized reader and applying proper XMLReaderSettings. The optimal set of options obviously depends on the structure of XML you deal with, but in most cases setting XMLReaderSettings.IgnoreWhitespace to be true results in measurable performance gain (~30% on average), as typically there is a fare amount of whitespace in formatted XML documents. XMLReaderSettings.IgnoreComments can also be beneficial for comment-rich documents. Note that these options are not enabled by default. You must specify them explicitly by providing a configured instance of the XMLReaderSettings.
The .NET Compact Framework implements decoders for UTF8, ASCII and UTF16 (big- and little- endian) encodings in managed code. All other encodings, such as all ANSI codepage encodings involve a PInvoke down to the operating system. So using UTF8, ACII and UTF16 is usually faster. If you don’t use international characters (outside of the ASCII character set) use UTF8 or ASCII (these have approximately equal performance). Try to avoid using Windows codepage encodings.
If your content includes some non-ASCII characters, but most of it is ASCII, use UTF8 if the size of the resulting data is an important factor, such as for example for serialization or sending content across the network. Otherwise experiment and measure on case-by-case basis and choose the optimal encoding for the task. For example, UTF-16 may take more space, but there is virtually no decoding work involved.
Not in general. In fact, the opposite is true. Utilizing schema would require using XMLValidatingReader, which performs additional validation work. Use schema only if you aren’t sure about the structure of a document you are parsing. Also, having schema is recommended if you intend to populate a DataSet from the XML data source.
Serialization metadata for a given type is built when a corresponding XmlSerializer is created:
new XmlSerializer(typeof(MyType)); (Metadata is built for type MyType)
Building this metadata is expensive so the metadata is cached by the XmlSerializer. It is recommended that applications only create one XmlSerializer instance per type to reduce the amount of time spent searching for metadata. Use the “Singleton pattern”.
If serializing several types use FromTypes():
// Create the list of serializer
Type[] types = new Type[]{typeof(MyType1), typeof(MyType2)};
XmlSerializer[] serializers = XmlSerializer.FromTypes(types);
// Serialize an instance of MyType1
MyType1 mt1 = new MyType1();
serializers[0].Serialize(writers, mt1);
Typically, you will not get optimal performance by transferring and parsing XML because using XML is memory, CPU, and network intensive. This is particularly noticeable on small devices, which are CPU and memory constrained. Consider building a custom binary serialization mechanism, using BinaryReader and BinaryWriter functionality to get better overall performance.
More often than not, the problem boils down to what happens when NetCF sees the first web method call on an instance of the service object.
When the first web method on a service object is called, NetCompact Framework uses Reflection to examine the service's proxy (to identify methods, headers, properties, etc). Unlike the full .NET Framework, .NET Compact Framework (for working set size reasons) does not cache the results of this examination. Because of this, applications incur a performance penalty if they use multiple instances of the same service. The following code example illustrates an application demonstrating this performance hit.
class SlowerWebServicePerformance{ public static void Main() { // application setup
foreach(String name in Friends) { String phoneNumber = CallWebService(name);
// process / display the received data }
// application cleanup }
public static string CallWebService(String name) { // create new instance of the web service proxy object PhoneBookService service = new PhoneBookService();
// call the desired web method // proxy reflection occurs here return service.LookupPhoneNumber(name); }}
In the above example, each call to CallWebService creates a new instance of the fictitios PhoneBookService object with each call to the LookupPhoneNumber method causing NetCF to reflect over the service proxy code. In this example, users with a fewer friends are better off than those with more -- at least as far as application performance goes.
To minimize the effects of this issue, applications can create a class global instance of their web service object and make a simple call to it (check version, etc) during the startup code. The code below is a re-write of the previous example, this time using a class global service object.
class FasterWebServicePerformance{ private static PhoneBookService service = new PhoneBookService();
public static void Main() { // call a simple web service method to "prime the pump" // proxy reflection occurs here service.GetVersion();
// application setup
public static string CallWebService(String name) { // call the desired web method // proxy reflection does not occur return service.LookupPhoneNumber(name); }}
As you can see from the second implementation, the Main method makes a call to the service's GetVersion method so that the reflection occurs exactly once during the course of the application. The data received from this call is not relevant here, since we merely wish to “prime the pump“. With this change, the penalty for having more friends is gone.
Keep in mind that when writing your applications in this manner, any headers required by the Web service are applied to all method calls, so do not modify them while a Web method call is in progress (or your calls may fail based on bad header data). While this applies to asynchronous and multi-threaded applications, it's still a good idea to keep it in mind whenever working with Web services. Since .NET Compact Framework web service client classes are thread safe, so you can feel free to pass your class global service instance to child threads -- provided that you remember the previous statement.
Simple tips for increasing web services client performance:
Even if you don't instantiate the types you reflect on, using reflection functionality may result in a significant permanent working set hit. For example, GetTypes() causes all the types defined in an assembly to be loaded, which means the .NET Compact Framework common language runtime loader will create an internal in-memory representation for each one of these types. These internal runtime structures remain alive until AppDomain shutdown (which essentially means they never get unloaded in version 1.0). Although those individual structures are not very large (the total memory associated with each loaded type is : ~70 bytes + (number of fields * 8 bytes) + (number of methods * 4 bytes), so you can easily be spending over 100 bytes per loaded type), if the number of them is high enough, unnecessary loading can result in a substantial permanent memory hit. This should also be a consideration when enumerating methods and properties of the type.
Here are a couple of simple tips that can significantly improve resource loading performance:
· Make sure that fully qualified names of types inside your RESX file(s) are correct (e.g. have the proper “Version” and more importantly the proper “PublicKeyToken” fields). The effort to find the most appropriate substitute for an improperly specified type comes at a price.
· Have a satellite assembly (for the particular culture) properly named and located (enabling Loader Logging can show you lookup mechanism that .NET Compact Framework is using in order to locate the requested resource).
Using resources might not be appropriate for all scenarios. Resources bring additional functionality (which is not free). Ask yourself
- is this the best way to manage my data?
- do I plan to localize my application into multiple languages?
In some cases reading application data directly from the file may be sufficient and more efficient than using ResourceManager. ResourceManager may probe multiple different locations in the file system to find a best matching satellite assembly before it will actually locate you resource binary. Use appropriate tools for the job.
Following these recommendations are critical for a version 1.0 form load speed. In .NET Compact Framework version 2.0, we made some substantial performance improvements in this area, so following this guidance may not absolutely critical, but still may result in measurable performance gains (because of reduced number of managed calls).
Process bigger operations asynchronously
Blocking in event handlers will affect UI responsiveness
The following article describes ways to significantly improve the form load performance:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnnetcomp/html/netcfimproveformloadperf.asp
This is great article on optimizing Pocket PC development with .NET Compact Framework :
http://msdn.microsoft.com/msdnmag/issues/04/12/NETCompactFramework/
Instrumentation for the .NET Compact Framework applications:
http://msdn.microsoft.com/smartclient/default.aspx?pull=/library/en-us/dnnetcomp/html/instnetcfapp.asp
Developing Well Performing .NET Compact Framework Applications
http://msdn.microsoft.com/library/en-us/dnnetcomp/html/netcfperf.asp
This posting is provided "AS IS" with no warranties, and confers no rights.
[Author: Roman Batoukov]