This document contains answers to the most frequently asked questions about building well performing applications with the .NET Compact Framework.

This is a live document, so please send us your feedback, new questions and answers.

.NET Compact Framework CLR performance

What is the relative performance cost of virtual method calls compared to non-virtual method calls in .NET CF?

Virtual calls are ~40% slower than static or instance method calls.  .NET Compact Framework interprets virtual calls instead of using fixed vtables because of the working set cost that comes with maintaining space in a table for methods that may never be called. When interpreting a virtual call, .NET Compact Framework walks the class and interface hierarchy looking for the requested method by name and call signature.  Looking up calls in this way is an expensive operation compared to indexing directly into a vtable.  However, a cache of resolved virtual calls is maintained so the lookup only happens once in most cases.  In .NET Compact Framework version 1.0 the cache of resolved virtual methods had a fixed size, which yielded acceptable performance in most cases, but the fixed nature of the cache was not 100% efficient because of cache misses. The efficiency of the cache is further improved in version 2.0.  The cache is now a variable size and has nearly a 100% hit rate.

 

The overhead of a virtual call is relative to the amount of work performed in a single call and is more noticeable for small methods. Note that JIT compiler optimizations are typically not applicable to virtual calls. Specifically, virtual calls are never inlined. So, for example, a virtual call to a simple property getter would be significantly more expensive than non-virtual call. The general guidance is to avoid using virtual calls where they aren’t necessary. If you do need to use a virtual call, do as much work as possible in a single virtual call to minimize the relative overhead of doing multiple calls into simple methods.

 

It always helps to analyze the IL code of performance-critical functions.  Normally compilers are pretty good about optimizing virtual calls.  Even if a method is declared virtual, but the call target can be resolved at the compile time, the compiler may generate non-virtual call IL instruction.

It’s also important to understand the difference between the callvirt IL instruction and actual virtual call at run-time. The callvirt instruction itself isn’t necessarily bad and sometimes can’t be avoided (C# likes to use it). If the JIT compiler can figure out the ultimate destination of such a call at JIT-compile time (for example, this can happen if a method is final (sealed) or a class is sealed), a callvirt is no more expensive than a regular instance call and can even be inlined.

That said, avoiding virtual calls should not be a motivation for poorly architected applications. Virtual calls are typically only a major performance issues for small very frequently called methods.

 

How does the cost of a property access compare to a direct field access?

Property getters and setters are methods. That’s why using properties is normally more expensive than accessing fields directly. Simple property access can be inlined by JIT, but no assumptions should be made about this. For example in .NET Compact Framework version 1.0 property setters were never inlined. Virtual properties are particularly expensive as virtual call overhead is added and virtual methods are never inlined. Accessing fields directly normally results in better performance.

 

What are the relative costs for the different types of method calls in .NET Compact Framework applications?

Having a mental model of the relative costs of various types of method calls can help you make design tradeoffs that will result in better performing applications.  The following bullet points summarize how the costs of various types of calls related:

 

  • Instance calls are ~2-3X the cost of a native function call
  • Virtual calls are ~1.5X the cost of a managed instance call.  See the entry above about virtual method costs for more details.
  • Platform invoke and COM calls are ~5-6X the cost of managed instance call (this data is passed on calling a method that requires the marshaling of a single int parameter).  See the entry below about platform invoke costs for more details.

 

 

Why should I override Equals() and GetHashCode() methods for my value types?

If Equals() and GetHashCode() are not overridden, the implementations of these methods as defined in the parent class ValueType are used.  Because these implementations must work for any valuetype, they perform boxing and use reflection to get information about your type. This approach is not only less precise, but also much slower than a dedicated override. Providing a more precise and efficient implementation of Equals() and GetHashCode() for your concrete type will perform much better than the general implementations supplied by ValueType.

 

 

Why are interop calls (P/Invoke and COM interop) slower compared to a managed instance call? 

P/Invoke (Platform Invoke) and COM interop calls in .NET Compact Framework are significantly (~5-6 times) slower than regular managed calls. Although the overall performance penalty largely depends on types marshaled between managed and native code (marshalling overhead), there is also a common overhead, primarily due to some internal work preceding and following every platform call. This work is needed to notify the runtime that the call must be GC (Garbage Collector) preemptable to avoid the GC from being blocked until the interop call is completed. This is why it's important to maximize the amount of work performed inside each interop call and avoid multiple frequent invocations.

 

What can I do to improve the performance of interop calls (P/Invoke and COM interop)?

As described above, the overall cost of an interop call is largely dependent upon the number and type of the parameters that must be marshaled between managed and native code.  The following points provide some guidance on how to make your interop calls more efficient.

General Guidelines

  • Use blittable types where possible.  Blittable types have a common representation in both managed and unmanaged memory so no copying or other transformations are necessary.  Examples of blittable types include ints, bytes, booleans, characters, and strings.  Arrays of blittable types are also blittable.
  • Call granularity is important. Because there is a fixed cost associated with each interop call, it’s much more efficient to make a few “chunky” calls rather than several “chatty” calls.

Guidelines for PInvoke(calling from managed code to native code)

  • There is a ‘fast track’ for PInvoke calls that have only simple parameters, such as Int or IntPtr arguments. 
    • In addition to basic blittable types, there are many other common items that are marshaled quickly in .NET Compact Framework version 2.0. The default marshaling (no use of MarshalAs attribute) of Strings, Arrays, any managed class that contains only blittable types, “ref” or “out” or “*” to structures with only blittable types are all about as fast as marshaling an int.
    • Blittable value types larger than 32 bits are more quickly passed by reference than by value.
    • Use [in] [out] attributes for the arguments in your function signature to reduce unnecessary marshalling.

 

  • In .NET Compact Framework version 2.0, you can use the methods in the Marshal class to manually covert between IntPtrs and managed objects.  These methods include Marshal.PtrToStructure, Marshal.PtrToStringBSTR, Marshal.GetObjectForNativeVariant, and Marhsal.GetObjectForIUnknown.  Using these methods isn’t faster than letting the runtime do the conversion as part of the function call, but they can be useful if out parameters do not always need to be fully resolved for the given application, or if you will re-use the data for more then one interop call.  This also allows you to control when the marshaling work is done.  These methods are also useful for debugging issues with marshaling parameters where the runtime is not able to convert a particular argument.  WARNING: you need to be aware of cleanup consequences of using the various marshal class functions, as documented in MSDN.
  • In .NET Compact Framework version 2.0, you can use the Marshal.Prelink and Marshal.PrelinkAll methods to force the stub required to support the interop call to be JIT compiled.  This approach is useful if you’d rather pay the cost of setting up the interop call when your application is first started instead of when the first call to the native function is made.  Using Marshal.Prelink and Marshal.PrelinkAll can also help you detect missing .dll issues and handle them gracefully.

 

Guidelines for Com Interop (calling from managed code to native code)

  • If you expect your native Com object to return S_FALSE as a common case, or other non S_OK HResults, set PreserveSig=true and make the managed signature match the native signature.  Using PreserveSig enables you to avoid the overhead of a try / catch necessary when the runtime performs the HResult to Exception translation on your Com calls.

 

 

Does the .NET Compact Framework perform string interning?

Yes. The .NET Compact Framework interns static strings at JIT time. One can also explicitly force interning of the arbitrary string using String.Intern() method. The C# compiler will use this mechanism to force interning of the string used in the switch statement. If you compare some string against a static string(s) many times or compare some set of strings against each other repeatedly, you may benefit from string interning. There is a shortcut in the string equality check, which attempts to compare object references first, before doing character by character comparison.  So, for matching interned strings the object reference will match immediately. However, you should be aware that string interning incurs some additional cost.  Specifically, the memory used to store interned strings is not freed until the AppDomain is shutdown, and extra time is required to intern the string (even if the string is already interned). So, don’t use explicit string interning by default, rather use it only when your own performance measurements show that it helps.

 

Can I take advantage of JIT compiler optimizations such as method inlining and enrigistation?

Yes you can, although it may not be obvious and it’s hard to do in many cases. Please keep in mind that any attempt to take advantage of JIT optimizations should not be a motivation or excuse for a code which is poorly architected and hard to maintain.

 

Method Inlining

The .NET Compact Framework JIT compiler will inline simple methods to eliminate the cost associated with a method call.  Inlining involves replacing the method’s argumenets with the values passed at call time, and eliminating the call completely.

The inlining rules differ from version to version of the Compact Framework. In version 1.0, only very simple functions that returned a field from their “this” argument or a constant value could be inlined. In version 2.0, the rules are more generous but are still severely limited. In order to be inlined, a method must have:

-       16 bytes of IL or less

-       No branching (typically an “if”)

-       No local variables

-       No exception handlers

-       No 32-bit floating point arguments or return value

-       If the method has more than one argument, the arguments must be accessed in order from lowest to highest (as seen in the IL)

Typically, this limits inlining to property getter/setters and methods that simply call another method, perhaps adding another argument (as often used for method overloads).

 

Also remember that inlining never occurs if you are running under a debugger.

 

In general, it is not possible to predict with 100% accuracy whether a method will be inlined or to confirm that one has been. However, there are some factors that make inlining impossible (virtual calls, exception handlers, etc.), so it’s might be useful to keep performance critical methods as simple as possible to give them a better chance for inlining.

 

Enregistration

Enregistration is new to .NET Compact Framework version 2.0. The JIT compiler will try to use CPU registers when possible to store 32 bit variables such as locals and method arguments (32 bit integers, object references, etc.). 8 and 16 bit integers can also be enregisted and are almost as efficient as 32-bit ints, but sometimes additional conversions need to be added which make them less optimal. An enum is treated the same as its’ underlying type (by default a 32-bit int) for code generation purposes. Note that variables which are more than 32 bit in size are never enregistered. That’s one of the reasons why 64 bit math is significantly slower than 32 bit math in .NET Compact Framework. So try to stick with 32 bit values where it makes sense. As there is only a small number of registers, the fewer variables you have, the better the chance they will get enregistered. Try to re-use a variable when possible instead of adding a new one.

 

How expensive is a garbage collection?

The cost of performing a garbage collection is a function of the number of live reference types your application has allocated.  Each time a collection occurs, the GC traverses the graph of objects looking for those that aren’t referenced anymore.  Objects that are no longer referenced are marked and then later freed.  Keep in mind that those objects that have finalizers are not immediately freed.  Instead, they are placed on a finalization queue where their finalizers get run by a background thread.  These objects are then freed the next time the GC runs.

 

You can determine how much time the GC spends doing collections by looking at the “GC Latency Time” performance counter in mscoree.stat (see Developing Well Performing .NET Compact Framework Applications for details on how to use the counters provided by the .NET Compact Framework)

 

What are the most common sources of “garbage” in managed applications?

When looking at how many managed objects your application is creating, remember that operations like boxing and some string manipulations will cause managed objects to be created where it might not be immediately obvious.  These objects that are created implicitly often times greater in number than those you explicitly create yourself.

 

The following example demonstrates how managed objects can be created in places you might not expect.  Consider the following class which uses a HashTable to map integer thread identifiers to instances of a ThreadInfo object:

 

class ThreadViewer

{

  Hashtable hashTable; 

  public ThreadViewer()

  {

    hashTable = new Hashtable();

  }

  public ThreadInfo FindThread(int ThreadId)

  {

    return (ThreadInfo) hashTable[ThreadId];

  }

} 

The FindThread method in this example returns the ThreadInfo object at the index indicated by ThreadId.  Because the HashTable class must serve as a general purpose hash table, it’s [] operator is defined as accepting a parameter of type Object:

 

public class Hashtable

{

  public object this[object key] { get; set; }

}

 

As a result, each time the integer ThreadId is used to access an entry in the HashTable, that integer is boxed, thereby creating a managed object.  If this operation is performed frequently in your application you may end up creating thousands of these short lived objects.  In addition to the memory they consume, these objects will also increase the time it takes to perform a garbage collection.

 

Various string manipulations can create additional objects as well.  Instances of the string class are immutable, so a new string object is created every time you attempt to modify the string through operations like concatenation.

 

What tools are available to help diagnose performance issues with .NET Compact Framework applications?

The .NET Compact Framework can be configured to log a variety of performance-related statistics as an application is running.  In version 1.0 of the .NET Compact Framework, these statistics are written to a file called mscoree.stat when the application terminated.  See Developing Well Performing .NET Compact Framework Applications for more details on how to enable mscoree.stat.

 

Several improvements have been made to the mscoree.stat logs in version 2.0 of the .NET Compact Framework.  In addition to several new counters, the logs can now be emitted at intervals as the application is running, instead of only when the application shuts down.  The addition of dynamic logging makes it possible to build graphical tools that can be used to monitor an application’s performance in real time.

 

Framework performance

There are many powerful usability features in .NET Compact Framework Base Class Library which make writing code much easier. However, these features are very general and may not be optimized for a particular user scenario, so it may not be appropriate to use some of these features in every context.  There is often a performance and working set penalty for abstraction and flexibility. This penalty is much more severe in the constrained environment of devices, as they just don’t have the computing power of desktop machines. It’s very important to use these powerful features in optimal manner, and, sometimes it’s recommended to defer to an optimized custom implementation, instead of using a general-purpose one from BCL.

 

 

Threading

When should I create threads and when should I use thread pool to run work items asynchronously?

The ThreadPool generally results in better performance if your work items have a relatively short lifetime.  So if you typically create threads just to run small asynchronous tasks, you’ll get better performance by performing those tasks using the ThreadPool.  In these scenarios, the ThreadPool is more efficient primarily because it avoids the overhead of creating and destroying individual threads.  Also, because your work items are short in duration, you shouldn’t have to wait for a thread in the pool to become available.

 

On the other hand it’s advisable for developers to create dedicated Thread objects if their threads have a long lifetime, or if a thread might be blocked for a longer time ( to avoid prolonged occupation of one of the ThreadPool threads) or needs to run at a different priority.  Adjusting the priority of ThreadPool threads can be dangerous if the priority isn’t properly restored when you’re done.

 

Also, if you need to run a large number of work items and you don’t need concurrency, you may choose to create a dedicated Thread and re-use it to do the work.  The .NET Compact Framework will try to create a new worker thread in the ThreadPool  (up to a certain limit – 25 by default in version 2.0, 256 in version 1.0), if no worker thread is available to process your work immediately. By re-using your own thread you avoid a potential spike in a number of  ThreadPool threads.

 

 

DateTime and numeric parsing

Is there any way to improve the performance of parsing/de-serializing DateTime strings?

If you know the exact format used for DateTime serialization, always specify it for parsing. Use DateTime.ParseExact(). Otherwise, the DateTime parser will sequentially try to apply a variety of culture-specific formats trying to make sense of your string, as it doesn’t have any hints about which format was used. The same practice can be applied to a numeric parsing, which is not as slow as DateTime parsing, but still can benefit from specifying a particular numeric format, if you’re not using the default format.

 

What’s the best way to store DateTime, so it can be consumed faster?

Storing DateTime in binary form using ticks is usually the simplest and fastest way to store a DateTime, although this is not the recommended practice for local times.

 

 

Collections

 

What are potential performance issues when using BCL collections?

The most common performance problems with using BCL collections include:

  • Boxing and unboxing overhead for valuetypes. All standard BCL collections will box a valuetype when it’s added to collection, and unbox it when retrieving the value from the collection. 

 

  • Using iterators (the foreach statement) may be expensive, as it unrolls into a sequence of virtual calls through the IEnumerator interface (GetEnumerator(),  get_Current(), and MoveNext()). For example foreach statement in the following benign piece of code:

        ArrayList al = new ArrayList(str_array);

        foreach (String s in al)

        {

                        //do something

        }

       is compiled into:

 IL_002b:  callvirt   instance class [mscorlib]System.Collections.IEnumerator [mscorlib]System.Collections.ArrayList::GetEnumerator()

  IL_0030:  stloc.s    CS$5$0001

  .try

  {

    IL_0032:  br.s       IL_004a

    IL_0034:  ldloc.s    CS$5$0001

    IL_0036:  callvirt   instance object [mscorlib]System.Collections.IEnumerator::get_Current()

    IL_003b:  castclass  [mscorlib]System.String

    ...   

    IL_004a:  ldloc.s    CS$5$0001

    IL_004c:  callvirt   instance bool [mscorlib]System.Collections.IEnumerator::MoveNext()

    IL_0051:  stloc.s    CS$4$0002

    IL_0053:  ldloc.s    CS$4$0002

    IL_0055:  brtrue.s   IL_0034

    IL_0057:  leave.s    IL_0076

  }  // end .try

Use indexers, if the collection is based on an array as storage.  

  •  Don’t forget to pre-size the collection. If you don’t specify how many entries your collection is expected to contain, the original default capacity is small (4 entries for ArrayList, for example). Once your collection grows beyond the default capacity, it gets resized. Depending on the type of the storage, a new storage object may be allocated and contents of the original collection get copied to the newly allocated storage. For ArrayList the capacity doubles every time you exceed it. This is extremely expensive and should be avoided.

 

BCL collections may not be optimized for the performance critical type operation your application performs most often, such as search or insert. We encourage developers to build optimized and strongly typed collections for particular task if you find collection performance to be an issue.

 

.NET Compact Framework version 2.0 supports generics. Are generic collections any faster than non-generic collections?

Generic collections provide a way to avoid the boxing and unboxing overhead that comes with using valuetypes in collections. So there are some performance benefits. However, keep in mind that .NET Compact Framework version 2.0 implements generics with representation and JITed code specialization. This means that each distinct instantiation of generic type results in a separate execution engine data representation and JITed code specific to that instantiation. Thus, when using generics you need to be aware of the potential JITed code size impact. If there is a very large number of closed constructed types/methods per generic type/method definition, your application may start experiencing JITed code size pressure. As a result, keep in mind the extra performance hit of re-JITing the code. On the positive side, specialized JITed code is typically more efficient because exact type parameter information is always easily accessible. As with non-generic collections, generic collection types in the BCL may not be optimized for the type of operation your application will do most often (such as sorting, insert, etc.). So, the usual recommendation applies: for best performance write your own optimized collection classes.

 

 XML and Data

 

What are the best practices for parsing larger XML documents? 

 

  • Use XmlTextReader and XmlTextWriter
    • These classes use less memory than XmlDocument
    • XmlTextReader is a pull model parser which only reads a “window” of the data
    • XmlDocument builds a generic, untyped object model using a tree
      • Type stored as string
  • Design XML schema first, then code
    • Understand the structure of your XML document
  • Don’t use schema for parsing unless you have to.  Consider using schema if:
    • You don’t know the structure of a document
    • You are populating a DataSet from XML data source

 

How do I structure an XML document so it can be parsed faster? 

  • Keep attribute and element names as short as possible (every character in attribute/element names get validated to see if it’s legal). Reduce character count; smaller documents are parsed quicker. Avoid gratuitous use of whitespace because the parser must scan past it.
  • Use attributes to reduce size. Processing an attribute-centric document is usually faster than parsing an element-centric one. Replace elements with attributes where it makes sense. 
  • Use elements to group. If XML is structured in a way that you can skip over certain portions of it in some scenarios, use the Skip() method. This is faster than Read().

 

Which reader/writer classes should I use? 

If you are developing for the .NET Compact Framework version 1.0, use XmlTextReader and XmlTextWriter for parsing large XML documents or to serialize XML serialization.

If you are using.NET Compact Framework version 2.0, use the factory classes XMLReader/XMLWriter to create a proper optimized reader or writer. Concrete implementations of XmlTextReader, XmlTextWriter, or XmlNodeReader should not be directly instantiated.

·         The XMLReader.Create() method returns an optimized XmlReader or XmlWriter depending on the specified settings.

·         The XmlReaderSettings and XmlWriterSettings classes are used to specify the features of the reader or writer. Use settings to improve performance.

·         This reduces the need to understand how and when to use a specific reader or writer.

·         Examples

Creating XmlReader

XmlReaderSettings settings = new XmlReaderSettings();

settings.ConformanceLevel = Conformance.Document;

settings.IgnoreWhitespace = true;

settings.IgnoreComments = true;

XmlReader reader = XmlReader.Create( “foo.xml”, settings ); 

Creating XmlWriter:

XmlWriterSettings settings = new XmlWriterSettings();

settings.Index = true;

settings.IndentChars = (“     “);

XmlWriter writer = XmlWriter.Create( “foo.xml”, settings );

 

           

Do various XMLReader settings available in .Net Compact Framework version 2.0 really affect parsing performance? 

Yes. You can substantially improve the performance of XMLReader by constructing an optimized reader and applying proper XMLReaderSettings. The optimal set of options obviously depends on the structure of XML you deal with, but in most cases setting XMLReaderSettings.IgnoreWhitespace to be true results in measurable performance gain (~30% on average), as typically there is a fare amount of whitespace in formatted XML documents. XMLReaderSettings.IgnoreComments can also be beneficial for comment-rich documents. Note that these options are not enabled by default.  You must specify them explicitly by providing a configured instance of the XMLReaderSettings.

 

How does the character encoding of the XML document affect the parsing performance?

The .NET Compact Framework implements decoders for UTF8, ASCII and UTF16 (big- and little- endian) encodings in managed code. All other encodings, such as all ANSI codepage encodings involve a PInvoke down to the operating system. So using UTF8, ACII and UTF16 is usually faster. If you don’t use international characters (outside of the ASCII character set) use UTF8 or ASCII (these have approximately equal performance). Try to avoid using Windows codepage encodings.

If your content includes some non-ASCII characters, but most of it is ASCII, use UTF8 if the size of the resulting data is an important factor, such as for example for serialization or sending content across the network. Otherwise experiment and measure on case-by-case basis and choose the optimal encoding for the task.  For example, UTF-16 may take more space, but there is virtually no decoding work involved.

 

Does having a schema help the performance of XML parsing? 

Not in general. In fact, the opposite is true.  Utilizing schema would require using XMLValidatingReader, which performs additional validation work. Use schema only if you aren’t sure about the structure of a document you are parsing. Also, having schema is recommended if you intend to populate a DataSet from the XML data source.

 

How can I improve the performance of populating a DataSet from the XML source?

  • Use schema. It might be created programmatically (fastest), loaded from separate file or present in the data file.
  • Avoid schema inference.
  • Avoid nested tables, use none-nested relations with surrogate keys.
  • Map columns as attributes.
  • Avoid using many DateTime columns. Consider storing DateTime.Ticks instead.
  • Use typed DataSets 

 

What are the recommended practices for working with local data ?

  • Try to avoid DataSets
  • Leverage SQL Server CE’s native in-proc database.
  • Query data using DataReader
  • Limit open SqlCeCommand and DataReaders and dispose when you are done with them.

 

What are the recommended practices for working with remote data ?

  • Use SQL Server CE replication
  • When using Web Services:
    • Use DiffGrams to read and write DataSets to limit data transfer
      • Save data locally in SQL Server CE for faster access and storage
      • Don’t save remote DataSets as XML on the device. If you do, include the  Schema as well.

 

 

.NetCF version 2.0 supports XMLSerialzier. Are there performance tips to make the XMLSerializer faster?

Serialization metadata for a given type is built when a corresponding XmlSerializer is created:

new XmlSerializer(typeof(MyType));  (Metadata is built for type MyType)

Building this metadata is expensive so the metadata is cached by the XmlSerializer. It is recommended that applications only create one XmlSerializer instance per type to reduce the amount of time spent searching for metadata. Use the “Singleton pattern”.

If serializing several types use FromTypes():

// Create the list of serializer

Type[] types = new Type[]{typeof(MyType1), typeof(MyType2)};

XmlSerializer[] serializers = XmlSerializer.FromTypes(types);

// Serialize an instance of MyType1

MyType1 mt1 = new MyType1();

serializers[0].Serialize(writers, mt1);

 

What are the performance implications of XML as a serialization format?

Typically, you will not get optimal performance by transferring and parsing XML because using XML is memory, CPU, and network intensive. This is particularly noticeable on small devices, which are CPU and memory constrained.  Consider building a custom binary serialization mechanism, using BinaryReader and BinaryWriter functionality to get better overall performance.

 

 

Web Services

 

How can I increase web service client performance?

More often than not, the problem boils down to what happens when NetCF sees the first web method call on an instance of the service object.

When the first web method on a service object is called, NetCompact Framework uses Reflection to examine the service's proxy (to identify methods, headers, properties, etc).  Unlike the full .NET Framework, .NET Compact Framework (for working set size reasons) does not cache the results of this examination.  Because of this, applications incur a performance penalty if they use multiple instances of the same service.  The following code example illustrates an application demonstrating this performance hit.

class SlowerWebServicePerformance
{
    public static void Main()
    {
        // application setup

  foreach(String name in Friends)
        {
            String phoneNumber = CallWebService(name);

            // process / display the received data
        }

        // application cleanup
    }

    public static string CallWebService(String name)
    {
        // create new instance of the web service proxy object
        PhoneBookService service = new PhoneBookService();

        // call the desired web method
        //  proxy reflection occurs here
        return service.LookupPhoneNumber(name);
    }
}

In the above example, each call to CallWebService creates a new instance of the fictitios PhoneBookService object with each call to the LookupPhoneNumber method causing NetCF to reflect over the service proxy code.  In this example, users with a fewer friends are better off than those with more -- at least as far as application performance goes.

To minimize the effects of this issue, applications can create a class global instance of their web service object and make a simple call to it (check version, etc) during the startup code.  The code below is a re-write of the previous example, this time using a class global service object.

class FasterWebServicePerformance
{
    private static PhoneBookService service = new PhoneBookService();

    public static void Main()
    {
        // call a simple web service method to "prime the pump"
        //  proxy reflection occurs here
        service.GetVersion();

        // application setup

        foreach(String name in Friends)
        {
            String phoneNumber = CallWebService(name);

            // process / display the received data
        }

        // application cleanup
    }

    public static string CallWebService(String name)
    {
        // call the desired web method
        //  proxy reflection does not occur
        return service.LookupPhoneNumber(name);
    }
}

As you can see from the second implementation, the Main method makes a call to the service's GetVersion method so that the reflection occurs exactly once during the course of the application.  The data received from this call is not relevant here, since we merely wish to “prime the pump“.  With this change, the penalty for having more friends is gone.

Keep in mind that when writing your applications in this manner, any headers required by the Web service are applied to all method calls, so do not modify them while a Web method call is in progress (or your calls may fail based on bad header data).  While this applies to asynchronous and multi-threaded applications, it's still a good idea to keep it in mind whenever working with Web services.  Since .NET Compact Framework web service client classes are thread safe, so you can feel free to pass your class global service instance to child threads -- provided that you remember the previous statement.

 

 

Simple tips for increasing web services client performance:

  • Use a class global service object
    • Make simple web service method call during your splash screen (first call is slower than subsequent ones)
    • Be careful to handle network and data errors
  • Avoid sending DataSets over Web Service, because it pulls in a large amount of XML and uses complex algorithm for loading data
    • If you are send/receiving a DataSet, ensure the schema is serialized as well
  • In some cases manually serializing DataSet as an XML string before making Web Service call results in better performance.

 

 

Reflection

 

I’m using reflection for inspection purposes only and don’t create many objects. Does this affect the working set?

Even if you don't instantiate the types you reflect on, using reflection functionality may result in a significant permanent working set hit. For example, GetTypes() causes all the types defined in an assembly to be loaded, which means the .NET Compact Framework common language runtime loader will create an internal in-memory representation for each one of these types. These internal runtime structures remain alive until AppDomain shutdown (which essentially means they never get unloaded in version 1.0). Although those individual structures are not very large (the total memory associated with each loaded type is :  ~70 bytes + (number of fields * 8 bytes) + (number of methods * 4 bytes), so you can easily be spending over 100 bytes per loaded type),  if the number of them is high enough, unnecessary loading can result in a substantial permanent memory hit. This should also be a consideration when enumerating methods and properties of the type.

 

 

Managed Resources

 

What can I do to improve the performance of resource loading using  the ResourceManager?

Here are a couple of simple tips that can significantly improve resource loading performance:

·         Make sure that fully qualified names of types inside your RESX file(s) are correct (e.g. have the proper “Version” and more importantly the proper “PublicKeyToken” fields). The effort to find the most appropriate substitute for an improperly specified type comes at a price.

·         Have a satellite assembly (for the particular culture) properly named and located (enabling Loader Logging can show you lookup mechanism that .NET Compact Framework is using in order to locate the requested resource).

Using resources might not be appropriate for all scenarios. Resources bring additional functionality (which is not free). Ask yourself

-          is this the best way to manage my data?

-          do I plan to localize my application into multiple languages?

In some cases reading application data directly from the file may be sufficient and more efficient than using ResourceManager.  ResourceManager may probe multiple different locations in the file system to find a best matching satellite assembly before it will actually locate you resource binary. Use appropriate tools for the job.

 

 

GUI

 

How can I improve form load performance?

  • Create your control tree top down
  • Try to reduce the number of method calls during initialization
  • Be ready to modify designer-generated code for top performance

Following these recommendations are critical for a version 1.0 form load speed. In .NET Compact Framework version 2.0, we made some substantial performance improvements in this area, so following this guidance may not absolutely critical, but still may result in measurable performance gains (because of reduced number of managed calls).

 

 

How to improve the perceived performance of the form-based UIs?

  • Load and cache Forms in the background
  • Populate data separate from Form.Show(): pre-populate data, or load data async to Form.Show()
  • Use BeginUpdate/EndUpdate when it is available (e.g. ListView, TreeView)
  • Use SuspendLayout/ResumeLayout when repositioning controls
  • Keep event handling code tight

Process bigger operations asynchronously

Blocking in event handlers will affect UI responsiveness

 

 

What are the best practices for graphics-intensive applications, such as games?

  • Compose to off-screen buffers to minimize direct to screen blitting (composing off-screen is approximately 50% faster).
  • Consider pre-rendering and using DrawImage instead of using the draw primitive API’s each time (e.g. DrawRectangle, DrawEllipse). Verify on a case by case basis because blitting isn’t always faster than drawing the primitive.
  • Avoid transparent blitting in areas that require performance (transparent blitting is approximate 1/3 speed of normal blitting)
  • Override OnKey methods on controls instead of adding Key event handlers (OnKeyDown, OnKeyUp, OnKeyPress)

 

 

 

References

 

The following article describes ways to significantly improve the form load performance:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnnetcomp/html/netcfimproveformloadperf.asp

 

This is great article on optimizing Pocket PC development with .NET Compact Framework :

http://msdn.microsoft.com/msdnmag/issues/04/12/NETCompactFramework/

 

Instrumentation for the .NET Compact Framework applications:

http://msdn.microsoft.com/smartclient/default.aspx?pull=/library/en-us/dnnetcomp/html/instnetcfapp.asp

 

Developing Well Performing .NET Compact Framework Applications

http://msdn.microsoft.com/library/en-us/dnnetcomp/html/netcfperf.asp

 

This posting is provided "AS IS" with no warranties, and confers no rights. 

 

[Author: Roman Batoukov]