Firstly, why is implementing efficient data access important for Azure RIAs in particular? The cloud databases in general are not the most performant databases in the world – they scale well but any given response by itself is probably going to be slower (e.g. much slower) than in traditional stand-alone or even client/server architectures. On the other hand, RIAs tend to be very interactive and therefore are expected to provide very responsive user experience. To avoid this conflict of architectures, one has to implement really efficient data delivery mechanism from cloud to the client and back. There are few steps to follow in this process.
Step 1. Implement efficient partitioning schema.
Azure utilizes partition key to “paginate” the data – that is entries with same partition key can be expected to be on the same virtual machine and entries with different partition keys can be spread across multiple virtual and physical machines in the cloud. Therefore, it’s best to group entries together using same partition key while creating them and filter them by the same partition key while querying.
The grouping should be primarily done based on business logic – of certain groups of objects represented by entries are more likely to be queried together than separately, than it’s best to assign them the same partition key value. However, if no grouping is possible, it’s best to assign each entry’s default PartitionKey property a unique value and query on it.
Another default property, RowKey can be used to further uniquely identify each entry. Using PartitionKey and RowKey in filters seems to be the most way to query data from Azure Table Storage in terms of speed of the request.
Step 2. Optimize data queries.
Based on partitioning implemented in Step 1 all queries that filter on certain entry properties should first include PartitionKey, then RowKey and only then other properties if needed. Following this method yields fastest results. Queries that ignore PartitionKey should not be too frequent and should artificially limit the amount of entries they request.
For example, a query from the last post could be modified like this:
var query = (from c in Context.Users
where c.PartitionKey == LiveUserId
select c) as DataServiceQuery<User>;
Step 3. Manipulate the merge option.
Each DataServiceContext-derived class (called AzureModel in previous part) has a property called MergeOption which significantly affects the performance, especially on large entry sets. It’s obvious, that MergeOption.NoTracking would be the best choice for sets that don’t need modification and MergeOption.PreserveChanges (used in previous part’s example) is the safest when sets need to be modified (for example, when user add an entry to the list of previously retrieved from the storage). Now, many developers resort to use one versus another or even creating multiple contexts with each option, when it’s totally feasible to just switch your DataServiceContext.MergeOption property based on current operation – NoTracking for reads and PreserveChanges for modification.
Step 4. Apply general ADO.NET DS performance optimizations.
A good set of ADO.NET DS-specific optimizations is described in this MSDN forum post. A word of caution here - all of these will need to be retested with every new ADO.NET DS and Azure SDK CTP release as they are subject to change.