Welcome to MSDN Blogs Sign in | Join | Help

Lightweight DataTable Serialization

We all know untyped data structures like DataTable and DataSet should not be passed around but sometimes - just sometimes - you got to do it because it makes sense and because it’s the most cost effective way to meet your goals. However passing things like DataTable over WCF can kill performance because of huge serialization overhead in both space and time.

So if you really had to go ahead with this crazy idea of sending DataTable over WCF then here’s the somewhat more efficient serialization technique you can use. The basic idea is to use binary serialization of DataTable and pass that serialized data as byte array along with the schema information so the client can reconstruct it on other end. It’s needless to say that doing this would invariably restrict your WCF clients to .Net so you might also want to include other web method for other clients.

 

public static void LightWeightSerialize(DataTable myDataTable, out byte[] serializedTableData, out string tableSchema)
{
    //Get all row values as jagged object array
    object[][] tableItems = new object[myDataTable.Rows.Count][];
    for (int rowIndex = 0; rowIndex < myDataTable.Rows.Count; rowIndex++)
        tableItems[rowIndex] = myDataTable.Rows[rowIndex].ItemArray;

    //binary serialize jagged object array
    BinaryFormatter serializationFormatter = new BinaryFormatter();
    MemoryStream buffer = new MemoryStream();
    serializationFormatter.Serialize(buffer, tableItems);
    serializedTableData = buffer.ToArray();


    //Get table schema
    StringBuilder tableSchemaBuilder = new StringBuilder();
    myDataTable.WriteXmlSchema(new StringWriter(tableSchemaBuilder));
    tableSchema = tableSchemaBuilder.ToString();
}

 

And here’s the deserializer to go with it:

 

public static DataTable LightWeightDeserialize(byte[] serializedTableData, string tableSchema)
{
    DataTable table = new DataTable();
    table.ReadXmlSchema(new StringReader(tableSchema));

    BinaryFormatter serializationFormatter = new BinaryFormatter();
    MemoryStream buffer = new MemoryStream(serializedTableData);
    object[][] itemArrayForRows = (object[][]) serializationFormatter.Deserialize(buffer);

    table.MinimumCapacity = itemArrayForRows.Length;
    table.BeginLoadData();
    for (int rowIndex = 0; rowIndex < itemArrayForRows.Length; rowIndex++)
        table.Rows.Add(itemArrayForRows[rowIndex]);
    table.EndLoadData();

    return table;
}

 

How efficient is this? It really depends on your data. For instance, with some of my test data with 10K rows I could get about 6X smaller payload size and 30% faster serialization. But as number of rows increases, the speed advantage diminishes compared to built-in XML serializer that you can access via ReadXml/WriteXml. For example, for a million row, above method still gives me 4X smaller payload but serialization is actually 3X slower than built-in XML serializer. So experiment before you go either way!

Posted by shitals | 1 Comments
Filed under: , ,

The Best Culture Invariant Format for DateTime

If you are looking to display how to display DateTime as text without causing confusion to users in different countries then good choices is either "o" or "r". The "o" format is in general more preferable as it also puts timezone offset.

 

long t = DateTime.Now.Ticks;
 
Console.WriteLine((new DateTime(t)).ToString("o"));
 
Console.WriteLine((new DateTime(t, DateTimeKind.Local)).ToString("o"));
 
Console.WriteLine((new DateTime(t, DateTimeKind.Unspecified)).ToString("o"));
 
Console.WriteLine((new DateTime(t, DateTimeKind.Utc)).ToString("o"));

 

Prints followings when actual date time is 2009-11-08T17:16:13.7791953 PST:
2009-11-08T17:16:13.7791953
2009-11-08T17:16:13.7791953-08:00
2009-11-08T17:16:13.7791953
2009-11-08T17:16:13.7791953Z


If you use "r" instead it would print followings:
Sun, 08 Nov 2009 17:26:02 GMT
Sun, 08 Nov 2009 17:26:02 GMT
Sun, 08 Nov 2009 17:26:02 GMT
Sun, 08 Nov 2009 17:26:02 GMT

Posted by shitals | 0 Comments

Find Path of a Command Line Tool

Many times you work on different machines, execute a command line tool but often wonder where that tool is actually installed. One way to figure this out is to look at all environment PATH variables and search them manually in same order as Windows does. But you don’t have to because luckily there is a little known built-in command called WHERE that does that for you:

image

This is similar to Unix commands like WHICH and WHEREIS.

Posted by shitals | 1 Comments

How to Right Align Address in Word Document

May be the silliest thing but how do you align address on the right side of a letter in Microsoft Word 2010? Select the text you need and then click on that little square in Ribbon bar:

image

Type amount of indentation you need. It’s typically 5” for letter size:

image

And you have right aligned address!

image

Posted by shitals | 0 Comments
Filed under: ,

Selecting Random Row From SQL Server Table

It is important to make sure your automated tests covers various real-world data combinations (for instance, some columns could be null or some rows could be duplicate). For perf testing you want to reduce effects of caching by not firing same SQL over and over. In these cases, ability to select a random row for your test could come in handy and here’s neat little trick to do it:

select top 1 *
from table
order by newid()

Posted by shitals | 0 Comments
Filed under:

What’s in a name?

When you want to store the name of a person a typical design starts out by creating two fields (in database or class):

Person
First Name
Last Name

Soon you realize lot of people have middle name, especially, when name change occurs after marriages. So you go and add one more field:

Person
First Name
Middle Name
 
Last Name

This is all good… until you encounter people in countries such as Spain and Cuba who have custom to have two last names. Both are equally important and both are required in any official document (including ones your website or app may print out). So you go in and add one more field while thinking this ought to do it once and for all:

Person
First Name
Middle Name
 
Last Name
2nd Last Name

Not so fast…  Lot of people from Hong Kong and few other places in Asia actually carry two first names. One of these first names is traditional while other is typically a Western/Roman name. Both first names are important and often many people will know only Western/Roman first name of a person although official documents would only refer to traditional names.

For example, consider name of Hong Kong’s Chief Secretary Anson Chan Fang On Sang. Here Anson is English given name, On Sang is Chinese given name, Chan is husband’s surname and Fang is her own surname.

So time to add few more field so we can store everybody’s names on planet without loss of semantics:

Person
Traditional Given Name
English Given Name
Middle Name
 
Last Name
2nd Last Name

Ok… so are we done now? Well, almost! We are still missing at least two critical pieces of information: Salutation and Suffix.

Example of common salutations are Dr, Mr, Mrs, Mr. While salutations are quickly falling out of fashion it might be still required, for example, if you are printing out an official letter to your customer and don’t want to make it look very casual.

Example of common suffixes are Jr, Sr, III, IV etc. These are required in official/legal communication to avoid confusion with other family members of a person.

Person
Salutation
Traditional Given Name
English Given Name
Middle Name
 
Last Name
2nd Last Name
Suffix

Now we have covered most of the globe. There are still two more nice-to-have fields if you want to make your customers happy: Phonetic Given Name and Phonetic Last Name. Remember the times when you call customer support and each time you have a guy struggling to say your name? These two fields would avoid those moments:

Person
Salutation
Traditional Given Name
English Given Name
Middle Name
 
Last Name
2nd Last Name
Phonetic Given Name
Phonetic Last Name
Suffix

So there you have it. A structure that can store almost anybody’s name on planet while maintaining semantics of each component of a name.

Most applications won’t need to go to this extreme because it’s OK to just have one first name and one last name that correctly identifies a person for its purpose even if it’s culturally incorrectly and technically incomplete. However if you are in a business where legal implications are high or if any information loss about your customer is not tolerable then it’s good to think about these possibilities.

There are probably better solutions than giant structure like above just to store name of a person. Instead of having all these different fields you can simply have one free form field, say, Full Name and another field called Full Name Style which takes values indicating how different components of names are arranged:

Person
Full Name
Full Name Style

This structure will make searches for specific components of a name little difficult but it would extend well as your application grows around the planet.
Posted by shitals | 7 Comments

Solving Shared Notebook Sync Issue With OneNote 2010

Since about 3 years we used Groove to share calendar, notes and files within family – until I discovered a feature in OneNote called “Shared Notebooks”. The Shared Notebooks are just like any other OneNote notebooks with a difference that they get synced with other people! If someone added new note or modified a note you get it next time and vice a versa. On conflicts it created new pages and also you can take automated backups. This feature requires either file share or SharePoint. So I’ve now got my personal SharePoint website on Internet (which costs $10 per year) to host our shared OneNotes as well as  our shared calendar that gets synced in Outlook.

Unfortunately in OneNote 2010 Technical Preview, the sync stopped working because OneNote for some reason does not popup a dialog to ask for a password to connect to SharePoint website on Internet anymore. Very troublesome. But here’s the work around I’ve found:

  1. Right click on the Notebook, select Properties.
  2. Click on Change Location button.
  3. Type URL of your SharePoint website. This will popup password dialog.
  4. Cancel all dialogs and sync! It should work now.
Posted by shitals | 0 Comments

Twitter Dishing Out 417 - Expectation Failed to .Net Clients

My little Twitter app was broke since past few days with error 417 - Expectation Failed. Infect most .Net apps calling Twitter APIs would be broken right now so I thought to write this up.

This error is seemingly because Twitter servers have started rejecting Expect HTTP header with value "!00-Continue". I'm not sure if this was planned event or enough warnings were issued to developers but it would be guaranteed to drive you nuts.

The error is because of default behavior in HttpWebRequest object that adds an HTTP header called Expect with value "100-Continue" to almost every outgoing POST request. This header basically tells the server that it's going to send all the data in form in the next request instead of current request so that if server has redirects or auth then it doesn't have to resend it all over again. This is a good thing if your web form has lots of data or if you are on low latency network or most servers in the word have either redirects or auth when submitting forms but a bad thing for server performance because now it gets hit twice for each request. I think performance might be the reason Twitter has turned off support for such two partter POST requests which unfortunately happens to be the default for HttpWebRequest.

In any case, it turns out that HttpWebRequest does all these thing under the hood so to get rid of this error you will need to set a static flag in ServicePointManager class like this:

System.Net.ServicePointManager.Expect100Continue = false;

Above statement will cause elimination of HTTP Expect header from your calls to Twitter and it will be happy again.

I'm using Yedda's C# wrapper for Twitter APIs for QckTwit so above line goes in to start of ExecutePostCommand method.

PS: If you are new to Twitter try out free simple lightweight app QckTwit. It just sits in your system tray, asks you about what you are doing at reminder interval you set, updates the Twitter and gets out of your way!

Posted by shitals | 22 Comments
Filed under: ,

Space Elevator Conference

They do have Space Elevator Conference, complete with a blog! Looks like MSR is participating too (Microsoft employees gets discount - just $225 for a ride of, uhm..., armchair presentations,  for now).

Posted by shitals | 1 Comments

Why would you still get "Strong name validation failed"?

There are not many web pages mentioning this so I would just post this so it comes up in search. Having personally spent 4 hours tracking this little thing down, I would want anyone else to go through same :).

So... if you are using delay signing, you will need to run the following command so you can still debug from Visual Studio.Net:

sn -Vr *,[public key token]

Apparently if you are using Vista 64-bit it just won't work! You will still keep getting error something like,

Could not load file or assembly '[Your file], Version=2.0.0.0, Culture=neutral, PublicKeyToken=[public ket token]' or one of its dependencies. Strong name validation failed. (Exception from HRESULT: 0x8013141A)


You can try viewing Fusion log, cleaning solution, rebooting machine, watch FileMon, run Process Explorer, rebuild everything 10 times... but it just won't work. Infect if you try removing signing and if your app is WPF 3.5 then you might even get even more weird errors like

Could not create an instance of type 'StaticExtension'


The solution is hidden in a one liner in Dan Wahlin's blog:

If you're running a 64-bit installation of Vista you'll need to use the sn.exe located at C:\Program Files\Microsoft SDKs\Windows\v6.0A\Bin\x64\sn.exe


I'm pretty sure tons of developers adopting shiny 64-bit OS are/would run in to this. The root cause here is sn.exe designed for 32-bit doesn't error out instead it happily lets you know that "Verification entry added for assembly '*,*'" successfully! It's not! So I also filed a bug in out Connect web site. Please vote to make 64-bit world a better place!

Posted by shitals | 11 Comments

A quicker way to Twitter

Past weekend, I finally thought about giving Twitter a try and started looking for a client app that just allows me to very quickly update the status with a global keyboard shot cut. I'm not in to following anyone or replying anyone but wanted this very simple app with one text box. Apparently no such apps existed in Twitter Fan Wiki which actually turned out to be a good thing because I immediately started looking at Twitter's API and any C# wrappers. About 90 minutes later I'd my app ready. On the way I also added functionality to break the big updates in to multiple twits. This little (literally) app is now open sourced on CodePlex and ready for you to try it out!

image

Posted by shitals | 1 Comments
Filed under: ,

Run As... is back in Vista!

Yes, this dearly missed shell context menu item is now available again in Windows Vista, thanks to Mark Russinovich. This little new utility is a new addition to Sysinternals toolset and can be downloaded here. Just run ShellRunAs /reg to register context menu for Shell "Run as different user". This menu will then be available in Start menu items as well as Windows Explorer right clicks :).

image

Posted by shitals | 2 Comments
Filed under: ,

Phun With Physics Simulations

This addictive program can easily keep you busy for rest of the weekend so be careful :). Phun is a physics simulator that even kids can use and its absolutely a delight. I watched the video and had to immediately download to give it a try. At first the interface might seem not as easy but after reading tutorial in main page and forums, you might be able to accomplish everything shown in video in less than 15 min of learning curve! Simply the easiest, powerful and most fun physics program I’ve come across.

image

Enjoy!

Posted by shitals | 0 Comments
Filed under: ,

Hello Word

Hello, I'm Shital Shah. I currently work with NGIM team at Microsoft as SDE. I intend to use my MSDN blog for mostly technology/programming related stuff with occasional detours :).

Let me put out standard big disclaimer:

All views expressed in this blog are mine and not my employer or any teams at Microsoft. The information, data and opinions being presented here are neither validated nor endorsed by Microsoft and should not be considered as an official statement of the company.

So let's get started. One of the classic confusion that new Linq users come across is how to write composite key (or multi-key) joins in Linq to Objects (especially in C#). Unfortunately the join keyword in C# Linq is pretty limited compared to SQL, however, there are things that you can do in Linq that you can’t do with SQL joins. Hopefully this short entry will demonstrate some possibilities.

There are at least 4 ways to do composite key joins in Linq to Objects. Let's say you had two sequences defined like this:

   1: class MainRow
   2: {
   3:     public int PK1; public int PK2;
   4: }
   5:  
   6: class RelatedRow
   7: {
   8:     public int PK1; public int PK2;
   9:     public string RelatedText;
  10: }
  11:  
  12: //Somewhere else
  13: MainRow[] mainTable; 
  14: RelatedRow[] relatedTable;

And now let's say you want to produce a "de-normalized" sequence that combines rows from mainTable and relatedTable by joining on PK1 and PK2.

Option 1

Simply join in where clause:

   1: var join1 = from mainRow in mainTable
   2:             from relatedRow in relatedTable
   3:             where mainRow.PK1 == relatedRow.PK1 && mainRow.PK2 == relatedRow.PK2
   4:             select new JoinReturn
   5:             {
   6:                 MainRowPK1 = mainRow.PK1,
   7:                 MainRowPK2 = mainRow.PK2,
   8:                 RelatedRowPK1 = relatedRow.PK1,
   9:                 RelatedRowPK2 = relatedRow.PK2,
  10:                 RelatedText = relatedRow.RelatedText
  11:             };

Here, we are essentially doing cross join and filtering resulting sequence on two keys. Yes, very inefficient if you had large sequences to join but its probably the most efficient for small sequences!

Option 2

Join on temp object that has composite keys:

   1: var join2 = from mainRow in mainTable
   2:             join relatedRow in relatedTable
   3:             on new { KeyPart1 = mainRow.PK1, KeyPart2 = mainRow.PK2 } equals new { KeyPart1 = relatedRow.PK1, KeyPart2 = relatedRow.PK2 }
   4:             select new JoinReturn
   5:             {
   6:                 MainRowPK1 = mainRow.PK1,
   7:                 MainRowPK2 = mainRow.PK2,
   8:                 RelatedRowPK1 = relatedRow.PK1,
   9:                 RelatedRowPK2 = relatedRow.PK2,
  10:                 RelatedText = relatedRow.RelatedText
  11:             };

This you would see a lot and is the most efficient among all options described here for large data sets. Most SQL enthusiasts however will wonder about the time it takes to create all those temporary objects and if GC will kill itself looking at code like this. Believe it or not, in all my experiments, this seems to perform well for sequences as large as 10,000 elements. Also notice that you can go far beyond simple comparison. For instance, you can use custom IComparer along with a class to implement complex joins that uses all kind of operators other than equals or custom logic that accesses external data stores, Internet in real time and so on. You can go far beyond SQL here.

Option 3

Use powerful SelectMany:

   1: var join4 = mainTable
   2:             .SelectMany(mainRow => 
   3:                 relatedTable.Where(relatedRow => mainRow.PK1 == relatedRow.PK1 && mainRow.PK2 == relatedRow.PK2)
   4:                 .Select(relatedRow =>
   5:                     new JoinReturn
   6:                     {
   7:                         MainRowPK1 = mainRow.PK1,
   8:                         MainRowPK2 = mainRow.PK2,
   9:                         RelatedRowPK1 = relatedRow.PK1,
  10:                         RelatedRowPK2 = relatedRow.PK2,
  11:                         RelatedText = relatedRow.RelatedText
  12:                     }
  13:             ));

SelectMany is very powerful feature and above fragment shows how you can leverage it to perform multi-key joins. Notice that here too you can use operators such as <, >, <=, >= if your join involves them. However this method isn't as efficient as Option2 because it won't do internal hash table mapping to do its job in one fell swoop instead it will take each row in mainTable separately to get related rows. Well, still to keep in mind this option if your mainTable has only few rows.

Option 4

Using partial joins:

   1: var join5 = from mainRow in mainTable
   2:             join relatedRow in relatedTable
   3:             on mainRow.PK1 equals relatedRow.PK1 into subJoinTable
   4:             from subJoinRow in subJoinTable
   5:             join  mainRow2 in mainTable 
   6:             on subJoinRow.PK2 equals mainRow2.PK2
   7:             select new JoinReturn
   8:             {
   9:                 MainRowPK1 = mainRow.PK1,
  10:                 MainRowPK2 = mainRow.PK2,
  11:                 RelatedRowPK1 = subJoinRow.PK1,
  12:                 RelatedRowPK2 = subJoinRow.PK2,
  13:                 RelatedText = subJoinRow.RelatedText
  14:             };

Here we are doing slightly better than Option1 by getting intermediate table which would be pretty large if your input sequences are large (but not as bad as full cross join in Option1). This is an useful option if you knew PK1 is a major filter on sequences. On average, Option 3 and Option 4 might perform comparably.

In general, if you are doing complex joins on very large sequences using Linq to Objects then there is something wrong and you are bound to loose on performance compared to relational databases. You might give this a consideration in your designs.

Posted by shitals | 1 Comments
Filed under: , ,
 
Page view tracker