In yesterday's post, I hinted at a method to improve memory usage in your applications. This trick can be applied anytime you have many strings in your application that have the same value but were allocated separately and thus each take up space of their own.
This is something that you may find whenever you're reading data from some external source into your application, where that data has no information to help you figure out whether it's a repeated instance or not. For example, when you read data from a database, typically character values will be read as different strings, even if they have the same value from one row to the next.
The process of taking a string and checking whether you already had one with the same value to reuse it is called atomizing a string. This has two nice properties.
By the way, XmlReader already applies this mechanism to things like element names, so we can borrow the NameTable class to do this work for us. We will use the Add method to add a string value if we haven't seen it before ("atomizing it"), or get the reference to an already-atomized string with the same value.
We can go back to the MeasurePlainObject method we wrote yesterday and touch it up like this:
...details.TrimExcess(); // While we're at it, try having less CarrierTrackingNumber instances.System.Xml.NameTable nt = new System.Xml.NameTable();foreach (var detail in details){ detail.CarrierTrackingNumber = (detail.CarrierTrackingNumber == null) ? null : nt.Add(detail.CarrierTrackingNumber);} GC.Collect();totalMemoryAfterWork = GC.GetTotalMemory(true);...
Now when I run this on my machine, I get the following values.
C:\work\repro>mem.exe --poco POCO (121317 records): Bytes before work: 49160 Bytes after work: 13575952 Delta: 13526792
Recall that without this, these were the values I had for the POCO case.
C:\work\repro>mem.exe --poco POCO (121317 records): Bytes before work: 49160 Bytes after work: 15860472 Delta: 15811312
This is a cool 2,284,520 bytes for very little code. A few things to bear in mind.
Enjoy!
How does this differ from using String.Intern()?
The difference between this method and using String.Intern() is that String.Intern atomizes the string into an internal CLR table. This means that even after there are no more references, or even after your AppDomain has completely unloaded, the interned string will still be around.
In this case, we "intern" the strings at one moment in time, but they are still regular string references. Once the last reference to the string is collectible, the GC can go reclaim the memory.
Duh. Why didn't I see that? I tend to work with strings that do live the life of the appdomain, so I'll take that as my weak excuse.
Thanks!