- Advanced resource debugging with Resview
-
In an earlier blog about resource fallback essentials, I said that the resource diagnostic tool resview is included with the framework sdk. That was wrong – it turns out we previously released resview as a gotdotnet sample, but it’s no longer accessible.
Since resview is helpful in our own investigations, we decided to re-release resview on code gallery [download resview].
About Resview
Resview looks specifically for resource embedding problems that can result in runtime resource lookup problems. It can tell you about critical errors; for example, if the assembly is missing resources or the embedded resource file name is incorrect. It also gives warnings about best practices – for example, reminding you to include NeutralResourcesLanguagesAttribute.
How to use resview
You can run resview on a .resources file, a main assembly, or a satellite assembly. Let’s run resview on assemblies from the ResourceManagerSample project, mentioned in the previous blog.
To view the resources embedded in the main assembly, run “resview Fallback.exe”.
To view the resources embedded in a satellite assembly, run “resview FallbackTest.resources.dll” on any of the satellite assemblies.
Let’s look at the output when running on the main assembly. I’ve added numbers to the end of interesting lines, which will be described afterwards.
ResourceManagerSample>resview FallbackTest.exe
Microsoft (R) Visual .resources Viewer Version 1.83 [CLR version 4.0.20209.0]
FallbackTest.exe is a main assembly.
ResView : warning: This assembly does not have a NeutralResourcesLanguageAttribute.
Using this attribute gives a slight perf optimization for one culture, and
is required for localizable ClickOnce apps. (1)
Public key token: null [This must not be a shipping binary] (2)
Processing FallbackStrings.resources... (3)
Resources from FallbackStrings.resources...
greeting = hello
eow = weekend
Found 2 resources in FallbackStrings.resources. (4)
Found 1 resource file, with a total of 2 resources.
Encountered 0 errors, 1 warning.
Total size of all .resources files: 244
Total length of keys (chars): 11 Average: 5.5
Total length of String values (chars): 12 Average: 6.0
Number of Strings: 2 Byte[]'s: 0 Streams: 0 Others: 0
# of unique Types: 1
file est. overhead: 221 bytes 110.5 bytes/resource 90.6% of the file
Output time: 31 ms
This tells us the following:
1. The main assembly didn’t use the NeutralResourcesLanguageAttribute. We strongly recommend that you add this attribute. About the NeutralResourcesLanguageAttribute.
2. There’s no public key token. This serves as a reminder, in case your assembly needs a strong name.
3. Resview found an embedded resources file named FallbackStrings.resources.
4. Lists the resources in that file
In addition to the general diagnostic information resview provides (for example, if the files were named incorrectly), 3 and 4 combined can be useful for debugging resource failures. For example, if you get an error during resource lookup, you can check whether the resources file the ResourceManager is searching for is actually embedded in the assembly. If so, you can check whether the key/value pair is actually in the resources file.
- Cleaning up after yourself (Dispose(void) doesn't get called for you)
-
Despite my large and growing number of Dispose-related blogs, I really don't like writing about Dispose. Or rather, I don't like that Dispose is in such a confusing state that it requires so much reading material. :) But here we are again. I'd like to start posting some of the common Dispose questions I get, in hopes that it will save time for others.
I got an email asking why StreamWriter's data isn't getting flushed to disk. In their scenario, the problem manifested as a partial file; i.e., the final block wasn't flushed to disk. But they also showed a smaller repro in which no data is flushed to disk.
using System;
using System.IO;
using System.Threading;
public class StreamWriterWrapper : IDisposable {
public StreamWriter writer;
public static void Main(string[] args) {
StreamWriterWrapper test = new StreamWriterWrapper(@"C:\temp\test.txt");
test.writer.WriteLine("some text");
test.writer.WriteLine("some more text");
}
public StreamWriterWrapper(string filename) {
writer = new StreamWriter(filename);
}
public void Dispose() {
writer.Close();
}
}
The user's comment was: it looks like .NET isn't calling Dispose for us, even though we implemented IDisposable and provided a Dispose method, and this is why the data isn't getting flushed.
The user is completely right. And this is a key difference between Dispose and finalization. Dispose(void) doesn't get called for you; your code needs to explicitly call it. This is different than finalization, and since Brian has already written up a great description of this difference, I'll refer you to his blog for more details:
http://blogs.msdn.com/bclteam/archive/2007/10/30/dispose-pattern-and-object-lifetime-brian-grunkemeyer.aspx
- What does the NeutralResourcesLanguageAttribute do?
-
NeutralResourcesLanguageAttribute marks the neutral culture for an assembly. That sounds self-referential, but a full description would require another blog post. To avoid getting bogged down, think of neutral culture roughly as the default language. (Fingers crossed that Michael Kaplan doesn't flame me for that oversimplification.)
The NeutralResourcesLanguageAttribute does two things:
1. Can speed up resource probes.
First note that the default location for neutral resources is the main assembly. In other words:
[assembly:NeutralResourcesLanguageAttribute("en-US")]
...is the same as:
[assembly:NeutralResourcesLanguageAttribute("en-US", UltimateResourceFallbackLocation.MainAssembly)]
If you have one of the above attributes on your assembly, then the ResourceManager looks for "en-US" resources directly in the main assembly, instead of searching first in an “en-US” folder. Since resource probes can be expensive, this attribute can help improve perf.
2. Specifies a fallback culture if resource probes fail; i.e. neutral resources are the resources that should always be there.
Suppose you want to deploy an app on a machine typically set to de-DE culture. Suppose you have partially localized strings for de-DE and de, and a full set of resources for fr-FR. You can use NeutralResourcesLanguageAttribute to say fr-FR resources are always there. So if the resource probe doesn't find an entry for de-DE or de, then it will fallback to fr-FR.
You can use the accompanying attribute UltimateResourceFallbackLocation.Satellite to say these resources are located in a satellite assembly, i.e.:
[assembly:NeutralResourcesLanguageAttribute("fr-FR", UltimateResourceFallbackLocation.MainAssembly)]
- Making a StreamWriter usable even after given garbage characters
-
I recently got a question from a customer using a StreamWriter with a UTF-8 encoding. The StreamWriter threw an EncoderFallbackException on an attempt to write “garbage” Unicode characters. For example, on an attempt to write U+DFC9, which is only half of a Unicode character (not a complete surrogate pair) an EncoderFallbackException was thrown.
That part seemed fine since the input was bogus. However, after that exception is thrown, the StreamWriter instance became effectively unusable; even calling WriteLine() on it threw EncoderFallbackException. So the customer asked how to make the writer usable even after the exception.
This behavior seems bad but it isn't a bug that, by default, the StreamWriter becomes unusable after getting bogus data. This was a design decision (from long ago) to make StreamWriter tolerant of encoding errors when reading but very strict when writing. Anything you do subsequently –- Flush(), Close(), etc would hit the encoding error again. The idea is, when you encounter an initial error, you should probably be concerned about fidelity of the rest of the stream, so just bail out as soon as you detect the stream is corrupt.
In this case, the customer was fine with not attempting to write garbage characters, but didn't want to StreamWriter to become unusable; for example to avoid losing the previous data.
Fortunately there's a solution since the encoding's EncoderFallback property can be set to emit fallback characters instead of throwing an exception. In this example, the encoding's default fallback behavior was to throw an exception; however, you can set the property to use a replacement character, e.g.: Encoding.EncoderFallback = EncoderFallback.ReplacementFallback. Then, instead of getting an EncoderFallbackException, the bogus characters are replaced with the fallback, and the StreamWriter continues to be usable.
- When to call Dispose
-
A recent internal email thread unearthed extreme differences of opinion about when Dispose(void) should be called on an IDisposable. This led to a long discussion and a realization that -- while it seems like we’ve said everything there is to say about Dispose -- it’s time for some more Dispose guidance. This blog summarizes our initial thoughts about when you should call Dispose. Input was provided by Jeffrey Richter, Mike Boilen, Brian Grunkemeyer, Joe Duffy, and Shawn Farkas. We'd like to hear your feedback as well.
Before diving in, some context:
- The question of when to call Dispose is just a small piece of the Dispose puzzle. For information about correctly implementing the Dispose pattern, see the updated Dispose guidelines on Joe Duffy’s blog.
- As a quick refresher, calling Dispose(void) releases resources deterministically (as opposed to nondeterministic cleanup at finalization).
The Debate
Let’s look at the different opinions.
“Always call Dispose” camp
The people in this camp, which included me, have been burned by cases in which failing to call Dispose can lead to bugs. For example, failure to call Dispose on a FileStream can lead to hard-to-spot bugs where the file is temporarily unavailable. Even worse, failure to explicitly dispose some .NET crypto classes can lead to an exception thrown on the finalizer thread. Based on these and other examples, we concluded the pit of success is to always call Dispose.
“Avoid calling Dispose” camp
Jeffrey Richter was the lone voice in this camp, but he was up to the challenge. He pointed out that many IDisposables are not as clear cut as FileStream, Socket, etc. For example, a Winforms app has IDisposables that are fonts, controls, etc. For these, explicit cleanup isn’t necessary in mainstream scenarios. Calling Dispose on each of these would be incredibly tedious – similar to (but not as bad as) C++ destructor style of cleanup.
Jeffrey also provided an example where Dispose shouldn't be called. The IAsyncResults returned by FileStream.BeginRead and BeginWrite have a WaitHandle member, which implements IDisposable. Jeffrey said some users think that they should aggressively fetch and dispose this WaitHandle. This can obviously have bad consequences if done prematurely, but it has another problem. The WaitHandle is lazily allocated, so fetching it just to dispose it causes an unnecessary allocation (i.e. negatively impacts performance).
He pointed out similar APIs where there is confusion over whether to call Dispose; in general, these are APIs in where there is ambiguity about who owns the IDisposable.
Who won?
Well, everyone...or no one. (Actually, probably Jeffrey, given that I didn't think he could budge my opinion at all.)
In any case, from our discussion, it was obvious we haven’t enunciated clear guidance about when to call Dispose.
Given the already confusing state of Dispose, we’d like to keep this guidance as simple as possible. Previously (before that email thread), I thought we wanted to tell users to always call Dispose, since doing so prevents bad side effects described above. The IAsyncResult / unnecessary allocation example could be solved by telling users not to aggressively fetch and dispose members. (In fact, this needs to be advertised no matter what.) This guidance is very simple. So why complicate things?
Jeffrey’s point about the impact of this guidance on WinForms-like apps is crucial. Having to call Dispose on each font and control in a WinForms app could significantly impact coding patterns and would be viewed as tedious. It’s not _as_ tedious as C++ destructors (since at least managed memory is handled), but it’s still a lot of bookkeeping that would be nice to avoid, as long as it's safe.
Proposed guidance about when to call Dispose (draft)
The simplest story we found combining these two concerns (correctness and usability) was to divide IDisposables into resource categories and give specific guidance for each category. The key observation, provided by Mike Boilen, is that you should call Dispose when failing to do so can lead to visible side effects. This approach covers most of our Dispose concerns; special cases are listed in the next section.
Resource Categories
1. Named/shared OS resources
- Examples: files, sockets, named pipes, memory mapped files
- Failure to call Dispose: likely to have visible side effects. Even if the resource is wrapped with a SafeHandle or class has a finalizer, problems can manifest as the resource being unavailable for some period of time
- When to call Dispose: if you own it, Dispose it
2. Other/unnamed resources
- Examples: native fonts, controls, bitmaps (native memory)
- Failure to call Dispose: only has visible side effects when large amounts are used. These resources have higher limits than resource category 1. In “typical” use, limits are not hit*
- When to call Dispose: only if limit is likely to be hit. Rely on GC cleanup for typical use
To keep this simple for users, we could flag in the docs the classes that you must call Dispose on (i.e. resource category 1). Guidance for resource category 2 is left vague at the moment; the intent for now is to get feedback about whether this approach addresses usability concerns, while remaining as simple as possible.
*Scenarios that expect to stress the resource may hit limits and should consider calling Dispose.
Other Dispose-related special cases
Unfortunately we have to complicate the story. These will also have to be handled on a case-by-case basis.
1. Classes with very different lifetimes: We’d like to recommend not holding strong references to objects that have shorter lifetime
2. Impersonation-related problems: Dispose must be called for crypto classes in which finalizer thread runs under different identity and process is ripped.
3. Bad/degenerate cases: if we suggest not to call Dispose for resource category 2, and a class using one of those resources has not handled cleanup via safehandles or finalizers, then we’re causing a new problem. Do we even care about this?
4. Ambiguous resource type: for some APIs that return an IDisposable, it’s not obvious what kind of resource is wrapped. If failure to call Dispose for any of the resources may lead to observable side effects, the docs must call this out.
5. Ambiguous ownership: for some APIs that return an IDisposable, it’s not clear whether the method allocated the IDisposable and you own the single reference, or you’re referencing a shared instance. Ownership ambiguity will require clarification in docs. In any case, you shouldn’t fetch IDisposable members and dispose them, as in the IAsyncResult case.
What’s next?
Whatever distinction we eventually use, API docs should explicitly call out IDisposables that must be Disposed.
Any comments?
- FileSystemWatcher doesn't fire events for monitored network drive after changing InternalBufferSize
-
Problem and .NET Fix
Some customers have observed that a FileSystemWatcher monitoring a network drive fails to fire events after setting InternalBufferSize to certain values. The problem is that the value provided to InternalBufferSize is invalid and FileSystemWatcher attempts to notify your error handler when you enable raising events, but without an error handler you won't know about the problem.
The sample code below demonstrates this problem. If you uncomment the line that adds an error handler, then you'll get notification. Note that this is one of the many reasons it's important to add an error handler.
Unfortunately this particular failure is a bit more complicated than adding an error handler. In current releases, FileSystemWatcher has a bug in which it returns the wrong error if you request an invalid buffer size. Even if you add an error handler, ErrorEventArgs.GetException() will show this exception:
System.ComponentModel.Win32Exception: The operation completed successfully
But this is what you should see:
System.ComponentModel.Win32Exception: The supplied user buffer is not valid for the requested operation
We've already fixed this bug and the fix will appear in the next major runtime release. To clarify, with the fixed version, you will still get the exception in the error handler, but the exception message will be correct.
Workaround
With and without the runtime bug, setting the InternalBufferSize to 25 * 4096 is failing. To find a good buffer size, you can use the fact that, if your buffer size is valid, then it won't result in your error handler being called (i.e. it won't really get called to tell you the operation was successful). So you can experiment with buffer size; for example this succeeds for me:
watcher.InternalBufferSize = 4 * 4096;
You can try this out using the sample code below.
Sample Code
using System;
using System.IO;
class Repro
{
static void Main()
{
// string networkDir = ???;
FileSystemWatcher watcher = new FileSystemWatcher(networkDir, "*.*");
watcher.Created += OnCreated;
// uncomment out line below to add the error handler
// watcher.Error += OnError;
watcher.IncludeSubdirectories = true;
watcher.InternalBufferSize = 25 * 4096;
// watcher.InternalBufferSize = 4 * 4096;
watcher.NotifyFilter = NotifyFilters.FileName;
watcher.EnableRaisingEvents = true;
while (Console.ReadLine() != "x") { }
watcher.Dispose();
}
private static void OnCreated(object source, FileSystemEventArgs e)
{
Console.WriteLine("File: " + e.FullPath + " " + e.ChangeType);
}
private static void OnError(object source, ErrorEventArgs e)
{
Console.WriteLine("Error!");
Console.WriteLine(e.GetException());
}
}
- VisualStudio 2008 install failing because of pre-release products
-
I can get my work machines into pretty odd states due to various private/beta versions of Microsoft products I've installed. While trying to install VS 2008 on one of my machines, install kept failing, saying I had a pre-release product installed. Problem was, I'd already uninstalled that product and it didn't show up in Add/Remove programs. Varun Gupta's blog posts led me through fixing this problem, so I wanted to show that process in case anyone else is running into thsi issue.
Let's start from the beginning.
The first time I tried to install VisualStudio 2008, I got the following error dialog, saying that a pre-release of Microsoft Document Explorer was installed.
![clip_image002[5]](http://blogs.msdn.com/blogfiles/kimhamil/WindowsLiveWriter/VisualStudio2008installfailingbecauseofp_AC34/clip_image002%5B5%5D_thumb.jpg)
That dialog links to http://www.microsoft.com/express/support/uninstall/, which describes every product that should be uninstalled. This list included Document Explorer. To be safe, I uninstalled every program in the list.
Then I launched the VisualStudio 2008 setup again, but got the same error dialog, saying that a pre-release version of Microsoft Document Explorer was installed. Microsoft Document Explorer no longer appeared in Add/Remove programs, so it was time for some Live searches. :)
I landed on Varun Gupta's VS 2008 Troubleshooting Guide. Note that section 1,B,3,b,ii in his guide applies to this case. Following his instructions there, I ran msiinv.exe to get the list of installed components. In the output file, there was this entry:
Microsoft Document Explorer 2005
Product code: {0913C927-03D9-3FE1-B8A4-7A4C0C435A4F}
Product state: (5) Installed.
...
Varun's next step says to use msiexec /x to remove the product:
msiexec /x {0913C927-03D9-3FE1-B8A4-7A4C0C435A4F}
I re-launched VS 2008 setup, and it worked! Thanks Varun for posting these useful instructions. Now I can get back to work.
- Difference between Disposing and Finalizing (referral)
-
I've gotten some pings on the difference between Disposing and Finalizing, which I didn't discuss in my Dispose/Close post. Fortunately Brian Grunkemeyer has written an excellent blog on the topic:
http://blogs.msdn.com/bclteam/archive/2007/10/30/dispose-pattern-and-object-lifetime-brian-grunkemeyer.aspx
- The never-ending saga of Close/Dispose
-
I made some quick updates in my recent post to describe the difference between Close and Dispose for SqlConnection:
http://blogs.msdn.com/kimhamil/archive/2008/03/15/the-often-non-difference-between-close-and-dispose.aspx
If you're wondering about the difference between Close and Dispose for classes I didn't mention, feel free to comment. I'm curious to unearth any other occurrences.
- The (often non-) difference between Close and Dispose
-
Some classes in the .NET framework, such as System.IO.FileStream, have both a Close() and Dispose() method. The natural question is what's the difference, and when you should use one versus the other.
The framework guidelines refer to Close and Dispose in the following context: occasionally you may prefer to use a domain-specific word instead of Dispose; for example, for files you may prefer to expose a method called Close(), because this word may be easier for users to discover. But even in that case, Close should just call Dispose.
That's pretty straightforward, but confusion abounds. A few months ago, while investigating a bug innocently titled "ResourceReader doesn't provide a Dispose method", I realized the cause of the confusion: understanding the variations of Dispose and Close throughout the framework, as well as the guidance going forward, requires a surprising amount of background. If you're confused by conflicting (or seemingly conflicting) information about Close/Dispose in forums, etc, I hope this will help make sense of it all.
What is the difference between Close and Dispose?
In most .NET framework classes, there is no difference between Close() and Dispose(). For example, these methods do the same thing in the System.IO.Stream hierarchy, and it doesn't matter which of the two methods you call. You should call one but not both.
There are exceptions; for example, System.Windows.Forms.Form and System.Data.SqlClient.SqlConnection have different behavior for Close() and Dispose().
But if you're looking for a rule of thumb, it's best if you think of Close() and Dispose() as the same in general, and others as special cases. We'll discuss these exceptions later.
Terminology note: I've been referring to Dispose() to distinguish it from Dispose(bool). Subsequently, when I use "Dispose" I mean Dispose().
But Close sounds less invasive than Dispose...?
This is a common perception about the difference between Close and Dispose, and it's a reasonable guess. We could have set the expectation up front that Dispose is always the name of the method that performs deterministic cleanup. Classes could have added a Close method to do something domain-specific without necessarily implying disposal. But as it stands, they generally do the same thing.
But why would they do the same thing?
This is where it gets weird. Before Whidbey, many classes used this guidance: if there's a domain-specific word that's more discoverable to users than Dispose, hide Dispose by explicitly implementing it, and instead expose the domain-specific word. This guidance was primarily intended for file-related classes, and the domain-specific word in that case is Close.
Any class that hid Dispose and instead exposed Close established that Close and Dispose were equivalent for that class (and its subclasses). Note that it's not obvious that Close and Dispose should do the same thing in general. For Sockets, some users initially guess that Close just closes the Socket (but you can reopen). But Socket.Close does the same thing as Dispose. (FYI, Socket.Disconnect is the method that makes it reopenable.)
Around Whidbey, it was realized that the rules around Dispose needed to be tightened up. C++ destructors -- which are deterministic -- needed to map to Dispose (the general Framework solution for deterministic cleanup). This motivated some investigation in which we realized some practices made it easy for classes to break disposal chains in a class hierarchy: these included hiding Dispose() and making Dispose() virtual.
To help prevent this problem, many framework classes (such as the Stream hierarchy) were cleaned up to ensure Dispose() was public and non-virtual, and the subclasses were properly chained via Dispose(bool).
There are still classes such as ResourceReader in which Dispose is hidden. This is because sealed classes were considered lower priority; the interesting problems happen for class hierarchies, in which Dispose(bool) needs to be chained.
Even after this cleanup, there remain at least 2 classes in which Close and Dispose are different. This is because these classes weren't hiding Dispose for Close: Close and Dispose really did something different for these classes.
Hiding Dispose, explicit implementation...what does that mean?
Hiding Dispose through explicit interface implementation means doing this:
void IDisposable.Dispose()
{
Dispose(true);
}
...as opposed to this:
public void Dispose()
{
Dispose(true);
}
Let's look at the consequences of using a class that's explicitly implements Dispose. ResourceReader explicitly implements Dispose, so let's use that as an example.
ResourceReader rr = new ResourceReader(resourceStream);
The following yields a compile error: 'System.Resources.ResourceReader' does not contain a definition for 'Dispose'
rr.Dispose();
Casting to IDisposable lets you call Dispose:
((IDisposable)rr).Dispose();
Since Close and Dispose do the same thing, the above is equivalent to:
rr.Close();
Unfortunate side-effects of hiding Dispose
The guidance about hiding Dispose got applied in interesting ways. For example, in some crypto classes, Dispose is hidden and the domain-specific word for disposal is Clear. Shawn Farkas pointed out that this has caused problems: some people don't recognize Clear as a disposal method, and they don't even know they're supposed to dispose the object.
Hiding Dispose causes confusion for many users, who think of Dispose as the go-to method for disposal. From that perspective, hiding Dispose in favor of the domain-specific word can actually make the disposal method less discoverable. The need to cast it to IDisposable is unintuitive to many users.
What about the classes where Close and Dispose are different?
In my search, I only confirmed that 2 classes have different behavior for Close and Dispose: System.Windows.Forms.Form and System.Data.SqlClient.SqlConnection. There could be more because more because my search wasn't exhaustive; I identified likely candidates and only looked into those.
Let's look at System.Windows.Forms.Form. Here are some interesting excerpts from the docs for Form.Close:
When a form is closed, all resources created within the object are closed and the form is disposed.
…
The one condition when a form is not disposed on Close is when it is part of a multiple-document interface (MDI) application, and the form is not visible. In this case, you will need to call Dispose manually to mark all of the form's controls for garbage collection.
The second part seems to indicate they're different, but it's a bit buried in the description. Let's open it in reflector to be sure.
public void Close()
{
if (base.GetState(0x40000))
{
throw new InvalidOperationException(SR.GetString("ClosingWhileCreatingHandle", new object[] { "Close" }));
}
if (base.IsHandleCreated)
{
this.closeReason = CloseReason.UserClosing;
base.SendMessage(0x10, 0, 0);
}
else
{
base.Dispose();
}
}
So indeed, it won't go down the Dispose path if base.IsHandleCreated is true.
For System.Data.SqlClient.SqlConnection, the primary difference between Close and Dispose is that Dispose nulls out _userConnectionOptions and _poolGroup, as shown below. While I'm not familar with the code in System.Data.dll, by browsing around in reflector it looks like this has the effect of making the SqlConnection object unusable; i.e. it throws if you try to re-open. Anyone familiar with this class can feel free to chime in if there's a better way to describe this. :)
this._userConnectionOptions = null;
this._poolGroup = null;
Guidance now
Joe Duffy's blog has the complete updated Dispose Design Guidelines. This is great reading for detailed guidance around Dispose and finalization (including when they should be used, implications for class hierarchies, etc):
http://www.bluebytesoftware.com/blog/PermaLink.aspx?guid=88e62cdf-5919-4ac7-bc33-20c06ae539ae
Based on this experience, I'd encourage you to think very carefully before attempting to use a domain-specific name instead of Dispose -- we're now at a place where users are familiar with Dispose and it is the most discoverable method name.
I think we should also impose some order by making it very clear in our docs when Close and Dispose are the same, and clearly call out any cases that differ.
Thanks to Krzysztof Cwalina, Joe Duffy, and Brian Grunkemeyer, who provided valuable input to my barrage of emails on this topic. :)
- What's been happening with Long Paths?
-
It appears that part 3 of the long path blog series on the BCL blog is now nearly a year overdue! This sounds hard to believe, but BCL really has been that busy. On the bright side, we've been working on exciting projects, such as Silverlight, and now that we're coming up for air, we should have tons more stuff to blog about.
No more excuses. I know a year is way too long to wait for the conclusion of a series. Well, unless it's Lord of the Rings. So I've been getting back to long paths top priority, and part 3 will be out this week.
- Hashtable and Dictionary thread safety considerations
-
Let’s start with the basics:
- System.Collections.Hashtable is multiple-reader, single-writer threadsafe.
- System.Collections.Generic.Dictionary<K,V> has no thread safety guarantees.
Consider a class that has a Dictionary member, and assume that multiple threads may access it simultaneously. Taking a lock is expensive, so what do you think about the following attempt to defer taking a lock?
class ElementManager {
private Dictionary<String, Element> cache =
new Dictionary<String, Element>();
private Object o = new Object();
// Not to ruin the punchline, but I don't want someone to copy
// and paste this flawed code...
// WARNING: BAD CODE! DON’T USE!
public Element GetElement(String name) {
Element theElement;
bool result = cache.TryGetValue(name, out theElement);
if (!result) {
lock (o) {
result = cache.TryGetValue(name, out theElement);
if (!result) {
theElement = new Element();
cache.Add(name, theElement);
}
}
}
return theElement;
}
}
GetElement only takes the lock if TryGetValue found no result. So from a performance perspective, it’s a nice try. But it’s not correct: the value returned may not actually correspond to the name. (That’s not just because value may be stale data; value may have never corresponded to name).
The problem occurs if a reader thread outside of the locked region gets an element while a writer thread adds an element and causes a resize in the Dictionary to occur. If the timing is just right (actually, just wrong) then the reader thread may index incorrectly into the Dictionary (internal implementation details) and return the wrong element.
This thread-safety difference is a little-known twist in our guidance to convert to using generic collections. If you’re converting from Hashtable to generic Dictionary, you’ll want to consider whether you may have taken a dependency on Hashtable’s MR-SW thread safety without knowing it. (This isn’t very common, but it has happened)
IMO, better concurrency support is one of the most compelling feature requests for the collections space. This is important for both performance and usability.
The usability argument may not be obvious, so here’s some background. Our non-generic collections provided synchronized wrappers, which inserted a lock around operations for thread safety. Synchronized wrappers weren’t included with the generic collections, with the argument that one often needs to synchronize at a higher level, over multiple operations (e.g. check if a stack is non-empty and then pop). However, we didn’t provide alternate framework solutions, meaning that users have to implement their own locking for generic collections. This is unnecessarily tedious in the common scenario when you don’t need to synchronize over multiple operations.
This doesn’t mean we should add back the synchronized wrappers; there are options with much better performance, but more on that later.