Psychic classes have the appearance of ignoring data provided to it in an attempt to provide you with an answer they predict is better for the situation. It’s impossible to look at a the data provided to an instance of the class and understand what queries on the object will return because it may think of a better answer for you, or a better piece of data to look at.
This comes from an example I ran into about a month ago. I work on an IDE and naturally deal with a lot of parse trees and tokens. Parsing everything all the time is expensive so naturally the results are cached in various places for performance reasons.
While debugging one such cache I noticed some strange behavior. The cache wasn’t returning the right tree for the input it was provided. So I decided to dig into the code a bit.
This cache takes several different forms of input which has no common base class or interface. What I noticed though is that when resetting the input of the service, it would not clear the existing cache or the previous form of input. Also because of the way the code loaded certain forms of input had precedence over others. So even an explicit clear did not guarantee the “correct” input was used.
The result is a service that reads well in code, but will not always act as you expect it to. The service at times will seemingly ignore all input and pick a source it thinks is better. Take the following code as an example
pCache->SetSource(pSomeFile);
ParseTree* pTree = pCache->GetTree();
This code is very straight forward but is certainly not guaranteed to do what it appears to do.
I like to think the service is predicting the results rather than calculating them. Or better yet guessing the answer. From the perspective of a code reviewer, that’s what’s happening.
Obviously I was curious about the reason for this and did a bit of research. It’s a rather old class so I had to contact people who’d been on the team awhile back and dig through the history of the code base to understand what the purpose of this behavior was. It turns out it was done to fix a few impactful scenarios where an alternate source needed precedence over the typical source. Other devs didn’t fully understand the source semantics of the service and wrote methods that caused bad interactions. Eventually it evolved to it’s current odd state.
Thanks to Dustin for coining the term “psychic classes”. Other ones we considered were
- Jedi Mind Trick classes: Weak name
- Weatherman: It’s a prediction after all
And yes, we fixed this issue :)
I was playing around in the registry the other day and found the PowerShell API lacking in a key area. There does not appear to be a good way to detect the presence of a Registry Name/Value pair. All of the operations such as New, Delete, Rename are based off of the program knowing the presence or absence of the key before hand. This lacks symmetry with the rest of the APIs which have a test style function.
PS> test-path "some\file\path\data.txt"
True
I took a few minutes and sketched out a basic Test-ItempProperty function. It utilizes the Get-ItemProperty function and suppresses the rather loud red font error message via the –ErrorAction parameter (standard on all CmdLets).
function Test-ItemProperty() {
param ( [string]$path = $(throw "Need a path"),
[string]$name = $(throw "Need a name") )
$temp = $null
$temp = Get-ItemProperty -path $path -name $name -errorAction SilentlyContinue
return $temp -ne $null
}
Now I can finish up my happy scripting for the night
PS> test-path . IsItScriptingTime
True
Today Microsoft announced a new search engine called bing. I’ve been dogfooding this engine internally for some time now and from my experience it’s a step up from the previous engine in both functionality and appearance. It’s definitely worth playing around with for awhile.
Announcement: http://www.microsoft.com/presspass/press/2009/may09/05-28NewSearchPR.mspx
While the functionality and accuracy of the results are certainly improved, this is not the biggest selling point for me. I find Live Search’s results to be very good and noticeably improving over time. I use it at home and work and rarely find a need to switch to google.com for anything. But functionality isn’t everything.
For me the biggest selling point is the new name. Having a search engine with a 2 word name consisting of existing words prevented it from beating google in a key aspect of my life: the vernacular. Google works great as a verb and hence allows itself to be injected into the conversation. For instance “Can you google it real quick?”
What to do with live search though? How can I express that I’m using live search without being too wordy? “Can you live search it” just doesn’t have the same ring to it. For awhile I shortened the name to learching (Live sEARCHING). While that works, it doesn’t quite have a good ring to it.
Bing on the other hand works great
Can you bing it?
You can view the engine at www.bing.com and the team is also twittering at @bing.
When using an API you must take care to understand not only what it returns, but also for how long the data returned will be valid. This is very important to consider because programs must make either be making decisions on valid and predictable data or have appropriate fallback mechanisms for failure.
I often find bugs because people assumed certain APIs were giving back data about the present or future, but in fact the were giving back information about the past or much shorter present. Part of the problem stems from the fact that API names are usually written in the present tense regardless of what point in time the data returned is valid. For example “Exists” instead of “Existed” or “DidExist”. The language of the API lulls developers into a false sense of security and leads to programming errors.
I like to think of these values as a question of when they are valid. For instance, the data is, was or will be valid.
- Will – The data return is valid now and will be valid for the remainder of the program.
- Is – The data is valid now and will remain valid until the program or thread takes some action to alter the underlying data source.
- Was – The data was valid at some point in the past but by the time the program receives the result it can no longer be relied upon to be valid.
Working with APIs in the “was” category are the trickiest. Because the data we are working with is volatile we must approach them in a different way in order to provide reliable programs. Most algorithms work by validating the present or future state of the system and then executing code with the expectation of success. When working with a “was” API, there is no way validate the state of system because it cannot provide reliable information about its present or future state. Instead we must execute the algorithm with the expectation of failure and spend our time considering how to account for these failures appropriately. Failure must be the expected case, and not the exception.
Working with “will” and “is” APIs is a bit easier. Each has a level of predictability and it’s possible to use program state to make reliable decisions and hence produce reliable results.
Lets add some substance to this conversation by considering these types of APIs and the ContainsKey method on various hashtable style collections. This API is often written in the present tense regardless of how long the data is actually valid.
A “will” API is the safest to work with because the data will always be valid. This API cannot be influenced by side effects. Hence the the programmer is free to think only about the algorithm at hand. These APIs are rare and are typically associated with purely immutable data structures. This is the main type of structure for which data does not ever change. For instance consider this set of calls
var name = "bob";
ImmutableMap<string, Student> map = GetSomeImmutableMap();
if (map.ContainsKey(name)) {
SomeOtherCall();
var student = map[name];
}
This call to the indexer will succeed no matter what. The structure cannot be changed irrespective of what actually happens in SomeOtherCall. The ContainsKey method already asserted the key was present in the hashtable. Because the structure is immutable this assertion will be valid for the remainder of the program and hence the indexer must succeed.
An “is” API is the most common type of API. It is generally associated with mutable structures completely within the control of the program / thread. The data will remain valid until it is explicitly, or much worse implicitly, invalidated by the user. In a well factored program it is possible to reason about these types of APIs but hidden side effects can get you into trouble. For instance consider the following code in the context of a single thread.
var name = "bob";
Dictionary<string, Student> map = GetSomeDictionary();
if (map.ContainsKey(name)) {
Console.WriteLine(map[name]);
SomeOtherCall();
Console.WriteLine(map[name]);
}
The state of a Dictionary<TKey,TValue> structure is completely within the control of the program. As such the first call to Console.WriteLine will work fine. There is nothing which can alter the state of map between the call to ContainsKey and the actual accessing of the map in the first WriteLine call so the assertion is still valid.
However without knowing the hidden side effects of the SomeOtherCall API we cannot know the next call to WriteLine will succeed. It’s possible for SomeOtherCall to access the underlying Dictionary reference and Clear it thus making the next call fail. True a programmer must investigate this before writing the above code. But consider the opposite problem. I am the maintainer of SomeOtherCall and I get a bug assigned to me which necessitates clearing out that Dictionary. I must now consider everyone who calls me, directly or indirectly and consider if any of them have an implicit dependency on the state of the underlying Dictionary object. This can get very difficult in a large program.
As previously stated, a “was” API is the most dangerous type of API. They never give back data for which you can make a reliable decision about. These APIs are associated with a data source that is not completely in the control of the current thread or program. Thus it’s possible for the data source to be altered at any point in the execution of the program / thread. Threading is a prime example of this problem.
For an example, consider SynchronizedDictionary to be an implementation of Dictionary<TKey,TValue> which uses locks internally to prevent data corruption from reads / writes on multiple threads but provides no other synchronization capabilities .
var name = "bob";
SynchronizedDictionary<string, Student> map = GetSomeSynchronizedDictionary();
if (map.ContainsKey(name)) {
var student = map[name];
}
In this example it’s possible for the call to map[name] to fail because another thread came through and cleared the collection in between the if statement and the call to map[name]. The bug here is that the program was written as if ContainsKey gave information about the present or future but in fact only gave information about the past. This actual return of the API could be made clearer by simply altering the name of the API. A much more applicable name is “DidContainKey” or “DidAndMayStillContainKey”. This name is much more representative of the data returned and will hopefully give a developer pause before writing code like the above.
My personal pet peeve in this category is the file system and in particular File.Exists. Notice how this name is written in the present tense indicating a successful return means the file is currently in existence. Lets ignore permissions for a minute[1] and consider only the files existence. Any program on the machine is at liberty to change the file system and can do so at any point in time. Even the user can affect the file system by doing some thing as simple as yanking out a USB thumb drive or stumbling over a network cable. Any of these operations can alter the state of the file system and hence invalidate the existence of a File with 0 action from your program. Hence File.Exists cannot ever reliably determine if a file exists, it can only determine that a file “did” exist at some point in the past and may exist at some point in the future. A more representative name would be File.DidExist, or File.DidAndMayStillExist.
So when designing your APIs, take care to consider the lifetime of the data when naming them.
[1] Once you consider permissions you find that even File.DidExist in not a representative name. It probably should be called File.DidExistAndHadSomeMeasureOfAccess. Yes, that’s a terrible name :(
People love to chat about how to conduct a C++ interview on newsgroups. Eventually these topics will shift into a discussion about what questions a candidate must know in order for them to get a hire from a particular interview. Unfortunately these questions tend to be items like the following.
- What happens when you delete NULL?
- Explain how exceptions affect constructor initialization.
- Which constructors are called for the following: SomeType a = SomeType(b);
Sure these questions are valid parts of the language and knowing them is a plus and demonstrates a deal of mastery over the language. But not knowing them should by no means by a deal breaker. The goal of an interview is to spot good developers who will become a contributing member of the team. Good developers are motivated and passionate problem solvers. These questions are testing little more than memorization. It’s easy to memorize rules after the fact. It’s much more difficult to teach problem solving skills.
The problem with questions like the above is that a developer could be both passionate and competent in C++ and never have come across these rules. C++ is an unbelievably huge language and can operate in many different ways where these type of situations simply don’t come up. For instance, I’ve known several very good C++ developers that spent the majority of their time working with COM. COM prefers HRESULTs to Exceptions and hence they never used exceptions in C++. Once we introduced exceptions into the code base, it took about 30 minutes to explain the rules and we were done.
C++ interviews should focus on the qualities that make a good developer: problem solving and passion. Of course a language based interview should certainly focus on the foundations of C/C+. Items like pointers, stack vs. heap and the basics of memory management. But don’t hinge the deal on little items that can be quickly taught. Otherwise you’ll end up turning away good developers.
One argument I commonly hear against immutable collections is they are slow. I’ve held the opposite belief for some time but shamefully had yet to look at actual numbers on the CLR. Tonight I decided to change that by benchmarking one of my immutable collections against a mutable collection in the BCL. The goal is not to decide the overall best collection but instead to get a sense of how they perform relative to each other in certain scenarios.
For the immutable version, I chose to use ImmutableCollection. This class is a slight variation of Eric Lippert’s Double Ended Queue implementation. The core algorithm is the same but the style was changed to be more inline with other immutable collections I own. For the mutable class I chose to use the ever popular List<T> collection.
I chose to examine the following scenarios that I commonly use with collection style classes.
- Adding to the end of the collection
- Adding to the front of the collection
- Removing from the end of the collection
- Removing from the beginning of the collection
Each scenario was run against collections of 100, 1000 and 10000 elements. For each count, the run was executed 1000 times and the total and average time was calculated. The full code for the benchmark is available at the end of this post.
Looking at the data
Now before I get into the results, please assume the usual caveats that come with any benchmark. That is, approach it with a skeptical eye. These scenarios are obviously not something I do exactly in everyday programming (especially removing thousands of elements from the front of List<T>). However they are representative of general operations that I do use. Also I find it interesting to see how the collections perform relative to each other in extreme scenarios.
Most of the results were unsurprising. Remove from end is a very simple operation on a List<T>. It comes down to a bounds check, decrementing an index and updating a couple of internal state variables. Removing the end of an immutable collection requires considerable updating of the internal structure. It ends up being roughly 1 order of magnitude slower. Adding to the end has similar implementation and numbers.
Operations on the front of the list were significantly slower on List<T> than ImmutableCollection for suitably large collections. This is unsurprising given that removal and insertion at the front of a List<T> requires all of the other elements in the underlying array to be shifted up or down. This is a non-trivial operation and is evident in the benchmark.
The most interesting item however, is to look at how each collection scales. In almost all scenarios, ImmutableCollection scaled very closely to the size of the input. That is, an order of magnitude more input resulted in an order of magnitude of more time. List<T> does not have that behavior for all scenarios. Scenarios dealing with the front of the collection saw time rises faster relative to input size. In fact there is a very dramatic jump in both front scenarios between 1000 and 1000 elements. Each case resulted in roughly 2 orders of magnitude more time.
Conclusion
Winner of each category …
- Add to End: List<T>
- Add to Front: ImmutableCollection<T>
- Remove from End: List<T>
- Remove from Front: ImmutableCollection<T>
No single benchmark is definitive and this one won’t change that. This benchmark says nothing about the general use of the two classes. However it can provide some insight into these specific scenarios. It also serves as some level of proof that immutable collections can have acceptable performance for these scenarios.
Data
Add to End 100 Elements
List: Total: 00:00:00.0060473 Average: 00:00:00.0000060
Immutable: Total: 00:00:00.0267079 Average: 00:00:00.0000267
Add to End 1000 Elements
List: Total: 00:00:00.0337505 Average: 00:00:00.0000337
Immutable: Total: 00:00:00.2240444 Average: 00:00:00.0002240
Add to End 10000 Elements
List: Total: 00:00:00.4266014 Average: 00:00:00.0004266
Immutable: Total: 00:00:02.6715789 Average: 00:00:00.0026715
Add to Front 100 Elements
List: Total: 00:00:00.0162186 Average: 00:00:00.0000162
Immutable: Total: 00:00:00.0213764 Average: 00:00:00.0000213
Add to Front 1000 Elements
List: Total: 00:00:00.4028523 Average: 00:00:00.0004028
Immutable: Total: 00:00:00.2055935 Average: 00:00:00.0002055
Add to Front 10000 Elements
List: Total: 00:00:38.5943722 Average: 00:00:00.0385943
Immutable: Total: 00:00:02.6212590 Average: 00:00:00.0026212
Remove From End 100 Elements
List: Total: 00:00:00.0031299 Average: 00:00:00.0000031
Immutable: Total: 00:00:00.0213737 Average: 00:00:00.0000213
Remove From End 1000 Elements
List: Total: 00:00:00.0187998 Average: 00:00:00.0000187
Immutable: Total: 00:00:00.1623739 Average: 00:00:00.0001623
Remove From End 10000 Elements
List: Total: 00:00:00.1773381 Average: 00:00:00.0001773
Immutable: Total: 00:00:01.9615781 Average: 00:00:00.0019615
Remove From Front 100 Elements
List: Total: 00:00:00.0142981 Average: 00:00:00.0000142
Immutable: Total: 00:00:00.0192679 Average: 00:00:00.0000192
Remove From Front 1000 Elements
List: Total: 00:00:00.4407993 Average: 00:00:00.0004407
Immutable: Total: 00:00:00.1879243 Average: 00:00:00.0001879
Remove From Front 10000 Elements
List: Total: 00:00:39.7832085 Average: 00:00:00.0397832
Immutable: Total: 00:00:02.2451406 Average: 00:00:00.0022451
The Code
public class Program {
public static void ImmutableCollectionAddToEnd(List<string> list) {
var col = ImmutableCollection<string>.Empty;
foreach (var item in list) {
col = col.AddBack(item);
}
}
public static void ListAddToEnd(List<string> list) {
var col = new List<string>();
foreach (var item in list) {
col.Add(item);
}
}
public static void RunAddToEnd(int count) {
var list = Enumerable.Range(0, count).Select(x => x.ToString()).ToList();
Console.WriteLine("Add to End {0} Elements", count);
RunScenario("List", ListAddToEnd, () => list);
RunScenario("Immutable", ImmutableCollectionAddToEnd, () => list);
}
public static void ImmutableCollectionAddToFront(List<string> list) {
var col = ImmutableCollection<string>.Empty;
foreach (var item in list) {
col = col.AddFront(item);
}
}
public static void ListAddToFront(List<string> list) {
var col = new List<string>();
foreach (var item in list) {
col.Insert(0, item);
}
}
public static void RunAddToFront(int count) {
var list = Enumerable.Range(0, count).Select(x => x.ToString()).ToList();
Console.WriteLine("Add to Front {0} Elements", count);
RunScenario("List", ListAddToFront, () => list);
RunScenario("Immutable", ImmutableCollectionAddToFront, () => list);
}
public static void ImmutableCollectionRemoveFromEnd(ImmutableCollection<string> col) {
while (!col.IsEmpty) {
col = col.RemoveBack();
}
}
public static void ListRemoveFromEnd(List<string> list) {
while (list.Count > 0) {
list.RemoveAt(list.Count - 1);
}
}
public static void RunRemoveFromEnd(int count) {
Func<List<string>> listInputFunc = () => Enumerable.Range(0, count).Select(x => x.ToString()).ToList();
Func<ImmutableCollection<string>> colInputFunc = () => ImmutableCollection.Create(listInputFunc());
Console.WriteLine("Remove From End {0} Elements", count);
RunScenario("List", ListRemoveFromEnd, listInputFunc);
RunScenario("Immutable", ImmutableCollectionRemoveFromEnd, colInputFunc);
}
public static void ImmutableCollectionRemoveFromFront(ImmutableCollection<string> col) {
while (!col.IsEmpty) {
col = col.RemoveFront();
}
}
public static void ListRemoveFromFront(List<string> col) {
while (col.Count > 0) {
col.RemoveAt(0);
}
}
public static void RunRemoveFromFront(int count) {
Func<List<string>> listInputFunc = () => Enumerable.Range(0, count).Select(x => x.ToString()).ToList();
Func<ImmutableCollection<string>> colInputFunc = () => ImmutableCollection.Create(listInputFunc());
Console.WriteLine("Remove From Front {0} Elements", count);
RunScenario("List", ListRemoveFromFront, listInputFunc);
RunScenario("Immutable", ImmutableCollectionRemoveFromFront, colInputFunc);
}
public static void RunScenario<T>(string description, Action<T> del, Func<T> getInputFunc) {
// Run once to jit
del(getInputFunc());
const int times = 1000;
var total = new TimeSpan();
for (var i = 0; i < times; i++) {
// get the input outside the timer so input creation is not calculated
var input = getInputFunc();
var watch = new Stopwatch();
watch.Start();
del(input);
watch.Stop();
total += watch.Elapsed;
}
var average = TimeSpan.FromTicks(total.Ticks / times);
Console.WriteLine("{0,11}: Total: {1} Average: {2}", description, total, average);
}
static void Main(string[] args) {
var list = new int[] { 100, 1000, 10000 };
list.ForEach(RunAddToEnd);
list.ForEach(RunAddToFront);
list.ForEach(RunRemoveFromEnd);
list.ForEach(RunRemoveFromFront);
}
}
I’ve recently run across several APIs that have a dependency on only dealing with objects that are serializable (in the binary sense). Unfortunately determining if an object is serializable is a non-trivial task and rife with problems. These problems have a direct impact on the types of guarantees these APIs can make.
For all objects which are serializable, it’s only possible to prove that a very small subset of them actually are in code. Easier but less reliable tests are very easy to write. So APIs must make a trade off. Only accept instances of types which are provable serializable and miss out on a while class of objects. Or do a much less reliable check and open themselves up to failure further down in the algorithm.
Take System.Exception for example. It is possible to associate arbitrary data with an exception through the Data property. Associated just any object with Exception is problematic though because Exceptions should be serializable. In order for an Exception instance to store these objects and remain serializable, the objects must also be serializable. Since serializability is not provable, the authors of Exception had to make a trade off between an overly restrictive test, or a loose test. They chose the latter. As a result it’s impossible to determine before hand if a given Exception instance is actually serializable.
Why is this the case though that serialization is tough to determine? Lets start with what it takes to make a type serializable. There are two separate components
- Declaring that the type is Serializable by either having the SerializableAttribute on the class definition or by implementing ISerializable
- Making the type conform to the rules of serialization.
These are completely separate actions. It’s possible to have types which do any combination of the above but not both. Take for instance the following type declarations
[Serializable]
class DeclaredOnly {
private ConformsOnly m_conforms;
}
class ConformsOnly {
private string m_name;
}
Both of these types are legal C# code and both represent one of the two extremes listed above. Yet neither of these types are actually serializable. ConformsOnly is not because it has not actually declared itself to be serializable. DeclaredOnly is not because one of it’s members is not serializable.
Lets look at proving serialization by ensuring types follow both of the rules. Proving the first part of serialization is pretty straight forward. Simply check to see if a type implements ISerializable or is decorated with the Serialization attribute. The latter is directly supported in the type system via Type.IsSerializable. This property is also the source of the most common mistake I see with respect to determining if an object is serializable. Take the following code snippet for an example.
public static voidExample1(object o) {
if(o.GetType().IsSerializable) {
// Do something different
}
}
On the surface, this looks like reasonable code. But as we just pointed out, the property IsSerializable just determines the presence or absence of the Serializable attribute but nothing about the second part. A more descriptive attribute name would be IsSerializableAttributeDeclared. Yet many pieces of code attempt to equate this property with the ability to be serialized (A fun experiment here is to search for it’s use in Reflector)
Proving the second part involves two cases, types implementing ISerializable and types decorated with the Serializable attribute. Lets start with the attribute. Proving these is involved but a straight forward process. The type must …
- Be decorated with the Serializable Attribute
- One of the following items must be true for every field at all points in the hierarchy
- It must be decorated with the NonSerializedAttribute
- The type of the field must be sealed and must conform to all of these rules
Instances of types which meet these guidelines will always be serializable. Not meeting these rules though does not exclude a type from serialization. There are several sets of types decorated with Serializable which are serializable and do not meet these rules.
Take for instance types that violate rule 2.2. By having a field whose type is not sealed, it is possible to construct a runtime instance which contains a value whose type is not serializable. The following type fits into this category.
[Serializable]
class OnlyKnownPerInstance {
private object m_field1;
}
Whether or not an instance of this type is serializable depends on the value of m_field1. So the only way to prove it is serializable is to look at the runtime information. This makes any definitive analysis on the type impossible. The actual object must be inspected.
The other case to examine are types implementing ISerializable. Serialization is a custom task for instances of these types and is done in imperative code. Proving these types are serializable involves actual algorithm inspection and is beyond the scope of this blog post. But suffice to say, proving these are serializable is an order of magnitude more difficult.
Getting back to the crux of this article. What is the best way to determine if an object is serializable or not? Bottom line, there is no good way. The only 100% definitive way is to serialize the object and see if it succeeds or not. This is problematic because it is not future proof. It only tells you that the object was serializable. This is a very important distinction. It’s possible for the object to be mutated in a different state later on which will prevent it from being properly serializable.
If serialization of a parameter is very important to the semantics of an API this is the only way to ensure the semantics are not violated is to serialize the object immediately and store the binary data. Otherwise you can only make a loose guarantee that an attempt to serialize in the future will succeed.
Recently I ran into a situation on a personal project where I needed a hashtable like structure for a set of WeakReference values. When poking around for an existing implementation I saw found several versions which were very thin, type safe wrapper around a Dictionary<TKey,WeakReference> (usually the class even implements IDictionary<TKey,TValue> directly). While this produces a type safe API it fails to take into account the different nature of a WeakReference. Because a WeakReference is constantly being collected without explicit user action it alters the types of operations that be performed on them. Failing to take this into account produced APIs which lead users to write incorrect code.
Finding no suitable implementation I set off to build my own. It took several iterations and I thought the result and process would be fun to share the design experience as a blog post.
Let start with the basics. At a high level the API should appear to the user as a type safe Dictionary<TKey,TValue>. Under the hood all values will be stored in an instance of WeakReference in order to enable collection. But this is an implementation detail and should not be visible to the user. The user should only see type safe keys and values.
A standard Hashtable works on the concept of a key/value pair. A value is associated with a particular key in the table and at any time, the value can be retrieved from the hashtable with the specified key. A key can be determined to be valid simply by ascertaining it’s presence in the underlying table. The value is irrelevant, it’s mere presence makes it valid.
A weak hashtable will work on the same concept but have a much different implementation. Keys are only valid if they point to an actual value. Since the value in the hashtable is a WeakReference the mere presence of the key does not determine it’s validity. Only the presence of the key and the value contained within the weak reference determines the validity of a key.
This seems like an obvious assumption but it has a dramatic impact on the type of API a weak hashtable can have. Lets take a simple property such as Count for an example of why this is important. Count on a hashtable is used to determine the number of valid key/value pairs in the table. On a normal hashtable, this count is simply incremented and decremented with the corresponding Add and Remove API’s. With a weak hashtable, any given run of the garbage collector can affect the count of key/value pairs by collecting a value. This means a simple counter cannot be used to keep track of the valid key/value pairs.
In order to get the actual Count every singe value must be accessed an verified that it is still alive. What’s even worse is that once a value is determined to exist, it must be stored for the duration of the Count method. Otherwise a GC could occur in the middle of the loop and collect Values that were marked as still alive.
This is what Count would need to look like …
public class WeakHashtable<TKey,TValue> {
private Dictionary<TKey, WeakReference> _map;
public int Count {
get {
var list = _map.Values
.Select(x => x.Target)
.Where(x => x != null)
.ToList();
return list.Count;
}
}
}
Count transformed from a simple O(1) return of an internal counter to a O(N) method which allocates memory. Worse yet, the return value is practically useless. As soon as the value is returned it cannot be considered to be valid. A GC could kick in and invalidate half the table. Count would in fact be giving the user information about the object in the past.
In some ways, this problem is similar to issues encountered with multi-threaded applications. In between every line of your code there is another operation, in this case the GC, which can alter the state of your structure.
The only API’s that can ask questions about a value and still have a reasonable use by the user must return the value with the call. Returning the value will, at least temporarily, provide a GC root and prevent the object from being collected. It gives the user a chance to use the value before it’s taken out from under them.
A good API comparison here are operations such as Contains and TryGetValue. Contains holds no value to the user because as soon as the call returns the GC can collect the value. TryGetValue on the other hand returns the value in question thus locking it in memory and preventing a collection.
When designing the API for a weak hashtable I tried to keep it simple and stick to these ideas. I started with the IDictionary<TKey,TValue> interface and removed the methods which hold no value for the end user due to GC limitations. In the end I was left with only the following.
- void Add(TKey key, TValue value)
- bool Remove(TKey key)
- void Clear()
- bool TryGetValue(TKey key, out TValue value)
- List<TValue> Values { get; }
I also added the following methods
- List<Tuple<TKey,TValue>> Pairs { get; }
- Put(TKey key, TValue value)
- Option<TValue> TryGetValue(TKey key);
The Values property returns a List<TValue> implementation instead of IEnumerable<TValue> In order to guarantee the values remain in existence they must be rooted in some structure. The easy choice is a List<TValue>. Since a List<TValue> must be created anyways, why return a less accessible interface such as IEnumerable<TValue>?
At first I did consider a design where Values returned IEnumerable. It is fairly simple to implement with a C# iterator.
public IEnumerable<TValue> Values {
get {
foreach (var weakRef in _map.Values) {
var obj = weakRef.Target;
if (obj != null) {
yield return (TValue)obj;
}
}
}
}
The problem though is that anything more than a simple .ForEach() over the IEnumerable may behave unexpectedly. Consecutive calls to GetEnumerator can produce different enumerations with no explicit user alteration of the table. I’ve seen several APIs which (rightly or wrongly) make this assumption. Given the user is not explicitly modifying the collection, it is not a necessarily bad assumption to make. However it would not work for a collection of this type.
I intentionally left off Keys here. Keys are only valid when they have a live value in the table. Unless the Value is returned with Key this cannot be guaranteed. The Pairs property serves this role.
This post went a bit longer than I originally intended. I also wanted to discuss how compaction of the table should work in a weak hashtable. I’ll save that for next time.
Here is the implementation of the dictionary without any compaction support.
public sealed class WeakDictionary<TKey, TValue> {
private Dictionary<TKey, WeakReference> m_map;
public List<Tuple<TKey, TValue>> Pairs {
get {
return m_map
.Select(p => Tuple.Create(p.Key, p.Value.Target))
.Where(t => t.Second != null)
.Select(t => Tuple.Create(t.First, (TValue)t.Second))
.ToList();
}
}
public List<TValue> Values {
get { return Pairs.Select(x => x.Second).ToList(); }
}
public WeakDictionary()
: this(EqualityComparer<TKey>.Default) {
}
public WeakDictionary(IEqualityComparer<TKey> comparer) {
m_map = new Dictionary<TKey, WeakReference>(comparer);
}
public void Add(TKey key, TValue value) {
m_map.Add(key, new WeakReference(value));
}
public void Put(TKey key, TValue value) {
m_map[key] = new WeakReference(value);
}
public bool Remove(TKey key) {
return m_map.Remove(key);
}
public Option<TValue> TryGetValue(TKey key) {
WeakReference weakRef;
if (!m_map.TryGetValue(key, out weakRef)) {
return Option.Empty;
}
var target = weakRef.Target;
if (target == null) {
return Option.Empty;
}
return (TValue)target;
}
public bool TryGetValue(TKey key, out TValue value) {
var option = TryGetValue(key);
value = option.ValueOrDefault;
return option.HasValue;
}
}
In my last post we discussed the problems with designing a safer API for mutable thread safe collections that employ only an internal locking system. The result was an API that was more difficult to mess up, yet pretty much unusable. Lets take a look at this problem and see if we can come up with a usable API that still helps to eliminate mistakes.
One of the main issues I have with mutable thread safe collections is the use of decision procedures such as Count and Contains. Procedures such these only return information that pertains to the collection as it existed at a previous point in time. It can provide no relevant information to the collection in it’s current state and only encourages the user to write bad code. For example.
if (col.Count > 0) {
// Collection can be modified before this next line executes leading to
// an error condition
var first = col[0];
}
Therefore they have no place on a mutable thread safe collection. Yet, once you take away these procedures, you’re left with a collection that is virtually useless. It can only have a minimal API by which to access data. Here is the last example we were left with
public sealed class ThreadSafeList<T> {
public void Add(T value) { ... }
public bool TryRemove(T value) { ... }
public bool TryGet(int index, out T value) { ... }
}
This is hardly a usable API. What’s worse, as wekempf point out, is that I inadvertently exposed a decision procedure in this API. It’s possible to infer state about a lower or equal index by a successful return result from TryGet(). For example, a user may say that “if I can access element 2, then surely element 1 must exist”. The result would still be evident in code (ignoring the return value of a TryGet method should be a red flag). But a better choice for this method would have been a TryGetFirst().
At the end of the day, users are going to want some level of determinism out of their collections. It’s possible to program against API’s like the above, but most people won’t do it. In order to be more used, the collection must be able to reliably implement procedures such as Count and Contains and allow the user to use the return to reason about the state of the collection.
One way to do this is to simply exposed the internal lock to the consumer of the collection. Consumers can take the lock and then query to their hearts content. Lets do a quick modification of the original sample to allow for this.
public sealed class ThreadSafeList<T> {
private List<T> m_list = new List<T>();
private object m_lock = new object();
public object SyncLock { get { return m_lock; } }
public void Add(T value) {
lock (m_lock) {
m_list.Add(value);
}
}
public void Remove(T value) {
lock (m_lock) {
m_list.Remove(value);
}
}
public bool Contains(T value) {
lock (m_lock) {
return m_list.Contains(value);
}
}
public int Count { get { lock (m_lock) { return m_list.Count; } } }
public T this[int index] {
get { lock (m_lock) { return m_list[index]; } }
set { lock (m_lock) { m_list[index] = value; } }
}
}
Now we can go back to the original sample code and write a version which can use the decision procedures safely.
lock (col.SyncLock) {
if (col.Count > 0) {
var first = col[0];
...
}
}
This code will function correctly. But the API leaves a lot to be desired. In particular …
- It provides no guidance to the user as to which procedures must be accessed with the SyncLock object locked. They can just as easily write the original bad sample code.
-
All procedures used within the lock reacquire the lock recursively which is definitely not advisable. We could provide properties which do not acquire the lock such as CountNoLock that work around this problem. While ok in small doses, it's just a matter of time before you see this snippet in the middle of a huge mostly undocumented function
// Lock should be held at this point
int count = col.CountNoLock;
This code makes my eyes bleed
- The API provides 0 information to the user on exactly what the rules are for this lock. It would be left as an artifact in documentation (which you simply cannot count on users reading).
- There is really nothing telling the user that they ever have to unlock the collection. Surely, any user entering into the world of threading should know this but if they do a Monitor.Enter call without a corresponding Monitor.Exit, they will receive no indication this is a bad idea.
- Overall this collection requires a lot of new knowledge about the collection to use
This design though is exactly how a “synchronized” collection in 1.0 version of the BCL worked. This code is essentially what you would get by passing an ArrayList instance to ArrayList.Synchronized (and most other BCL 1.0 collections). It was problematic enough that all of the new collections in 2.0 did not implement this feature. Here’s the BCL team’s explanation on this http://blogs.msdn.com/bclteam/archive/2005/03/15/396399.aspx
Overall this design poses several problems because it exposes internal implementation details directly to the consumer. An improved design should seek to hide the lock from direct access. What we really want is a way to not even provide API’s like Count and Contains unless the object is already in a locked state. This prevents them from being used at all in an incorrect scenario.
Lets run with this idea to design a more usable thread safe queue. First we’ll divide the interface for a queue into two parts.
- All procedures that have 0 reliance on the internal state of the collection. Namely Enqueue, and Clear. No state is required to use these methods
- All procedures that rely on the internal state of the collection to function correctly.
The ThreadSafeQueue class will contain all of the methods in category #1. It will also provide a method which returns an instance of an interface which has all of the methods in category #2.
public interface ILockedQueue<T> : IDisposable{
int Count { get; }
bool Contains(T value);
T Dequeue();
}
The implementation of this interface object will acquire the internal lock of the original ThreadSafeQueue during construction and hold it for the duration if it’s lifetime. This effectively freezes the queue allowing for decision procedures to be used reliably. Implementing IDisposable and releasing the lock in the Dispose method provides a measure of lifetime management.
The rest of the code sample is below.
public sealed class ThreadSafeQueue<T> {
#region LockedQueue
private sealed class LockedQueue : ILockedQueue<T> {
private ThreadSafeQueue<T> m_outer;
internal LockedQueue(ThreadSafeQueue<T> outer) {
m_outer = outer;
Monitor.Enter(m_outer.m_lock);
}
#region ILockedQueue<T> Members
public int Count {
get { return m_outer.m_queue.Count; }
}
public bool Contains(T value) {
return m_outer.m_queue.Contains(value);
}
public T Dequeue() {
return m_outer.m_queue.Dequeue();
}
#endregion
#region IDisposable Members
public void Dispose() {
Dispose(true);
GC.SuppressFinalize(this);
}
private void Dispose(bool disposing) {
Debug.Assert(disposing, "ILockedQueue implementations must be explicitly disposed");
if (disposing) {
Monitor.Exit(m_outer.m_lock);
}
}
~LockedQueue() {
Dispose(false);
}
#endregion
}
#endregion
private Queue<T> m_queue = new Queue<T>();
private object m_lock = new object();
public ThreadSafeQueue() { }
public void Enqueue(T value) {
lock (m_lock) {
m_queue.Enqueue(value);
}
}
public void Clear() {
lock (m_lock) {
m_queue.Clear();
}
}
public ILockedQueue<T> Lock() {
return new LockedQueue(this);
}
}
This design now cleanly separates out the two modes by which the collection can be asked. It completely hides the explicit synchronization aspects from the users and replaces it with design patterns (such as IDisposable) that they are likely already familiar with. Now our original bad sample can be rewritten as follows.
static void Example1(ThreadSafeQueue<int> queue) {
using (var locked = queue.Lock()) {
if (locked.Count > 0) {
var first = locked.Dequeue();
}
}
}
No explicit synchronization code is needed by the user. This design makes it much harder for the user to make incorrect assumptions or misuses of the collection. The “decision procedures” are simply not available unless the collection is in a locked state.
As with most thread safe designs, there are ways in which this code can be used incorrectly
-
Using an instance of ILockedQueue<T> after it’s been disposed. This though is already considered taboo though and we can rely on existing user knowledge to help alleviate this problem. Additionally, static analysis tools, such as FxCop, will flag this as an error.
With a bit more rigor this can also be prevented. Simply add a disposed flag and check it on entry into every method.
- It’s possible for the user to maintain values, such as Count, between calls to Lock and use it to make an incorrect assumption about the state of the list.
- If the user fails to dispose the ILockedQueue<T> instance it will be forever locked. Luckily FxCop will also flag this as an error since it’s an IDisposable. It’s not a foolproof mechanism though.
- There is nothing that explicitly says to the user “please only use ILockedQueue<T> for a very short time”. IDisposable conveys this message to a point but it’s certainly not perfect.
- The actual ILockedQueue<T> implementation is not thread safe. Ideally users won’t pass instances of IDisposable between threads but it is something to think about.
The good news is that two of these flaws (1 and 3) are issues with existing types and tools are already designed to weed them out. FxCop will catch common cases for both of them.
Also, many of these cases are considered bad code in the absence of a thread safe collection. This allows users to rely on existing knowledge instead of forcing them to learn new design patterns for mutable thread safe collections.
Overall I feel like this design is a real win over the other versions. It provides an API which helps to limit the mistakes a user can make with a mutable thread safe collection without requiring a huge deal of new patterns in order to use.
Writing a collection which is mutable, thread safe and usable is an extremely difficult process. At least that’s what you’ve likely been told all through your schooling. But then you get out on the web and see a multitude of thread safe lists, maps and queues. If it’s so hard, why are there so many examples?
The problem is there are several levels of thread safe collections. I find that when most people say thread safe collection what they really mean “a collection that will not be corrupted when modified and accessed from multiple threads”. Lets call this “data thread safe” for brevity. This type of collection is rather easy to build. Virtually any collection that is not thread safe can be made “data thread safe” by synchronizing access via a simple locking mechanism.
For Example, lets build a data thread safe List<T>.
public sealed class ThreadSafeList<T> : IEnumerable<T> {
private List<T> m_list = new List<T>();
private object m_lock = new object();
public void Add(T value) {
lock (m_lock) {
m_list.Add(value);
}
}
public void Remove(T value) {
lock (m_lock) {
m_list.Remove(value);
}
}
public bool Contains(T value) {
lock (m_lock) {
return m_list.Contains(value);
}
}
public int Count { get { lock (m_lock) { return m_list.Count; } } }
public T this[int index] {
get { lock (m_lock) { return m_list[index]; } }
set { lock (m_lock) { m_list[index] = value; } }
}
// IEnumerable<T> left out
}
And there you have it. The lock statement prevents concurrent access from multiple threads. So the actual m_list instance is only ever accessed by a single thread at a time which is all it’s designed to do. This means instances of ThreadSafeList<T> can be used from any thread without fear of corrupting the underlying data.
But if building a data thread safe list is so easy, why doesn’t Microsoft add these standard collections in the framework?
Answer: ThreadSafeList<T> is a virtually unusable class because the design leads you down the path to bad code.
The flaws in this design are not apparent until you examine how lists are commonly used. For example, take the following code which attempts to grab the first element out of the list if there is one.
static int GetFirstOrDefault(ThreadSafeList<int> list) {
if (list.Count > 0) {
return list[0];
}
return 0;
}
This code is a classic race condition. Consider the case where there is only one element in the list. If another thread removes that element in between the if statement and the return statement, the return statement will throw an exception because it’s trying to access an invalid index in the list. Even though ThreadSafeList<T> is data thread safe, there is nothing guaranteeing the validity of a return value of one call across the next call to the same object.
I refer to procedures like Count as decision procedures. They server only to allow you to make a decision about the underlying object. Decision procedures on a concurrent object are virtually useless. As soon as the decision is returned, you must assume the object has changed and hence you cannot use the result to take any action.
Decision procedures are one of the reasons why Immutable Collections are so attractive. They are both data thread safe and allow you to reason about there APIs. Immutable Collections don’t change ever. Hence it’s perfectly ok to have decision procedures on them because the result won’t get invalidated.
The fundamental issue with ThreadSafeList<T> is it is designed to act like a List<T>. Yet List<T> is not designed for concurrent access. When building a mutable concurrent collection, in addition to considering the validity of the data, you must consider how design the API to deal with the constantly changing nature of the collection.
When designing a concurrent collection you should follow different guidelines than for a normal collection class. For example.
- Don’t add an decision procedures. They lead users down the path to bad code.
- Methods which query the object can always fail and the API should reflect this.
Based on that, lets look at a refined design for ThreadSafeList<T>
public sealed class ThreadSafeList<T> {
private List<T> m_list = new List<T>();
private object m_lock = new object();
public void Add(T value) {
lock (m_lock) {
m_list.Add(value);
}
}
public bool TryRemove(T value) {
lock (m_lock) {
return m_list.Remove(value);
}
}
public bool TryGet(int index, out T value) {
lock ( m_lock ) {
if( index < m_list.Count ) {
value = m_list[index];
return true;
}
value = default(T);
return false;
}
}
}
Summary of the changes
- Both the Contains and Count procedures were removed because they were decision procedures
- Remove was converted to TryRemove to indicate it’s potential to fail
- The TryGet property was added and is reflective of the fragile nature of the method. Sure it’s possible for users to simply ignore the return value and plow on with the invalid value. However the API is not lulling the user into a false sense of security
- The collection no longer implements IEnumerable<T>. IEnumerable<T> is only valid when a collection is not changing under the hood. There is no way to easily make this guarantee with a collection built this way and hence it was removed.
- The indexers were removed as well. I’m a bit wishy washy in this particular point as there is nothing in the API which gives a user a false sense of security. But at the same time mutable concurrent collections are dangerous and should be treated with a heightened sense of respect so the indexers were removed.
This version of ThreadSafeList is more resilient, but not immune to, accidental user failure. The design tends to lead users on the path to better code.
But is it really usable? Would you use it in your application?
Previously I blogged about PowerShell’s lack of closure support within a script block. This presents a significant hurdle in developing a LINQ like DSL for powershell which I’ve been working on. Imagine the following syntax
$a = from it in $source where {$it -gt 5 }
This would be the rough equivalent of the following C# code
var a = from it in source where it > 5;
In C# this code works because the where clause “it > 5” is converted to a lambda expression under the hood. The variable it is captured in the lambda expression via a closure. In order to get similar functionality out of powershell, the value $it must be resolvable when the “where” scriptblock is executed.
Luckily Powershell is incredibly flexible. When a script block executes it will attempt to resolve any variables by looking through the various scopes. The first scope is that of the script block, and then the local scope of the code in which the script block is executed. Using new-variable, we can create variables which match the name the script block is looking for and simulate a closure.
PS) $sb = { write-host $it }
PS) & $sb
PS) new-variable "it" 42 -scope local
PS) & $sb
42
Success! Now all we need to do is generalize this behavior by creating a function: Run-Scriptblock. It takes two arguments
- The scriptblock to execute
- A list of name/value pairs. Each one represents a variable that must be available for the execution of the scriptblock
Code:
#============================================================================
# Runs a script block. The $list parameter must be a list of string, value
# combinations. The script block will be executed with variables of the
# specified name and value in scope
#============================================================================
function Run-Scriptblock() {
param ( [scriptblock] $sb = $(throw "Need a script block"),
[object[]]$list= $(throw "Please specify the list of names and values") )
for ( $i = 0; $i -lt $list.Length; $i = $i+2 ) {
$name = [string]($list[$i])
$value = $list[$i+1]
new-variable -name $name -value $value -scope "local"
}
& $sb
}
Example Usage:
PS) $sb = { write-host $it }
PS) run-scriptblock $sb "it",42
42
PS) $it
Now we have the method by which to execute a “where” clause. Next time we’ll look at actually defining a LINQ DSL in powershell.
I published a .Net utility library on Code Gallery today called BclExtras. It’s a set of classes meant to be used in addition to the standard .Net base class libraries (BCL). The main focuses of the library are functional programming, multi-threading, LINQ extensions, unit testing and API’s designed to support type inference.
This project evolved from various classes and constructs I was using in personal projects. For the last year or so I’ve kept as a separate tested library. It started out with a lot of multi-threaded code constructs but lately is leaning to a lot of functional style API’s and collections.
The library includes source and binaries for .Net 2.0 and .Net 3.5. The .Net 2.0 version of the library includes many constructs added in 3.5 that don’t rely on any 2.0SP1 CLR features. Examples are Func<T>, Action<T>, the Extension attribute and a subset of the LINQ Enumerable class. It allows for most LINQ expressions in a 2.0 targeted application. These types are removed in the 3.5 version to avoid conflicts with types in System.Core.
I’ve previously released this library under the name RantPack. It originally started out as personal utility library of mine and hence ended up with the somewhat obscure name. But, besides to me, RantPack doesn’t really convey a useful meaning. So I decided to give it a more meaningful name for the general population.
The Take pair of functions are very similar to the Skip functions. The Take expression does essentially the opposite of the Skip functions. Skip is useful for getting elements further down the pipeline. Take is used for getting elements from the start of the pipeline.
#============================================================================
# Take count elements fro the pipeline
#============================================================================
function Take-Count() {
param ( [int]$count = $(throw "Need a count") )
begin {
$total = 0;
}
process {
if ( $total -lt $count ) {
$_
}
$total += 1
}
}
#============================================================================
# Take elements from the pipeline while the predicate is true
#============================================================================
function Take-While() {
param ( [scriptblock]$pred = $(throw "Need a predicate") )
begin {
$take = $true
}
process {
if ( $take ) {
$take = & $pred $_
if ( $take ) {
$_
}
}
}
}
Example
PS) 1..10 | take-count 5
1
2
3
4
5
PS) 1..10 | take-while {$args[0] -lt 6}
1
2
3
4
5
PS)
CLR 2.0 introduced IEquatable<T> which is an interface that allows for type safe equality comparisons. Previously, the best available method for comparing equality was the virtual Object Equals method. The method is loosely typed since it takes an object as a parameter. This is easy enough to deal with on the client with a simple cast to the appropriate type.
class Student {
public override bool Equals(object obj) {
var other = obj as Student;
if (other == null) {
return false;
}
// rest of comparison
}
}
IEquatable<T> is a significant improvement over this pattern because it provides a strongly typed equals method. This protects both the caller and callee from passing incompatible object types. Additionally it avoids the overhead of boxing for value types.
These benefits are nice but if you implement IEquatable<T> you still must override Equals(object) and GetHashCode. Not doing so is wrong and will cause you pain down the road. I’ve explored this topic briefly in the past but wanted to expand on it a bit with some concrete examples.
Before we get into the technical details of why, lets look at this from an expectation point of view. Implementing IEquatable<T> is a statement that “this object knows what it means to be equal.” This in effect adds a contract to your class declaring that it knows how to be compared for equality. Your object should live up to these expectations in order to avoid confusing other programmers who aren’t intimately familiar with your class. Confusing programmers is rarely a good idea.
Issue #1: IEqualityComparer<T> requires GetHashCode()
Strongly typed collections such as Dictionary<TKey,TValue> and HashSet<T> must be able to compare objects for equality in order to function. Starting in 2.0, the BCL provides an interface by which object equality semantics can be performed: IEqualityComparer<T>. This class is used in many other places besides collections, but inspecting the collection classes is the easiest way to get a feel for it’s use.
Lets take a look at the definition of IEqualityComparer<T>
public interface IEqualityComparer<T> {
bool Equals(T x, T y);
int GetHashCode(T obj);
}
The default definition is an internal class in the BCL named GenericEqualityComparer<T>. The default implementation of IEqualityComparer<T> relies on IEquatable<T> for it’s implementation.
But if it uses IEquatable<T> for it’s implementation how can it possible implement GetHashCode()? Simple, it uses Object.GetHashCode(). This means an object must implement IEquatable<T> and GetHashCode() in order to function correctly in places where IEqualityComparer<T> is used.
But wait, I don’t actually implement IEqualityComparer<T> anywhere so I’m safe right? Unfortunately no. Very few people actually implement IEqualityComparer<T>. Instead they use EqualityComparere<T>.Default to access a given IEqualityComparer<T> for a given type T.
In fact, the standard pattern for methods which take an IEqualityComparer<T> is to have an overload that doesn’t and pass EqualityComparer<T>.Default to the one that does.
public static class Example {
public static IEnumerable<T> Distinct<T>(this IEnumerable<T> source) {
return Distinct(source, EqualityComparer<T>.Default);
}
public static IEnumerable<T> Distinct<T>(
this IEnumerable<T> source,
IEqualityComparer<T> comparer) {
// implementation
}
}
If your object implements IEquatable<T> this will eventually cause it to create an instance of GenericEqualityComparer<T> and hence a reliance on GetHashCode.
Issue #2: Non-Strongly typed collections and Frameworks don’t use IEquatable<T>
IEquatable<T> only provides equality comparisons in strongly typed scenarios. It is not convenient to access this interface in less strongly typed scenarios. Consider for instance the original 1.0 collection classes: ArrayList, Hashtable, etc … These are all object based collections and have no way in which to cast to IEquatable<T>. Instead these collections must rely on the Object based methods of Equality.
Without implementing Object.Equals and Object.GetHashCode your type will not actually do any sort of comparison. This will cause lots of incorrect behavior for programmers who expect the class to understand equality.
class Person : IEquatable<Person> {
public readonly string Name;
public Person(string name) {
Name = name;
}
public bool Equals(Person other) {
if (other == null) {
return false;
}
return StringComparer.Ordinal.Equals(Name, other.Name);
}
}
static void EqualityCheck() {
var p = new Person("Bob");
var list = new ArrayList();
list.Add(p);
Console.WriteLine(list.Contains(p)); // Prints: True
Console.WriteLine(list.Contains(new Person("Bob"))); // Prints: False
}
This goes against expectation. Both Person instances in this case are equal by definition of Person yet Contains fails. Implementing Object.Equals and Object.GetHashCode will remove this confusion.
The list of frameworks which still use loosely typed collections include WinForms, WPF, WebForms, etc … It’s almost inevitable that you will end up using a loosely typed collection in your project somewhere.
Isssue #3: Equality and hash codes are linked in the BCL
Rightly or wrongly, equality and hash codes are unbreakably linked in the BCL. If an object can be compared for equality it also must be able to produce a hashcode. This implicit contract exists many places throughout the framework.
As previously displayed, implementing Object.Equals() after implementing IEquatable<T> is straight forward and boiler plate code. Object.GetHashCode can be a bit trickier because there are many implicit contracts for GetHashCode. Often mutable objects cannot provide an efficient hashing mechanism. In that case just return 1. This will satisfy all of the implicit contracts around GetHashCode() and takes little time to do. Yes, it will cause a Dictionary to effectively be a linked list. But that’s a heck of a lot better than simply not working at all.
Next up in the PowerShell LINQ series is SkipWhile. This LINQ function takes an enumerable instance and a predicate. The function will skip the elements in the enumerable while the predicate is true. The argument to the predicate is the current value of the enumerable.
The LINQ version takes a predicate in the form of a Func<T,TResult>. The PowerShell equivalent of a delegate is a script block. Unlike a .Net delegate, there is no way to type the Skip-While function to accept a particular number or type of arguments. The contract with the caller will be implicit.
Other than the strict typing, the function will match the contract for the LINQ version of SkipWhile.
#============================================================================
# Skip while the condition is true
#============================================================================
function Skip-While() {
param ( [scriptblock]$pred = $(throw "Need a predicate") )
begin {
$skip = $true
}
process {
if ( $skip ) {
$skip = & $pred $_
}
if ( -not $skip ) {
$_
}
}
end {}
}
Example Usage:
PS) 1..10 | Skip-While { $args[0] -lt 6 }
6
7
8
9
10
PS)