The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspxLast time on FABULOUS ADVENTURES:
Eric Lippert: The second term is asymptotically insignificant compared to the first, so we'll throw it away, and declare that any sort on a list of n items has at least a worst case of O(n log n) comparisons.
Leoen-USTelligent Evolution Platform Developer Build (Build: 5.6.50428.7875)250% of what, exactly?http://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#460111Fri, 02 Sep 2005 21:16:17 GMT91d46819-8472-40ad-a661-2c78acb4018c:460111Fabulous Adventures In Coding
<br>I&amp;nbsp;just got a question this morning about how to take two collections of items and determine how...<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=460111" width="1" height="1">250% of what, exactly?http://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#459168Thu, 01 Sep 2005 20:23:50 GMT91d46819-8472-40ad-a661-2c78acb4018c:459168Fabulous Adventures In Coding
<br>I&amp;nbsp;just got a question this morning about how to take two collections of items and determine how...<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=459168" width="1" height="1">Dude, Where's My Asymptotic Order Analysis?http://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#132163Fri, 14 May 2004 22:12:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:132163Fabulous Adventures In Coding<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=132163" width="1" height="1">re: The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#132238Fri, 14 May 2004 21:18:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:132238Mike DimmickI've always thought the naming of ArrayList (and obviously the new List<>) was weird. They're not lists, they're arrays. For a while I thought the name of ArrayList implied a deque<> (from STL), but digging around with Reflector cleared it up: the nearest STL equivalent is a vector<>.
<br>
<br>I still think in C++ ;-)
<br>
<br>I assume that cache locality is a distinct factor in this kind of thing. If your behaviour is read-mostly, append new items rather than insert, then an array typically has better locality of reference than a linked list. Unless you do what MFC's collections do and allocate whole blocks of nodes in one go, keeping freed nodes on a free list (and even then you get a lot of indirection through pointers).<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=132238" width="1" height="1">re: The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#132176Fri, 14 May 2004 19:31:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:132176Eric LippertCorrect -- a double-linked list (or a single-linked list with a pointer to the end) is required if you want to do O(1) inserts on the end.
<br>
<br>In fact, the List<> implementation is not a linked list at all. It's a double-when-full array list. The amortized cost of insertions at the end of a DWF is also O(1). The down side is that the cost of insertions at the beginning is O(n).<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=132176" width="1" height="1">re: The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#132171Fri, 14 May 2004 19:22:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:132171Mike DimmickObviously Eric's example requires that your list contains a pointer to the _end_ of the list as well as the head. Most non-academic linked list code I've used does: it simply makes your linked list more flexible at the cost of one pointer. It's not quite a necessity for a doubly-linked list to have a pointer to the end, but it makes things more orthogonal (particularly if you use the pointer-to-pointer technique).
<br>
<br>I agree that if you only had a pointer to the head of the list, it would be O(n), because you have to find the end if you want to add to it. But if you knew that, you'd add the new item to the head of the list rather than the end: we're not guaranteeing which order the words with the same frequency appear in.<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=132171" width="1" height="1">re: The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#132080Fri, 14 May 2004 17:35:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:132080Eric LippertLet me run through a small example to see if I can convince you that the many-unique-words case is O(n). Consider a list of three words, A, B and C.
<br>
<br>In the first pass I produce this hash table:
<br>
<br>A -> 1
<br>B -> 1
<br>C -> 1
<br>
<br>You agree that each insert into this hash table is O(1) and hence building the hash table is O(n), right?
<br>
<br>In the second pass, I build the "second pass" hash table like this. It starts empty. I extract (A, 1) from the first hash table and look up "1" in the second hash table. I don't find it, so I insert
<br>
<br>1 -> < >
<br>
<br>That's an O(1) operation.
<br>
<br>Then I say "fetch me the list" and get back < >. That's an O(1) operation.
<br>
<br>Then I insert A into the list. That's an O(1) operation, and now I have
<br>
<br>1 -> < A >
<br>
<br>OK, we've made it through the loop once. The second time through I extract (B, 1) from the first hash table. I look up "1" in the second hash table, which is an O(1) operation. But I don't INSERT anything into the second hash table. I insert something into the LIST which I have allocated. I fetch a reference to the list and get back < A > , and I insert B onto the end of that list to produce <A, B>. I don't change the hash table at all, I change the contents of the thing that the hash table refers to. So now we have
<br>
<br>1 -> <A, B>
<br>
<br>And we've made it through the second loop doing only O(1) operations.
<br>
<br>Third time, same thing. Extract (C, 1), look up 1, get the list, insert C onto the end of the list.
<br>
<br>Does that make sense?
<br>
<br>
<br><div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=132080" width="1" height="1">re: The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#132060Fri, 14 May 2004 17:02:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:132060Eric LippertI'm sorry to have to continue to be so contradictory, Hogan.
<br>
<br>Yes, checking for the collision might walk a hash chain. (Using "list" to mean "hash chain" is confusing, particularly when I have a hash table of lists.)
<br>
<br>But as I said above, if the hash table is written using an extensible-hashing-with-chaining algorithm then the average chain length is bounded to some small number.
<br>
<br>In the JScript engine we use this algorithm in the name tables and thereby guarantee that the average number of comparisons is less than 2.5 no matter how big the table gets. Thus, its O(1) for collision detection, not O(n).
<br>
<br>Raymond's link is about a different topic -- how to put data in a hash table such that the data is chosen to deliberate produce bad performance. That's a separate topic that maybe I'll cover later -- writing a hash table to have good performance even when attacked is a hard problem. For my purposes, I'm assuming that the hash table has good performance on typical, non-hostile data.
<br><div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=132060" width="1" height="1">re: The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#132032Fri, 14 May 2004 16:36:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:132032Hogan Long"But nowhere do I check to see if the key is already in the list. If I did, yes, that would be expensive. But I don't need to because we can prove logically that there will NEVER be a collision on this list. "
<br>
<br>You do walk the list when you perform the secondPass.ContainsKey(count) -- that is how hash tables work... if there is a colision you walk a linked list of items with that hash. AND YES there will be collisions... all words that occure once will hash to the same location because the key is the number of times a word occures.
<br>
<br>For a good example of how this effects performance see Raymond Chen's link above.
<br>
<br>-----
<br>
<br>It is interesting that your last post was about typical performance for InStr, you have the same problem here... for the best input data this will run at O(N), for the typical case (eg lots of words that only appear once) it will run at O(N^2) and for almost all cases it will run at O(NlogN) or worse.
<br>
<br>
<br>
<br>
<br><div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=132032" width="1" height="1">re: The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#132001Fri, 14 May 2004 16:08:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:132001Eric Lippert> And, while theoretically O(n) is better than O(n log n), in practice O(n) might be SomethingBig * n
<br>
<br>Indeed, in practice you cannot disregard the constant factor. For example, when I was a foolish intern I suggested to Tim Paterson that we rewrite the Instr method to use a
<br>"better" string search algorithm. InStr uses an O(n m) algorithm where n and m are the lengths of the search string and the pattern. There exist O(n + m) algorithms, which are obviously better for large n and m.
<br>
<br>But the naive algorithm Instr uses is BLAZINGLY fast for common cases. Most of the work optimizes down to a single machine instruction. The algorithmically better methods do lots of expensive string preprocessing up front and end up being slower in typical cases.<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=132001" width="1" height="1">re: The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#131993Fri, 14 May 2004 16:03:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:131993Eric Lippert> Why care about performance for a one-off task?
<br>
<br>I don't. And clearly the time taken to fetch the 2193 web pages was way, way longer than the time taken to move the strings around in memory!
<br>
<br>However, two things:
<br>
<br>First, note that this is a VERY short algorithm that I put together very quickly. One of the aims of me writing this blog is to give scripters new techniques that they might not have been exposed to before. Lots of people do not understand the power of tables to make small, fast programs.
<br>
<br>Second, this is a somewhat contrived example that I'm using as an educational opportunity. There have been a number of threads on JOS and elsewhere lately on the value or lack thereof of a mathematical approach to computing. I'm trying to illustrate the intersection between theoretical computer science and practical down-n-dirty programming.
<br>
<br>That's why I care what the order is.<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=131993" width="1" height="1">re: The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#131989Fri, 14 May 2004 15:57:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:131989Eric Lippert> I missed the part where you do "ContainsKey" in less than O(log n). Can you explain
<br>
<br>Consider a hash table with chaining that rehashes whenever the average chain length exceeds, say, four. On average, such a hash table requires one hash and a maximum of four comparisons per lookup, no matter how big the table gets.
<br>
<br>As Mike points out above, rehashing the table might get expensive. But don't forget that the amortized cost of the rehashing is going to be spread over all the inserts, most of which will be very cheap. The cost of re-hashing is going to be O(n), but you only rehash about O(1/n) of the time, so it works out to O(1) per insert.<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=131989" width="1" height="1">re: The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#131983Fri, 14 May 2004 15:49:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:131983Eric Lippert> When you insert "is" it will take 4 steps to see if "is" exists
<br>
<br>No, it won't. At no time do I search a list. I search a hash table, yes, but not a list.<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=131983" width="1" height="1">re: The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#131980Fri, 14 May 2004 15:47:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:131980Eric Lippert> you hash the index and then run down the *whole list* to check if an item already exists
<br>
<br>No, I don't. It's a hash table FROM numbers TO lists of strings. Here's where I hash the number:
<br>
<br>if (!secondPass.ContainsKey(count))
<br>
<br>and here's where I insert the key into the list:
<br>
<br> secondPass[count].Add(key);
<br>
<br>But nowhere do I check to see if the key is already in the list. If I did, yes, that would be expensive. But I don't need to because we can prove logically that there will NEVER be a collision on this list.
<br>
<br>"Add" just adds the item to the end of a linked list, and that's cheap.<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=131980" width="1" height="1">re: The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#131953Fri, 14 May 2004 15:11:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:131953Hogan Long"Also, I think you're misunderstanding the order of the nested loop. Yes, some of those lists could be a large fraction of n, but the TOTAL length of EVERY list added together is n or less.
<br>
<br>Is that clear? I mean, suppose you've got "the cat in the hat", and you have the hash table
<br>
<br>1 cat, in, hat
<br>2 the
<br>
<br>the fact that the first list is long means that the second will be short. The total order of the nested loop can't be more than O(n) "
<br>
<br>No this is not true, as I said above.
<br>
<br>Because you do the linked list for each item inserted this becomes N^2.
<br>
<br>To follow your example:
<br>The cat in the hat is fun.
<br>
<br>When you insert "is" it will take 4 steps to see if "is" exists. When you insert "fun" it will take 5 steps to see if "fun" exists, and so on... thus up to N steps N times, or N^2.
<br> <div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=131953" width="1" height="1">re: The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#131923Fri, 14 May 2004 14:39:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:131923Hogan"I think you're misunderstanding the code. The hash bucket for one only has one thing in it, and that's a linked list. Adding to the end of a linked list is O(1)."
<br>
<br>No I do understand it... I'm not talking about the insert part I'm talking about the checking part... you hash the index and then run down the *whole list* to check if an item already exists. If most of the items are only seen once then this degrades the hash from O(1) to O(N)
<br>
<br>There are other problems with this algorithm but this is the most obvious one.
<br>
<br>As an aside this algorithm (with some modification) looks like it might work quite well with multi-processor systems.
<br><div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=131923" width="1" height="1">re: The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#131847Fri, 14 May 2004 13:13:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:131847David ThomasI don't accept the O(1) hashtable operations claim either. Some (admittedly not very well designed) quick tests on my machine doing something very similar with integer keys and adding or incrementing a counter value every time a key is hit suggest something else:
<br>
<br>size=32768, time/op=5.66E-06
<br>size=131072, time/op=6.57E-06
<br>size=524288, time/op=9.53E-06
<br>(any more and I start swapping :)
<br>
<br>I concede that hashtables can be setup to do lookups in O(1) time, but in the case of the .NET 1.1 hashtable at least, they probably aren't (cursory inspection using reflector would appear to confirm this, but I may be missing something).<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=131847" width="1" height="1">re: The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#131816Fri, 14 May 2004 11:52:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:131816CentaurUm, er, I’m going to “say something stupid™”.
<br>
<br>Why care about performance for a one-off task? It’s a one-off task, you have a specific and not very large n, and you don’t actually care if it takes O(n), O(n log n), or O(n^2).
<br>
<br>And, while theoretically O(n) is better than O(n log n), in practice O(n) might be SomethingBig * n and O(n log n) might be n log n and n might be 3, and for n = 3 bubble sort suddenly becomes the optimal algorithm :)<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=131816" width="1" height="1">re: The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#131812Fri, 14 May 2004 11:33:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:131812Peter StuerI missed the part where you do "ContainsKey" in less than O(log n). Can you explain?<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=131812" width="1" height="1">re: The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#131800Fri, 14 May 2004 10:55:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:131800Ian HicksonOf course the real question is why were those "c++" terms written as "c++" and not "c%2B%2B"? Something fishy is going on there!<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=131800" width="1" height="1">re: The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#131773Fri, 14 May 2004 09:34:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:131773Mike DimmickYou've not told the dictionary how many items to expect, so it's going to start with whatever the default minimum size is. When you try to add more than this, it has to create a new underlying array and copy all the items to it. This will be repeated if the new array still isn't enough to hold all the items.
<br>
<br>If Dictionary is implemented like Hashtable, it's based on skipping forward to another bucket if the bucket selected by the hash is already in use. It will also rehash the current table once the number of hash collisions reaches a significant amount (load factor multiplied by the number of buckets in the table). I did a bit of digging with Reflector - I don't have VS2005 CTP here, so I can't confirm what Dictionary<> does.
<br>
<br>When the underlying bucket array is expanded, the code searches for the smallest prime which is greater than the size you asked for. It has a small table of 70 primes (I assume starting around 100ish with a decent step between them); if you ask for a capacity bigger than that, it starts searching prime numbers by dividing your candidate by every odd number up to Sqrt(candidate). When it expands automatically, the minimum size is twice the current size plus 1.
<br>
<br>Yes, best case is O(1), but if you get hash collisions or the table needs to rehash, it suddenly gets a lot slower.
<br>
<br>I'm trying to think of the rough capacities required. The two obvious cases I can think of are where all the words are different - the first Dictionary needs a capacity of N, but the second only 1 - and where the words are all the same - both dictionaries only need capacity of 1. I'm not too good at this O() stuff!<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=131773" width="1" height="1">re: The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#131689Fri, 14 May 2004 06:49:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:131689Eric LippertAlso, I think you're misunderstanding the order of the nested loop. Yes, some of those lists could be a large fraction of n, but the TOTAL length of EVERY list added together is n or less.
<br>
<br>Is that clear? I mean, suppose you've got "the cat in the hat", and you have the hash table
<br>
<br>1 cat, in, hat
<br>2 the
<br>
<br>the fact that the first list is long means that the second will be short. The total order of the nested loop can't be more than O(n)<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=131689" width="1" height="1">re: The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#131686Fri, 14 May 2004 06:45:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:131686Eric Lippert> Because most of your words are going to appear once, the hash bucket for 1 is going to be huge and slow
<br>
<br>I think you're misunderstanding the code. The hash bucket for one only has one thing in it, and that's a linked list. Adding to the end of a linked list is O(1).<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=131686" width="1" height="1">re: The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#131685Fri, 14 May 2004 06:44:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:131685Eric Lippert> The code below degrades to O(N^2)
<br>
<br>You're assuming a very naive hash table algorithm. Sure, if you use a really terrible hash table, things go bad real fast.
<br>
<br>As Raymond pointed out, it's complicated. But for many situations, you can use extensible hashing and get pretty close to O(1) searches and inserts.<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=131685" width="1" height="1">re: The Stygian Depths Of Hacked-Together Scriptshttp://blogs.msdn.com/b/ericlippert/archive/2004/05/13/the-stygian-depths-of-hacked-together-scripts.aspx#131660Fri, 14 May 2004 05:26:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:131660Hogan LongThe code below degrades to O(N^2) -- degrades quickly even with a good hash algorythm if your hash table is small (that is size N or less). I seem to remember that you need a hash table size of 2N (or was it N^2?) for the hash table to be able to assure performance... but I don't remember for sure.
<br>
<br>In any case the problem is that secondPass.ContainsKey(i) walks a linked list containing up to N items with that hash value.
<br>
<br>for (int i = max ; i >= 1 ; --i)
<br>{
<br> if (secondPass.ContainsKey(i))
<br> {
<br> Console.WriteLine(i);
<br> foreach(string s in secondPass[i])
<br> {
<br> Console.WriteLine(s);
<br> }
<br> }
<br>}
<br>
<br>Now that I think about it, all the loops degrade this way, this one is just clearest because it has the nested loops.
<br>
<br>In actually, this code is going to be very slow:
<br>foreach(string key in firstPass.Keys)
<br>{
<br> int count = firstPass[key];
<br> if (count > max)
<br> max = count;
<br> if (!secondPass.ContainsKey(count))
<br> secondPass[count] = new List<string>();
<br> secondPass[count].Add(key);
<br>}
<br>Because most of your words are going to appear once, the hash bucket for 1 is going to be huge and slow. This section will degrade O(Z^2) where Z is the number of words that appear once.
<br>
<br>
<br>
<br><div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=131660" width="1" height="1">