Santalic tailfans, part two

Santalic tailfans, part two

Rate This
  • Comments 36

As I have said before many times, there is only one sensible way to make a performant application. (As an aside: perfectly good word, performant, deal with it!) That is:

  • Set meaningful, measurable, customer-focused goals.
  • Write the code to be as clear and correct as possible.
  • Carefully measure your performance against your goals.
  • Did you meet your goal? Great! Don't waste any time on performance analysis. Spend your valuable time on features, documentation, bug fixing, robustness, security, whatever.
  • If you did not meet your goal, use tools to discover what the worst-performing fixable thing is, and fix it.

Iterate on that process, updating your goals if it proves necessary, until you either have something that meets your goals or you give up.

My explicit goal for my little dictionary search program was that it give results fast enough that I would not be sitting there impatiently waiting for more than a few seconds. That's a very customer-focused goal, me being the only customer of this program. With only a very small amount of tweaking my program met that goal right away, so why would I spend any more time on it? The program takes just under two seconds to report the results for a typical query, which is faster than I can read.

But suppose that we did want to improve the performance of this program for some reason. How? Well, let's go down the list. We have goals, we have very clear code, we can measure the performance easily enough. Suppose we didn't meet the goal. The last thing on the list is "use tools to discover what the slowest fixable thing is".

A commenter conjectured that the performance bottleneck of this program was in the disk I/O. As you can see, every time we do a query we re-read the two megabyte dictionary file dozens of times. This has the benefit of using very little memory; we never have more than one line of the dictionary in memory at a time, instead of the whole four megs it would take to store the dictionary (remember, the dictionary is in ASCII on disk but two-byte Unicode if in strings in memory, so the in-memory size will double.)

That's a conjecture -- a reasonable conjecture -- but nevertheless, it's just a guess. If I've learned one thing about performance analysis it's that my guesses about where the bottleneck is are often wrong. I'll tell you right now that yes, the disk-hitting performance is bad, but it is not the worst thing in this program, not by far. Where's the real performance bottleneck? Any guesses? Could you know without using tools?

Here's the result of a timing profile run of a seven-letter queries with one blank, repeated 20 times:

43%: Contains
21%: ReadLine
14%: Sort
7%: ToUpperInvariant
15%: everything else

Holy goodness! The call to the Contains extension method in the query to test whether the dictionary word is in the rack set array is almost half the cost of this program!

Which makes sense, once you stop to think about it. The "Contains" method is by its highly general nature necessarily very naive. When given an array, 99.9% of the time it has to look through the entire 26-item array because 99.9% of the time, the word is not actually going to match any of the possible racks. It cannot take advantage of any "early outs" like you could do if you were doing a linear search on a sorted list. And each time through the array it has to do a full-on string comparison; there's no fancy-pants checks in there that take advantage of string immutability or hash codes or any such thing.

We have a data structure that is designed to rapidly tell you whether a member is contained in a set. And even better, it already does the "distinct" logic. When I replace

    var racks = (from rack in ReplaceQuestionMarks(originalRack)
                 select Canonicalize(rack)).Distinct().ToArray();

with

    var racks = new HashSet<string>(
                from rack in ReplaceQuestionMarks(originalRack)
                select Canonicalize(rack));

suddenly performance improves massively. The "Contains" drops down to 3% of the total cost, and of course, the total cost is now half of what it was before.

Another subtle point here: notice how when I changed the type of the variable "racks" from "array of string" to "set of string", I didn't have to redundantly change the type thanks to implicit typing of local variables. I want to emphasize the semantics here, not the storage mechanism. If I felt that communicating the storage mechanism was an important part of this code -- because it has such a strong effect on performance -- perhaps I would choose to emphasize the storage by eschewing the "var".

With this change, the program performance improves to about one second per query and the profile now looks like this:

39%: ReadLine
23%: Sort
11%: ToUpperInvariant
7%: the iterator block goo in FileLines
5%: ToArray (called by Join)
15%: everything else

Now the bottleneck is clearly the combination of repeated file line reading (48%) and the string canonicalization of every dictionary line over and over again (37%).

With this information, we now have data with which to make sensible investments of time and effort. We could cache portions of the dictionary in memory to avoid the repeated disk cost. Is the potential increase in speed worth the potentially massive increase in memory usage? We could be smart about it and, say, only cache the seven- and eight-letter words in memory.

We could also attack the canonicalization performance problem. Should we perhaps precompute an index for the dictionary where every string is already in canonical form? This in effect trades increased disk space, increased program complexity and increased redundancy for decreased time. Should we use a different canonicalization algorithm entirely?

All of these decisions are driven by the fact that I have already exceeded my performance goal, so the answer is "no". Good enough is, by definition, good enough. If I were using this algorithm to actually build a game AI, it would not be good enough anymore and I'd go with some more clever solution. But I'm not.

  • Smartass et al. are missing the point...

    All the counter-arguments that have been made rely on assumptions about what is "Good Enough".

    However, the customer (in this case, Eric) defines what is "Good Enough", and the programmer (us) works to that specification. In real-world software development (where deadlines and timesheets and chargeable time matter), it is wrong to spend even an hour "optimising" past that point.

    I've never experienced a performance problem that I couldn't have thrown hardware at if I chose to. Sometimes we've done that, sometimes we've spent time on performance analysis - based on our judgement about which was cheaper.

    What Eric is trying to say is that there's always a trade-off - speed for memory, speed for size, speed for development time, speed for readability, speed for maintainability, speed for stability - and that's a) not a trade-off that you always want to make, and b) most likely not a decision you should be making without consulting your client.

    "Hi, Mr Project Manager! I'm gonna deliver two days late because I want this function to run in 5 nanoseconds instead of 0.2 seconds, even though the spec says that 'a few seconds' will do. That ok?"

    Most relevantly, we take a rapid-development approach. I've written some shamefully poorly-performing code (I'm looking at you, LINQ-to-SQL Deferred Loading...) in the interests of getting our client hands-on with a system in-development sooner. "Good Enough" for a first-contact demo, or for UAT is quite different to "Good Enough" for production - it can help your project run more smoothly if you can do your performance optimisation later in the project (i.e. after you know that you need it), and deliver other things earlier. More than once it's turned out that my shameful code has been "Good Enough". At that point it's my job to work on something else, instead of spending valuable time fixing something that isn't broken.

    I don't think I'd be exagerating if I said "most" of our code doesn't even make it to production. Imagine what a waste of time it'd be if I "optimised" *all* my code for speed...

  • Great article. Given that so many of the comments show the authors clearly miss the point of the article, it's no wonder that so many projects exceed budget and are delivered late, if ever. More often than not, the best optimization you can make at any given time is to make none at all and move on to the next task.

    It is likely that I'm just not very good at explaining myself clearly. Writing well is hard, as it turns out! -- Eric

    BTW, given that the word that's "not a word" fits the textbook definition (Dictionary.com's definition, anyway) of a word, it's pretty hard to argue that it's not a word. And given that nearly everyone intuitively knows what the word means on first sight, regardless of how well liked the word is, "performant" is apparently quite performant!

  • I'd just like to provide another example of when CPU doesn't matter.  I'm a corp wage slave ASP.NET webforms developer, and I like to write fast C# code.  Unfortunately, I've found that no user has ever noticed just how fast my C# code that runs on the server side of a webform page when they're accessing the site from Norway to Houston over a VPN connection.  It wouldn't matter diddly if I was using Smartass' code or not.  They do often notice when I optimize things like Page size, or viewState size, or SQL stored procedures.

  • Very interesting Eric.  Actualy learnt quite a few things.  Ironically though, you attack one of the important part of the storage mechanism while an other, easier (in my mind anyways) solution is staring at you.

    At any given point in time, the list of racks you pass to your search query is a list of strings, all of the same length.  I would think the following might do a good bit to reduce the I/O part and maybe get another 20% performance gain.

    First, I would sort the word file by length/alpha (the alpha is probably useless - more to do with having a "clean" file)

    Then I would have:

       return from line in FileLines(dictionary,originalRack.Length)

              where racks.Contains(Canonicalize(line))

              select line;

    And

    private static IEnumerable<string> FileLines(string filename, int rackLength)

    {

       using (var sr = File.OpenText(filename))

       {

           while (true)

           {

               string line = sr.ReadLine();

               if (line == null)

                   yield break;

               // Stop reading any further if the words are longer than the current rack - this will reduce file I/O

               if (line.Length >  rackLength)

                   yield break;

               // Only output to the iterator if the line has the appropriate length - this, by limiting the number of records yielded will furthr reduce the impact of Contains (though now negligible since you're using a HashSet and number of iterations

               if (line.Length ==  rackLength)

                   yield return line;

           }

       }

    }

    On a last note, I would say that the oversight was relatively easy to make.  As much as I do love the convenience of LINQ, I did cringe the first time I saw the feature.  Being mostly a database warehousing person, I find that the mindset to deal with large amounts of data is usually very different than the mindset for object-oriented programming that most developers are used to.  LINQ is both a wonderful tool and a dangerously two-sided sword: it is easy to forget, through convenience the #1 rule in data warehousing: reduce the size of the input you're working on.

    In this case the "where line.Length == originalRack.Length" clause in your original version is perfectly correct and functionality equivalent to my proposal, and it is beautifully simple.  But, it unfortunately doesn't "seamlessly travel" to the underlying data system to actually reduce the number of scaned records.

    By sorting the dictionary file by word length and moving the length check inside the function we can pretty easily squeeze some more performance (though you most certainly got most of it with the HashSet).

    I would be curious to know how much we can get?  I had a much smaller dictionary file on hand (340k words), but statistically only 1/3 or the words were of a length under 8 characters (7 and 7+1), so this should reduce your I/O by 2/3.

    But very good and useful article, thank you!

  • I read an interesting post on Eric Lippert’s blog recently where he was using LINQ to Objects to find

  • Interestingly performant is a common german word, which means: with good performance.

    (http://dict.leo.org/?search=performant)

    It's not a bad thing to introduce new words from another languages, in Germany we do that all the time.

Page 3 of 3 (36 items) 123