In my last posting I made some recommendations about how to drastically improve the evaluation process based on the key observations that even the basic operations required to evaluate the truth of the facts were quite costly -- more so than all of the extra allocations.

I suggested a plan like this one:

  • preparse the predicates so that no string processing is required for evaluation
  • assign facts a number in sequence as they are encountered in the predicates
  • the preparsed predicate is a mix of fact numbers and operators probably in postfix order
  • when looking up facts simply access an array of truth bits indexed by fact number
  • as facts change simply change the array contents and then reevaluate predicates as desired
  • do the evaluation itself with a simple stack machine; no recursion in the evaluation

    And as promised here's some code that does just that:  parser_compling.cs

    Now how do we fare with this code?  Well the raw number I get on my machine is now 225ms to do those same predicate evaluations.  That's down from 4.839s -- so it's over 21 times faster.  Let's have a look at how the CPU usage is distributed now:

    Module/Function Exclusive %
    ParserCompiling.exe 93.50
       ParserCompiling.Program.EvaluatePredicate(int32[]) 76.52
       ParserCompiling.Program.Main(string[]) 16.98
       ParserCompiling.Program.CompilePredicate(string) 0.00
       ParserCompiling.Program.CompileOptionalNot() 0.00
       ParserCompiling.Program.GetToken() 0.00
       ParserCompiling.Program.CompileAnd() 0.00
       ParserCompiling.Program.CompileOr() 0.00
    mscorwks.dll 3.15
    ntdll.dll 1.47
    mscorjit.dll 0.63
    Unknown 0.42
    SHLWAPI.dll 0.21
    KERNEL32.dll 0.21 0.21
    MSVCR80.dll 0.21
    shell32.dll 0.00
    mscoree.dll 0.00

    Things have really improved at this point, the vast majority of the time (76.52%) is now being spent directly doing predicate evaluation -- that's just where we want the time to go.  Our new implementation is also much more cache friendly as the number of facts increases because now we simply have an array of booleans (it could be even denser if it was an array of bits).  Even if there were a few thousand facts we'd still have a tight representation -- the hash table approach tends to scatter things (deliberately) throughout a larger table.  There are no redundant string comparisons, no hashing, it just goes voom :)

    Now, what if we get out the big guns and start doing native code generation.  Could we do even better?  After all, this is managed code, we have the JIT at our disposal.

    Tune in tomorrow :)

    NOTE:  When I originally posted this I reported the times for the debug build... I had to restate the results because the release build is approximately three times faster.  While I was at it I re-verified that my previous numbers were on the release build and they were.  Sorry about that...

    Concluded in Performance Quiz #8 -- The problems with parsing -- Part 5