We recently had a question about JetIntersectIndexes()'s performance. Here's what Brett wrote up:
Intersect is to save disk IO at the cost of CPU time.
The cost of intersect indices, is the cost of iterating over ALL the index ranges (choice #3 below). That may not sound like a savings, HOWEVER index entries are stored together, and we try to maintain sequentially … and so iterating over index entries generally speaking is very fast off disk. If you walk an index, and retrieve columns you have to get from the real record … you have to jump back and forth between the index and the table, you will end up seeking into the table, and seeking is slow if the cache is not full of that data. Likely your cache is full of data … but let me continue on the cold case points though …
Some examples help imagining cold IO cases:
Every time another index range is added it generally takes a few IOs / 30 ms and let’s say 1 sec CPU / 250k entries of range. For a small index range of say 100 entries (where IO cost is dominant), if the selectivity of the eventual set of records you have to look up improves by only 3 records per 3 IOs, then you save those costly #2 type seeks that can only be performed at a measly 100 IOPS pace. For a large index range, if the selectivity of the eventual set of records improves by 100 records per 250k entries processed (CPU wise) then you save enough costly #2 type disk seeks that you win over the CPU cost. Keep in mind also you have 4 cores to process things, but generally speaking only 1 disk … so again there CPU over disk is nicer to the system as a whole.
However, if everything is warm, generally it becomes an "even numbers" game … where more entries is just more processing … it probably would’ve been better to just select the most selective (smallest) index range and process them straight off by applying the left over terms (residual predicate in query parlance) through the primary record … as that’s 2000 index entries + 2000 seeks (but no IO, just CPU)… every index range you add is a cost / penalty of CPU processing you add to your operational time at likely little improvement. Which is likely what you’re seeing here … but this is again another thing that is difficult to know a-priori.
 I am generally assuming we have generally good indices, like those over dates, auto-inc-like IDs, common terms or common names, etc. Keep in mind some really randomized indices, like GUID indices can have much poorer contiguity and thus can cost a lot more IO to traverse a range of them … and that can change the results dramatically.
 This is numbers for HDs … SSDs change things significantly, as they can often do 2000+ random IOPS, and don’t experience as much speed differential from sequential IOs.