Today, we'll talk about row goals. The optimizer in SQL Server has a feature that can bias plan choices to retrieve a certain number of rows quickly instead of the whole results. This shows up in a few places, but the primary areas are in TOP N queries and in subqueries that need to check for the existance of "a" row - WHERE EXISTS, for example. The query hint "option (fast N)" also translates into the same feature.
One of the problems that can arise with row goals is that the query plan can change when you add them into a query request. Especially with the option(fast N) hint, you can find cases where the first row may come back quickly but the whole results come back more slowly. So, if you send option(fast N) but retrieve the whole results, your system won't perform as well.
Effectively, we bias the optimizer to favor plans that can return a few rows quickly compared to the minimum cost to return all rows. In practice, this often means that joins will choose nested loops joins for row-goal limited plans and hash joins otherwise.
The following example demonstrates the issue:
use tempdb
create table A(col1 int, col2 binary(100), col3 int)
declare @i int
set @i = 0
while @i < 5000
begin
insert into A(col1, col2, col3) values (@i, 0x3333, rand()*1000)
set @i = @i + 1
end
create clustered index i1 on A(col1)
set statistics time on
-- should pick a series of hash joins
select A1.* from
A as A1 inner join A as A2 on A1.col1 = A2.col1
inner join A as A3 on A1.col1 = A3.col1
-- if there is a row goal, we’ll pick the loop join plan that returns one (or a few) row(s) quickly
option (fast 1)
You can run these on your installation to see the time difference for retrieving all rows with the hash join plan vs. the loop join plan (just keep adding rows until you see the difference).