Some of you are all set to flame me just because of my title. I know because a couple of years back, I would have probably been one of the first to say “NO!”, when anybody started talking about query hints. But, like everything else in this world, they have their place. First of all, I encourage you to try everything else before resorting to query hints and to avoid hints that specify index names and such directly. The hints I have found to be most useful are fairly generic and do not impose maintenance burdens on the database structure.
The rest of you may be wondering exactly what I’m talking about, so let me explain. When SQL Server evaluates a query, it takes it apart and looks for how it can optimize the retrieval, what indexes can be used. It looks at all sorts of things including selectivity of the indexes, number of physical and logical I/Os required, the parameter values, the search arguments (SARGs) and makes a best-guess stab. Any query you execute, you can select the “Show Query Plan” menu item (or CTRL-L) and get a graphical picture of just how SQL will execute the query. You can also save it in XML format. And it will actually detect “missing indexes”, indexes that would help performance, which I have found incredibly useful and powerful (although I don’t usually follow the index suggestions verbatim – more about that in another post). A query hint is basically telling the query optimizer that you have inside information that is better than what SQL knows. For more background on query plans, query optimization, etc, I suggest the book SQL Server 2008 Internals” by Karen Delaney (http://search.barnesandnoble.com/booksearch/results.asp?ATH=Kalen+Delaney). The SQL team has also created a special blog focused on query optimization at http://blogs.msdn.com/queryoptteam/. Understanding query plans and every single different hints are long blog posts in and of themselves, so I’m not going to attempt to cover this in this article.
My recent experience with query hints came in 2 different areas. Alas, I actually have stats to backup one of my observations. The first observation involved the use of the FAST N hint. This basically tells SQL to optimize only for the first N matches. This can be real useful when using a TOP N where N is a parameter value and you happen to know that it will usually be a small number of rows. SQL doesn’t have the knowledge of that luxury, so forcing an OPTION N hint will cause the optimizer to do SEEKs instead of SCANs which (and I am oversimplifying) means it goes right to the page with the page with the first N matches rather than trying to scan through an entire range or entire table. This can also be useful for partitioning/ranking queries where again you are looking for just the first few rows and not wanting to rank the entire set.
The other experience involved the use of my simulation application. I recently ran the simulations against about 6 years of stock history (almost 10 million rows) in order to simulate different entry/exit trading strategies. This generated nearly 2 million orders and over 1 million end-of-day portfolio rows. What was interesting was that at the beginning of the simulation before there were any rows in my portfolio Orders table, the buy/sell procedures were running very fast. They didn’t start slowing down until over 1 million rows, and even then they weren’t too bad, considering the volume of data. You can see this from the attached chart. At that time, the indexes were undoubtedly not pretty as some of them were being split a lot due to the various combinations of fields that had to be indexed in order to close out positions, evaluate trading effectiveness, etc.
If my routine was simply to generate a couple of million of rows each night, I could just add a post-processing step to rebuild all of the fragmented indexes, then I would have left well enough alone. (for a great tool to automate this, see http://blog.sqlauthority.com/2008/03/04/sql-server-2005-a-simple-way-to-defragment-all-indexes-in-a-database-that-is-fragmented-above-a-declared-threshold/ – I copied and pasted this into a SQL job step I run nightly and I don’t worry any more about index fragmentation and it runs a lot faster than using the built-in SQL index maintenance tools, because it only does what it needs to). In any case, turns out that what I really want to do is just update the data daily with a few new simulations and append to the existing one. For the append, the query plan was still good, since I still need to create a few thousand orders and update a few thousand (fill the buy orders, close the sell orders). But, for just adding a new simulation that goes back 7 years, it was taking inordinately long given the amount of data. You can see from the below chart, that even though only a few or in some cases no records were updated, the buy and sell procedures were taking almost as long as when 2 – 3,000 rows were being updated.
So, the question is, how can I still keep the batch-oriented performance optimized to process thousands of buy/sell orders all at once, while also getting good-turn around when we have a handful of orders? Enter the query hint “OPTION RECOMPILE”. Most of the time, using this hint doesn’t make sense, because it defeats one of the great features of SQL Server for auto-parameterizing execution. Auto-parameterization is another complex topic covered in the earlier links I listed, but basically allows SQL to look at similar queries, particularly when they are done in a stored procedure, and come up with a reusable plan. It does this based on the parameters for the execution. This works great if you’re calling a stored procedure repeatedly at short intervals, since it saves the recompilation step. However, in my case, my process is mainly batch, I am only calling my stored procedures once per market date for all of the simulations. With the exception of 1 row-oriented procedure, I basically simulate an entire' days worth of stock market activity in 7 stored procedures, so in my scenario I don’t care about the time being saved on the recompile. By using OPTION RECOMPILE, SQL will look at the stored procedure each time and look at all of the parameters and their values and then make the right decisions.
Lets take a look at the stored procedure. You can see for example in my stored procedure where dynamic evaluation of the parameter values would help. If for example, I am calling for just a single portfolio, that makes for an entirely different plan than when there are hundreds of portfolios. One approach involves scanning through the orders by key to look for qualifying orders and then matching back to the portfolios while the other approach involves looking up a single portfolio and evaluating only the orders for that portfolio.
CREATE PROCEDURE [Olap].[ProcessSellOrders]
@MarketDate DATE, @RowCount BIGINT OUT
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
SET XACT_ABORT ON;
-- Assume bollinger history technique for now
-- Similar logic as Process Buys, except reverse the trading methods with same
-- logic for daily high/daily low, but update Close date, limit instead of Open date,
-- and check stop limits
-- We don't know if the low or high hit first, so to be on the safe side, we assume
-- that the worse happened and so check the stop limit first
SET FilledPriceClose = dbo.udf_CalculateFillPrice
CloseDate = @MarketDate
FROM olap.Portfolio pf,
WHERE (pf.StartDate <= @MarketDate and (pf.EndDate >= @MarketDate or pf.EndDate IS NULL))
AND (pf.LastUpdateDate < @MarketDate or pf.LastUpdateDate is null)
-- Last Update Date not set until EOD
AND PortfolioOrder.PortfolioId = pf.PortfolioId
AND ed.MarketDate = @MarketDate
AND ed.TradingSymbol = PortfolioOrder.TradingSymbol
AND PortfolioOrder.OpenDate <= @MarketDate
-- See if valid for day trade - An equity is valid for day trade if it was bought using the
-- opening price.
-- Don't get confused by new orders for next market date
AND PortfolioOrder.CreateDate < @MarketDate
-- Redundant since earlier than OpenDate, but include so that optimal index can be used
AND PortfolioOrder.OrderStatus = 2
AND ( (ed.PriceAtOpen >= PortfolioOrder.StopLimitPrice AND PortfolioOrder.TradingMethod = 'S' )-- Stop price hit at open
OR (ed.PriceAtOpen <= PortfolioOrder.LimitPriceClose AND PortfolioOrder.TradingMethod = 'S' ) -- Limit price hit at open
OR (ed.DailyHigh >= PortfolioOrder.StopLimitPrice AND PortfolioOrder.TradingMethod = 'S' )-- Stop price hit during the day
OR (ed.DailyLow <= PortfolioOrder.LimitPriceClose AND PortfolioOrder.TradingMethod = 'S' ) -- Limit price hit during the day
OR (ed.PriceAtOpen <= PortfolioOrder.StopLimitPrice AND PortfolioOrder.TradingMethod = 'L')-- Stop price hit at open
OR (ed.PriceAtOpen >= PortfolioOrder.LimitPriceClose AND PortfolioOrder.TradingMethod = 'L') -- Limit price hit at open
OR (ed.DailyLow <= PortfolioOrder.StopLimitPrice AND PortfolioOrder.TradingMethod = 'L')-- Stop price hit during the day
OR (ed.DailyHigh >= PortfolioOrder.LimitPriceClose AND PortfolioOrder.TradingMethod = 'L') -- Limit price hit during the day
AND ( (PortfolioOrder.IsValidForDayTrade = 1 AND PortfolioOrder.OpenDate = @MarketDate)
OR (PortfolioOrder.OpenDate < @MarketDate)
SET @RowCount = @@ROWCOUNT
SQL 20008 provides much more flexibility in how to supply query hints. For example, in the above stored procedure, we can just put the keywords OPTION (RECOMPILE) at the end of the query rather than using the legacy RECOMPILE option for the whole stored procedure. If there were multiple queries, we could put different query hints or no query hints with each query, so there is a great improvement in granularity for performing this kind of tweaking in SQL 2008.
Here’s the difference in performance after changing to using the RECOMPILE hint. Times are rounded to the nearest second, so 0 means less than .5 seconds. Here you can see the difference in the query plan when I am only calling for a single portfolio versus the case where all of the portfolios are being processed. OPTION RECOMPILE is particularly useful when there is a lot of variation in the parameters value that ultimately determine the search arguments and predicates for scenarios where one index may make much more sense than another index depending on the search criteria.
P.S: Since I wrote this, I put my indexed view back in and now the sell orders stored procedure even with the main simulation also performs in just a couple of seconds, even with the indexes fragmented.
For more about query hints, see http://technet.microsoft.com/en-us/library/ms181714(SQL.90).aspx