Sometimes the Simplest Solution Isn't the Best Solution (The Optional Parameter Problem)

Sometimes the Simplest Solution Isn't the Best Solution (The Optional Parameter Problem)

Rate This
  • Comments 11

Programmers should naturally gravitate toward the simplest, most elegant solution.  This is because the simplest coding solution is so often the best solution: simple solutions are cheaper to implement; easier for others to understand, maintain, and extend; and less prone to bugs.  Simplicity can also be associated with efficient and consistent execution, but performance is one area where Occam's Razor doesn't apply quite as consistently.  Performance requirements, more often than many other types of requirement, may demand introducing complexity into what would otherwise be a nice, tidy implementation.  I’m going to discuss one such case here. 

 

The All-In-One Search Query

Sometimes you need a query or a stored procedure to expose several search parameters but actually do the search only for the subset of the parameters that the caller cares about.  For example, consider a [find_customer] stored procedure that can search for a @name value, a customer @id, an @address, or any combination of these.  The most obvious solution to this problem looks something like this:

 

    SELECT [id], [name], [address], ...

    WHERE (@name IS NULL OR [name] = @name)

        AND (@id IS NULL OR [id] = @id)

        AND (@address IS NULL OR [address] = @address)

 

This allows the caller to pass non-NULL values for the search criteria they care about, and to pass NULLs for any criteria that they want to be ignored.  This pattern is variously referred to as "all-in-one search query", "catch-all query", "dynamic search conditions", or "optional parameters".  Here’s a sample stored proc that shows a slightly simplified example with just two available search parameters:

 

    CREATE PROC find_customer

        @name nvarchar(128) = NULL,

        @id int = NULL

    AS

    BEGIN

        SELECT [id], [name], [address], [creation_date]

        FROM [customers]

        WHERE (@name IS NULL OR [name] = @name)

            AND (@id IS NULL OR [id] = @id);

    END;

 

This idiom is simple enough that it is relatively easy to understand.  Despite its simplicity, it allows a single query to meet many different needs (in this case: a search on name, or a search on address, or a search on both name and address).  It is understandable that queries like this are so popular; it seems to be an elegant solution, and developers like elegance.  Unfortunately, it’s usually a bad idea for practical performance reasons.  The problems are significant enough that I regard this as an anti-pattern in my own code. 

 

What's Wrong With the Obvious Solution?

The first problem is that the OR operators will prevent an efficient index seek-based query plan.  If the search parameters are mutually exclusive (the caller is only supposed to provide one parameter value), you can avoid this problem by changing the query to:

 

    SELECT [id], [name], [address], [creation_date]

    FROM [customers]

    WHERE (@name IS NOT NULL AND [name] = @name)

        OR (@id IS NOT NULL AND [id] = @id);

 

That may allow an index seek, but you should expect that it will still perform poorly in some cases because it suffers from other problems that I’ll discuss next.  Also note that the query semantics aren't quite the same as the first query. 

 

The second problem that affects both the original problem query and the attempted rewrite is that SQL does a poor job with plan costing for queries like this.  In general, you should avoid using filter predicates where both operands are constants and one or both are variables or parameters (e.g. “where @var = 1”, or “where @parameter is null”).  These have a tendency to screw up the optimizer’s estimates because at plan compile time the optimizer has no idea whether the predicate will qualify all rows, or disqualify all rows.  You are essentially guaranteed that its blind guess will be exactly wrong for some parameters.  The bad estimate is likely to cause inaccurate costing of plan alternatives, which in turn is likely to cause selection of a suboptimal plan. 

 

The third problem is that there is no single plan that will be appropriate for all of the different combinations of parameters, even if the optimizer was smart enough to predict the outcome of “variable = constant”-type predicates at compile time.  For example, suppose you had a nonclustered index on the [name] column, and another nonclustered index on [id].  If SQL chose a plan that first did a lookup using the index on the [name] column, this would be a very poor choice whenever the caller passed NULL for @name.  Conversely, a plan that first scanned an index on the [id] column would be inefficient when @id was null and @name was non-null. 

 

The only situations where you should use queries like either of those shown above is when (a) you don't care about the performance of the query, or (b) you can guarantee that the table will only contain a handful of rows so that a table scan-based plan will never be noticeably slower than a seek-based plan. 

 

Possible Solutions

One possible solution would be to have a different query for each search parameter (or combination of search parameters, if that is legal input for the procedure).  This is shown below.  It may be a practical solution if you only have two or three possible search parameters. 

 

    CREATE PROC find_customer

        @name nvarchar(128) = NULL,

        @id int = NULL

    AS

    BEGIN

        IF (@name IS NULL)

        BEGIN

            SELECT [id], [name], [address], [creation_date]

            FROM [customers]

            WHERE [id] = @id;

        END

        ELSE IF (@id IS NULL)

        BEGIN

            SELECT [id], [name], [address], [creation_date]

            FROM [customers]

            WHERE [name] = @name;

        END

        ELSE BEGIN

            SELECT [id], [name], [address], [creation_date]

            FROM [customers]

            WHERE [name] = @name AND [id] = @id;

        END;

    END;

 

Note that parameter sniffing may cause bad plan selection even in this case if each search query is more complex than the queries shown here (if the queries include joins against large tables, for example).  You can protect against this problem by assigning the parameters to local variables and using the variables in the queries.  You can also avoid the problem by moving the search queries to a different compilation scope (child stored procedures, dynamic SQL, or "OPTION(RECOMPILE)" query hints). 

 

The second possible solution uses dynamic SQL to construct the WHERE clause dynamically. 

 

    CREATE PROC find_customer

        @name nvarchar(128) = NULL,

        @id int = NULL

    WITH EXECUTE AS OWNER

    AS

    BEGIN

        DECLARE @search_query nvarchar(max);

        SET @search_query = '

            SELECT [id], [name], [address], [creation_date]

            FROM [customers]

            WHERE 1 = 1 ';

       

        IF (@name IS NOT NULL)

        BEGIN

            SET @search_query = @search_query + 'AND [name] = @name';

        END;

       

        IF (@id IS NOT NULL)

        BEGIN

            SET @search_query = @search_query + 'AND [id] = @id';

        END;

       

        EXEC sp_executesql @search_query, N' @name nvarchar(128), @id int',

            @name = @name, @id = @id;

    END;

 

You should explicitly parameterize the query, as shown here, to avoid the risk of SQL injection attacks.  Note the EXECUTE AS OWNER used for this procedure, which may be appropriate to avoid the need to grant the end user SELECT permission on the base table(s).  (Caution: always triple-check to ensure that the proc is not vulnerable to SQL injection before enabling EXECUTE AS OWNER.)  Also note that the pattern is simplified by always passing all parameters to sp_executesql, even though the query may only make use of a subset of the parameters.  I generally don't like to use dynamic SQL where I can avoid it, but I make an exception for this case; the dynamic WHERE clause is actually my preferred solution when I'm faced with this problem on SQL 2005 and earlier. 

 

If you're running SQL 2008 or later, there is a simpler solution that generally works well. More recent versions of SQL are better at predicting the effect of a predicate like "@p1 IS NULL" at compile time. You still have to worry about inappropriate plan reuse (getting a compiled plan that is good for the initial set of parameters but bad for subsequent parameters), though. That problem can be mitigated with an OPTION(RECOMPILE) query hint.    

 

 

    CREATE PROC find_customer

        @name nvarchar(128) = NULL,

        @id int = NULL

    AS

    BEGIN

        SELECT [id], [name], [address], [creation_date]

        FROM [customers]

        WHERE (@name IS NULL OR [name] = @name)

            AND (@id IS NULL OR [id] = @id)
        OPTION(RECOMPILE);

    END;

 

This will generate a new plan on each execution that is optimized for that execution's set of parameters. The code remains very simple. The additional per-execution compile cost can generally be tolerated if the procedure execution rate is low -- say, less than a dozen executions per second. If the proc execution rate is higher than that, you'll need to compare the compile cost to execution cost (you can use SET STATISTICS TIME ON) to see whether the small extra per-execution compile cost is offset by the more consistent performance that you should get with the hint. 

 

I’ve seen the All-In-One search query cause perf problems in code written both inside and outside of Microsoft, and I’ve been burned by the problem in my own code.  If you find yourself needing this type of query, my recommendation is to first evaluate OPTION(RECOMPILE), assuming you're on SQL 2008 or later. On SQL 2005 and earlier, or if you're on a more recent version but can't tolerate the per-execution compile cost that the RECOMPILE hint introduces, the dynamic SQL approach would be my next choice in most cases.

 

 

 

UPDATE: Erland Sommarskog has a page on this very topic here: http://www.sommarskog.se/dyn-search-2005.html.  He calls it the problem of "dynamic search conditions" (a less awkward phrase than "All-In-One Search Query").  The page provides a very detailed discussion of the topic, and would be a great read if you really want to drill into the details. 

 

 

UPDATE #2: Another good read on a variation of this problem: http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/ (and, to a lesser degree, http://sqlinthewild.co.za/index.php/2009/09/15/multiple-execution-paths/), both from Gail Shaw.  Gail calls this class of query "Catch-All Queries". 

 

UPDATE #3: Another discussion of the optional parameter problem from Conor Cunningham: http://blogs.msdn.com/conor_cunningham_msft/archive/2010/04/22/conor-vs-optional-parameters.aspx

    FROM [customers]

Leave a Comment
  • Please add 6 and 1 and type the answer here:
  • Post
  • PingBack from http://asp-net-hosting.simplynetdev.com/sometimes-the-simplest-solution-isnt-the-best-solution-the-all-in-one-search-query/

  • There is a downside to this approach. You now need to explicitly grant the "SELECT" permission to the user in connection string. It would be better if we split the each condition logic to a separate UDF/SP to avoid the dynamic execution of the query and at the same time without compromising on the security.

  • Thank you for submitting this cool story - Trackback from DotNetShoutout

  • @Gajhani -

    Actually, I think you're mistaken -- with the proc written as shown above, it is not necessary to grant select permission on the base tables.  Perhaps you overlooked this comment?

    >> "Note the EXECUTE AS OWNER used for this procedure, which may be appropriate to avoid the need to grant the end user SELECT permission on the base table(s)." <<

    Bart

  • Bart,

    Yes, I have overlooked that statement (EXECUTE AS OWNER) and thanks for pointing it out. I did not know that there a thing like this in SQL 2005. Thanks for the nice post. I did learn something new today from your blog! Please do also remember that there are still people like me who are still doing the development in SQL 2000 :-). Thanks again!

  • Thank you for an excellent and well presented article that clearly explains some tricky concepts. Interestingly enough, I also adopt and have found the Dynamic SQL Where clause method to be the most effective approach for this scenario.

    Thanks again, I look forward to reading more.

  • @Gajhani -

    Yes, EXECUTE AS is new to SQL 2005.  You're not alone on SQL 2000 :) -- I should have mentioned in the article that this was only available on Yukon and later versions.  

    As an aside, Erland Sommarskog (a SQL MVP) also has a great article on dynamic SQL that discusses EXECUTE AS and some other approaches to mitigating the security issues that are typically associated with dynamic SQL.  It's a long read, but worth it IMO: http://www.sommarskog.se/dynamic_sql.html.  

    HTH,

    Bart

  • It is worth mentioning that This is a SQL Server specific thing.  Other SQLs store up to 16 different execution plans and use the best one based on the parameter values.  They do automatically what we have to do by hand.  And most of us don't want to write out 16 IF-THEN statements.  

  • As you say, moving the queries out to child stored procs enables you to still get appropriate per-query plan allocation, and avoid dynamic SQL. But as the number of parameters increases, this becomes increasingly less viable.

    If you have to build a query dynamically, it's normally better to do it in the calling application and skip the stored proc altogether. Provided the calling app parameterises correctly (which ADO / ADO.Net commands will do), there's no perf. difference between that and the child-stored-proc approach.

  • I think the most obvious solution might just be to actually use a full text search engine and let it do the work for the text pieces and then use a store procedure for min and max numericals and date times. this will also allow you to default the parameters to teh min and max of the data type respectfully. Building the catalogue will require some additional effort as well but it well worth it.

  • This is very good article and i need to learn more like this it is very help full for me and its very clear explanation

Page 1 of 1 (11 items)