"Should I use Variables, Parameters, or Literals in my code?" We've all seen the pros and cons for each of them and had to choose the option with the least negative impact. What if I told you there was a fourth option?For years, I wished SQL Server would support constants, a value that you could declare at the database level, that would allow the best query plan and still be simple to change. Sadly, I can't say that they're being introduced (not that I'm aware of). But with a little work, we can duplicate their functionality. For lack of a better term, I'll be referring to it as a Pseudo-Constant.
Using AdventureWorks, consider a scenario where the Shipping department need a special report for all "Overnight Shipping" orders. We'd typically choose one of three solutions for for implementing this query:
While a constant would rarely replace the need for parameters, it's worth mentioning their behavior for the sake of comparison. Through a feature called, "parameter sniffing", parameters give the query optimizer the best value for your first compiled execution but can often cause issues with subsequent calls that don't represent data of a similar cardinality. Among other places, parameter sniffing is discussed here and here.
Like a real constant, the pseudo-constant doesn't replace the need for parameters or variables. Instead, it's something that you might consider when you want the performance of a literal without the overhead maintenance of updating hundreds of procedures when a change is required. Implementation:
Pseudo-Constants are not variables or parameters. Instead, they're simply views with one row, and enough columns to support your constants. With these simple rules, the SQL Engine completely ignores the value of the view but still builds an execution plan based on its value. The execution plan doesn't even show a join to the view!
To show its true power, we'll have to add some data to AdventureWorks database. I'm using the AdventureWorks2012 download from CodePlex and updating the SalesOrderHeader table to include a wider variety of data.
Data and Index Changes
[Sales].[SalesOrderHeader] currenly uses two of the five ship methods, which doesn't give us enough uniqueness of data. We're also missing an index on ShipMethodID so we'll make a few adjustments:
Our next step is to generate our Pseudo-Constants. As described above, we will create a table for each constant we require in our code. We'll also create a schema to keep things organized. This is not required but I find it improves readability. The data we're creating reflects the data in [Purchasing].[ShipMethod]
Testing Our Pseudo-Constants
Now that our view is in place, we can compare the overhead of a pseudo-constant with that of a variable and a literal. We'll start by turning on STATISTICS IO and then running the most selective query (OVERNIGHT J-FAST) against each implementation.
There are two interesting things we see from STATISTICS IO output. 1. While the use of a standard JOIN or a variable show reads from the ShipMethod table, there are no observable reads from using the view. 2. Compared to literal and Pseudo-Constant, the logical reads against SalesOrderHeader are significantly higher with a standard JOIN or a variable.
We can see here, that using a variable results in a clustered index scan. Since the value of @ShipMethodID is not known at compile time, the SQL engine has estimated that 104488.3 rows (one third of the total 31465 rows) will be returned. There are a lot of articles that describe this behavior including a great one by Kalen Delaney so I'm not going to cover it again here. I'll only mention that this is expected behavior when using variables in a query.
More interestingly, using Pseudo-Constants gives us the exact same execution plan as using a literal. The estimated and actual number of rows are a match because we know the value of ShipMethodID at compile-time.
Looking at the Index Seek operation, we even see that SQL Server is using the literal value 4 to determine the execution plan.
But Why Bother? Why not use Literals?
So why not just use literal values? On the surface, it seems like Pseudo-Constants are a lot of work with no additional benefit, but we get some terrific advantages in medium-to-large projects:
1. Easier Dependency Tracking. When using variables, constants, or even JOINing to tables, we're never sure which lookup values are actually used. The removal of a row from the ShippingMethod table won't cause any compile errors in our code and will only present a problem when a query is actually executed. We may or may not even notice the problem. Using Pseudo-Constants gives us immediate visibility into dependencies:
2. Easier Code Refactoring.
When using a literal, updating an ID becomes a painful search-and-replace project. Changing every stored procedure that used "ShipMethodID = 4" to a new value like, "ShipMethodID = 6" requires a manual review of the code to look for these hard-coded values. In a large enough project, you'll never be sure you got them.If you're using Pseudo-Constants, it's a simple process of querying the dependencies (as outlined aboce) or altering the underlying view to provide a new literal value.
3. The right data type every time.
From my own experience of seeing unexpected implicit data conversions, I know our literals and variables can sometimes use the wrong data type. This results in undetected performance hits when the SQL engine has to convert every row in the table to match the data type of your parameter. The Pseudo-Constant ensures you're only setting the data type in one place.
4. Instant Recompiles for data changes.
If the value of "OVERNIGHT J-FAST" changes, you're required to ALTER the view. This will result in all dependent objects to obtain a new plan for their next execution. While it's rarely valuable in this situation, it's extremely beneficial if you use Pseudo-Constants for date ranges.
1. Working in date ranges
Another exciting use of Pseudo-Constants is when working with date ranges. Pseudo-Constants can be used to supplement variables in returning a more selective dataset. I'll expand on this in a future post but imagine a query where the Psedo-Constant narrowed down your potential rows by 90% while the variable was used to get the exact range. This is far beneficial than a plan that returns all of your rows to be filtered at runtime.
Disclaimer and a heartfelt thanks: Whenever possible, I will endeavor to give credit to people that provided a solution, idea, or inspiration for my own blog posts. These posts would not have possible without all the amazing people in the SQL Server community sharing their own solutions. While I'm not aware of anybody using this solution, I don't claim to be the first person to have thought of it. If there are any other sources for anything I cover, please contact me so that I may give proper attribution.