If you’ve been following my blog, you may remember an article I did extolling the virtues of indexed views - http://blogs.msdn.com/microsoftbob/archive/2009/05/13/sql-data-warehousing-tip-2-indexed-views.aspx. I stand behind that and have found this feature extremely useful. In fact in my last post the 11 seconds it was taking to perform a few thousand transactions against a 10-million row history table with 3 million transaction went down to just 2 seconds using an indexed view.
However, before you go out and make every view into an indexed view, it is important to understand what an indexed view is and what are the consequences. I was reminded of this very sharply when I did the following “routine” update command on my table containing all of the trading symbols used for my simulation processing:
/*------------------------update dbo.EquitySymbol set Shortable = 1 from Load.ShortableList lwhere l.Symbol = dbo.EquitySymbol.TradingSymbol------------------------*/
(6334 row(s) affected)
Now, normally this had been taking less than 2 seconds. But, this was before I added the indexed view that happened to include the “Shortable” column. So, how long did it take with the indexed view in place?
Almost 19 minutes? Yes, 18 minutes and 51 seconds? That is actually about twice as long than it takes to create the indexed view along with all of it’s alternate indexes! Pardon the drama, but my point is that Indexed views are dramatic by their nature. They are dramatic in that on one hand you can analyze millions of rows and help enabling performing something as complex as generating 7 years of trading simulations consisting of over 3 million orders for 10 million rows in a space of a couple of hours, while on the other hand causing a small update to take over a thousand times longer than usual…
So, how do we explain this? First, we need to understand what happens when you create an indexed view. The first step is to enable schemabinding on the view. This ensures that SQL can tag a dependency between the table and the view so it can prevent and detect changes that might cause the view to stop functioning, that isn’t where the performance hit comes, (see http://msdn.microsoft.com/en-us/library/ms187956.aspx for an explanation of the process for creating an indexed view including the schemabinding step). The key threshold happens (pun intended) when you create the primary key for the view which is the unique clustered index for the view and required before you can add secondary indexes to the view. What that does it “materializes” the view such that it is no longer just a query into your tables, but it becomes a table itself with real data. And of course, that data must be maintained to match the underlying tables.
So, what was the situation in my case? Well, the “shortable” flag was part of the indexed view that joined the equity symbol table (which has about 9,000 rows) to the Equity History table, which happens to have about 10 million rows. My “little” update of the 6,000 or so Equity Symbol rows cascaded into about 7 million updates necessary to maintain all of the rows in the indexed view. Here you can see the major time-consuming aspects of the query plan with the indexed view in place:
Notice the large time for the sorts, these were apparently needed to prepare for secondary index updates that included the shortable column as an included column. And notice that the update time for the clustered index for the table actually containing the column ([EquitySymbol].[PK_EquitySymbol]) is estimated to take virtually 0 percent of the total query time.
So, what does this mean – avoid indexed views? Not at all! It just means as the title suggest: use caution. So what could I have done differently to avoid this issue. To start with, my update command is flawed. I told SQL to update every row and didn’t even look to see if the rows needed to be updated. SQL Server will happily do what you tell it to, it doesn’t have a built-in mechanism that says - “Hey, you don’t need to update that column, it already contains the value you specified”. It doesn’t do that for a very good reason. For example, what if I had a trigger on a table that checked to see if a particular column was updated. Since SQL trigger processing with the “IF UPDATE(column)” doesn’t check to see if the value is actually changed, just whether it is updated, it is easy for me to force a trigger to execute just by updating the column to its current value. So, what I should have done was:
update dbo.EquitySymbol set Shortable = 1 from Load.ShortableList l where l.Symbol = dbo.EquitySymbol.TradingSymbol AND dbo.EquitySymbol.Shortable <> 1
Since there were only a handful of rows that had changed since the last time this update command was run, doing it this way, should have taken only a few seconds, albeit it still would have required a couple thousand updates to cascade the updates to the related rows the view. Another option is that if I really did need to update a few thousand rows, would have been to simply drop the indexed view, perform the update, and recreate the indexed again. There may be some more exotic approaches such as disabling the indexes on the indexed view, but effectively the approach is the same.
So, here are a few suggestions when dealing with indexed views:
So, the bottom line is that like just about everything else in SQL Server 2008, indexed views have their place, but use them wisely…