How to warm up the Analysis Services data cache using Create Cache statement?

How to warm up the Analysis Services data cache using Create Cache statement?

Rate This
  • Comments 15

Goal

This document describes how to build Create Cache commands.  Create Cache for Analysis Services (AS) was introduced in SP2 of SQL Server 2005. It can be used to make one or more queries run faster by populating the OLAP storage engine cache first.

 

Some customers have found certain queries benefit other later queries.  For example, ascmd.exe could be used every hour to execute all queries in a directory keeping the cache ready for subsequent user queries.  The other approach, which has been used, is to create a cache query for each user query.  This is feasible if the MDX query is part of a report, then one simply adds another query that has the side effect of populating the cache, thereby speeding up the next query.

 

The root of the problem is that during a query, the AS Server does only local optimizations.  Calculations, mixed granularities, and other more complex MDX statements can result in a chatty communication between the FE (Formula Engine) and the SE (Storage engine).  In AS2000 and earlier, this was also a network round trip. 

 

By issuing a cache statement, we can populate the cache with one or more subcubes that cover the regions of cube space that the query will actually need.  We often find approximately the same time taken for each subcube query, so the effect can be dramatic overall.

 

With this methodology, the collection of MDX queries will appear as first executing inside the storage engine, and second inside the formula engine.  In addition to reducing overall time, this can make it easier to predict the effect of multi-user load testing, because the first part uses 100% of all CPUs, and the second part uses 100% of one CPU.

 

Summary Steps

It is an iterative process.  I would describe the identification of potential scenarios where create cache would help:

 

  1. Run profiler
  2. Run the query looking for Non-cache SE queries (query subcube filtered by subevent = 2)
  3. Look at the total time of the query vis-à-vis of the sum of the times of the non-cache SE queries.
  4. If:
    1. They are pretty close and
    2. There are many non-cache SE queries within the same range of time (not just 1 or 2 time consuming)

then create cache might help.

 

There could be variations, but here are one set of steps that have been used successfully:

 

1. Extract all MDX queries as separate files.

2. Add a Create Cache statement to correspond for every MDX query file.

3. Run Clear Cache, Create Cache, then the user query.

4. Verify that the Create Cache is effective and improve, if necessary.

5. Work on next query

6. When done with all queries, combine the Create Cache into 1 or more Create Cache queries.

7. Verify the combined Create Cache.

 

 

 

 

 

 

 

 

Detailed Description

Below are details about each of the steps.

 

1. Extract MDX queries as separate files

Placing the queries in separate MDX files, it can be faster to work one by one and verify if each query is handled correctly by Create Cache before moving on.

 

1. Start a trace.

2. Run the report.

3. Stop the trace.

4. Extract the queries.  One way is with SQLProfiler, /File /Export /Extract SQLServer Analysis Serverices Events / Extract All Queries.  This creates a text file, with each query on a separate line.  When queries have multiple lines this might be confusing, but is easy to add an extra line or otherwise edit.  Either copy each query to a separate file, or, for every query that is being worked on, comment out the other queries.

 

2. Add Create Cache for every MDX query

Below is an example of a Create Cache statement. 

 

create cache

for [MyCube]

as (

            { [USA].[Oregon], [USA].[Colorado], [USA].[Florida], [USA].[Washington]  }

            * { [Measures].[mybasemeasure] }

            * { [2006].children, parallelperiod( [Time].[Year], 1, [2006].[Q1].[Jan] ), YTD( parallelperiod( [Time].[Year], 1, [2006].[Q1].[Jan] ) )  }

            * { [Products].[Shoes].children }

)

 

It should be apparent that it is basically a crossjoin of each dimension member that is specified in the query.  Note that set expressions are allowed.

 

2.a. Add Specified Members

First add all members specified in the query.  NOTE: Create Cache covers static analysis rather than dynamic. MDX with dynamic members will not benefit from this approach.

 

 

2.b. Add Calculated Members and Definitions

During execution, the Formula Engine can issue a subcube query which includes calculated members. 

 

For example, if the cube has a calculated member:

            [MyDim].[Calc123] as [MyDim].[A] + [MyDim].[B]

one should include the following members in Create Cache:

             [MyDim].[A], [MyDim].[B] }

 

2.c. Account for Custom Rollups

During execution, there may be custom rollups that affect the calculation.  For example, a custom rollup may involve QTD() or YTD().  This might be hard to detect since the actual calculations are stored in the relational database.  Custom rollups can be viewed from BI Development Studio.  The dimension must be processed (otherwise the calculations still only exist on the relational database).

 

1. Go to Solution Explorer, double-click on the dimension you want to examine.

2. Click on the Browser tab.

3. Find the icon for Member Properties, click on it.  Select "Custom Rollup", "Custom Rollup Properties".

4. It will display the Custom Rollup (formula) and other properties (such as solve order) for each member.

 

For example, if calculations include YTD or PeriodsToDate, the member list should account for that.  So, if the query includes:

 

            [Date].[2006].[June],

            ParallelPeriod( [Date].[2006].[June], [Date].[Year], -1 )

            YTD( [Date].[2006].[June] )

 

the only calculations resulting in output are [Date].[2005].[June], [Date].[2006].[Jan] ..[Date].[2006].[June]. However, because of indirect relationships, you should also add the member set:

 

            YTD( ParallelPeriod( [Date].[2006].[June], [Date].[Year], -1 )

 

3. Run Clear Cache, Create Cache, User Query

Now, it is time to run the Clear Cache statement and examine its effect.  Note that the Cube ID can be specified or left blank to clear the entire database cache entries.

 

<!-- Can be used through ADOMD.NET ExecuteNonQuery call, or passed to ADODB as CommandText   or SQL Server Management Studio XMLA query -->

 

<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">

  <ClearCache>

    <Object>

      <DatabaseID>FoodCmp</DatabaseID>

      <CubeID>SalesCube</CubeID>

    </Object>

  </ClearCache>

</Batch>

 

3.a. Connection string parameters

We recommend these parameters to be added at the connection string. 

 

Provider=msolap.3;Datasource=MyServer;Initial Catalog=MyDatabase;Timeout=300;Disable Prefetch Facts=true;Cache Ratio=1

 

Provider:           The name of the OLE DB provider, msolap.

Datasource:      The name of the server.  Might be named instance like MyServer\MyInstance.

Initial Catalog:   The name of the database to use.

Timeout:           Optional number of seconds for command timeout.  This can help to avoid very long runs until the queries are optimized to run faster.

Disable Prefetch Facts: Optional new parameter to disable the Formula Engine heuristic that sends queries for possibly more data than it is requested.

Cache Ratio:     Optional parameter to control the conversion from a specified set of members to the subcube query sent to the SE Storage Engine.  Use 1 to send only the specified list.

 

3.b. Ways to Execute Queries

Because the connection string needs to be modified, SQL Server Management Studio cannot be used.  There are several alternatives:

 

1. One could write a small program in C#.

2. ascmd.exe can execute queries and output time.  The ascmd utility is part of the Yukon Samples kit located here:

http://www.microsoft.com/downloads/details.aspx?FamilyID=e719ecf7-9f46-4312-af89-6ad8702e4e6e&DisplayLang=en

Then look in the C:\Program Files\Microsoft SQL Server\90\Samples\Analysis Services\Administrator\ascmd folder.

3. Use the mdx sample app from AS2000.

 

4. Verify Create Cache is Effective

The goal of using Create Cache is to isolate the storage engine and formula engine work and prevent the user query from executing subcube commands.  The SE is multi threaded when the FE is single threaded.  Therefore, one can just use Task Manager to verify if the SE is running. 

 

During a good run, initially all the CPUs are at 100% since the SE is busy with the Create Cache command.  Then, only one CPU reaches 100% because the FE is busy with the formulas and preparing the result set.

 

Task Manager is the general way to verify, but to be sure, create a trace and look for event Query Subcube, subevent 2 Non Cache.  The cache subcube queries are fast and generally not an issue for AS.  (In fact it shows how effective it is to create a cache first.)

 

4.a. Improve, if necessary

If the user query results in Task Manager CPU spikes (caused by subcube queries), one only needs to determine which members from the main query are not present in the Create Cache statement. 

 

It might be helpful to look at trace information for Query Subcube Verbose.  It might be that by comparing the ones for the Create Cache with the ones for the user query, some differences can be seen, and that could help discover which members should be added to the Create Cache query.

 

Another approach is to simplify the query until the problem does not happen.  Try to use binary search to find the critical part of the user query that causes the extra subcubes.

 

4.b. Eliminate All Subcubes

This general technique works well when one can eliminate all subcubes.  However, it might not be possible to know the set of members in advance (static analysis).  If this is not true, for instance for complex calculations, then the Create Cache technique might not be as beneficial.  Adding members that are not used can be cheap in some cases, for example to add one more product in an already large specified set.  Or it might be expensive, for example to include another large set of data that otherwise would avoid entire partitions.  One will need to experiment and measure the result, and because sometimes it will be a win and sometimes a loss, we recommend considering several user parameters.

 

5. Work on Next Query

 

6. After working with all the main queries, combine the Create Cache into 1 or More Create Cache statements

The best number of Create Cache queries depends on their structure (which dimension members involved, which measure group) and member sets. 

 

1. Different measure groups should be separated unless the dimension members are consistent.  Then, it is just a matter of convenience to combine, since under the covers the measure groups are physically separate.

2. If one query is covered or almost covered by another one, they should be combined.

3. If queries are disjoint or mostly disjoint, keep separate. 

 

We have not experimented greatly, but here is one suggestion.  Execute both scenarios, as separate and combined.  Measure execution time and size of resulting DataCache (from perfmon counter).  Use single combined Create Cache if the combination reduces time to 67% of separate commands and consumes space less than 3 times as much as separate.

 

 

 

 

 

7. Verify the combined Create Cache

Execute all queries, in this pattern:

 

1. Clear Cache

2. Create Cache (possibly many queries)

3. User queries

 

Verify the set of queries in the same way as each single query.

 

 

Contributors:

Eric Jacobsen

Cristian Petculescu

Leave a Comment
  • Please add 5 and 1 and type the answer here:
  • Post
  • Chris Webb has recently released a really interesting tool: MDX Script Performance Analyzer . It's a

  • After writing my previous blog post about dynamic named sets , I received feedback from multiple people

  • After writing my previous blog post about dynamic named sets, I received feedback from multiple...

  • Author: Carl Rabeler Proper cube design, efficient multidimensional expressions (MDX), and sufficient

  • Hello, I thought that just by issuing an MDX query, i.e. from Pivot Table or SSMS, then cache would be created.  

    Why do we need to use the create cache statement?  

    When I run profiler, I see that queries from Pivot Tables don't have the "create cache" at the head of the MDX.  I do know these queries are cached without using the "create cache".  

    Thanks,

    Jason

  • Author: Carl Rabeler Proper cube design, efficient multidimensional expressions (MDX), and sufficient

  • Author: Carl Rabeler Proper cube design, efficient multidimensional expressions (MDX), and sufficient

  • Goal This document describes how to build Create Cache commands. Create Cache for Analysis Services (AS) was introduced in SP2 of SQL Server 2005. It can be used to make one or more queries run faster by populating the OLAP storage engine cache first

  • After writing my previous blog post about dynamic named sets , I received feedback from multiple people

  • Автор: Карл Рабелер (Carl Rabeler) Надлежащим образом построенные кубы, эффективные многомерные выражения

  • 作者 : Carl Rabeler 在针对 SQL Server 2005 Analysis Services 实例进行 MDX 查询时,要想获得最佳性能,关键要做到以下几点:合理的多维数据集设计、有效的多维表达式

  • &#160; 執筆者 : Carl Rabeler 適切なキューブ デザイン、効率的な多次元式 (MDX)、および十分なハードウェア リソースは、SQL Server 2005 Analysis Services

  • Автор: Карл Рабелер (Carl Rabeler) Надлежащим образом построенные кубы, эффективные многомерные выражения

  • AnalysisServices查询性能十大最佳实践 作者:CarlRabeler

    在针对SQLServer2005AnalysisServices实例进行MDX查询时,...

  • Here are other articles on Cache Warming: ssas-wiki.com/.../Articles

Page 1 of 1 (15 items)