Why did we only use the MOLAP storage mode in Project REAL? There is also ROLAP and HOLAP storage, but they weren’t used at all. Also, why doesn’t Project REAL illustrate pro-active caching?
The Analysis Services 2005 Performance Guide contains a more complete discussion of the storage modes in Analysis Services in Appendix B. Because the question comes up from time to time regarding Project REAL, this write-up gives a summary.
Every partition in AS has an associated storage mode. MOLAP partitions store aggregations and a copy of the source data (fact and dimension data) in a multidimensional structure on the Analysis server. This is the optimal storage mode for several reasons:
HOLAP partitions store aggregations in a multidimensional structure on the Analysis server, but do not copy fact data from the original relational database. As a result, whenever Analysis Services needs to resolve a query against fact data stored in a HOLAP partition, Analysis Services must query the relational database rather than using a multidimensional structure stored on the Analysis server. Some administrators choose HOLAP because HOLAP appears to require less total storage space while yielding high performance queries. However, the disadvantages almost always outweigh the advantages:
ROLAP partitions store aggregations in the same relational database that stores the fact data. ROLAP partitions take longer to process, and because all queries must be redirected to the relational database, ROLAP partitions are slower to query against as well.
Given all this, it sounds like only MOLAP should ever be used! For most implementations that is the case. The exception is when near-real-time access is needed to data that changes throughout the day. Even there, pro-active caching is the first technique of choice. However, if even pro-active caching does not allow frequent enough updates, then the use of one ROLAP partition in the measure group is a way to handle the incoming data.
The recommended way to handle near-real-time scenarios is to take advantage of the fact that the storage mode is set on a per-partition basis. A vast majority of data in an OLAP cube does not change throughout the day. That data should be stored in MOLAP partitions. One partition in the measure group is designated to receive updates as they come in. That way, only that partition needs to be updated. This partition can be updated using proactive caching or it can be defined as a ROLAP partition with zero aggregations. In the latter case, all queries to this partition will be sent to the relational source, and current data will always be as current as that table.
For more details about near-real-time data access, see the section “Near Real-Time Data Refreshes” in the Analysis Services 2005 Performance Guide.
An example of how such a partitioning scheme would be updated might look like this:
Returning to the original question: Why did we only use the MOLAP storage mode in Project REAL? The simple answer is that MOLAP storage performs the best. The slightly extended answer is that Project REAL is based on the scenario in the Barnes & Noble data warehouse. The business model centered on daily updates, not near-real-time access. Therefore, there was no need for proactive caching or ROLAP partitions.
- Len Wyatt