Sign In
Scalability Notes
[Read -> Think -> Write]
Translate This Page
Translate this page
Powered by
Microsoft® Translator
Options
Blog Home
Email Blog Author
Share this
RSS for posts
RSS for comments
Search
Advanced search options...
Search In:
Everything
Blogs
Forums
People
Groups
Places
Pages
Date range:
All Time
Last Year
Last 6 Months
Last 3 Months
Last Month
Last Week
Last Two Days
Tags
database
distributed system
engineering
hpc
network
parallel
scalability
search
Archive
Archives
December 2010
(1)
September 2010
(1)
August 2010
(1)
April 2010
(1)
February 2010
(2)
January 2010
(4)
December 2009
(1)
November 2009
(1)
October 2009
(1)
September 2009
(1)
August 2009
(4)
June 2009
(2)
May 2009
(1)
April 2009
(1)
March 2009
(2)
February 2009
(4)
January 2009
(1)
Database Technologies for Decision Support System
MSDN Blogs
>
Scalability Notes
>
Database Technologies for Decision Support System
Database Technologies for Decision Support System
changl
5 Feb 2010 1:58 AM
Comments
0
Database Technologies for Decision Support System
Database technologies can be applied into two types of scenarios:
-
Transaction Processing
(
OLTP
)
-
Analytic Processing
, using
statistical
method(
OLAP
) or
machine/computational
learning
method(
Data Mining
)
OLTP, which is based on E.F. Codd's relation model, is the traditional (maybe most popular) application type of DBMS and most people are very familiar with it. This post tries to summarize related technologies in analytic processing, which is widely adopted in decision support systems.
Part I - What Data to Analyze
?
In
Decision Support
(or
Business Intelligence
) system, data to be analyzed usually comes from operational system, i.e., OLTP relational database. These relational databases are often located at different departments/sites, may be using different DBMS vendor's products, using different data schema and merely contain data within a relatively short time span.
To make a good business decisions, it's strongly desired to hold historical data, view them in a uniformed way and not bothering the daily operational environment. Thus comes out the
Data Warehouse
, which is a
repository
of an organization's electronically stored data and is designed to facilitate reporting and analysis.
Operational data (in OLTP system) is
extracted
,
transformed
(also cleaned) and
loaded
(by
ETL
subsystem)
into the data warehouse for further analyzing. OLAP and DM systems read these data, analyze them, produce useful reports and present them to end(business) users. (See diagram below)
Decision Support System Life Cycle, based on
MS SQL Tech Doc
Part II - How to Analyze?
There are two ways to analyze data in data warehouse:
1. Data Mining
Data Mining
is
the extraction of hidden predictive information from large databases. Someone also defines
it
as knowledge discovery process(using machine learning algorithms) in database.
Technical highlights:
- Typical Data Mining enabled information systems process data in terms of
Record
(or
Case
).
- Such system also provides some
L
anguage Extension
to facilitate composing data mining related queries. (for example, MS SQL Server provides
DMX - Data Mining Extensions
)
The challenging part of data mining is various mining algorithms. Here is a list of
data mining algorithms available in MS SQL Server Analysis Service
.
See -
A Basic Data Mining Tutorial using MS SQL Server
2. OnLine Analytical Processing(OLAP)
OLAP
is the processing of large scale multidimensional data using statistical based methods. A typical OLAP system provides:
-
Multidimensional Model
- Analytical Query Language (for example, MS SQL Server provides
MDX - Multidimensional Expression
)
- Analyzing Server(or Engine) that executes analytical queries
See -
A Basic Tutorial for OLAP in MS SQL Server
2.1 Multidimensional Model
Multidimensional model view data as as cubes that generalize spreadsheets to any number of dimensions. It categorizes data either as numerical values(a.k.a.
measures
) associated with some
facts
or textual values(a.k.a.
dimensions
) that characterize the
facts
.
Facts
represent the subject - the interesting event in a enterprise that need to be analyzed.
Dimensions
represent context information for facts, perspectives to view facts.
Measures
represent those numeric properties of facts that decision makers want to analyze.
For example, in a shoe shop, shoe purchasing events are the facts, the selling price is a measure attribute and the color, the size, the manufacture and the brand are all dimension attributes.
More tutorial explanation of this model can be found in [7][8].
2.2 OLAP Server Architecture
There are three ways to implement multidimensional model:
-
ROLAP
(Relational OLAP)
Fact data is stored in relation model based storage system and some special induces technologies may be adopted. In this architecture, measures are derived from the records in the fact table and dimensions are derived from the dimension tables.
from
star schema example
This architecture can be further divided into two types:
Star Schema
Snowflake Schema
from
snowflake schema example
Both of them contains fact table and dimension table, but in star schema, there is only one table for one dimension, while in snowflake schema, there are usually multiple tables for one particular dimension.
-
MOLAP
(Multidimensional OLAP)
Fact data is stored in an optimized multi-dimensional array storage, i.e., the server supports the multidimensional model directly.
It's usually regarded to be faster but less scalable than ROLAP.
-
HOLAP
(Hybrid OLAP)
It is a combination of ROLAP and MOLAP. HOLAP allows storing part of the data in a MOLAP store and another part of the data in a ROLAP store.
Part III - Other technologies
Other related technologies include data visualization, metadata management, analytical query parallelizting etc.
Analytical database technology is very promising and also complicated, more survey paper can be found at[5][7][8].
[
Reference
]
General
[01]
Course on DM & OLAP
[02]
OLAP & Data Mining Links
[03]
OLAP & Data Warehouse Bibliography
[04]
A Brief Tutorial on Data Mining and OLAP
[05]
An Overview of Data Warehousing and OLAP technology
[06]
Providing OLAP to User-Analysts: An IT Mandate
[07]
An Overview of Data Warehouse, OLAP and Data Mining Technology
[08]
Multidimensional Database Technology
Data Warehousing
[21]
Data Warehouse Architecture
[22]
Oracle Data Warehousing Guide
Data Mining
[31]
Data Mining FAQ
[32]
Data Mining Introduction
[33]
Data Mining Tech Summary
[34]
Oracle Data Mining Concept
[35]
Microsoft SQL Server Analysis Service - Data Mining
OLAP
[41]
OLAP Introduction
[42]
OLAP Overview
[43]
OLAP Council
[44]
OLAP Wiki
[45]
Oracle OLAP User Guide
[46]
Microsoft SQL Server Analysis Service - OLAP
0 Comments
database
,
parallel
Blog - Comment List MSDN TechNet
Comments
Loading...
Leave a Comment
Name
Comment
Please add 2 and 4 and type the answer here:
Post