LinkedIn | FaceBook | Twitter
Windows Azure provides several options for storing, processing and retrieving data in structured, semi-structured and unstructured layouts. In this series, I'll cover various options for implementing these options.
In this entry, I'll cover the SQL Server Fast-Track Data Warehouse, and show an example of it's implementation. This is a Relational Database Management System (RDBMS), which provides full ACID (http://en.wikipedia.org/wiki/ACID) compliance for when you need transaction-level consistency at the highest level.
Within the RDBMs family, there are various implementation options to allow the system to perform at peak performance. RDBMS servers excel at On-Line Transaction Processing (OLTP) (http://en.wikipedia.org/wiki/OLTP) workloads using a highly normalized data structure. They are also often used in On-Line Analytical Processing (OLAP, or more commonly called Business Intelligence) (http://en.wikipedia.org/wiki/OLAP) applications where the data structure is more de-normalized, with more repeated data, more aggregations and performance tuned for real-time queries rather than more constant updates.
OLAP implementations often further segment into the type of audience they serve. At the highest aggregation level sits the "Enterprise Data warehouse" which usually contains all (or most all) of the data within an organization, but at a fairly rough grain of aggregation. For instance, it might contain all sales from all regions for all time, but in a yearly or quarterly breakdown. The next level of granularity is the Data Warehouse. In this implementation, there is often more detail within the aggregates, and the Data Warehouse most often serves a more specific audience. Note that in some companies, the terms "Data Warehouse" and "Enterprise Data Warehouse" are used interchangeably.
Microsoft SQL Server contains engines, components and features to be used as either an OLTP or an OLAP system (or both, which is not advisable) but there are settings and configurations and hardware that lend themselves best to one mode or another. In the past Microsoft has teamed up with hardware vendors to create a Data Warehouse Appliance, a blend of tuned hardware and specific settings in SQL Server to run a Data Warehouse workload for high performance. You can now download a whitepaper and instructions to create your own high-performance Data Warehouse system, called a "Fast Track Data Warehouse" or FTDW.
Microsoft has now introduced an image into the Windows Azure Virtual Machine Gallery that is pre-optimized for a FTDW workload. In three steps and in less than 20 minutes, you can have a system that is a charged-by-the-hour license for development and testing or full production.
To begin, you'll need either a production account, a free account, or your MSDN benefits. See the http://windowsazure.com site for more information on using one or all of these types of accounts. From there, the process has three steps:
To start, I've logged in to my production account on the Windows Azure Portal site.
From there I select the large "plus-sign" icon at the bottom right of the page, then "New virtual Machine", then "From Gallery". I select "SQL Server 2012 SP1 for data warehousing on WS 2012 R2" from the menu that appears.
Clicking the arrow at the bottom of that panel allows me to assign a name, a size, and an administrative account and password to the image. Microsoft recommends an "A6" VM size for production workloads, but I'll select "Small" here for my test system. It's important to note the size - there is a chart located here (http://i.msdn.microsoft.com/dynimg/IC686301.gif)
that shows the drives you need to attach in the next few steps.
Clicking the arrow brings me to the screen where I can select the Cloud Service, Storage Account, and Availability Group (more on those here ) for the deployment.
The next panel lets me set the RDP port redirect so that I can log into the system, and also the port for PowerShell for remoting. I take the defaults.
Now the system starts the build process for the Virtual Machine.
In just a few minutes, the system is provisioned and running - but not complete. I don't want to log in yet, because that will start a script to build the system.
At the bottom of the screen I click the "Attach" button.
I select the name of the VM I created, make sure the storage account is in the same region, set hte size to 1023 (per the chart above) and turn off the host caching.
The system configures the drives using Windows Azure Blob Storage and attaches it to my Virtual Machine.
From there, I log into the machine using RDP (clicking "Connect" on the VM in portal makes it automatic) and navigate to the logs location shown below:
The first log shows a summary of the process and the success of the build. Drilling into the subdirectory there shows more detail.
And that's it. In less than 20 minutes I have a full data warehouse system ready for use.
Fast-Track Data Warehouse reference document: http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/reference-architecture.aspx
FTDW document for Windows Azure: http://msdn.microsoft.com/en-us/library/dn387396.aspx
I read the FTDW document for Windows Azure, and didn't see one question answered. A big part of fast track implementations has been very specific I/O patterns for large table scans. Is the I/O implementation for this particular VM guaranteeing more spindles or otherwise superior I/O subsystems compared to other VMs in Azure?
Richard - the layout is optimized for the deployment, but the storage is the same. Improvements are made constantly to the physical and logical layers of storage for performance, but you'll want to test for your specific workload.
Great overview Buck! In Fast Track implementations I've seen, there is one virtual disk and mount point per RAID group and files and file groups are spread across them. What is the recommendation for allocating files and file groups in this Azure Fast Track implementation?
Chris - check with this team for details: mailto:firstname.lastname@example.org?subject=Fast Track Data Warehouse Reference Guide for SQL Server 2012
Before retiring from Microsoft, the inestimable Lubor Kollar presented DW Best Practices to members of SQL 2014 TAP (Technology Adoption Program). He stated that he & the team were leveraging Storage Spaces. The program group is advising us at MSIT to do so as well for our BI implementations which require throughput. I'm currently working on guidance which I'm likely to share publicly. Stay tuned (but don't hold your breath--it'll be a bit).