Welcome to MSDN Blogs Sign in | Join | Help

News

  • Visit the SQL Server Best Practices site (http://sqlcat.com) for real-world guidelines, expert tips, and rock-solid guidance to take your SQL Server implementation to the next level.
PowerPivot Jump Start!

As you may already know, the new name for Project Gemini is PowerPivot and the official names are PowerPivot for Excel and PowerPivot for SharePoint.  For information and to sign up for notification on the release dates, check out the official PowerPivot web site at http://www.powerpivot.com/

Shortly after the SharePoint conference, our Analysis Services teammates independently created their own very interesting blogs:

  • Rob Collie’s PowerPivotPro.com: Rob is one of the founding engineers behind PowerPivot and has a great theme around football statistics to demo and explain PowerPivot.
  • Dave Wickert’s PowerPivotGeek.com: If you’ve been working with Analysis Services for any length of time, you should already be very familiar with Dave.  He puts his knowledge, humor, and personality into this blog on PowerPivot.

In addition, some solid new content / aggregator blogs include:

  • Vidas Matelis’s PowerPivot-Info.com: Vidas has recreated his SSAS-Info.com resource hub to be dedicated to PowerPivot.
  • As well, there is PowerPivotTwins.com, the name which is a moniker Dave Wickert and I had created at SQLPASS (yes, it is a dork-y homage to the Wonder Twins!).

Enjoy!

 

Are you going to SQL PASS Nov. 3rd- Nov. 5th?

Are you going to SQLPASS in Seattle on Nov 3rd-Nov 5th?   If you are going or even thinking about it, then read about how you can spend some quality time with the SQLCAT team.   We will be there with our famous ugly lime green shirts, so you won’t miss us.   The new exciting addition the SQLCAT will partake in this upcoming PASS is the SQL Server Clinic.   The SQL Server Clinic will be open every day during PASS starting after the keynote until 6:00pm.  In the SQL Server Clinic will be the powerful combination of the CSS SQL Server escalation team combined with the SQLCAT team.   The CSS SQL Server escalation team is the best in the world at troubleshooting SQL Server.  You can bring any problem, question or performance challenges and they will be there to assist you. The other half of the SQL Server Clinic will be the SQLCAT team.   We will have the vast knowledge of architecture, design and performance present to assist you.   You can bring your application specs and they will review the architecture and design with you.  You can bring your ideas and they will help you shape it into working system.   And yes, in case you are wondering, these amazing services will be free to all attendee(s) at PASS in order to help make your PASS the most impactful one yet

. 

Along with the exciting new SQL Server Clinic,  the SQLCAT team will be delivering the following presentations at this year’s conference:

 

Session

Speaker

SQLCAT: Analysis Services Consolidation & Virtualization

Carl Rabeler

SQLCAT: A Preview of Gemini Best Practices

Denny Lee

SQLCAT: Customer Experiences Deploying Data Warehouse Solutions using the FastTrack Architecture

Stuart Ozer

SQLCAT: SharePoint on SQL Server - Implementation, Configuration and Tuning

Burzin Patel

SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008

Denny Lee

SQLCAT/SQL Server Consolidation Series - Consolidation and virtualization best practices and recommendations

Lindsey Allen

SQLCAT: SQL Server Always On Series. Part 3: SQLCAT Customer Deployments, Best Practices & Panel Discussion

Prem Mehra

SQLCAT: Customer Experiences with Data Compression

Sanjay Mishra

SQLCAT: Designing High Performance I/O for SQL Server

Thomas Kejser

SQLCAT: Complex Event Processing Early Customer Experiences

Kun Cheng

SQLCAT: Analysis Services Performance Monitoring and Tuning

Carl Rabeler

SQLCAT: Tuning ETL and ELT

Thomas Kejser

SQLCAT: SQL Server Query Optimization and Processing

Juergen Thomas

SQLCAT: Madison Overview and Madison Technology Preview Results

Jesse Fountain

SQLCAT: SQL Service Broker: High Performance Distributed Applications in Real World Deployments

Michael Thomassy

 

The entire SQLCAT team will be there for the entire 3 days.  Stop anyone with a lime green shirt at any time and say hello, ask a question or just introduce yourself as we are there to meet and learn from you too.

 

I look forward to seeing you there.

 

Mark Souza

Director – SQL Server Team

SQL Server Customer Advisory Team

 

Looking deeper into SQL Server using Minidumps

Author: Thomas Kejser

Reviewers and Contributors: Bob Ward, Michael Thomassy, Juergen Thomas, Hermann Daeubler, Mark Souza, Lubor Kollar, Henk van der Valk (Unisys) and Peter Scharlock

For advanced troubleshooting and understanding of SQL Server, you can sometimes benefit from creating a dump file of the sqlservr.exe process.

What is a dump? It is a file containing a snapshot of the running process – and parts or all of the memory space of that process. The snapshot also contains the call stack of every thread the process has created.

There are two major types of dump files:

  • A full dump – this contains the entire process space. A full dump can take a VERY long time to run. If you are only interested in learning more about SQL Servers internal structures – do not use this type of dump.
  • A minidump – this much smaller file contains the memory for the call stack of all threads, the CPU registers and information about which modules are loaded. If you are just curious about the internals of SQL Server, this is the type of dump you want to create.

Using a debugger, such as WinDbg, you can analyze a dump file. Remember that you are only looking at a snapshot in time of the process – but even then, just showing the call stacks can be quite enlightening. In WinDdg, the command to show all call stacks is: ~*kn. If you are the kind of person who likes to have a mental model of how a product works – minidump can help you deepen your understanding. A trick I often use is to take three minidumps in a row, waiting a few seconds between each one. By comparing the thread stacks of dumps, I can get a rough idea what threads inside the SQL Server process space are doing. But remember – these are the threads of the sqlservr.exe process itself – not the session_id that you see from the DMV.

If you should run into errors in SQL Server itself, minidumps help CSS support you and perform deep level investigation. CSS will sometimes request that you perform such a minidump of the process – the engineer in CSS can then use the dump to analyze the issue.

Minidump files have the extension *.mdmp. In rare cases, you may find some of these files in your SQL Server or Analysis Services directory. CSS may request these files from you if you have a case open with them.

There are several ways to generate a minidump. One way is to use the sqldumper.exe file that ships with SQL Server. You can read about sqldumper.exe in:

  • KB 827690 - How to use Sqldumper.exe to generate dump files for Windows applications

in Windows 2008 Server there is an easy way to get dumps from the GUI. If you bring up task manager and right click on a process, you get this new option:

clip_image002

But watch out! In the default configuration of Windows 2008 Server you will get a full dump. For a large SQL Server installation with hundreds of GB of memory – generating such a dump can take hours. And while the dump happens, the SQL Server process is frozen.

If you only want the minidump, you can re-configure Windows 2008 Server to generate mini dumps instead of full dumps. This is documented in:

  • KB 931673  - How to create a user-mode process dump file in Windows Vista

Now that you know how to create minidumps. Let me show you an example of a curious investigation using a minidump. A question we often get is: Why do I see such high waits for CXPACKET in sys.dm_os_wait_stats. CXPACKET is a wait that SQL Server uses to coordinate parallelism – and you can generally ignore it. But, for those of you curious to know more, minidumps gives you the ability to understand this elusive wait type better.

Recently, I was running an highly parallel INSERT…SELECT statement. I was using the new minimally logged heap operations and the SELECT statement was doing a lot of hash joining. After some time, I saw a lot of tasks blocked on CXPACKET in sys.dm_os_waiting_tasks. I decided to perform a minidump to learn a bit more about SQL Server Parallelism. After opening the dump in WinDbg and running ~*kn I could now see all the thread call stacks in the snahpshot. I saw a lot of threads with this call stack:

Thread: <Many> call stack
ntdll!ZwWaitForSingleObject
KERNELBASE!WaitForSingleObjectEx
sqlservr!SOS_Scheduler::SwitchContext
sqlservr!SOS_Scheduler::SuspendNonPreemptive
sqlservr!SOS_Scheduler::Suspend
sqlservr!EventInternal<Spinlock<153,1,0> >::Wait
sqlservr!EventInternal<Spinlock<153,1,0>   >::WaitAllowPrematureWakeup
sqlservr!CXPacketList::RemoveHead
sqlservr!CXPipe::Pull
sqlservr!CXTransLocal::AllocateBuffers
sqlservr!CQScanXProducerNew::AllocateBuffers

sqlservr!CQScanXProducerNew::GetRowHelper
sqlservr!FnProducerOpen
sqlservr!FnProducerThread
sqlservr!SubprocEntrypoint
sqlservr!SOS_Task::Param::Execute

From the highlight, it does not take much to guess that this is probably the CXPACKET waiting tasks. Searching for a task that is not waiting, I found this:

Thread: 117 call stack
msvcr80!memcpy
sqlservr!RowsetBulk::InsertRow
sqlservr!CXRowset::InsertRow

sqlservr!CValRow::SetDataX
sqlservr!CEs::GeneralEval
sqlservr!CQScanUpdateNew::GetRow
sqlservr!CQScanProfileNew::GetRow
sqlservr!CQueryScan::GetRow
sqlservr!CXStmtQuery::ErsqExecuteQuery
sqlservr!CXStmtDML::XretDMLExecute
sqlservr!CXStmtDML::XretExecute
sqlservr!CMsqlExecContext::ExecuteStmts<1,0>
sqlservr!CMsqlExecContext::FExecute
sqlservr!CSQLSource::Execute
sqlservr!process_request
sqlservr!process_commands
sqlservr!SOS_Task::Param::Execute

Now, you don’t really need source code access to guess what is going on here: SQL Server is inserting the rows and other threads are waiting to feed the insert thread. I can even see that the execution is using what looks like the bulk load function.

If you are curious to learn more about analyzing minidumps there is an excellent article about it found here:

  • KB 315263 - How to read the small memory dump files that Windows creates for debugging
Using HierarchyID in SQL Server

Implementing a hierarchy structure in a relational data base normally takes a bit of work.  The new SQL Server data type for hierarchyID gives a good shortcut to the old methods, makes it faster to get a solution in place and makes it much easier to maintain.

MSDN has a good tutorial on HierarchyID showing the old method using a relational table design, followed by another design using the new HierarchyID.  This article shows a good tip of how to easily get all descendents of a parent by adding a computed column using the GetAncestor(1) method that comes with the HierarchyID.  It also has a complete list of other methods available.

Rather than repeat what is already on the MSDN tutorial, I will give examples of where it works well and where it doesn’t.

First, take a quick refresher course in the difference between a tree and a graph.  These links are quick reads and will refresh your memory on those old courses you took at the university so many years ago.  This is important to do before continuing because HierarchyID works well with trees but not graphs.  Ok, that is a general statement for which there are some exceptions that I explain below.

WHERE IT WORKS WELL

The HierarchyID works well in projects where there is a hierarchical structure where each child has a single parent, like in a product/product classification hierarchy.   In this example each product can only belong to one product class.  You may have other levels, like product family where each product class can belong to only one product family. 

Another, slightly more complex example would be a bill of materials structure for manufacturing a product.  A part can belong to a subcomponent, which can belong to another subcomponent, which gets assembled into the final product.  It is more complex than the first example because a part can belong to many subcomponents.  HierarchyID may or may not work well in this case, it all depends on how many time a part can appear in the final product.  If your graph is relatively unconnected, either due to relatively few parents or low links/node, then hierarchyId can be used for this type of graph using a primary key of (RootId int, Path hierarchyId). 

 

All XML documents are trees and the HierarchyID works well with these projects. 

 

WHERE IT DOESN’T WORK

1.       When each child node in the graph has multiple parents.  We tried this in a genealogy/ancestry project and discovered that to get this to work you would have to add a HierarchyID column for each parent.  This is called a highly connected graph, where the number of paths is substantially more than the number of nodes.   In this scenario, the traditional hierarchy design for relational databases works well, especially when combined with the CTE query pattern.

2.       If subtree movement forms a substantial part of the workload.  This is O(1) for parentiD/childId and O(subtree size) for hierarchyId.  So if you are constantly updating the tree and cause the nodes to move, then the HierarchyID is not the best solution.

3.       If subtree query isn’t a substantial part of the workload.  This is O(1) for hierarchyId and O(subtree) for parentId/childId – but if not common this isn’t an advantage.  In other words, if you don’t walk the tree often then it may not be worth using the Hierarchy data type.

 

Kevin Cox, Peter Carlin

 

Gemini - Self Service BI!

Gemini is the code name for the new breakthrough Self-Service Business Intelligence (BI) capabilities being delivered in the SQL Server 2008 R2 release. Gemini enables end users to build BI applications by integrating data from a variety of sources, modeling, refining and analyzing the data, adding business logic, building reports and visualizations and ultimately sharing it with their coworkers in an environment that is managed and secured by IT.

You should bookmark the Gemini Team Blog for the latest information on Gemini.  As well, see SQL Server 2008 R2 | Self-Service Business Intelligence for more information.  Once we start compiling best practices and lessons learned for Gemini, we will also publish more information on sqlcat.com.

 

Multi user SSAS writebacks may result to blocks on similar functions and new connections.

  Author: Nicholas Dritsas

       

         Reviewers: Akshai Mirchandani, Kevin Cox, Lubor Kollar, Thomas Kejser

 

Writeback consists of two distinct processes.  The first one is an update cube process that updates the current session with the changes.  Only the current user sees the changes and he can continue with updates and what/if analysis.  The second process is a commit so the changes get committed in the database and all users can see the results.

 

The challenge is when you have many users issuing writeback commits.  A writeback commit requires an SSAS database write lock. If it gets it, the other commit requests (eg. from other writebacks, cube process, alter roles etc) will have to wait.  Also, during this time, new connections cannot be made since a new connection requires a read lock of the database. 

 

So, as you can imagine, the queue can get rather long. New connections cannot be made and they will start timing out.  Queued writeback commits can take a long time since they will not timeout.  To make things more interesting, writeback commit will not enforce the ForceCommitTimeout server setting that is already used by cube process.  ForceCommitTimeout kills all the processes, reads or writes, ahead of it to make sure the issued commit completes within a reasonable time.

 

This is the observed behavior in SQL server today.  The SSAS team decided to introduce an enhancement as part of the sql 2008 sp1/cu3.

 

Under this enhancement, writeback commit will observe the ForceCommitTimeout server setting.  So, if it waits more than the set value, typically 30 seconds, it will go ahead and kill any queries or processes ahead of it to ensure it commits within a timely fashion.  This is new behavior and applications need to be modified to handle the new error case.

 

Additionally, SSAS now requests a database commit write lock at a later stage and for a shorter period of time than before.  When you enable MOLAP Writeback, SSAS updates both the relational engine that holds the table with the deltas plus SSAS storage with the updates. An SSAS cube transaction wraps around the relational transaction and it will only commit the relational transaction if it is ready to commit the cube transaction. If the MOLAP update fails, then SSAS will rollback its transaction which will also rollback the relational transaction.

 

During a writeback commit, there are three operations involved:

-       Update MOLAP

-       Update relational

-       Commit

 

Previously, a Database Commit Write lock was required at the beginning of these three operations. Now, we have introduced a more granular process, such as:

 

-       Temporarily take read commit lock on DB – note that this will not prevent other sessions/queries from executing.  Commit Read protects a database from being committed in another transaction.  It’s only taken to safely take read/write locks on the affected objects later on.

-       Analyze writeback to determine which objects are going to be affected.

-       Take write locks on these objects and read locks on their dependent objects.  Write locks protect an object while it is being written such that no other writers/readers can affect it for the duration of the transaction.

-       The DB read commit lock is then released.

-       Now, we do the two update steps (update MOLAP then update relational in a single transaction).

-       We enter Commit and only now we take the Database Commit Write lock and finish the Commit operation. Commit Write protects the database from being read/written in another transaction.  This is also where the ForceCommitTimeout value is utilized.  All the read/write locks are released at commit/rollback time. They’re held till then to prevent other transactions from updating the objects until the commit/rollback says that this transaction is done with them.

 

With this enhancement we already see customers gaining higher levels of concurrency during writebacks with no or minimum blocking of other processes and connections.

Fast ordered delete

We have a visit from MySpace database team in our lab this week. I have used this opportunity and asked Kevin Stephenson, MySpace’s Senior Database Developer, about “pain points” he encounters while doing his everyday job supporting SQL Server. He brought up a problem of efficient ordered delete. He needs to delete large portions of older custom log entries periodically and he needs to delete them in certain order. He also knows that it is efficient to break huge update operations into more small ones to allow truncating log. Kevin is aware of several methods. He immediately illustrated his problem on a small example:

drop table t1
create table t1 (a int primary key, b char (100))
declare @i int
set @i=1
SET NOCOUNT ON
while (@i<100000)
begin
insert into t1 values (@i,'x');
set @i=@i+1
end
 
set statistics io on
set statistics time on
delete from t1 where a in (select top (10000) a from t1 order by a);
set statistics time off
set statistics io off

And I got the following result messages:

Table 't1'. Scan count 2, logical reads 30564, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 20152, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 78 ms,  elapsed time = 80 ms.

The delete query in the above script has the following plan (before performing the delete):

Pic1

We can see the table t1 is accessed twice; the lower branch is performing the “select top (10000) a from t1” subquery and joining the result to the second occurrence of the same table t1. So we need to read the “deleted” portion of the table twice to identify the rows to delete and then once more to perform the delete. The question was, is it possible to perform the same operation more efficiently?

The answer is using a tip – a view with ORDER BY. You cannot use ORDER BY in a view definition unless you use also TOP in the SELECT clause in the same view. And we know we want to delete 10000 rows. So the view definition is as follows:

create view v1 as (select top (10000) * from t1 order by a)

and we can delete the “top” rows using simply

delete from v1

The query plan for this delete is much simpler.

Pic2

and the I/O and cpu statistics demonstrate the improvement:

Table 't1'. Scan count 1, logical reads 654, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 31 ms,  elapsed time = 75 ms.

Kevin Stephenson and Lubor Kollar

Assigning surrogate keys to early arriving facts using Integration Services

In data warehouses, it is quite common that fact records arrive with a source system key that has not yet been loaded in the dimension tables. This phenomena is known as “late arriving dimensions” or “early arriving facts” in Kimball terminology.

When you see a fact record that cannot be resolved to a dimension surrogate key, the typical solution is this:

  • Create a dummy member in the dimension table using the source system key
  • Assign a surrogate key to this dummy member
  • Use the newly create surrogate key and assign it to the fact record

If you use T-SQL to load the data warehouse, it means you have to pass over the input fact rows twice. First, you have to discover which keys are not present in the dimension (and create surrogates for them). Second, you will have to look at the input data again and use the newly generated surrogate keys to load the the fact table.

Using Integration Services, early arriving facts can be populated with just one pass over the source rows, which means less read I/O operations. Nice!

In project Project REAL, a script component is used to achieve this effect. If many of your dimension have early arriving facts, this creates a lot of copy/paste code. There is a cleaner solution that does not use script components.

There is a way handle early arriving facts without relying on script components. It is best illustrated with an example. Let us create these three tables:

/* The input table */
CREATE TABLE Stage_Fact
(
  NK_A CHAR(10) NOT NULL /* The late arriving source system key */
)

/* The late arriving dimension table */
CREATE TABLE Dim_A
(
  SK_A INT PRIMARY KEY IDENTITY(1,1) /* The surrogate key*/
  , NK_A CHAR(10) NOT NULL /* The natural, source system key */
)

/* The final destination table */
CREATE TABLE Fact
(
  SK_A INT NOT NULL /* Surrogate key from dimension */
)

Now, use this script to generate 16M rows in the input table and create a 9000 row dimension table:

/* Create some staging data */
INSERT Stage_Fact WITH (TABLOCK)
SELECT RIGHT(REPLICATE('0', 10) + CAST(K AS VARCHAR(10)), 10) AS NK_A
FROM (SELECT ABS(binary_checksum(*) % 10000) AS K 
FROM sys.trace_event_bindings eb1
CROSS JOIN sys.trace_event_bindings eb2) AS stuff

/* Populate Dim_A with 90% of the keys from the fact table */
INSERT Dim_A WITH (TABLOCK) (NK_A)
SELECT DISTINCT NK_A FROM Stage_Fact
WHERE NK_A < '0000009000'

With this data, there will be 1000 late arriving dimension keys in Stage_Fact in around 1.8M rows. When we see a non-matched key in Stage_Fact, we want to generate a new surrogate key in Dim_A. But here is the catch: We only want to generate the surrogate once, and we do NOT want to do a roundtrip to the database the second time we see the same key.

Project Real uses a .NET hash table to track the generated keys and perform quick lookups the next time we see the key. But, we already have a fine hash table available without using script components: the lookup transformation. Let us see how we solve the early arriving fact problem with Integration Services, au natural:

clip_image002

The non-matched rows from Lookup SK_A go into the second lookup (New SK_A Cache). New SK_A Cache is where we want to handle the early arriving facts.

First, configure New SK_A Cache as a partial cache:

clip_image004

Now, we play a clever trick: Whenever a partial lookup cache first receives a non-matched row, it will call a SQL statement and fetch data to populate the lookup  cache. The default is a SELECT statement, but it does not have to be a SELECT statement. We could replace it with a stored procedure that returns the same result as the SELECT. Actually, let us do exactly that:

clip_image006

Now, the FIRST time the partial lookup cache sees a early arriving fact, it will call Generate_SK_A. I have mapped the NK_A (the source system, natural key) column to the input parameter. To finish the trick, we just have to create a simple stored procedure that uses NK_A to lookup SK_A (the Surrogate Key), and if not found, create a new key:

CREATE PROCEDURE Generate_SK_A
  @NK_A CHAR(10) /* The key to find a surrogate for */
AS
SET NOCOUNT ON

/* Prevent race conditions */
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

/* Check if we already have the key (procedure is idempotent) */
DECLARE @SK_A INT
SELECT @SK_A = SK_A
FROM Dim_A
WHERE NK_A = @NK_A

/* The natural key was not found, generate a new one */
IF @SK_A IS NULL BEGIN
 
INSERT Dim_A (NK_A) VALUES (@NK_A)
 
SET @SK_A = SCOPE_IDENTITY()
END

/* Return the result. 
  IMPORTANT: must return same format is the SELECT statement we replaced */
SELECT @SK_A AS SK_A, @NK_A AS NK_A

Simple isn’t it?... No need to use any .NET script components here. Have a look at the attached files to study the technique further and you will be handling early arriving facts elegantly in no time.

Proper partitioning can improve dramatically the writeback process when dealing with large data sets

Proper partitioning can improve dramatically the writeback process when dealing with large data sets

Author: Nicholas Dritsas

Reviewers: Richard Tkachuk, Akshai Mirchandani

CASE:

Customer is using SQL 2008 and SSAS's writeback abilties to do 52-weeks rolling sales forecasting.  Their fact table has reached 250 million rows and they want to add 40 million records into the writeback table each week using weight allocation when updating 2 measures and executing 20,000 update cube statements per week.  This process takes 11 hours currently using MOLAP writeback storage.  They have 5 dimensions, with the biggest been the item dimension with 130,000 members and 7 levels.  They do not have properly defined aggregations or attribute relationships yet.  Overall, as number of records in writeback table is increased, performance is progressively slower.  The number of records in a writeback partition will have an impact on both query performance and writeback performance.

 SUGGESTIONS:

Suggestion was around proper partitioning.  Transferring rows from the writeback partition into multiple regular partitions is likely to help only from the standpoint of standard partitioning optimization techniques.  All relevant partitions (and not just the writeback partition) will be scanned for the deltas. The Writeback partition usually doesn’t have good partition slices, and by distributing records to other partitions that have good partition slices you may get better performance. For example, if you move all updates for Year 2008 into the Year 2008 partitions – now any subsequent updates to Year 2009 cells will avoid unnecessarily scanning all of those records. 

The number of records in the writeback partition will have an impact on both query performance and writeback performance – particularly if the partition is ROLAP.  However, if this partition is MOLAP then it should behave very much like a standard MOLAP partition for most situations – although 250 million rows will still probably result in perf slowdowns to calculate the deltas at the leaf level.

 SOLUTION:

By doing the following operations, we improved writeback performance from 11 hours to 2 hours 46 minutes.

1.       We introduced partitions to the fact and writeback tables.  We aligned them so they correspond to each other.  We added partitions for years 2005, 2006, 2007 and 2008 plus 9 partitions for 9 weeks todate for 2009.  So, a total of 13 partitions for fact and 13 for writeback tables.  The 2005, 2006, 2007 and 2008 partitions were done using a WHERE clause against the main single table.  For 2009, we have a separate physical table per week.  ETL process will be creating a new weekly table and SSAS partition each week.

2.       We applied time slicer to all partitions, even if they were MOLAP.

 

Profiler result for one update cube statement.

A.     Before : When writeback was executed  to 200908, update cube accessed three partitions:

1. 200908 Partition

2. All Partition

3. Writeback Partition

 

B.      After : When writeback is executed  to 200909, update cube accessed two partitions:

1. 200909 Partition

2. 200909 Writeback Partition

 

SQL Resources for Compliance

Author: Denny Lee
Reviewers: Prem Mehra, Kevin Cox 

With the terabytes of data that are being stored today - truly representing the data explosion that we've always talked about - it's becoming more and more difficult to provide an audit trail or utilize tools that can ensure that your SQL environment is compliant.  To help resolve this problem, we've introduced some guidance in the form of our SQL Server Reaching Compliance Guide.  We have built up more resources and case studies so you can reference them to help you build a compliant SQL Server environment.  These resources include:

And recently added is SQL Server helping with HIPAA compliance in action:

As a teaching hospital of Harvard Medical School, and the largest member of the CareGroup Healthcare System, Beth Israel Deaconess Medical Center needs an IT infrastructure with auditing capabilities to help it remain in compliance with regulations including those of the Health Insurance Portability and Accountability Act (HIPAA). The organization has more than 2 terabytes of information, including data warehouses, clinical applications, and other solutions hosted using Microsoft® SQL Server® database software. BIDMC is upgrading to Microsoft SQL Server 2008 Enterprise to take advantage of advanced auditing and transparent data encryption features in the latest version of SQL Server to help it meet HIPAA and other regulatory requirements. As early adopters, CareGroup IT professionals helped Microsoft create the new SQL Server 2008 Compliance Software Development Kit.

As well, for all of your interested in evidence that SQL Server can be used for PCI compliance as per the Payment Credit Industry Data Security Standards (PCI DSS), check out:

 

 

 

Using ProcessingGroup Dimension property option ByTable vs. ByAttribute may error with string keys

Author: Nicholas Dritsas

Reviewers: Thomas Kejser, Stuart Ozer

In SSAS 2005 and later, there is a dimension property called ProcessingGroup.  It has two values; ByAttribute (default) and ByTable. 

 

When you use ByAttribute, SSAS will send a SELECT DISTINCT query to the relational engine for each attribute PLUS, at the end, an additional SELECT DISTINCT that combines all the attributes plus key.  As you can imagine, if you have several attributes and a very large dimension table, this process can take a while.  Our customer in this case has a 100 million members dimension table in Oracle.

 

Using the option ByTable, SSAS sends one table scan query to the relational engine and temporarily caches in memory the results.  This can work well only if you have enough memory for the cache (see related blog for potential issues here). 

 

Now, even if you have enough memory, you may encounter a different problem.  If some of the attribute keys are strings, you may hit the 4gb limit of SSAS for the string store.  This limitation is faced typically on an attribute level, if you process ByAttribute.  But, since we process ByTable here, the limit applies to the whole table.  So, if you have a very large dimension table, you will hit this limit even if no single attribute size is more than 4gb.

 

There are a few workarounds, but, the problem of having a very large dimension with many attributes that have string keys is still hard to manage currently:

 

1.    ByAttribute and configure relational engine to perform better by adding indexes.  For more details, please look at the SQL Server 2008 Analysis Services Performance Guide here.

2.    ByTable and reduce the dimension string sizes at the key.

Useful links for upgrading to SQL Server 2008

There is plenty of material available to help you upgrade to SQL Server 2008.  This blog is intended as a short list for the most useful guidance that I have found.   And you may have different experiences upgrading from SQL Server 2000 and SQL Server 2005 so please take the time to do proper preparation work and advanced studying.  And test your upgrades before you do it for real in production. 

There is one bug that you need to know about that potentially affects your upgrade experience.  If you are planning on using backup/restore to move your database over to your new SQL Server 2008 environment, then you will need to start with the latest Cumulative Update.  Get that installed before you do the restore as it contains a fix you will need.  The problem is that the log replay after the restore may take a very long time, up to 10x the time you expected.  It only happened when log files were very large and had many VLFs (virtual log files).  Detach/Attach method is not affected.  This is the KB article for the error: http://support.microsoft.com/kb/967178/EN-US/. The fix is in CU4, but you should get the latest one available (as of the time of this writing it is CU4).

Here are the minimum steps I recommend:

Start with the SQL Server 2008 upgrade advisor:  http://www.microsoft.com/downloads/details.aspx?familyid=F5A6C5E9-4CD9-4E42-A21C-7291E7F0F852&displaylang=en. Read all output in detail because it can be easy to miss a small warning that could be important. 

 

Then get the 490 page document available on the web that contains all the upgrade best practices.  http://www.microsoft.com/downloads/details.aspx?FamilyID=66d3e6f5-6902-4fdd-af75-9975aea5bea7&displaylang=en.  It has a good section on Windows failover clustering.

 

Another great site is http://msdn.microsoft.com/en-us/library/cc936623.aspx. Although there is some overlap with the 490 page document, it contains more detail for some sections, especially clustering.   This site is an absolute requirement to use if you have Windows failover clustering. 

 

Specialty Links:

See my other blog on this site that I co-wrote with Glenn Berry on how to use database Mirroring to minimize downtime if you are upgrading from SQL Server 2005.  http://blogs.msdn.com/sqlcat/archive/2009/02/09/minimize-downtime-with-db-mirroring.aspx

 

Moving DTS 2000 packages: http://blogs.msdn.com/sqlcat/archive/2009/03/20/remember-to-move-dts2000-packages-when-upgrading-msdb-from-2005-to-2008.aspx

 

Motivation to upgrade if you are using failover clustering: http://sqlcat.com/top10lists/archive/2008/11/20/six-failover-clustering-benefits-realized-from-migrating-to-sql-server-2008.aspx

 

Kevin

 

Remember to move DTS2000 packages when upgrading msdb from 2005 to 2008

Some customers prefer to upgrade to SQL Server 2008 by detaching all databases from their SQL Server 2005 instance and then reattaching them to a completely new SQL Server 2008 instance.

 

If you choose this upgrade strategy, you should be aware that you cannot detach an msdb database from a SQL Server 2005 instance and reattach it to a SQL Server 2008 instance. If you try, you will get an error message. Backup/restore will not work for msdb in this upgrade scenario either.

 

The msdb database may contain objects that are relevant to your installation – for example agent jobs and DTS 2000 packages. When upgrading, you should make sure you move all relevant objects in msdb. You can use tasks in Integration Services to move most of these objects. However, DTS 2000 packages cannot be moved using the Integration Services tasks. Instead, you must manually move the contents sysdtspackage.

 

The CSS team has a great blog entry describing how to move DTS 2000 packages to a SQL Server 2008 instance: How to copy DTS 2000 packages between servers (and from SQL 2000 to SQL 2005 and SQL 2008)

 

Analysis Services Partition Size

Thomas and I updated the SQL Server 2008  Analysis Services Performance Guide principally to address features available in the latest release. But there was one important change related to partition size driven by changes in hardware. The earlier version of the performance guide for SQL Server 2005 Analysis Services Performance Guide stated this:

 

“In general, the number of records per partition should not exceed 20 million. In addition, the size of a partition should not exceed 250 MB.”

 

This guidance is reasonable but implies a strict size limit that should not be exceeded when in fact no such hard and fast limit exists. Query performance scales well as partition sizes exceed 20 million rows or 250MB, so that isn’t the primary concern here. Rather, the main criterion is manageability and processing performance. Partitions can be processed in parallel, so the more there are the more can be processed at once (assuming that I/O can keep up). Furthermore, the smaller a partition is the less memory is required for the ProcessIndex step to build the indexes and aggregations. Therefore, the likelihood of spilling temporary files to disk during processing (and the associated performance hit) decreases with smaller partitions.

 

So lots of partitions is good, right? Well, not always. The more partitions you have the more things you have to manage. That carries a cost. Also, partitions have some metadata and when the number exceeds several thousand, the cost of managing this metadata becomes apparent.

 

There is much more detail in the updated performance guide on best practices around partition sizing, but there are some simple rules of thumb to keep in mind. These are (in order or priority):

a)      Keep the number of partitions fewer than a few thousand

b)      Get the most out of your hardware and process as much data as possible by processing multiple partitions in parallel

c)       Avoid spilling to disk when processing indexes and aggregations (the ProcessIndex step of the processing stage)

d)      Target 20 million rows or 250 MB per partition

 

Some simple examples:

-          If you’re building a 1TB cube, the first rule takes precedence and you should exceed the 250 MB suggestion to keep the partition count down. In this case, sizing partitions between 500MB to 1 GB is reasonable.

-          If records come in at the end of each day and history does not change you can get the quickest processing with daily partitions for recent data and monthly or annual partitions for historical data.

-          If you have to reprocess a good portion of your cube on some regular basis, the size of partitions can be driven by how many can be processed in parallel keeping CPU and/or I/O usage as high as possible.

SQL Server 2008’s Oracle destination fast load option may fail if certain Oracle system views are missing

Author: Nicholas Dritsas

Reviewers: Lubor Kollar, Michael Thomassy, Sanjay Mishra 

 

SQL Server 2008’s latest feature pack, that can be found here, contains a new Oracle connector by Attunity that supports Oracle versions 9.2.0.4 and higher. When you use this connector in SSIS 2008 to send data into Oracle, you may not be able to use fast load if some Oracle system views are missing.  This is the case in Oracle 9.2.0.7, but, not 10.2.x.

 

When we tried to use fast load option using Attunity’s Oracle destination task to 9.2.0.7, we received the following SSIS error message:

 

[Oracle Destination [268]] Error: Fast Load error encountered during PreLoad or Setup phase. Class:  OCI_ERROR Status: -1 Code:   0 Note:   At:     ORAOPRdrpthEngine.c:735 Text:   ORA-00942: table or view does not exist.

 

We also switched on tracing at the Oracle instance level, and then scanned the trace files looking for exceptions.  This is what we found:

 

PARSE ERROR #4:len=207 dep=0 uid=26 oct=3 lid=26 tim=63762441650753 err=942
SELECT DECODE(COUNT(*), 0, 0, 1)   FROM SYS.LOADER_NESTED_VARRAYS                                               
WHERE TABLE_NAME = :tname AND TABLE_OWNER = :owner

 

PARSE ERROR #4:len=51 dep=0 uid=26 oct=3 lid=26 tim=63762441654278 err=942
SELECT VALUE FROM SYS.LOADER_SKIP_UNUSABLE_INDEXES
XCTEND rlbk=0, rd_only=1

 

The loader_ (sql loader) views are usually created when the catalogue script has been executed. The specific script that creates the loader_ objects is (.\rdbms\admin\catldr.sql).

 

In this case, these two views causing the errors don't exist in the catldr.sql file for 9.2.0.7 - We have confirmed this to the case for both Unix (aix 51) and Windows Server 2003.

 

However - both these views do exist in the 10.2.0.2 catldr.sql file for Windows, i.e.:

 

rem
rem $Header: catldr.sql 31-aug-2004.15:07:06 msakayed Exp $ ulview.sql
rem
Rem Copyright (c) 1990, 2004, Oracle. All rights reserved. 
Rem NAME
Rem    catldr.sql
Rem  FUNCTION
Rem    Views for the direct path of the loader
Rem  NOTES
Rem    This script must be run while connected as SYS or INTERNAL.
Rem  MODIFIED
Rem     msakayed   08/30/04 - column encryption support (project id 5578)
Rem     rphillip   05/08/03 - Add view to get full attribute name
Rem     msakayed   02/10/03 - Add security clause to loader views
Rem     rphillip   12/02/02 - Add view to check for nested varray tables
Rem     preilly    11/22/02 - Add view to get version of type used for a column
Rem     msakayed   11/11/02 - remove hard tabs
Rem     msakayed   11/01/02 - Bug #2643907: add LOADER_SKIP_UNUSABLE_INDEXES

 

We did create these missing views (see script below) on our DEV instance (running 9.2.0.7 on Aix) and the connector now seems to be functioning correctly with the both the fast load (direct path) and the table name access modes.

 

However, we have noticed some other unusual behavior with the Fast load - direct path method, especially on special characters and space trimming.  It seems trailing spaces are removed and special characters may convert to ?.  But, if you do not use fast load option, the trailing spaces and special characters are loaded into Oracle fine.  We have filed a bug and we follow up with a blog when there is a fix.

 

Missing system views script (provided as is per Oracle’s 9.2.07 scripts)

CREATE OR REPLACE VIEW SYS.LOADER_NESTED_VARRAYS

(

    TABLE_OWNER,

    TABLE_NAME

)

AS

select u.name as table_owner, o.name as table_name

        from col$ c, obj$ o, user$ u, ntab$ nt

        where o.obj# = nt.ntab# and o.owner# = u.user# and

              c.obj# = nt.obj#  and c.type#  = 123 and c.intcol# = nt.intcol#

              and (o.owner# = userenv('schemaid')

                    or o.obj# in

                         (select oa.obj#

                          from sys.objauth$ oa

                          where grantee# in ( select kzsrorol

                                              from x$kzsro

                                            )

                         )

                    or /* user has system privileges */

                      exists (select null from v$enabledprivs

                              where priv_number in (-45 /* LOCK   ANY TABLE */,

                                                    -47 /* SELECT ANY TABLE */,

                                                    -48 /* INSERT ANY TABLE */,

                                                    -49 /* UPDATE ANY TABLE */,

                                                    -50 /* DELETE ANY TABLE */)

                               )

                   )

/

GRANT SELECT ON SYS.LOADER_NESTED_VARRAYS TO "PUBLIC"

/

 

 

 

CREATE OR REPLACE VIEW SYS.LOADER_SKIP_UNUSABLE_INDEXES

(

    VALUE

)

AS

select count(*) as value from v$parameter

   where upper(name) = 'SKIP_UNUSABLE_INDEXES'

   and value = 'TRUE'

/

GRANT SELECT ON SYS.LOADER_SKIP_UNUSABLE_INDEXES TO "PUBLIC"

/

 

 

More Posts Next page »
Page view tracker