|
|
Ramblings of all kinds of Sql related tasks - usually related to customer questions, problems, etc. (that's what I do and all)
-
Due to a few different reasons, I'll be moving my blog from here at MSDN to 2 other locations. Please update your rollers and links, I'm hoping to see everyone at the new locations below!
http://blogs.mssqltips.com/blogs/chadboyd
http://www.chadhoc.net
(NOTE: the chadhoc.net isn't up yet - probably be out in a few days...)
I've moved most of my posts from here to these locations, and will be building on some of the more popular posts (32bit memory management is popular still apparantely, I'll expand on that and 64bit as well, Katmai/2008 will continue to be a popular topic, I'll get back to the SYTYKS posts also, etc.).
Thanks everyone, hope to see you at the new locations!
Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.
|
-
So, I think most of my close friends have heard by now, but if you are a friend, and you haven't heard, and this is how you're hearing about it, sorry about that - give me a call and I'll make it up to you over dinner and drinks or something :-).
I'm getting married to (in my opinion) the greatest girl in the world - you can read/see more about it at our little wedding website located here:
http://www.theknot.com/ourwedding/ChadBoyd&TiranaMcDermott
There's also a few pics of us out there if you've ever for some reason been curious what I look like (i'm nothing special).
Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.
|
-
I get many requests from customers for something like sp_who/sp_who2 for Sql 2005 with new columns/features/etc. specific to Sql 2005 (sp_who/sp_who2 are both included in Sql 2005, but they return exactly the same information as they did in Sql 2000). Well, here's one version of it - I've included the code below. You'll find a UDF that can be used to query inline with standard Sql statements (I call it "fn_getProcessData") that includes 3 parameters (@activeOnly, @includeSqlText, @includeQueryPlan) that are probably self-explanatory. In addition, there's a stored procedure I call "sp_who2k5" which basically wraps the "fn_getProcessData" function for querying via stored procedure syntax. Please let me know if you already have any code modules you use to return similar information, I'm curious to know how others are getting the data, what they're returning, etc.
Enjoy!
Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.
---------------------------------------------------------------------- ------------------ CODE ONLY BELOW ------------------ ----------------------------------------------------------------------
use master; go
if object_id('dbo.fn_getProcessData') > 0 drop function dbo.fn_getProcessData; go
create function dbo.fn_getProcessData ( @activeOnly bit = 0, -- If set (1), we return information on active requests only... @includeSqlText bit = 0, -- If set (1), we return the full sql text for the available sql_handle's... @includeQueryPlan bit = 0 -- If set (1), we return the XML query plan for the available plan handle's... ) returns @retTable table ( sessionId int, requestId int, requestStartTime datetime, requestStatus nvarchar(60), requestCommand nvarchar(32), sqlHandle varbinary(64), planHandle varbinary(64), dbName nvarchar(256), blockingSessionId smallint, waitType nvarchar(120), waitTime int, lastWaitType nvarchar(120), waitResource nvarchar(512), openTranCount int, openResultSetCount int, percentComplete real, requestCpuTime int, requestElapsedTime int, requestReads bigint, requestWrites bigint, requestLogicalReads bigint, requestRowCount bigint, requestQueryMemory int, sessionStatus nvarchar(60), loginTime datetime, hostName nvarchar(256), programName nvarchar(256), loginName nvarchar(256), originalLoginName nvarchar(256), sessionLastRequestStartTime datetime, sessionLastRequestEndTime datetime, sessionReads bigint, sessionWrites bigint, sessionLogicalReads bigint, sessionRowCount bigint, connectTime datetime, netTransport nvarchar(80), connectionPacketReads int, connectionPacketWrites int, netPacketSize int, clientNetAddress varchar(48), connectionMostRecentSqlHandle varbinary(64), sessionCpuTime int, sessionMemUsage int, sessionScheduledTime int, sessionElapsedTime int, clientInterface nvarchar(64), contextInfo varbinary(128), authScheme nvarchar(80), connectionLastRead datetime, connectionLastWrite datetime, sqlText nvarchar(max), queryPlan xml ) as /* -- All information for all sessions select * from master.dbo.fn_getProcessData(default,0,0)
-- Only data for currently active requests... select * from master.dbo.fn_getProcessData(1,1,1);
*/ begin
if @activeOnly > 0 insert @retTable (sessionId,requestId,requestStartTime,requestStatus,requestCommand,sqlHandle,planHandle,dbName,blockingSessionId,waitType,waitTime, lastWaitType,waitResource,openTranCount,openResultSetCount,percentComplete,requestCpuTime,requestElapsedTime,requestReads,requestWrites, requestLogicalReads,requestRowCount,requestQueryMemory,sessionStatus,loginTime,hostName,programName,loginName,originalLoginName, sessionLastRequestStartTime,sessionLastRequestEndTime,sessionReads,sessionWrites,sessionLogicalReads,sessionRowCount,connectTime, netTransport,connectionPacketReads,connectionPacketWrites,netPacketSize,clientNetAddress,connectionMostRecentSqlHandle,sessionCpuTime, sessionMemUsage,sessionScheduledTime,sessionElapsedTime,clientInterface,contextInfo,authScheme,connectionLastRead,connectionLastWrite) select coalesce(s.session_id, r.session_id, c.session_id) as sessionId, r.request_id as requestId, r.start_time as requestStartTime, r.status as requestStatus, r.command as requestCommand, r.sql_handle as sqlHandle, r.plan_handle as planHandle, isnull(db_name(r.database_id),'N/A') as dbName, r.blocking_session_id as blockingSessionId, r.wait_type as waitType, r.wait_time as waitTime, r.last_wait_type as lastWaitType, r.wait_resource as waitResource,r.open_transaction_count as openTranCount, r.open_resultset_count as openResultSetCount, r.percent_complete as percentComplete, r.cpu_time as requestCpuTime, r.total_elapsed_time as requestElapsedTime, r.reads as requestReads, r.writes as requestWrites, r.logical_reads as requestLogicalReads, r.row_count as requestRowCount, r.granted_query_memory as requestQueryMemory, s.status as sessionStatus, s.login_time as loginTime, s.host_name as hostName, s.program_name as programName, s.login_name as loginName, s.original_login_name as originalLoginName, s.last_request_start_time as sessionLastRequestStartTime, s.last_request_end_time as sessionLastRequestEndTime, s.reads as sessionReads, s.writes as sessionWrites, s.logical_reads as sessionLogicalReads, s.row_count as sessionRowCount, c.connect_time as connectTime, c.net_transport as netTransport, c.num_reads as connectionPacketReads, c.num_writes as connectionPacketWrites, c.net_packet_size as netPacketSize, c.client_net_address as clientNetAddress, c.most_recent_sql_handle as connectionMostRecentSqlHandle, s.cpu_time as sessionCpuTime, s.memory_usage as sessionMemUsage, s.total_scheduled_time as sessionScheduledTime, s.total_elapsed_time as sessionElapsedTime, s.client_interface_name as clientInterface, s.context_info as contextInfo, c.auth_scheme as authScheme, c.last_read as connectionLastRead, c.last_write as connectionLastWrite from sys.dm_exec_sessions s with(nolock) join sys.dm_exec_connections c with(nolock) on s.session_id = c.session_id and c.session_id > 50 join sys.dm_exec_requests r with(nolock) on s.session_id = r.session_id and r.session_id > 50 where s.session_id > 50 else insert @retTable (sessionId,requestId,requestStartTime,requestStatus,requestCommand,sqlHandle,planHandle,dbName,blockingSessionId,waitType,waitTime, lastWaitType,waitResource,openTranCount,openResultSetCount,percentComplete,requestCpuTime,requestElapsedTime,requestReads,requestWrites, requestLogicalReads,requestRowCount,requestQueryMemory,sessionStatus,loginTime,hostName,programName,loginName,originalLoginName, sessionLastRequestStartTime,sessionLastRequestEndTime,sessionReads,sessionWrites,sessionLogicalReads,sessionRowCount,connectTime, netTransport,connectionPacketReads,connectionPacketWrites,netPacketSize,clientNetAddress,connectionMostRecentSqlHandle,sessionCpuTime, sessionMemUsage,sessionScheduledTime,sessionElapsedTime,clientInterface,contextInfo,authScheme,connectionLastRead,connectionLastWrite) select coalesce(s.session_id, r.session_id, c.session_id) as sessionId, r.request_id as requestId, r.start_time as requestStartTime, r.status as requestStatus, r.command as requestCommand, r.sql_handle as sqlHandle, r.plan_handle as planHandle, isnull(db_name(r.database_id),'N/A') as dbName, r.blocking_session_id as blockingSessionId, r.wait_type as waitType, r.wait_time as waitTime, r.last_wait_type as lastWaitType, r.wait_resource as waitResource,r.open_transaction_count as openTranCount, r.open_resultset_count as openResultSetCount, r.percent_complete as percentComplete, r.cpu_time as requestCpuTime, r.total_elapsed_time as requestElapsedTime, r.reads as requestReads, r.writes as requestWrites, r.logical_reads as requestLogicalReads, r.row_count as requestRowCount, r.granted_query_memory as requestQueryMemory, s.status as sessionStatus, s.login_time as loginTime, s.host_name as hostName, s.program_name as programName, s.login_name as loginName, s.original_login_name as originalLoginName, s.last_request_start_time as sessionLastRequestStartTime, s.last_request_end_time as sessionLastRequestEndTime, s.reads as sessionReads, s.writes as sessionWrites, s.logical_reads as sessionLogicalReads, s.row_count as sessionRowCount, c.connect_time as connectTime, c.net_transport as netTransport, c.num_reads as connectionPacketReads, c.num_writes as connectionPacketWrites, c.net_packet_size as netPacketSize, c.client_net_address as clientNetAddress, c.most_recent_sql_handle as connectionMostRecentSqlHandle, s.cpu_time as sessionCpuTime, s.memory_usage as sessionMemUsage, s.total_scheduled_time as sessionScheduledTime, s.total_elapsed_time as sessionElapsedTime, s.client_interface_name as clientInterface, s.context_info as contextInfo, c.auth_scheme as authScheme, c.last_read as connectionLastRead, c.last_write as connectionLastWrite from sys.dm_exec_sessions s with(nolock) left join sys.dm_exec_connections c with(nolock) on s.session_id = c.session_id and c.session_id > 50 left join sys.dm_exec_requests r with(nolock) on s.session_id = r.session_id and r.session_id > 50 where s.session_id > 50
if @includeSqlText > 0 update r set r.sqlText = t.text from @retTable r outer apply sys.dm_exec_sql_text(isnull(r.sqlHandle,r.connectionMostRecentSqlHandle)) t where isnull(r.sqlHandle,r.connectionMostRecentSqlHandle) is not null;
if @includeQueryPlan > 0 update r set r.queryPlan = t.query_plan from @retTable r outer apply sys.dm_exec_query_plan(r.planHandle) t where r.planHandle is not null;
return; end go
use master go
if object_id('dbo.sp_who2k5') > 0 drop procedure dbo.sp_who2k5 go
create procedure dbo.sp_who2k5 @activeOnly bit = null, -- If set (1), we return information on active requests only... @includeSqlText bit = 0, -- If set (1), we return the full sql text for the available sql_handle's... @includeQueryPlan bit = 0 -- If set (1), we return the XML query plan for the available plan handle's... as /*
NOTE: This procedure requires the following modules:
1) master.dbo.fn_getProcessData() */ set nocount on;
-- Print some usage data for usability if needed... if @activeOnly is null begin print 'USAGE: '; print ' exec dbo.sp_who2k5 @activeOnly, @includeSqlText, @includeQueryPlan;'; print ''; print ' @activeOnly - If set (1), we return information on active requests only'; print ' @includeSqlText - If set (1), we return the full sql text for the available sql_handles'; print ' @includeQueryPlan - If set (1), we return the XML query plan for the available plan handles'; print ''; print ' Wrapper procedure around the UDF master.dbo.fn_getProcessData() - accepts exact same arguments.'; print ''; end
-- Format as needed... select @activeOnly = case when @activeOnly > 0 then @activeOnly else 0 end;
-- Get the response... select sessionId,blockingSessionId,requestStatus,dbName,waitType,waitTime,lastWaitType,requestCommand,loginName,originalLoginName, openTranCount,openResultSetCount,percentComplete,requestCpuTime,requestElapsedTime,requestReads,requestWrites,waitResource, requestId,requestStartTime,requestLogicalReads,requestRowCount,requestQueryMemory,sessionStatus,loginTime,hostName,programName, sessionLastRequestStartTime,sessionLastRequestEndTime,sessionReads,sessionWrites,sessionLogicalReads,sessionRowCount,connectTime, netTransport,connectionPacketReads,connectionPacketWrites,netPacketSize,clientNetAddress,connectionMostRecentSqlHandle,sessionCpuTime, sessionMemUsage,sessionScheduledTime,sessionElapsedTime,clientInterface,contextInfo,authScheme,connectionLastRead,connectionLastWrite, planHandle,sqlHandle,queryPlan,sqlText from master.dbo.fn_getProcessData(@activeOnly, @includeSqlText, @includeQueryPlan);
GO
|
-
With Sql 2008 (in the current CTP you have in your hand), thanks to a new extension to the group by clause referred to as 'grouping sets', you now have the ability to use sets of grouping columns in your group by clauses, allowing you to define basically multiple groupings in the same single query - i.e., instead of providing a single group by column-set, you can define multiple 'sets' of grouping columns, and have the resultset include the concatenated results for each set (think of this as effectively the UNION ALL of multiple identical select statements that are grouped by different column sets).
Some sample code showing some simple usage is here below, enjoy!
use tempdb; go if object_id('dbo.tblGroupTest') > 0 drop table dbo.tblGroupTest; create table dbo.tblGroupTest (id int, year smallint, quarter smallint, month smallint, amt bigint); go -- Fill up some sample yearly, quarterly, monthtly data for 2000 - 2007... declare @i int set @i = 2000 while @i <= 2007 begin insert dbo.tblGroupTest (id, year, quarter, month, amt) select row_number() over (order by a.object_id), @i, ntile(4) over (order by a.object_id), ntile(12) over (order by a.object_id), a.object_id from ( select object_id from sys.columns union all select object_id from sys.columns ) a; set @i = @i+1; end go -- Report select year, case when grouping(quarter) = 1 then '-- TOTAL --' else cast(quarter as varchar(5)) end as quarter, case when grouping(month) = 1 then '-- TOTAL --' else cast(month as varchar(5)) end as month, sum(amt) as sumAmt, avg(amt) as avgAmt from dbo.tblGroupTest group by grouping sets ( (year, quarter, month), (year, quarter), (year) ) order by year, isnull(quarter,10), isnull(month,15);
Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.
|
-
This post will cover an initial overview of the data compression feature that will be included in Sql Server 2008, and I'll also cover briefly some information on the Backup Compression feature as well (note that these are 2 totally separate and distinct features, since they are very different, as you'll hopefully see by the end of this post). This will actually be a fairly lengthy post, but I'd rather it be that way to ensure I cover as much as possible completely. NOTE that this feature is NOT in a CTP yet, so you won't be able to make use of it in the current CTP...
Reducing the size of a data-footprint can usually be done in a few different ways: 1) limit the amount of data that is stored via archival processes, 2) compressing the actual data. Let's assume that you've either already done as much of #1 as possible, and/or can't do #1, and you're data is still large - you're left with option #2. Compressing the data can also typically come in a few different flavors: 1) compress the data-file on disk using traditional compression algorithms, 2) storing the data more efficiently within the row/page (in this case anyhow). Compressing the data via traditional methods typically requires that you compress "larger" chunks of data in order to perceive a benefit (larger in this case than a typical 8/64k size related to Sql Server storage), and also doesn't provide any in-memory benefit (i.e. the compression is only beneficial on-disk, accessing that data and pulling it into cache requires decompression, and hence the memory footprint required is the same as the data in an uncompressed format). Storing the data more efficiently in-row/page on the other hand provides additional in-memory benefits, since the data is stored in the efficient format in-memory just as on-disk. This is the way that Sql Server 2008 implements data compression (which, by the way, is similar to the way that certain competitor products that shall remain nameless also implement compression).
Let's start with a couple of simple points:
- Data compression in Sql 2008 is easily enabled/disabled for a table and/or index (via new options in traditional create/alter table/index DDL) - Compression can be enabled/disabled for only certain partitions of a table/index - There are 2 levels of compression that can be enabled, which include 3 key changes in Sql 2008
- ROW level compression
This relates to more efficient storage of data in-row for Sql Server. We'll talk about this more below. - PAGE level compression This is a superset of (and implies) row level compression. It also makes use of 2 other new changes in Sql 2008 - "Column prefix compression" and "Page dictionary compression". We'll talk about each of these more below.
- Enabling compression is performed via rebuilding an index/table - CAN be online - Can NOT currently work with XML, BLOB, MAX data types - This will most likely be an Enterprise-only feature (i.e. in the Enterprise/Developer SKUs)
Ok, so with some of those points out of the way, onward...
In reality, Sql Server 2005 SP2 included a form of data compression - the vardecimal storage format. I'm not going to discuss the vardecimal storage format here, but you can think of it as extremely similar for decimal data as varchar is compared to char data. However, Sql Server 2008's data compression is vastly different from this (though it still supports/includes the vardecimal format as well) - so different in fact, that if you enable data compression on a given table/index, the underlying row/page format is different - yes, that's right, you heard correctly - if you use compression (ROW or PAGE), the row/page format for Sql 2008 will be DIFFERENT from the existing row/page format (only for the table/index(es) you are using compression for). So, in Sql 2008, there are 2, yes 2, row/page formats for data. You may now be wondering "well, if the row/page format changed, how in the world did you have enough time to re-engineer every component in Sql Server that is aware of the format in such a small amount of time? The answer is that we didn't - the Storage Engine is the ONLY component in Sql 2008 that is aware of the new row/page format. Once data is passed up the food-chain so to speak (i.e. to the Relational Engine for example), the format is converted from the new format to the old format and vice versa (again, this is ONLY if you are using compression for a given table/index/partition). This has some repercussions to be aware of as well - since the data is "uncompressed" when it is passed from the Storage Engine (i.e. out of cache to be consumed), this means that if you bcp out the data, or push the data out to an external consumer in any way, the data will be translated to the original, uncompressed row format. The Storage Engine does encompass many IO bound features however, so much is gained here (backup, dbcc, scans, etc.).
So you may be wondering why the row/page format had to change - good question. For row-level compression, in the simplest answer it's because the existing row format used for variable-length data didn't scale very well (if you understand how varchar type data is stored compared to fixed-char type data, you know that there is a 2-byte overhead associated with each variable length column for determining the beginning offset in the row for the data (even if there is NO data for the given column), among other considerations for things like NULL values, etc.). Given that we're basically in effect trying to be more efficient about storing data with the compression feature in 2008, using this existing type of row format simply wouldn't cut the mustard for getting the benefits required (though it is very similar to how the vardecimal format is stored in Sql 2005 and 2008 if you're using it and not using the new compression features). For page-level compression (which includes row-level compression), the format will change because we are adding a new structure at the beginning of each page refereed to as the 'CI' structure (CI stands for "compression information") - this structure will be made up of 2 components: 1) the "column prefix" portion, and 2) the "page dictionary" portion - we'll discuss these a bit later.
Ok, now that all that is out of the way, let's discuss the internals of ROW compression a bit. The basic jist of row-level compression is more efficient storage for all kinds of data (int, float, money, datetime, character (including varchar), etc.) - it works primarily by optimizing the row-level metadata for variable length storage (i.e. compared to the current model for storing variable length data). In Sql 2005 and prior, if you store an INT value of 1, you wind up with 4 bytes of fixed-length storage for a value that takes less than a byte. With variable length data, a varchar column with a value of 'hi' would take 2 bytes for storage + 2 bytes of meta-data overhead (and vardecmial would be the same for decimal values). With Sql 2008 and ROW compression, this meta-data storage is cut from 2 bytes to 4 BITS per value (bits, not bytes), in addition to efficient storage of the actual data on byte-boundaries (for example, to store the value 1 for an int datatype would require a single byte, not 4 bytes - for the value 10,000 in int, this would require 2 bytes). So, the storage for this fixed data now becomes the number of bytes to store the value + 4 bits of meta-data overhead (per value/column). I'm not going to discuss the actual row layout/format for the new structure at the moment, as it is quite complex, but will possibly go into it in a future post - the key thing to understand currently is that actual amounts used for storing ROW compressed data, both fixed and variable length data types (but NOT LOB/XML/MAX data types currently). And yes, there are special considerations for NULL and 0s and zero-length strings to efficiently store this type of data as well.
How about PAGE compression - as mentioned above, PAGE compression includes ROW compression (so, if you enable PAGE compression, you get ROW compression as well). In addition to the row-compression stuff from above, PAGE compression also uses "column prefix" matching, as well as the concept of a "page dictionary". So, what exactly are each of these concepts you say????
Let's start with "column prefix", which, as you may have guessed, has much to do with repeating patterns at the beginning of the values for a given column (which, not coincidentally is quite common for the contents on many pages, especially in the cases of index pages, since a single page most likely contains similar data). Assume you have a column of data on a single page of rows that contain values like 'Chad', 'Chadwick', 'Chadly', 'Chad', 'Chadster', 'Chadwick', and 'Chadly' (values repeated purposely) - as you can tell, there's quite a bit of redundant data 'prefixing' each of the rows in this column on this page, yes? So, what you might end up with in a scenario like this would be a column prefix value of 'Chad' stored in the CI structure, and each column ending up with pointers to that prefix value - resulting in values like this on-disk: '<empty>', '1wick', '1ly', '<empty>', '1ster', '1wick', and '1ly' (NOTE: this isn't necessarily exactly what you'd end up with, it's an example - you may end up with 2 prefix values, 1 for 'Chad' and 1 for 'Chadwick' for example, etc., etc...all depends on the data and what the algorithm decides is best for that page). In this scenario, we've now gone from using something in the lines of 44 bytes of storage to using more like ~25 bytes of storage (4 bytes in the CI structure for Chad, plus the sum of values) plus a little meta-data overhead - an approximately 38% savings of storage space in this case.
Now let's move on to the "page dictionary" concept - this is basically what it sounds like - a 'dictionary' of repeating column values on the given page is created in the CI structure, and again, pointers to the value in the dictionary are left in the original column value location. The page dictionary is generated on top of "column prefix" values, so you can most definitely end up with values in the dictionary that have pointers to column prefix values (this is by design). So, using the same column value samples as above with the Chad's, after the "column prefix" values are calculated and stored as mentioned above, you'd potentially end up with a page dictionary that contained the values '1ly', and '1wick', then the in-line row-values would ultimately look something like '<empty>', '2', '3', '<empty>', '1ster', '3', and '2'. In this case, we went from something in the lines of our original ~25 bytes of storage to around ~17 bytes of storage - another ~30+% savings.
First off, notice that in both cases I'm using approximate values - this is because the savings depends on lots of things (including a very complex format), and also on the amount of meta-data that is being stored to track the information. The good news is that we are also providing a few procedures with Sql 2008 that will allow you to estimate the savings you will be able to perceive for a given table/index, which is called "sp_estimate_data_compression_savings", which you'll be able to read extensively about in BOL.
Now, a couple of things to ensure we're clear on with PAGE level compression:
- Each page is distinct from all other pages - the CI structure for each page is built on and made up of only values/rows/etc. for that page, independent of other pages - Page dictionary data builds on column prefix data - so, you may have dictionary values that contain column prefix pointers - this is a good thing - Page level CI structures aren't built until a page is full (or pretty near full) - this makes sense in multiple ways if you think about it: 1) since you're using compression to conserve space, and space is allocated to the engine on page level boundaries anyhow, you wouldn't be conserving any space by compressing a 1/2 full page to a 1/4 full page (the db would still have a full page allocated); 2) it would be kind of tough to build the column prefix and page dictionary data without any data to analyze, yes?
Additionally, ROW and PAGE level compression are 2 different forms of compression that can be enabled/disabled at the table/index/partition level independently from other table/index/partitions - yes, this means that within a single index made up of multiple partitions, you can have some partitions that are ROW compressed, some that are PAGE compressed, and/or some that are not compressed at all, providing you all the power and flexibility to manage your data as you see fit.
Now for some of the other questions that I'm sure you're curious about:
1) How will this impact performance? Well, my favorite answer, that depends :-)...it's true though. Obviously, compression of any form (including this implementation) incurs CPU time - that doesn't necessarily mean however that your CPU utilization is guaranteed to go up. Compression in this manner will decrease the following: - I/O utilization - Memory utilization (less # of pages in cache, better memory utilization) - Potentially Page latching (i.e. the less # of pages in cache that exist, less cache that must be scanned, less # of latches to be taken) If the benefit from the savings of CPU cost for the above outweighs the upward cost of CPU utilization for "compression", the net effect would be lower CPU utilization - now this would entirely depend on your workload, system, etc. It's entirely possible that you currently have a totally CPU-bound workload due to true CPU-bound reasons, and you'll see increased CPU utilization with no offset (assuming you have no IO for example, or plenty of cache). Tests with your environment and your workload would need to be performed to see what benefit you'd receive. If you're workload is IO bound, or memory bound, then sacrificing a bit of CPU time to increase your overall throughput will be an easy decision. Additionally, the whole idea is to increase throughput, so if your throughput goes up, but your CPU also goes up a few % points, that's generally a win as well. 2) How much compression will I see / what will the compression ratio for my data be? Again, it depends - which is hopefully somewhat obvious from the discussions above. This answer depends entirely on your data - for ROW level compression, it depends entirely on the size of data, type of data, sparse-ness, etc. of the data. If you have lots of null values, or lots of small values compared to the storage size of the datatype, then you'll see very good compression ratios - if not, you won't see great compression ratios. For PAGE compression, in addition to the same considerations for ROW compression, it will additionally depend on the repetition of data, density/selectivity of data, etc. Again, you'll want to use the "sp_estimate_data_compression_savings" procedure to get an idea of what you might see.
Finally, no the manageability/supportability side, many tools include enhancements for insight (DBCC PAGE, new perfmon counters "CI Computations Succeeded" and "CI Computations Attempted", page counts in DMVs, etc.).
Well that covers the data compression component, how about backup compression? Well, this is a much easier story to tell - backup compression takes advantage of your typical standard on-disk compression methods. You most certainly can have a database that contains data compression in tables/index/partitions AND is also compressed via backup compression at backup time. Backup compression is implemented via a new option in the BACKUP syntax, and it is either ON or OFF (much was researched around implementing different levels of backup compression, but if you've ever worked with backup compression software that supports levels of compression, it's usually very easy to find the point at which the trade-off from compression achieved vs. CPU time spent is a no-brainer - this is why it's a simple ON/OFF option at this point). You can also set a new system level sp_configure option to define the default compression value (on/off) that will be the default anytime a BACKUP is taken without specifying the compression option. Backup compression is supported in ALL backup types, and can be levereged with log backups for improving things such as log shipping throughput across a WAN. A few points about backup compression:
- Backup compression will most likely be an Enterprise-only feature (i.e. in the Enterprise/Developer SKUs) - Regardless of the SKU support, ALL editions will support RESTORING compressed backups - Compression ratios will depend on your data as usual, but we've seen around 5:1 ratios on 'standard' data sets. Again, this will be entirely dependent on your data set - no guarantees with anything here, you'll need to test on your dataset and see what you can see. - Database mirroring has been enhanced to leverage compression algorithms for streaming data to the mirror (big improvements here)
Ok, so that hopefully covers data compression and backup compression a bit, by all means post with any questions/comments/etc. and I'll be sure to take note.
Finally, much of the information included herein was relayed to me from Sunil Agarwal (a PM for Sql Server here at Microsoft), so lots of thanks should go to him for the content - I'm sure if there's anything misstated herein, that it was due to my interpretation and not from his information for sure.
Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.
|
-
So, I've been getting quite a few pings about providing a list of new features to look for in Katmai (Sql 2008) and/or those that I'll be either blogging about and/or presenting on at the DC area groups I mentioned here: http://blogs.msdn.com/chadboyd/archive/2007/07/18/katmai-sql-server-2008-presentations-in-the-dc-area-novasql-caparea-net-cmap.aspx
Well, I put together a list below, and this isn't even everything (though it's a lot of the major stuff) - I even tried to categorize it (and some things may appear in multiple categories). I'll be blogging/presenting/etc. on many, many of these in the coming months, so stay tuned:
Security/Auditing - Transparent Data Encryption (encryption while data is 'still' on disk, transparent to applications) - External Key Management (Consolidation of key management, integration with external products) - Data Auditing (1st-class 'AUDIT' objects; DDL support; audit objects, principals, data, etc.; support for multiple logging targets) Availability/Reliability - Pluggable CPU support - Enhanced Database Mirroring (compression of mirror streams, enhanced performance, automatic page-level repair for principal/mirror) Performance - Data compression (easy to enable/disable online, more efficient data storage (this is NOT traditional data compression)) - Backup stream compression (server level control or backup statement control, all backup types) - Performance data collection (single, common framework for data collection, reporting, and storage/warehousing) - Improved Plan Guide support (plan freezing, pull plans directly from plan cache, SSMS integration, etc.) - Resource Governor (create pools and groups to govern, define classifications based on built-in functions, segment resource utilization amoung groups) Management - Policy-based management framework (manage via policies vs. scripts, enterprise-wide support, automated monitoring/enforcement, etc.) - Integrate with Microsoft System Center - Extended Events (high perf lightweight tracing infrastructure, NOT sql trace, integrated with ETW, unprecidented insight into goings-on) Development Enhancements - Improved datetime datatypes (100th nanosecond precision (7 digits past second), time-zone datetime offset, date only, time only) - HierarchyID datatype (hierarchical-aware data type, ORDPath values, built-in functions, methods, etc.) - Entity Data Model support (develop 'business entities' vs. tables, model complex relationships, retrieve entities vs. rows/columns) - LINQ - Sql Server Change Tracking (Change Data Capture, get 'diff' data changes WITHOUT a comparible value (i.e. datetime, timestamp, etc.)) - Table Valued Parameters - MERGE statement ('upsert' data, also includes deletion functionality) - Large UDT's (no more 8000 byte limit on CLR-based UDTs, no more 8000 byte limit for UDA's) - Spatial data (GEOMETRY and GEOGRAPHY data types, built-in spatial function support, spatial indexes) - XML enhancements (support for lax validation, office 12 support, xs:dateTime support, lists/union types, LET FLOWR support, etc.) - Inline initialization and compound assignment Service Broker - New UI and Tools for working with (add/drop/edit functionality within SSMS, Diag tools, ) - Conversation Priority (set message ordering, send/receive impact, 1-10 levels) Data Storage - Data compression (see above) - FILESTREAM attribute (get the 'best of both' functionality from BLOBs in the DB vs. BLOBs on filesystem, no more "to blob or not to blob") - Integrated Full Text Search (FTS fully integrated into DB engine, no external storage, no external service, more efficient and reliable costing) - Sparse columns (more efficient storage for 'wide' tables with many columns that repeat and don't contain data) - New index types (spatial indexes, hierarchical indexes, FILTERED indexes (indexes on filtered values within columns), etc.) Data Warehousing/ETL - Partitioned Table Parallelism (no more thread limit per partition) - Star Join support (no special syntax, optimizer based, full backward syntax support) - Data compression (see above) - Resource Governor (see above) - Persistent Lookups in SSIS (no more re-querying for lookup operators, cache lookups in multiple ways, persist lookups to disk) - Improved thread scheduling in SSIS (shared thread pool, pipeline parallelism) - Change Data Capture (see above) - MERGE statement (see above, great uses with slowly changing dimensions) - Scale-out analysis services (read-only storage supports multiple AS servers) - Subspace computations - New Tools for Cube design - Best Practice Design Alerting - Backup cubes with better scalability - Data-mining add-ins for Excell Reporting - IIS Agnostic Reporting Services Deployment (no IIS required to run RS any longer) - Rich-text support - Enhanced visualiztion (graphing) - New Word rendering (render reports to Microsoft Word) Deprecation - Many 'old' features ARE REMOVED/GONE (those that have been deprecated for some time - 60/65/70 compat modes, nolog / truncateonly syntax, etc.)
That's the list I've got for now...I'm sure I missed something or 2, and I'll continue to add to the list if I see something left out. I'm sure there are LOTS of questions around each of these, so I'll be continuing to blog on subsets of the features in-depth over the next few weeks/months. If there are specific questions, by all means post to this blog and/or email me via this blog, and I'll be sure to answer them directly or in a post covering the topic in question. Stay tuned!
Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.
|
-
In Sql 2008 (Katmai), a couple of the 'smaller' features that are currently in the latest CTP include inline variable initialization and compound assignment (something you App Dev folks have had for years). So, the following types of code now work in Sql 2008:
declare @d datetime = getdate(), @i int = 1, @s varchar(100) = 'test'; -- Show the values... select @d, @i, @s -- Increment i, append to s... select @i += 1, @s += 'ing'; -- Show the new values... select @i, @s;
These operators also work inline with DML statements and columns...for example, a simple UPDATE statement in a table called 'testTable' with a column called 'testColumn' where you wanted to increment the value of testColumn by 1 could be:
update testTable set testColumn += 1; Also would work with other columns...if another column called testColumn2 existed:
update testTable set testColumn += testColumn2;
I'll be posting other tidbits in Sql 2008 ongoing...enjoy!
Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.
|
-
In August, September, and October I'll be presenting on Sql Server 2008 (Katmai) at a variety of groups in the DC/Baltimore area. We'll be demoing lots of the new functionality with Sql 2008, including but not limited to Change Data Capture, MERGE, Table Valued Parameters, Declarative Management Framework, FILESTREAM, enhancements to Mirroring and Service Broker, etc. I'm hoping to also blog a bit on many of the new and coming changes over the next few months as well in addition to the typical Sql stuff...
My current known schedule is as follows:
August 7th, CMAP.NET group (http://www.cmap-online.org)
August 27th, NovaSql group, my "home" group :-) (http://www.novasql.com)
October 23rd, CAPEREA.NET Group (http://www.caparea.net)
Please join us at one of the coming meetings to learn about the next generation of Sql Server!
Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.
|
-
So you think you know SQL...Administration Part I
The 2nd in the series...this one revolves around Administration and Internals of Sql Server, and is the 1st of 2-4 groups of questions in just Administration and Internals...again, these are seperated into easy, moderate, and difficult groups. As always, if there are any questions/comments/etc., post a comment and we'll discuss it, enjoy!
For answers to these questions, go here when you're ready to see them:
Part I: http://www.mssqltips.com/tip.asp?tip1299 Part II: http://www.mssqltips.com/tip.asp?tip1281
Part III: http://www.mssqltips.com/tip.asp?tip1288
-------------------------------------------------------------------------------------------------- EASY --------------------------------------------------------------------------------------------------
Q: What can you do to guarantee any backup is valid?
Q: What is the name of the system database that contains all executable system objects (i.e. system stored procedure code, system view definitions, etc.)
Q: Can you update system tables in Sql Server, and if so, how?
Q: If a database goes down for whatever reason (assume data file is corrupt for example), and the database cannot be brought back online (i.e. you are in a recovery situation), what must be done first to ensure you can retrieve the latest data modifications (assume FULL recovery model)?
Q: What are the 3 recovery models SQL Server exposes?
Q: What 3 basic phases does database recovery go through during recovery, and in what order do they occur?
Q: Consider a scenario where you take a full backup, then some log backups, then a diff backup, some more log backups, then another diff, then some more log backups, then you crash - if all the diff backups are bad, when is the latest you can restore to? i.e. can you recover the database to the current point in time without using any of the diff backups?
FOLLOW-UP Q: Assume the same scenario, however instead of taking diff backups, everywhere that the diff backups were taken were instead FULL backups (so you have 3 full backups) - all the full backups are corrupt with the exception of the first full backup - can you recover the database to the current point in time here?
Q: What options/arguments can be specified in a BACKUP LOG statement to keep inactive log records from being truncated?
Q: What methods are available for removing fragmentation (any kind) on an index in Sql Server?
-------------------------------------------------------------------------------------------------- MODERATE: --------------------------------------------------------------------------------------------------
Q: Name as many operations as possible that cannot be performed on the model database.
Q: What page verification options are available in Sql Server and how do they work?
Q: What is the fundamental unit of storage in SQL Server data files and what is it’s size?
Q: What is the fundamental unit of storage in SQL Server log files and what is it’s size?
Q: Name as many different types of Pages in SQL Server as possible
Q: Is the sequence of data rows on given data/index page guaranteed to physically match the logical order of rows (i.e. by index key, row id, etc.)?
Q: What is the fundamental unit in which space and allocation is managed within Sql Server and what is it’s size?
Q: What are the primary differences between an index reorganization and an index rebuild?
Q: If you need to REBUILD a non-clustered index that is 10gb in size and have 5gb of free data-file space available with no room to grow the data file(s), how can you accomplish the task?
Q: During an index reorganization operation, if the index spans multiple files, will pages be allowed to migrate between files?
-------------------------------------------------------------------------------------------------- DIFFICULT: --------------------------------------------------------------------------------------------------
Q: Explain the difference between a fully-logged and minimally-logged operation
Q: What special type of page within Sql Server is responsible for tracking (via a simple bitmap) extents that have been modified by bulk-logged operations since the last "BACKUP LOG" statement?
Q: What special type of page within Sql Server is responsible for tracking (via a simple bitmap) extents that have been modified since the last "BACKUP DATABASE" statement?
Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.
|
-
The first in hopefully a fairly long series of posts on interview type questions/scenarios/etc., this one will cover Concurrency and Locking related Q/A (simple Q/A only at this point). The questions are separated into easy, moderate, and difficult groups. If you have any comments, by all means please post and lets discuss!
Typically, answers and content can be found in Books Online and/or MSDN...in this case, Craig Freedman's blog also contributed to the answers/scripts (http://blogs.msdn.com/craigfr/default.aspx).
For answers to these questions, go here when you're ready to see them: http://www.mssqltips.com/tip.asp?tip1253.
Additionally, see the attached script for some demo-ing of concurrency behaviors in different scenarios...
EASY:
Q: What isolation levels will provide completely read-consistent views of a database to all transactions?
Q: Within the READ_COMMITTED isolation level, during a read operation how long are locks held/retained for?
Q: Within the REPEATABLE_READ and SERIALIZABLE isolation levels, during a read operation and assuming row-level locking, how long are locks held/retained for?
Q: Can locks ever be de-escalated?
MODERATE:
Q: What are the different types of lock modes in Sql Server?
BONUS Q: Can you explain scenarios where each type of lock would be taken?
Q: What is lock escalation and what triggers it?
Q: Name as many of the lockable resources as possible in Sql Server 2005?
Q: What requirements must be met for a BULK-UPDATE lock to be granted, and what benefit do they serve?
Q: What is the least restrictive type of lock? What is the most restrictive?
Q: What is a deadlock and how is it different from a standard block situation?
Q: Which 2 isolation levels support optimistic/row-versioned-based concurrency control?
Q: What database options must be set to allow the use of optimistic models?
Q: What is the size of a lock structure?
DIFFICULT:
Q: Describe the differences between the pessimistic SERIALIZABLE model and the optimistic SNAPSHOT model in terms of transactional isolation (i.e., not the concurrency differences, but instead how the exact same transactional modifications may result in different final outcomes).
Q: In what circumstances will you see key-range locks, and what are they meant to protect against?
Q: Explain the purpose of INTENT locks
Q: Can deadlocks occur on resources other than database object?
BONUS Q: What different types of resources that can deadlock?
| |
|