Welcome to MSDN Blogs Sign in | Join | Help

Using time zone data in SQL Server 2008

 

In SQL Server 2008 Microsoft has introduced a number of new date and time data types.  One of these is the datetimeoffset data type. This data type includes an offset from UTC time as well as the datetime value and ensures that the datetime can be retrieved in UTC or a particular timezone based on this offset.  There are also new functions to allow for conversions between different time zones using the new function SWITCHOFFSET(). 

 

An example from SQL Server 2008 Books On Line (BOL):

CREATE TABLE dbo.test

    (

    ColDatetimeoffset datetimeoffset

    );

GO

INSERT INTO dbo.test

VALUES ('1998-09-20 7:45:50.71345 -5:00');

GO

SELECT SWITCHOFFSET (ColDatetimeoffset, '-08:00')

FROM dbo.test;

GO

--Returns: 1998-09-20 04:45:50.7134500 -08:00

SELECT ColDatetimeoffset

FROM dbo.test;

--Returns: 1998-09-20 07:45:50.7134500 -05:00

 

One of the most common questions we are asked is why we use the offset and not a timezone name.  A timezone name is much easier to remember than an offset, and most people do not know an offset without looking it up, making queries more difficult. 

 

Unfortunately, there is no current international standard authority for timezone names and values.  Each system needs to use a system of their own choosing, and until there is an international standard, it is not feasible to try to have SQL Server provide one, and would ultimately cause more problems than it would solve.  However, there are a couple of common systems that are well recognized around the world.  One is the Dynamic timezone data that is stored in the Windows Vista registry.  This data can be read from the registry into a file, which is then imported into SQL Server.


Another cross-platform standard is the public domain Olson Timezone database (
http://www.twinsun.com/tz/tz-link.htm). There are many public domain programs for extracting a time zone from these files, but at this time most are programmatic solutions.   So a programmatic function could be written in the CLR, but to provide full database functionality and query-ability, a table is needed.

 

The programmatic solutions take a date, then apply the many different rules that determine when a zone is in daylight savings time and when it is not.  However, there are also historical changes.  Daylight savings time changed in 2007, meaning that determining what the offset for a particular zone is at a particular time is different depending on the year.  Then there are times when leap seconds need to be added.  Therefore any data-driven solution must have rows that have valid time ranges as well.

 

The approach recommended here is to take one of the DLLs found on the web and instead of providing a programmatic solution around a specific date – to write all of the rows out into a database as an offset with the valid ranges.  Currently this example uses the Windows standard naming conventions for timezones, with a mapping to the Olson timezone names, but you could easily add other names in other languages as well.

 

Working with timezones is very complex, and the following is a suggestion only for some ideas on how to use time zone data more effectively.   This is an example program (no guarantees) that uses a .NET library and writes the data from the Olson tz files in table format, and which can then be imported into SQL Server.    The .NET timezone library can be found at http://www.babiej.demon.nl/Tz4Net/main.htm and they request a small donation.   

 

Here is some sample code to write to files the timezone data (no guarantees – does not include leap seconds):

 

            StreamWriter sr = File.CreateText(@"D:\TZMapping.txt");

            StreamWriter tr = File.CreateText(@"D:\TZZones.txt");

 

            string[] zoneNames = OlsonTimeZone.AllNames;

            sr.WriteLine("ID\tDaylightName\tStandardName\tRawUtcOffset\tOffsetSeconds\tWin32Id");

            tr.WriteLine("ID\tTransitionStart\tTransitionEnd\tDeltaSeconds\tDST");

 

            for (int i = 0; i < zoneNames.Length; i++)

            {

                OlsonTimeZone tz = OlsonTimeZone.GetInstanceFromOlsonName(zoneNames[i].ToString());

                sr.Write(i.ToString() + "\t");

                sr.Write(tz.DaylightName.Trim() + "\t");

                sr.Write(tz.StandardName.Trim() + "\t");

                sr.Write(tz.RawUtcOffset.ToString() + "\t");

                sr.Write(tz.RawUtcOffset.TotalSeconds.ToString() + "\t");

                sr.WriteLine(tz.Win32Id == null ? "" : tz.Win32Id.Trim());

 

                DaylightTime[] times = tz.AllTimeChanges;

                for (int j = 0; j < times.Length; j++)

                {

                    tr.Write(i.ToString() + "\t");

                    tr.Write(times[j].Start.ToString("yyyy-MM-dd HH:mm:ss") + "\t");

                    tr.Write(times[j].End.ToString("yyyy-MM-dd HH:mm:ss") + "\t");

                    tr.Write(times[j] is StandardTime ? "0\t" :times[j].Delta.TotalSeconds.ToString() + "\t");

                    tr.WriteLine(times[j] is StandardTime ? false.ToString() : true.ToString() );

                }

            }

            tr.WriteLine();

            sr.WriteLine();

            tr.Close();

            sr.Close();

 

Import the TZMapping file, which will become the parent table, with the ID as the primary key.   Your table structure might look like this:

  

 

 

 Please note:  If you use the Flat File Datasource in the Import Data Wizard in SQL Server 2008 Management Studio, you will need to open the Advanced Tab to set the source OutPutColumnWidth to greater than the default of 50.   Then import the TZZones file, which will become the child table with the ID, TransitionStart, and TransitionEnd as the composite primary key with a foreign key reference to the TZMapping table.  The TZZones table includes historical timezone data.    Joining these new tables into your data into queries now allows for queries that include standard names, Windows IDs, etc.

 

For example, offsets can now be retrieved by a preferred name:

 

select UtcOffset from TZmapping where StandardName = 'US/Pacific (PST)'

 

The following two queries return different offset amounts for the same day in two different years.  This is because the US changed daylight savings time, and the date in March now falls into daylight savings when it did not before.

 

 

select (dbo.TZMapping.OffsetSeconds + dbo.TZZones.DeltaSeconds)/3600

from dbo.TZMapping 

join dbo.TZZones

on dbo.TZMapping.id = dbo.TZZones.id

where dbo.TZMapping.StandardName = 'America/Los_Angeles (PST)'

and '2006-03-15'

between dbo.TZZones.TransitionStart

and dbo.TZZones.TransitionEnd

 

 

 

select (dbo.TZMapping.OffsetSeconds + dbo.TZZones.DeltaSeconds)/3600

from dbo.TZMapping 

join dbo.TZZones

on dbo.TZMapping.id = dbo.TZZones.id

where dbo.TZMapping.StandardName = 'America/Los_Angeles (PST)'

and '2007-03-15'

between dbo.TZZones.TransitionStart

and dbo.TZZones.TransitionEnd

 

 

Again, timezones are a complex area and each application will need to address how you are going to handle time zone data to make programs more user friendly.  This is just one small example.

Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv | 5 Comments

Increase your SQL Server performance by replacing cursors with set operations

You have probably heard many times, from different sources, that as a best practice; avoid using TSQL cursors.

During a recent visit to a partner we ran into a common cursor case, which I wanted to use as an example to demonstrate why you should avoid TSQL cursors in most cases, and how to convert cursor logic to simple set join operations. Now there are certain scenarios where using a cursor makes sense. For example, a cursor is ideal for row by row processing that can’t be accomplished by set based operations. A cursor is flexible in that it provides a window, or subset, of data and that allows manipulation of the data in various ways. Study carefully what you want to achieve on case by case basis before using a cursor. Keep in mind SQL Server, as a modern RDBMS system, performs much better with set operations.

Here is simplified version of a real cursor that was used to update a big table with over 200 million rows.

DECLARE @EntityId Varchar(16)

DECLARE @PerfId Varchar(16)

DECLARE @BaseId Varchar(16)

DECLARE @UpdateStatus Int

 

DECLARE outerCursor CURSOR FOR

SELECT EntityId, BaseId

FROM outerTable

--Returns 204,000 rows

 

OPEN outerCursor

FETCH NEXT FROM outerCursor INTO @EntityId, @BaseId

 

WHILE @@FETCH_STATUS = 0

BEGIN

            DECLARE innerCursor CURSOR FOR

            SELECT PRFMR_ID

            FROM innerTable

            WHERE ENTY_ID = @BaseId

                       

            OPEN innerCursor

            FETCH NEXT FROM innerCursor INTO @PerfId

            SET @UpdateStatus = @@FETCH_STATUS

           

            WHILE @UpdateStatus = 0

            BEGIN

                  UPDATE 200MilRowTable

                  SET ENTY_ID = @EntityId

                  WHERE PRFMR_ID = @PerfId

           

                  FETCH NEXT FROM innerCursor INTO @PerfId

                  SET @UpdateStatus = @@FETCH_STATUS

            END

           

            CLOSE innerCursor

            DEALLOCATE innerCursor --clean up inner cursor

                       

            FETCH NEXT FROM outerCursor INTO @EntityId, @BaseId

END

 

CLOSE outerCursor

DEALLOCATE outerCursor –cleanup outer cursor

 

 

You might notice that this is a nested cursor with 204,000 loops in total for outerCursor. The innerTable has 10 million rows but innerCursor varies in number of loops depending on @BaseId of outerCursor. When I arrived at the customer this cursor had been running for over a day. The developer was “hoping” that it would finish soon given another day or two. The problem was nobody knew for sure how much time this thing would need to complete. Well, we can find out how much progress it has made so far to make an educated guess:

SELECT execution_count, st.text

FROM sys.dm_exec_query_stats as qs

CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as st

WHERE st.text like '%FETCH NEXT FROM outerCursor INTO%'

 

This would tell us how many times the outer cursor has looped already. It returned 107. That’s only around 0.5% (107/204,000) after 30 hours of running. If the trend were to continue, the cursor would need another 8+ months!!!

A rewrite of the batch to take advantage of set operations is as simple as this:

SELECT i.PRFMR_ID, o.EntityId INTO #tempTable

FROM innerTable i join outerTable o on i.ENTY_ID = o.BaseId

Go

UPDATE 200MilRowTable

SET m.ENTY_ID = t.EntityId

FROM 200MilRowTable m join #tempTable t on m.PRFMR_ID = t.PRFMR_ID

Go

--note this is only one of a few ways to rewrite.

In this particular case, “SELECT INTO” is minimally logged under simple recovery mode. The two statement approach makes it easier to understand the conversion logic.

This batch took approximately 17 hours to complete. Between the statement, I also put the database into simple recovery mode and added appropriate indexes to the temp table. I also dropped indexes from 200MilRowTable that touched “ENTY_ID” to speed this up. Adding indexes back took another 7 hours. The total time was approximately 24 hours, which is just a small fraction of the original cursor batch. I need to point out that the non-cursor batch uses more resources since the UPDATE now spawns multiple threads to process parallely. Remember our goal here is to make this finish faster not worrying about how much resources it consumes.

Note: this might not be a perfect example because the nested cursor is magnifying the slow performance. 

However, the bottom line is; aviod cursors if possible and use joins / set operations whenever you can.

Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv | 1 Comments

Appending Data Using SQL 2008 Filestream

SQL Server 2008 has a new feature called Filestream, which allows you to save large binary files in the file system, instead of in the database.  This is targeted directly at the scenario that many document management and web applications have today where they save some metadata in the database, with a column holding the path to the actual file.  However, there is no transactional context between the two; therefore, the data in the database can be changed to no longer point to the actual location, or the file moved without the pointer being updated. Users want to be able to have documents, pictures, video, etc. in the file system with the streaming support the file system provides, but have the metadata be consistent.  This is exactly what the Filestream feature is targeted for.

 

When a large binary files gets ‘updated’ most users/applications will replace the file in its entirety (you do not usually update part of an image through the database). Filestream does not currently support in-place updates.  Therefore an update to a column with the filestream attribute is implemented by creating a new zero-byte file, which then has the entire new data value written to it.  When the update is committed, the file pointer is then changed to point to the new file, leaving the old file to be deleted at garbage collection time.  This happens at a checkpoint for simple recovery, and at a backup or log backup.

 

Code that updates large value data types (varchar(max), nvarchar(max), and varbinary(max)) may use the .Write(offset) UPDATE clause; however since an in-place update of a column with the Filestream attribute is not supported, the .Write(offset) UPDATE clause will error out. Therefore, even though filestream is only a storage attribute, the update code does not work against both systems, requiring knowledge of underlying storage mechanism.

 

Many of the update actions taken against large data types are actually appends.  And many logging scenarios continuously append to an ever-growing file.  Filestream offers an option for this case, to avoid the scenario where the original data is pulled to the client, data appended, and then a new file written with the combined data – leaving the old file to be garbage collected later.

 

 In order to avoid this round trip, a device FS control (FSCTL_SQL_FILESTREAM_FETCH_OLD_CONTENT) can be issued to trigger a server-side copy of the old content.  This avoids the data moving over the network. This control is not yet documented; note that the DeviceIoControl ControlCode constant that is needed is: 599392.  This will be added to the Filestream API documentation at a future date.

 

Example code:

[DllImport(“kernel32.dll”, SetLastError = true)]

Static extern bool DeviceIoControl(

IntPtr hDevice,

Uint dwIoControlCode,

IntPter lpInBuffer,

Uint ninBufferSize,

[out] IntPtr lpOutBuffer,

Uiny nOutBufferSize,

Ref uint lpBytesReturned,

IntPtr lpOverlapped );

 

                IntPtr hFile = IntPtr.Zero;

Uint bytesReturned;

                // hFile = mySafeHafle.DangereousGetHandle();  or something similar

               

Bool result = DeviceIoControl ( hFile, 599392, IntPtr.Zero, 0, IntPtr.Zero, 0, ref lpBytesReturned, InPtr.Zero );

 

As Filestream is minimally logged, if there is no activity other than filestream activity, there is usually not enough activity to trigger a checkpoint, and the old files will not get deleted.  In non-simple recovery mode, the log will not increase, but the db will grow until those files get garbage collected.  This will occur in a background thread, and can impact the server through this activity.

 

Therefore, it is a best practice to manually issue that checkpoint if in simple recovery mode; and to maintain an optimal transaction log backup to avoid the file growth.

 

Other quick notes:

 

Filestream deletes are significantly faster than blob deletes.

If an application needs to commonly delete large data values, filestream will be more scalable as the metadata is changed, then the file garbage collected asynchronously.  For instance, the delete of a 6GB nvarbinary(max) file that took 6 seconds, happens in milliseconds with a filestream delete.

 

Filestream maximum size is different for SNAC 9 and SNAC 10.

Files using SNAC9 have a maximum size of 2GB.  SNAC 10 supports unlimited size.

 

Note:  Altering an existing column without a filestream attribute to having the filestream attribute is not supported.

This means that moving data from a blob storage mechanism to filestream mechanism requires that the data be copied to the new datatype with the filestream attribute.

 

 

Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv | 1 Comments

Now is time to try SQL Server 2008

Did you know that SQL Server 2008 is just around the corner, and that the SQL development team has just publicly released the latest Community Technology Preview (CTP 6) version?

This latest beta version of SQL 2008 includes many exciting new features like filtered indexes, sparse columns, data compression, Microsoft sync, integrated full text search, Report rendering in Microsoft Word, Data visualization enhancements, extensive Books Online documentation, the first feature pack, and many, many more.

Now is great time to try it for yourself! Download SQL Server 2008 here .

Also, the SQLCAT teams (including us on the SQLCAT ISV team) have been busy working with our customers and partners implementing pre-release (CTP) builds. We have many lessons learned to share with you; these can be found here

Additionally, we have a number of great best practices papers coming up:

· Index Defragmentation Best Practices

· Best Practices for using Resource Governor with a mixed (OLTP and Reporting) workload

· Data Compression: Design and Implementation Best Practices, Performance Impact

· XEvents Toolkit

· SQL 2008 Usage-Based Optimization Efficacy (subject to name change)

· Scale-Out Querying with Analysis Services 2008

· Best practices for Data Warehousing in SQL Server 2008

 

 

Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv | 1 Comments

Detecting Overlapping Indexes in SQL Server 2005

 

When SQL Server has an optimal index that satisfies the search predicates of a query the optimizer performs an index SEEK operation as opposed to an index (or table) scan to retrieve the required rows; this is desirable.  Based on this, one may be led to believe that having an index for every possible query predicate set would result in all the queries executing optimally.  While true, one has to keep in mind that the indexes need to be maintained when the underlying table data in the column included in the index changes, which amounts to overhead for the database engine.  So as you may guess, there are advantages of having indexes, but having too many can result in excessive overhead. This implies that you need to carefully evaluate the pros and cons before creating indexes.

 

In your first evaluation scenario you clearly want to avoid the case of having overlapping indexes as there is no additional value that an overlapping index provides.  For example, consider a table ‘TabA’ and its three associated indexes created with the following definitions.

 

CREATE TABLE TabA

( Col1  INT, Col2  INT, Col3  INT, Col4 INT );

GO

 

CREATE INDEX idx1 ON TabA ( Col1, Col2, Col3 );

CREATE INDEX idx2 ON TabA ( Col1, Col2 );

CREATE INDEX idx3 ON TabA ( Col1 DESC, Col2 DESC );

GO

 

In the table structure above the index idx1 is a superset (overlap) of the index idx2, and therefore redundant.  As can be expected any query that needs to perform a search on Col1 and Col2 could use index idx1 just as well as the index idx2 as seen in the graphical query plan below.

 

Figure1 Here

 

 

Such overlapping indexes are often a result of multiple developers working on the same product and not evaluating and understanding the existing schema before making additions.  Once created, detecting such overlapping indexes in a database can often be a laborious task requiring detailed analysis.  More importantly, most DBAs do not like to disable or drop indexes because they are not certain of the queries they were created to help with and fear the side effects the action may have.

 

The script below uses the new system catalog introduced in SQL Server 2005 to report all duplicate indexes in the current database context.

 

CREATE FUNCTION dbo.INDEX_COL_PROPERTIES (@TabName nvarchar(128), @IndexId INT, @ColId INT)

RETURNS INT

WITH EXECUTE AS CALLER

AS

BEGIN

      DECLARE @IsDescending INT;

      SELECT @IsDescending = is_descending_key

      FROM sys.index_columns SYSIDXCOLS

      WHERE OBJECT_ID(@TabName) = SYSIDXCOLS.object_id

      AND @IndexId = SYSIDXCOLS.index_id

      AND @ColId = SYSIDXCOLS.key_ordinal;

 

      -- Return the value of @IsDescending as the property

      RETURN(@IsDescending);

END;

GO

 

-- Find Duplicate Indexes in SQL Server Database

CREATE VIEW IndexList_VW AS

SELECT

      SYSOBJ.[name] AS TableName,

      SYSIDX.[name] AS IndexName,

      SYSIDX.[is_unique] AS IndexIsUnique,

      SYSIDX.[type_desc] AS IndexType,

      SYSIDX.[is_disabled] AS IsDisabled,

      INDEX_COL( SYSOBJ.[name], SYSIDX.index_id, 1 ) AS Column1,

      INDEX_COL( SYSOBJ.[name], SYSIDX.index_id, 2 ) AS Column2,

      INDEX_COL( SYSOBJ.[name], SYSIDX.index_id, 3 ) AS Column3,

      INDEX_COL( SYSOBJ.[name], SYSIDX.index_id, 4 ) AS Column4,

      INDEX_COL( SYSOBJ.[name], SYSIDX.index_id, 5 ) AS Column5,

      INDEX_COL( SYSOBJ.[name], SYSIDX.index_id, 6 ) AS Column6,

      INDEX_COL( SYSOBJ.[name], SYSIDX.index_id, 7 ) AS Column7,

      INDEX_COL( SYSOBJ.[name], SYSIDX.index_id, 8 ) AS Column8,

      INDEX_COL( SYSOBJ.[name], SYSIDX.index_id, 9 ) AS Column9,

      INDEX_COL( SYSOBJ.[name], SYSIDX.index_id, 10 ) AS Column10,

      INDEX_COL( SYSOBJ.[name], SYSIDX.index_id, 11 ) AS Column11,

      INDEX_COL( SYSOBJ.[name], SYSIDX.index_id, 12 ) AS Column12,

      INDEX_COL( SYSOBJ.[name], SYSIDX.index_id, 13 ) AS Column13,

      INDEX_COL( SYSOBJ.[name], SYSIDX.index_id, 14 ) AS Column14,

      INDEX_COL( SYSOBJ.[name], SYSIDX.index_id, 15 ) AS Column15,

      INDEX_COL( SYSOBJ.[name], SYSIDX.index_id, 16 ) AS Column16,

      dbo.INDEX_COL_PROPERTIES( SYSOBJ.[name], SYSIDX.index_id, 1 ) AS Column1_Prop,

      dbo.INDEX_COL_PROPERTIES