|
|
-
Okay, it's time to come clean. As a technical evangelist, particularly one that focuses on driving platform adoption, I have to be familiar with an ever-increasing array of technologies. Compound that with the aging process, vast quantities of fried brain cells from my misspent youth and various and sundry character flaws (such as the over use of superlatives), and the result is that I don't always have a complete understanding of everything of which I speak. Shocking, I know... Usually I do a pretty good job of motoring past these soft spots and steering the conversation into more familiar territory before whomever I'm talking with flips my bozo bit. But sometimes there's just no way out and I get caught. I had just such an experience last week, and in the interest of full disclosure (and laughs) I thought I'd come clean. So I'm doing this podcast for Greg Low of Solid Quality Mentors in Australia on SQL Server 2008 developer enhancements. Now Greg's a pretty sharp guy, and I know going in that he's not going to learn much from me, but what the heck its a great chance to reach out to the developer community and get people excited about SQL Server 2008. The interview is going pretty well, and Greg's doing a great job of filling in the weak spots, until we reach the end and start talking about spatial support in SQL Server 2008. Then I did it. I used the word "tessellation". As soon as the word came out of my mouth I regretted it. Greg perked up and asked me to define it, at which point I fell on my sword and said I had absolutely no clue. DECLARE @rogerBozoBit [bit] = 1; We both had a chuckle, and he offered to remove it from the podcast. I told him I thought he should leave it in for laughs, but he removed it after all. Thanks Greg for protecting my dignity, but I have none. Besides, I didn't want to loose the opportunity to actually define the word so that other fast talking evangelists won't fall into the same trap. There's an excellent discussion of the topic in the SQL Server 2008 books online that I will try to summarize here. Generally speaking, tessellation is a technique used to simplify the manipulation of complex structures by forming the structure out of a bunch of polygons. In the context of SQL Server 2008 spatial support, tessellation comes into play when you are talking about spatial indexes. By all accounts we've provided a pretty powerful and flexible indexing capability for spatial data that decomposes an indexed space into a grid hierarchy. When searching for a match, SQL Server uses tessellation to fit an instance of spatial data into this grid hierarchy. In this case, where a shape is being compared to a grid hierarchy, SQL Server uses various tessellation rules to see how many cells in the the grid hierarchy are touched by the current shape being compared to the index, and how deep down the grid hierarchy the search needs to go. SQL Server provides a couple of different tessellation options when creating a spatial index which map directly to usage of the geometry vs. the geography type. So there, I've grossly oversimplified tessellation so the term can be butchered by evangelists worldwide.
|
-
In our last series of posts Zach and I talked about using SQL Server's new FILESTREAM support to store, manage and retrieve media in SQL Server to enable rich user experiences. We've been at it again this week trying to get some traction with the new spatial data types and query semantics in SQL Server 2008. This new functionality makes it possible to integrate location awareness into just about any type of application. What could be more compelling than allowing a user to interact with information that is relevant to locations that they care about, like home, work, school or vacation destinations? This technology has long been the domain of sophisticated GIS applications, but by integrating spatial capabilities into SQL Server 2008, Microsoft is making location awareness available to the average database developer. Zach has some experience with GIS applications, but I have none at all. So the challenge was whether a couple of database geeks could figure out how to leverage this spatial stuff in an application in a couple of days. I'm happy to say that we made great progress and we'd like to share some of our experiences in hopes that it will demystify the spatial functionality in SQL Server 2008. Rather than regurgitate the same stuff you can read in books online when CTP5 comes out, I'm going to try to focus on some of the key discoveries we made in hopes that it might save you some time. The scenario we focused on was "geo-locating" the media we stored in SQL Server 2008 in our previous FILESTREAM sample. By tagging these images and videos with some location information, it's possible to do spatial queries that retrieve all the pictures that were taken in a particular geographic area. There are a ton of cool applications for this kind of functionality, I'll leave it to your imagination to think of some. I figured a logical place to start was by simply adding a geography column to our eventMedia table like so: CREATE TABLE [dbo].[eventMedia] ( [mediaId] [bigint] NOT NULL IDENTITY PRIMARY KEY, [mediaKey] [uniqueidentifier] NOT NULL ROWGUIDCOL UNIQUE, [title] [nvarchar](256) NOT NULL, [dateCreated] [datetime] NOT NULL DEFAULT(GETDATE()), [createdBy] [nvarchar](256) NOT NULL, [fileName] [nvarchar](256) NOT NULL, [contentType] [nvarchar](256) NOT NULL, [location] [geography] NOT NULL, [binaryFile] [varbinary](max) FILESTREAM DEFAULT(0x)); GO Next I started thinking about how to get shapes into our new geography column. Turns out it was a whole lot easier than I expected. Under the covers, the geography type is implemented as a UDT. That means geography is not a scalar type like an integer or date, rather its an object with methods and properties. This is a great example of how the .NET Framework Common Language Runtime integration introduced in SQL Server 2005 is now enabling a whole new generation of features and capabilities in SQL Server 2008. Everyone knows how to assign values to scalar types, but how do you create an instance of an object in Transact-SQL? In the case of the geography type there are a variety of ways, but the simplest that I found is the Parse() method. This takes a string representation of a shape as an argument. The format of that string is defined by the Well Known Text (WKT) specification published by the OGC. For example, here's some Transact-SQL code that creates an instance of a geometry object and initializes its value to a POINT shape: DECLARE @g [geography]; SET @g = geography::Parse('POINT (39.96296 -75.16834)'); PRINT @g.ToString(); In this example the point shape's coordinates are the latitude and longitude of my old apartment building in Philadelphia. It's worth mentioning that the geography type utilizes a default coordinate system known as WGS 84 which is used by most GPS systems. Once you've initialized a geography instance, there's a ton of methods and properties you can use to interrogate them, compare them, and create new instances from existing ones. Most of these capabilities are defined in the OGC's Simple Features for SQL Specification. The Parse() method is great because it can deal with all of the different types of shapes, allowing me to code a single INSERT statement that can load any kind of valid shape into the geography column. Here's some C# code we used to insert geography data into our our eventMedia table: SqlCommand cmd = new SqlCommand("INSERT [dbo].[eventMedia]([mediaKey], [title], [createdBy], [fileName], [contentType], [location]) VALUES( @mediaKey, @title, @createdBy, @fileName, @contentType, geography::Parse(@location));", cxn, txn); cmd.Parameters.Add("@mediaKey", SqlDbType.UniqueIdentifier).Value = mediaKey; cmd.Parameters.Add("@title", SqlDbType.NVarChar, 256).Value = title; cmd.Parameters.Add("@createdBy", SqlDbType.NVarChar, 256).Value = createdBy; cmd.Parameters.Add("@fileName", SqlDbType.NVarChar, 256).Value = sourceFileName; cmd.Parameters.Add("@contentType", SqlDbType.NVarChar, 256).Value = contentType; cmd.Parameters.Add("@location", SqlDbType.NVarChar).Value = location; Next we started thinking about what kind of shapes to load into our new geography column. My first inclination was to start loading a bunch of map data. Zach and I really didn't want to write a bunch of graphics code to draw maps, there are great services out there like Virtual Earth that draw maps way better than we could. We quickly realized that there really wasn't any need to load a whole map, all we needed to demonstrate "geo-locating" our media files was to create some shapes that represent the map and regions that we are interested in, then tag each of our media files with a specific location within that map. To keep things simple, we decided to store those shapes in the same table as our media files so we could query them all in the same place. Our demo design uses the following "types" of rows in the eventMedia table: - Map row: There is only one of these. It's a rectangle represented as a geography POLYGON instance whose four points are the lat/long coordinates of our total map area. For good measure we store an image of the map region from a map drawn in Microsoft Streets & Trips. I simply drew a rectangle on top of an existing map in Streets and Trips, then used the location sensor tool in Streets & Trips to get the lat/long coordinates of the four corners.
- Region rows: There are three of these. They are irregular closed polygons that define regions of interest contained within the overall map. Like the map, we store an image of the region I created in Streets & Trips along with the lat/long coordinates for each of the points in the polygons.
- Media rows: There are several of these, which represent pictures stored as varbinary(max) FILESTREAM instances. We tag them with location information using a geography POINT shape whose lat/long coordinates correspond to the location where the picture was taken. By comparing these points to our regions, the demo shows how you can retrieve media based upon its location within a map or region of interest.
For the demo we decided to focus on the Pro Cycling Tour International Championship in Philadelphia, PA. It's relatively self-contained and has some cool sections that correspond neatly to our concept of "regions of interest". Map and Region Rows | Region | Map | Race Area | Wall Area | Parkway Area | | Description | This is the total map area we will deal with in this sample. This helps constrain things to a specific geography in Philadelphia rather than the whole earth. | The entire Pro Cycling Championship race route is contained inside this irregular polygon. | One of the most interesting parts of the race route. Contains the infamous 17% grade climb known as the "Manayunk Wall". | The race starts and finishes here on the Benjamin Franklin Parkway. | | JPEG Image: |  |  |  | | | WKT Shape Def | POLYGON(( 39.95372 -75.23177, 39.95372 -75.16456, 40.03618 -75.16456, 40.03618 -75.23177, 39.95372 -75.23177)) | POLYGON(( 39.95601 -75.17031, 39.95778 -75.16786, 39.97789 -75.18870, 39.99237 -75.18521, 40.00677 -75.18603, 40.01136 -75.19922, 40.03142 -75.21746, 40.02586 -75.22534, 40.01430 -75.21052, 40.00634 -75.19192, 39.99570 -75.19248, 39.98374 -75.20526, 39.97704 -75.19437, 39.96920 -75.19087, 39.95601 -75.17031)) | POLYGON(( 40.02387 -75.22280, 40.02810 -75.21442, 40.03142 -75.21746, 40.02586 -75.22534, 40.02387 -75.22280))
| POLYGON(( 39.95601 -75.17031, 39.95778 -75.16786, 39.96874 -75.17921, 39.96512 -75.18441, 39.95601 -75.17031)) | Media Rows | Media | Parkway Photo | Wall Photo (Bottom) | Wall Photo (Top) | Logan Circle Photo | | Description | Taken on the Ben Franklin Parkway near the finish line. | This shot was taken from the bottom of the Manayunk Wall. | This shot was taken at the top of the Manayunk Wall. | This is another shot from the Benjamin Franklin Parkway. | | JPEG Image: | 
| | | | | WKT Shape Def | POINT (39.96045 -75.17396) | POINT (40.02593 -75.22457) | POINT (40.02920 -75.21986)
| POINT (39.95813 -75.17052) | One interesting thing I ran into when creating the geography polygons was that you have to define the polygon's points in a counter-clockwise fashion to have the proper "ring orientation". If you define geography points in a clockwise fashion you will get the following error: Msg 6522, Level 16, State 1, Line 2 A .NET Framework error occurred during execution of user defined routine or aggregate 'geography': Microsoft.SqlServer.Types.GLArgumentException: 24205: The specified input does not represent a valid geography instance because it exceeds a single hemisphere. Each geography instance must fit inside a single hemisphere. A common reason for this error is that a polygon has the wrong ring orientation. Microsoft.SqlServer.Types.GLArgumentException: at Microsoft.SqlServer.Types.GLNativeMethods.ThrowExceptionForHr(GL_HResult errorCode) at Microsoft.SqlServer.Types.GLNativeMethods.GeodeticIsValid(GeometryData g) at Microsoft.SqlServer.Types.SqlGeography.IsValidExpensive() at Microsoft.SqlServer.Types.SqlGeography.ConstructGeographyFromUserInput(GeometryData g, Int32 srid) at Microsoft.SqlServer.Types.SqlGeography.STGeomFromText(SqlChars geometryTaggedText, Int32 srid) at Microsoft.SqlServer.Types.SqlGeography.Parse(SqlString s). I don't confess to understand this 100%, but Ed Katibah told me its kind of like turning the world inside-out. Ed also mentioned that sister data type geometry polygons don't have this limitation and can be loaded in either order (clockwise or counter-clockwise). Without getting into too much detail here, that led to a conversation about what the right type was to use for our demo. I had thought that geometry was limited to standard x and y coordinates and wouldn't handle lat/long coordinates, but was surprised to learn that it does. The basic difference between the two is that geography types account for the curvature of the earth, while geometry types don't. That means for relatively small surface areas they are roughly equivalent, but if you are dealing with larger surface areas you will definitely want to stick with the geography type. Another important thing to remember about polygons is that you need to "close" them with a final point that is the same as the first point in the polygon. If you don't do this you get the following error: Msg 6522, Level 16, State 1, Line 2 A .NET Framework error occurred during execution of user defined routine or aggregate 'geography': System.FormatException: 24119: The Polygon input is not valid because the start and end points of the exterior ring are not the same. Each ring of a polygon must have the same start and end points. System.FormatException: at Microsoft.SqlServer.Types.GeometryDataBuilder.EndFigure() at Microsoft.SqlServer.Types.OpenGisWktReader.ParseLineStringText(FigureAttributes attributes) at Microsoft.SqlServer.Types.OpenGisWktReader.ParsePolygonText() at Microsoft.SqlServer.Types.OpenGisWktReader.ParsePolygonTaggedText() at Microsoft.SqlServer.Types.OpenGisWktReader.ParseGeometryTaggedText() at Microsoft.SqlServer.Types.OpenGisWktReader.ReadGeometry() at Microsoft.SqlServer.Types.SqlGeography.STGeomFromText(SqlChars geometryTaggedText, Int32 srid) at Microsoft.SqlServer.Types.SqlGeography.Parse(SqlString s). This points out one of my few beefs about the CLR implementation in SQL Server, and that is that CLR exceptions always generate the same Transact-SQL error number (6522). Somebody on the CLR team once explained to me why it has to be this way, I think it has to do with limitations of the existing Transact-SQL error handling framework and RAISERROR in comparison to the extremely rich exception handling capabilities in the .NET Framework. I'm sure my buddy Bob Beauchemin would be able to explain this in gory detail. One last tip about polygons, and any other shape that uses lat/long coordinates. I made several typos when initially recording my lat/long coordinates, resulting in less than satisfying results for the demo. I called Ed to get some debugging recommendations. Ed mentioned a great spatial partner named Safe Software who is working on a beta version of their spatial ETL tool named FME, which can be used to move spatial data in and out of SQL Server 2008 from other formats. One of their components will actually allow you to visualize shapes stored in a geography column, making it a heck of a lot easier to determine if the polygon you thought you created is actually the polygon you created. Anyway back to the matter at hand, we created a .NET Framework command-line utility named FileStreamLoader designed to populate our eventMedia table. This was the easiest way for us to pump in all of our pictures and videos using FILESTREAM. It's tough to do that kind of stuff in a Transact-SQL script, so database geeks please go learn .NET programming it will make your life a whole lot easier. I'll post the code for FileStreamLoader up on CodePlex once CTP5 ships. So after loading up our map/region rows using FileStreamLoader I started running some queries to see how they worked. Here's a good one that retrieves all of the polygons from eventMedia: SELECT N'Map/Region Row' As [Label], [mediaKey] As [Key], [title] As [Title], ROUND([location].STArea(),2) AS [Area], ROUND([location].STLength(),2) AS [Length], [location].ToString() AS [Location] FROM [dbo].[eventMedia] WHERE [location].STDimension() = 2; I used a few spatial methods to illustrate some of the rich functionality you get when using the geography type. The STArea() values are the total area of the polygon in square meters. The STLength() values are the total length around the polygon in meters. The ToString() values are WKT string representation of the geography instance. I used the STDimension() method to filter the result set down to just 2-dimensional objects. The results look like this: | Label | Key | Title | Area | Length | Location | | Map/Region Row | cc80cfee-bc14-435b-97ee-6dc1f2ca8f9c | Map | 52552559.9 | 29791.3 | POLYGON ((39.95372 -75.23177, 39.95372 -75.16456, 40.03618 -75.16456, 40.03618 -75.23177, 39.95372 -75.23177)) | | Map/Region Row | 1254b31c-c37b-4d12-86da-9404fd9e42cd | Race Area | 6432902.35 | 22165.07 | POLYGON ((39.95601 -75.17031, 39.95778 -75.16786, 39.97789 -75.1887, 39.99237 -75.18521, 40.00677 -75.18603, 40.01136 -75.19922, 40.03142 -75.21746, 40.02586 -75.22534, 40.0143 -75.21052, 40.00634 -75.19192, 39.9957 -75.19248, 39.98374 -75.20526, 39.97704 -75.19437, 39.9692 -75.19087, 39.95601 -75.17031)) | | Map/Region Row | ffd2bfed-195c-46da-8b6b-0a8728d8a43a | Parkway Area | 689476.79 | 4015.39 | POLYGON ((39.95601 -75.17031, 39.95778 -75.16786, 39.96874 -75.17921, 39.96512 -75.18441, 39.95601 -75.17031)) | | Map/Region Row | dbd03037-7451-4451-8a26-b7b95fd2486f | Wall Area | 334024.82 | 2529.11 | POLYGON ((40.02387 -75.2228, 40.0281 -75.21442, 40.03142 -75.21746, 40.02586 -75.22534, 40.02387 -75.2228)) | The demo application needs to determine what the overall map space is. We can hard code this, but its better if the app can determine it dynamically using a spatial query. This query gets the map region from the table, which is the 2-dimensional geography object with the biggest area: DECLARE @mapKey [uniqueidentifier]; DECLARE @map [geography]; SELECT TOP 1 @mapKey = [mediaKey], @map = [location] FROM [dbo].[eventMedia] WHERE [location].STDimension() = 2 ORDER BY [location].STArea() DESC; Next, the demo application needs a list of sub-regions within the overall map space. Again, its best if the app can get this list using a spatial query: SELECT N'Region Row' AS [Label], [mediaKey] AS [Key], [title] AS [Title] FROM [dbo].[eventMedia] WHERE ([mediaKey] <> @mapKey) AND ([location].STDimension() = 2) AND ([location].STIntersects(@map) = 1); The STIntersects() function is one of the most important functions for enabling spatial queries. In this case, each 2-dimensional geography instance in the location column is evaluated against the map instance to see if they intersect. The results represent the set of "sub-regions" in our map: | Label | Key | Title | | Region Row | ffd2bfed-195c-46da-8b6b-0a8728d8a43a | Parkway Area | | Region Row | 1254b31c-c37b-4d12-86da-9404fd9e42cd | Race Area | | Region Row | dbd03037-7451-4451-8a26-b7b95fd2486f | Wall Area | While I don't discuss it in this blog post, its important to note that the STIntersects() method supports usage of the new spatial index type in SQL Server 2008. We didn't really need one for such a small number of rows, but when dealing with larger scale GIS systems indexing can become critical. Finally, on to the basic spatial query the demo application needs to support. We need to retrieve a list of pictures that were taken within the boundaries of our map. This query gets all of the pictures that have been geo-located within our map: SELECT N'Map Picture Row' AS [Label], [mediaKey] AS [Key], [title] AS [Title], [location].ToString() AS [Location], LEN([binaryFile]) AS [File Size] FROM [dbo].[eventMedia] WHERE ([contentType] = N'image/jpeg') AND ([location].STDimension() = 0) AND ([location].STIntersects(@map) = 1); The results look like this: | Label | Key | Title | Location | File Size | | Map Picture Row | 1135AEF5-E36A-40F5-AA31-9AFD8808C7F9 | Finish Line | POINT (39.96045 -75.17396) | 44826 | | Map Picture Row | 181C2858-D0A3-44FF-BF6B-5BE571306EDA | The Manayunk Wall | POINT (40.02593 -75.22457) | 49931 | | Map Picture Row | 62DF5D51-60B1-45D0-93B8-76A1D2D76CCB | Top of the Manayunk Wall | POINT (40.0292 -75.21986) | 510277 | | Map Picture Row | CD0D0373-B706-48C4-9C99-BD22470A4F44 | Logan Circle | POINT (39.95813 -75.17052) | 46612 | Since the pictures are stored as a FILESTREAM, the demo application can go out and retrieve them using new Win32 file streaming API support in SQL Server 2008. Now let's do a final spatial query that gets all of the pictures in one of our "areas of interest", namely the Manayunk Wall: DECLARE @wallArea [geography]; SELECT @wallArea = [location] FROM [dbo].[eventMedia] WHERE [wallKey] = 'dbd03037-7451-4451-8a26-b7b95fd2486f'; SELECT N'Wall Picture Row' AS [Label], [mediaKey] AS [Key], [title] AS [Title], [location].ToString() AS [Location], LEN([binaryFile]) AS [File Size] FROM [dbo].[eventMedia] WHERE ([contentType] = N'image/jpeg') AND ([location].STDimension() = 0) AND ([location].STIntersects(@wallArea) = 1); Here are the results: | Label | Key | Title | Location | File Size | | Wall Picture Row | 181C2858-D0A3-44FF-BF6B-5BE571306EDA | The Manayunk Wall | POINT (40.02593 -75.22457) | 49931 | | Wall Picture Row | 62DF5D51-60B1-45D0-93B8-76A1D2D76CCB | Top of the Manayunk Wall | POINT (40.0292 -75.21986) | 510277 | And there we have it! These are the two pictures that were taken in the Manayunk Wall area. Obviously the demo gets more interesting with more photos and more regions. Now you know how to use the new spatial functionality in SQL Server 2008 to geo-locate media files in your application. Hopefully all you database developers out there won't be afraid to dive in and start using this amazing new functionality. Zach is going to do some additional posts in our spatial series which discuss how to leverage these new spatial features within a WPF application. - Roger
|
-
The table structure I proposed in my last post used the following column definition for my FILESTREAM column: CREATE TABLE [dbo].[eventMedia] ( ... [file] [varbinary](max) FILESTREAM);
So by default I am allowing NULL's to be stored in the file column. Zach and I changed the design a bit to the following: CREATE TABLE [dbo].[eventMedia] ( ... [file] [varbinary](max) FILESTREAM DEFAULT(0x)); So by default we are initializing every file instance to an empty binary string. The net effect of this change is a zero-length file will get created in the file system when new rows get inserted into the eventMedia table. So why did we do this? Without stealing Zach's thunder, you can't open up a handle to an instance if the column is NULL. If you can't open up a handle to it, that means you can't write to it either, which will result in a dead-end. So all you DBA's creating FILESTREAM columns take note! - Roger
|
-
Zach and I just spent a couple of days figuring out how to make use of the new FILESTREAM support in SQL Server 2008 and we thought we'd share a little bit about the experience in hopes it might save somebody some time. There's a ton of information out there regarding FILESTREAM, but in case you need more detail you can check out some of the SQL Server 2008 CTP content on Connect. FILESTREAM support will be enabled in CTP5. From the perspective of user experience, FILESTREAM is going to enable some interesting scenarios. SQL Server has always provided the capability to store binary data, and thus you could grab any type of file and stuff it into a SQL Server varbinary(max) column. But blobs have different usage patterns than relational data, and SQL Server's storage engine is primarily concerned with doing I/O on relational data stored in pages and extents, not streaming blobs. So the bottom line is that storing blob's in SQL Server have always had some limitations from a performance perspective. Most developers resorted to storing files in the file system, then just storing a path to the file in the database. Perfectly legitimate approach, and for some apps may still be the right way to go even with the advent of FILESTREAM. But when you do that you introduce a whole new set of challenges around manageability, backup and concurrency that should already be obvious to the reader. Enter FILESTREAM. Now SQL Server 2008 can store blobs in its own private namespace on the local NTFS filesystem instead of in-line with relational data. That's good cuz the NTFS file system was built to stream blobs. NTFS is even more interesting form a database perspective because it's transactional and supports recovery. So you can imagine the SQL Server Storage Engine and NTFS having a little mutual self-respect love fest. Again this has obvious advantages from a manageability, backup and concurrency perspective that I won't reiterate here. But what I will do is talk about our first experience with FILESTREAM from a database perspective, and Zach's gonna talk about it from an application development perspective. Our first project was to wire up a rich WPF app written in Visual Studio 2008 with some video stored in FILESTREAM in a pre-release build of SQL Server 2008 CTP5. We quickly found out that you can't connect to CTP5 from managed code written in Visual Studio 2008 Beta 2 due to a TDS compatibility issue. Once we tracked down a release candidate of Visual Studio 2008 we were good to go. Next I went into SSMS and built out a database to store our video. My first surprise was IntelliSense support in the Transact-SQL Editor! Check it out: In order to use the FILESTREAM attribute on a varbinary(max) column you have to enable FILESTREAM support for the SQL Server instance using: EXEC [sp_filestream_configure] @enable_level = 3; At this point I got a message that I needed to reboot the server in order for the setting to take effect. The dev team is working on removing this requirement for RTM. Next I created the database. In order to use FILESTREAM you have to have a special filegroup for storing the FILSTREAM data which I called FileStreamGroup1: CREATE DATABASE AdventureWorksRacing ON PRIMARY ( NAME = AdventureWorksRacing_data, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\AdventureWorksRacing_data.mdf', SIZE = 2MB, MAXSIZE = 50MB, FILEGROWTH = 15%), FILEGROUP FileStreamGroup1 CONTAINS FILESTREAM ( NAME = AdventureWorksRacing_media, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\AdventureWorksRacing_media') LOG ON ( NAME = AdventureWorksRacing_log, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\AdventureWorksRacing_log.mdf', SIZE = 5MB, MAXSIZE = 25MB, FILEGROWTH = 5MB); GO When I went out to the AdventureWorksRacing_media folder there was some initial NTFS folders with GUID's for names, and some header files and log folders. This is all storage engine gobbledy-goop for creating the FILESTREAM namespace. Next I created a table to store our video with a FILESTREAM column: CREATE TABLE [dbo].[eventMedia] ( [mediaId] [uniqueidentifier] NOT NULL ROWGUIDCOL PRIMARY KEY, [dateCreated] [datetime] NOT NULL DEFAULT(GETDATE()), [createdBy] [nvarchar](256) NOT NULL, [fileName] [nvarchar](256) NOT NULL, [mediaType] [nvarchar](256) NOT NULL, [location] [geometry] NULL, [file] [varbinary](max) FILESTREAM); GO; Tables with FILESTREAM columns required a ROWGUIDCOL column. This is used by the storage engine to keep track of instances in the filesystem. Next I wrote some TSQL to insert some garbage to see the affect in the AdventureWorksRacing_media folder. I'm not showing that code here because every lame FILESTREAM sample shows it and it's worthless and makes me crazy. The correct way to insert data is using a native or managed client that can actually put real binary data into the column as opposed to a hello world string cast as binary data. Anyway I digress... After inserting some garbage I saw some new folders and files: Next I tried to delete some rows and see the affect on the file system. Nothing changed right off the bat, and when I talked to the devs about it they said that storage gets freed up on a filestream filegroup when a valid log truncation point occurs. Otherwise you wouldn't get proper backup/restore behavior. This made perfect sense to me! One quick word of warning, trying to open up a query results grid in SSMS on gobs of filestream data is not a good idea. Take it from me. As usual, our tools are powerful and flexible enough to enable a dummy like me to shoot myself in the foot. Anywho we were up and running with a database to store our videos. Next up Zach's gonna talk about how we wired our WPF client up to it to play videos in this post.
|
-
One of the things Zach and I are trying to accomplish is to get developers to understand that SQL Server 2008 isn't just some bland bit bucket for storing and retrieving data. It's THE engine that powers immersive user experiences! Catchy huh? I'll give Zach the credit for coming up with that one. Whether you are talking about applications that serve up documents, rich media or maps, SQL Server 2008 has rich data types, powerful search, and flexible data management capabilities that make it the ideal platform for building applications for Windows and the next web now. Zach and I got our greedy little hands on an early build of CTP5 and are busy building some applications that exploit these new features and capabilities. We'll be blogging about that and other stuff over the next few weeks. - Roger
|
-
It's hard to believe that SQL Server 2005 has been out there for a couple of years, but it has. Things have really stabilized with Service Pack 2, and all of the Office 2007 Business Intelligence functionality that exploits SQL Server 2005 is shipping (with the exception of Office PerformancePoint Server 2007 which is nearing completion). I just completed a year of driving Microsoft Business Intelligence adoption in the ISV community through a program I called Microsoft BI Voyage. We worked with nearly 300 partners worldwide to help them understand how to build value-added reporting and analytics functionality using the Microsoft BI Stack.
So what's next? Well the SQL Server 2008 wave has begun! You can check out some early Community Technology Previews on Microsoft's Connect site. I'll continue to focus on driving early adoption of SQL Server 2008 in the developer community, and I'm happy to say that I won't be doing it alone! I am joined this time around by Zach Owens, newly hired Technical Evangelist. Together Zach and I will be responsible for driving early adoption of SQL Server 2008 through a variety of programs sponsored by Microsoft's Developer and Platform Evangelism Division. Zach comes to the team with a wealth of BI expertise, so he's going to take the lead on BI. I'll shift my focus to Developer, Database Infrastructure/Scalability, and Upgrade / Application Compatibility.
I'm also pleased to announce that we've selected some top services partners to join the evangelism team. They will help us build and deliver all that interesting new SQL Server 2008 content. SQLSkills will be helping us with our Developer, BI and Database Infrastructure/Scalability content. Scalability Experts will be helping us with Upgrade / Application Compatibility content. I'm extremely excited to have these partners on board because of the depth of SQL Server experience they bring to the table.
Things will be a bit quiet for the time being. I'm going on vacation at the Jersey Shore in Stone Harbor for a week in September, and Zach is new to Microsoft and still ramping up. But you will be begin to see a trickle of SQL Server 2008 related R & D posts from various members of the team in the next few weeks. Eventually we hope to have tons of good stuff up on Channel 9, and we'll be hitting the road in November 2008 to bring all the new content to a Microsoft location near you.
|
-
In my last post I talked about associating the SQL Server 2005 Reporting Services (SSRS) application pool with a domain service account rather than NETWORK SERVICE to enable access to a data source outside a firewall. One thing I forget to talk about that warrants a bit more detail is the need to register a Service Principle Name (SPN).
Users will need to connect using integrated Windows authentication with their Active Directory credentials, so I started testing access to the SSRS instance.. After I set up my new application pool for SSRS, I could connect to the report server locally, but when I tried to connect remotely I got prompted for credentials, none of which worked.
After a more thorough reading of the patterns and practices article in my last post, I saw the section at the end entitled "Creating Service Principal Names (SPNs) for Domain Accounts". In order to connect to my SSRS instance over HTTP using Kerberos, I needed to register two different Service Principle Names with Active Directory for the domain service account that I associated with the SSRS application pool.
This is accomplished using the SETSPN.EXE tool that comes with the Windows Support Tools add-on for Windows Server 2003. I recommend using the new version of the tool that ships with SP1 as it has more informative error messages.
Here's the syntax and the output I used for my first failed attempt:
C:\Program Files\Support Tools>setspn -A HTTP/servername domain\svcaccount
Registering ServicePrincipalNames for CN=Service Account,OU=UserAccounts,DC=domain,DC=corp,DC=microsoft,DC=com HTTP/servername Failed to assign SPN on account 'CN=Service Account,OU=UserAccounts,DC=domain,DC=corp,DC=microsoft,DC=com', error 0x2098/8344 -> Insufficient access rights to perform the operation.
Apparently registering an SPN with Active Directory requires a level of privileges that my lowly user account did not have. So after a quick call to the help desk I was able to have a support technician with the appropriate privileges register the SPN for me. I also asked the technician to register a second SPN for the FQDN of the server. Note that this operation occurs against Active Directory, so you can run it from anywhere on the domain.
After the SPNs were was registered I was able to verify them using the following command:
C:\Program Files\Support Tools>setspn -L domain\svcaccont Registered ServicePrincipalNames for CN=Service Account,OU=UserAccounts,DC=domain,DC=corp,DC=microsoft,DC=com: HTTP/servername.domain.corp.microsoft.com HTTP/servername
Now that my SPNs are registered, domain users can connect to the site remotely using Kerberos authentication.
Here's a recap of the infrastructure I've established in order to start building and deploying operational reports:
- Production Airframe Database
- Hosted outside the firewall
- SQL authentication enabled
- SQL login reporting created on production SQL Server 2005 instance to be used for reporting connections
- SQL database user reporting created in the Airframe database and associated with login reporting. User added to the db_datareader database role for read-only access to all the tables in the database
- Hot Standby Airframe Database
- Hosted outside the firewall
- Hosted on a different server than the production database
- Updated hourly via log shipping, database is left in standby mode to enable read-only access.
- Used for disaster recovery and to offload overhead associated with operational reporting from the production database
- SQL login reporting created with the same SID as the production instance
- SQL database user reporting in the Airframe Hot Standby database associated with SQL login reporting since it has the same SID.
- SSRS Operational Reporting Instance
- Hosted inside the firewall
- Created new application pool for the Report Server and Report Manager components using the Reporting Service Configuration Tool and associated it with a domain service account
- Registered two SPNs for the domain service account to enable Kerberos authentication for my new SSRS instance.
Now that I have all of the plumbing in place, we can proceed with the development of the operational reports and start granting users access.
|
-
In my last post I discussed how to configure database security on two SQL Server 2005 instances outside a firewall to enable operational reporting. In this post I’ll examine the security configuration of the SQL Server 2005 Reporting Services (SSRS) instance inside a firewall to be used for operational reporting.
To quickly review the scenario, I have a production SQL Server 2005 database for the Airframe application that is hosted by a third party outside the firewall and used by customers and partners. Using log shipping, I created a hot standby of the production database on a second instance of SQL Server 2005, also hosted outside the firewall. The hot standby is refreshed hourly, and is to be used for both disaster recovery and as a data source for operational reports.
The operational reports are intended for internal employees who need better access to Airframe data to do their jobs, not for customers and partners who are users of the Airframe system itself. One approach would be to add the operational reports directly to the Airframe application. I decided against this since the application is already in production and the resources required to add operational reporting functionality to the core application were cost prohibitive. Also, operational reporting is not a core requirement for the customers and partners who use Airframe, but are intended for internal employees who may not even have access to the Airframe application who need better access to the data.
I decided on a much simpler approach, which was to develop an incremental operational reporting capability using SSRS that is separate from the Airframe production application. The data for the operational reports would need to come from the hot standby system outside the firewall to avoid reporting overhead on the production environment. Users of the operational reports, all of which are internal employees, will want fast access to them using internal SharePoint sites, so the SSRS instance used will need to be hosted inside the firewall and secured using the internal user’s Active Directory credentials via Windows authentication.
Enough background, let’s get into some implementation. I think I mentioned that we need these reports yesterday, so I wanted to build some simple reports and get them deployed quickly to test connectivity and performance. After locating a server to host my SSRS instance for operational reporting, I did a quick default installation of SSRS to test my connection to the data source outside the firewall.
Next, I created a new SSRS report server project on my workstation in Visual Studio 2005 and added a shared data source to the project for my hot standby server. I used the FQDN of the hot standby server as the server name, and set the credentials to use the reporting account I blogged about in my previous post which uses SQL Server Authentication.
At first I was unable to connect since the server is outside the firewall, so I installed the ISA Server 2004 firewall client on my workstation and I was able to establish a connection from within the development environment. I developed a quick report that displays all of the children users for a parent user, and deployed it to my SSRS operational reporting instance, and tried to view the report from the SSRS report manager web site.
At this point I ran into a snag. The Report Server
My first feeble attempt was to install the ISA Firewall Client on my operational reporting server, create a data source and a simple report, and try to run the report. I kept getting the following error from Reporting Services when attempting to view the report:
Microsoft.ReportingServices.ReportProcessing.ReportProcessingException: Cannot create a connection to data source 'Airframe Hot Standby'. ---> System.Data.SqlClient.SqlException: An error has occurred while establishing a connection to the server. When connecting to SQL Server 2005, this failure may be caused by the fact that under the default settings SQL Server does not allow remote connections. (provider: TCP Provider, error: 0 - A socket operation was attempted to an unreachable host.)
at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
at System.Data.SqlClient.TdsParser.Connect(Boolean& useFailoverPartner, Boolean& failoverDemandDone, String host, String failoverPartner, String protocol, SqlInternalConnectionTds connHandler, Int64 timerExpire, Boolean encrypt, Boolean trustServerCert, Boolean integratedSecurity, SqlConnection owningObject, Boolean aliasLookup)
at System.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(SqlConnection owningObject, SqlConnectionString connectionOptions, String newPassword, Boolean redirectedUserInstance)
at System.Data.SqlClient.SqlInternalConnectionTds..ctor(DbConnectionPoolIdentity identity, SqlConnectionString connectionOptions, Object providerInfo, String newPassword, SqlConnection owningObject, Boolean redirectedUserInstance)
at System.Data.SqlClient.SqlConnectionFactory.CreateConnection(DbConnectionOptions options, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection)
at System.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(DbConnection owningConnection, DbConnectionPool pool, DbConnectionOptions options)
at System.Data.ProviderBase.DbConnectionPool.CreateObject(DbConnection owningObject)
at System.Data.ProviderBase.DbConnectionPool.UserCreateRequest(DbConnection owningObject)
at System.Data.ProviderBase.DbConnectionPool.GetConnection(DbConnection owningObject)
at System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection)
at System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
at System.Data.SqlClient.SqlConnection.Open()
at Microsoft.ReportingServices.DataExtensions.SqlConnectionWrapper.Open()
at Microsoft.ReportingServices.ReportProcessing.ReportProcessing.ReportRuntimeDataSourceNode.OpenConnection(DataSource dataSourceObj, ReportProcessingContext pc)
From the error I was able to determine that SqlClient (the .NET data provider SSRS uses to connect to SQL Server) was reach the host for the data source which is outside the firewall. To help diagnose the problem I wrote a quick C# console app that tests the connection to the hot standby database outside the firewall using System.Data.SqlClient and copied it to the server that hosted the SSRS operational reporting instance. At first the connection test failed, but after installing the firewall client the test succeeded.
SSRS however was still unable to connect even with the firewall client installed. As I dug a little deeper, I found that some firewalls disable access to built-in accounts like NETWORK SERVICE. SSRS, which is basically an ASP.NET 2.0 application, runs in an application pool whose identity is set to NETWORK SERVICE by default. There’s a great Patterns and Practices article on MSDN that discusses this in detail here. So my next step was to reconfigure SSRS to use a different application pool associated with a real domain service account.
The application pool settings for SSRS can be configured using the Reporting Services Configuration tool. Just connect to the server, then click on the Web Service Identity section . You can create a new application pool for both the Report Server and Report Manager components, and associate them with a different identity. In my case I used the same service account we use for our SQL Server 2005 services.
After I applied these changes and restarted everything, Reporting Services was able to connect to my hot standby server outside the firewall via the firewall client. In my next post, I’ll focus on how I built my first set of operational reports.
|
-
In my last post I talked about how I used SQL Server 2005 log shipping to create a hot standby of the Airframe production database to be used for both disaster recovery and as a data source for operational reporting. In this post I'll discuss how to configure security on the production and hot standby servers to enable operational reporting.
Both servers (production and hot standby) are hosted outside the firewall. The SQL Server 2005 Reporting Services (SSRS) installation for the operational reports will be hosted inside the firewall. This means that the data source used by SSRS to connect to the hot standby database will have to use SQL Server authentication. It also means that the server hosting the SSRS operational reports will need the Microsoft Firewall Client for ISA Server 2004 installed since the hot standby server is outside the firewall.
This presents a tricky security configuration problem, namely how to configure a SQL Server login and database user on the production server that can be used to connect to the database on the hot standby server. Database users added to the production database will show up automatically on the hot standby database via log shipping. Logins however are in the master database and need to be added manually on the hot standby server.
My first attempt was to create SQL Server authentication logins on both servers with the same name. Then I added a database user for the login to my production database, and waited for the next log shipping cycle to occur for the database user to show up on the hot standby server. This didn’t work. After the next log shipping event occurred, I found that the database user in the hot standby database had no associated login and couldn’t be used to connect to the database.
After some investigation I realized I had a SID problem. Every SQL Server login has a unique identifier known as a SID, which stands for security identifier. The SID is used to associate logins in the master database with database users. Logins that use Windows authentication get their SID from Windows, but logins that use SQL Server authentication have their SID’s generated by default using a certificate that is unique to the SQL Server instance. This means that even though my logins had the same name on the production and hot standby servers, they had different SID’s. Since database users are mapped to logins using SIDs, the database user in the hot standby database was orphaned and unusable.
Here’s the work-around I used to get security properly configured:
1. Production SQL Server (outside firewall)
a. Set server authentication to “SQL Server and Windows Authentication mode”
b. Create a new SQL Server login named reporting using SQL Authentication
c. Add a new user to the production database named reporting associated with the login named reporting created in the previous step
d. Add the reporting user to the db_datareader role which permits the user to read from all user tables in the database without having to grant explicit permissions (I’ll talk about why I did this in a future post)
e. Get the SID for the reporting login using the following query: SELECT [SID] FROM [master].[sys].[sql_logins] WHERE [name] = 'reporting'
2. Hot Standby SQL Server (outside firewall)
a. Set server authentication to “SQL Server and Windows Authentication mode”
b. Manually create a new SQL Server login using the following TSQL Command where somesid is the varbinary value of the SID retrieved in step 1-e above: CREATE LOGIN reporting WITH PASSWORD = 'somepassword’, SID = somesid
c. Wait for next log shipping cycle to occur
d. Check the reporting user in the hot standby database to make sure it is associated with the reporting login created in step 2-b above. This can be done by viewing the properties for the user in SQL Server Management Studio and confirming the association with the login name reporting. If the properties screen indicates “Without login” you probably didn’t use the right SID.
In my next post I’ll talk about the security configuration of the operational reporting server inside the firewall which will be used to host the SSRS reports that target the hot standby data source.
|
-
In my last post I introduced the first phase of the Airframe RA project called Operational Reporting. The gist of this phase is to augment an existing production system, in this case Airframe, with some quick and dirty reports to help everyday users of the system within Microsoft get their jobs done a bit easier.
We need these reports yesterday, so I don't have a lot of time to massage the data into a more useable form. The existing production database schema we use in Airframe will have to do. While it was tempting to build the reports right on top of the production database, I didn't want to do anything to compromise performance for users of the production system.
What I needed was a quick and dirty way to create a copy of the production database that is reasonably up-to-date, and host that copy on a completely different server to offload the overhead associated with the Operational Reports to a separate system. I decided to use a feature in SQL Server 2005 called Log Shipping to accomplish this.
Basically log shipping takes regular transaction log backups from a primary database and applies them to a secondary database. There's a handy wizard in SQL Server Management Studio for setting it up which creates a couple of SQL Agent jobs on the primary and a couple on the mirror, and you can have email sent to an operator if things get out of whack.
The secondary database can optionally be left in Standby mode, which makes it available for read-only operations. This killed two birds with one stone for me. We can use the secondary database both as a data source for our operational reports, and as a hot standby in case something catastrophic happens to the primary database.
The problem with wizards, especially complex ones that work with distributed systems, is that sometimes they don't work right. Unfortunately I had a problem with the log shipping wizard which required some manual intervention. The wizard failed when it tried to initialize the secondary database and put it in Standby mode. Basically it was trying to create a standby file in a system directory that SQL Server didn't have access to, and I couldn't find a way to redirect it to a folder where SQL Server had the right priveleges. The specific error I got was:
During startup of warm standby database 'AirframeHotStandby' (database ID 6), its standby file ('C:\WINDOWS\system32\AirframeHotStandby_20060630063016.TUF') was inaccessible to the RESTORE statement. The operating system error was '5(Access is denied.)'. Diagnose the operating system error, correct the problem, and retry startup. (Microsoft SQL Server, Error: 3441)
Fortunately most of the wizards in SQL Server Management Studio support a scripting option, so I was able to save off the wizard actions as a T-SQL script and execute it in stages to get around the problem. Basically I initialized the secondary database myself by doing a restore and put the database into standby mode. The UI for RESTORE DATABASE did allow me to specify the location of the standby file.
I wound up with the following:
- An initialized secondary copy of my primary database on a different server in standby mode
- A job on the primary server which backs up the primary databases transaction log every hour to a file share on the primary
- A job on the secondary server which copies new backups from the primary to the secondary every hour
- A job on the secondary which restores new transaction log backups to the secondary database every hour
The jobs clean up old backup files to conserve space, I set the threshold to 72 hours. They also send alerts if things don't happen in a timely fashion. I set the alert threshold to 2 hours.
So in a couple of hours I was able to create a hot standby of my production database on a separate server that gets refreshed hourly. I can start building my operational reports against the secondary database without slowing down the production system, and from an operations perspective we have an up-to-date version of our production database staged on a separate server for disaster recovery.
My next task will be to start building my first operational reports, but that's not going to happen tonight because my flight is here. Whew!
|
-
Well I have to apologize for my long absence. I'm just not a very consistent blogger. I'll just have to do better. Here's the good news: you won't see any smarmy posts from me about my personal life, I prefer to be a nameless / faceless Microsoft automaton ;-).
Now that SQL Server 2005 has shipped, I'm moving on to focus on some BI evangelism through a program we are creating called Microsoft BI Voyage (more on that later). I'm going to start a series of posts related to a dogfood project we are working on internally to add some reporting and analytics to a production web application known as Airframe.
Airframe is a sort of evangelism engine we use to work with partners in the various Ascend and Touchdown programs, and a variety of different teams within Microsoft need good visibility into the data we're collecting in that system. I'm going to use the Microsoft Business Intelligence stack end-to-end starting with SQL Server and ending with PerformancePoint Server, and hopefully share some useful tidbits with folks who are working on similar projects.
The name of the project is Airframe Reporting & Analytics (Airframe RA). We've decided to build it in two stages. The first stage we're calling Airframe Operational Reports, and we'll build some stop-gap relational reports in SQL Server 2005 Reporting Services (SSRS) to help the folks who administer and work with Airframe on a daily basis do their job a little more smoothly.
The second stage is called Airframe Performance Management, which will involve the creation of some scorecards that will help the folks who manage these programs keep track of some key performance indicators. I think these two stages represent a pretty good example of how the data assets in an existing production system can be cracked open and leveraged to provide more value to the organizations that use them.
In my next post I'll talk about the first stage of the project where I set up log shipping to create a data source for the operational reports.
|
-
How are these two topics possibly related you ask? Well, I'm in Mexico City, I'm eating a burrito, and I'm bogging about data flow tasks in SQL Server 2005 Integration Services. So there you go. I'm going to start to use some Business Intelligence / Data Mart / Data Warehousing terminology in my posts. If you aren't a member of the BI elite, don't worry about it. Remember, its just a bunch of databases, queries and front ends. Anybody who is smart enough to learn .NET, XML, Web Services, ASP.NET and the like can figure this stuff out. Don't let the terminology scare you away. Also remember that the terminology I'm using represents only one school of thought. The first set of terminology I'm going to throw out there is a data mart. A data mart is nothing more than a relational database used for reporting and analytics that follows a particular design pattern known as a dimensional model. Think of it as an OLTP schema that gets re-organized to make it easier to do reporting and analysis. Typically a data mart consists of some tables known as "fact tables" that contain stuff you want to measure (price, quantity, date, count) and some related tables known as "dimension" tables that are used to analyze your facts (product, customer, time). Dimension tables also represent useful hierarchies. For example, a product dimension would define not only products, but product categories that the products belong to. Together these fact tables and dimension tables form what is known as a star schema. Tools like SSIS help you keep your fact and dimension tables up-to-date with the latest information represented in your OLTP database schema. Okay, so now that you know everything about data marts, let's get back to SSIS. In my last post I talked about how I was to use the XML Source Adapter to read some data I extracted from SharePoint. In this post I'm going to talk about what I did with the source data in my data flow tasks. In its simplest form, a data flow task reads data from a source, runs it through a series of on-the-fly transformations, and writes it out to a destination. I'm going to pick up at the point were the XML Source Adapter has produced a result set that I now need to transform and load into SQL Server. I'll talk about two different transforms that are useful in constructing data flow tasks: - multicast transforms
- derived column transforms.
Now that I had a result set containing my source data, I wanted to persist a copy of that result set in a table before doing anything with it. This is a good design-time best practice because it allows you to compare your source data unmolested with whatever you wind up with at the end of your data flow task. To accomplish this, I simply connected my XML Source Adapter to a Multicast transform with a data flow. Then I routed the data flow outputs from my multicast transform to both to an OLE DB Destination Adapter targeting SQL Server to persist the original source data, and to subsequent transformations in my data flow task. I needed a target table to use as a destination for my raw data. Now normally I would switch out of the SSIS design environment and go write some scripts. But one nice design-time feature of the OLE DB Destination Adapter (and other destination adapters for that matter), is that you can create the target on-the-fly at design time. It simply examines the metadata for the input, and generates DDL for a compatible table structure. It shows the CREATE TABLE command in an editor dialog so you can make changes, then you just click OK and it builds a compatible destination table for you on the target. Very sweet, especially for tables with a lot of columns. I did have to go back later and auto-generate my scripts in SQL Server 2005 Management Studio so I could rebuild my data mart, but having this at design time made me more productive. So next I tested my dataflow, and it didn't work. Problem is that my multicast didn't work due to a bug in the Feb CTP. It manifested itself by throwing unknown exceptions further down in my data flow on the other branch of my multicast. So I just left my multicast in as a stub, and I'll add the OLE DB Destination back in later to persist my raw data after the bug is fixed. Again, the joys of dogfood. At this point I need to digress. Warning, here comes more elite BI babble. Again, don't be afraid, this is simple stuff. If you are familiar with reporting and analytics, you may be familiar with the concept of a time dimension. A time dimension is a fancy word for a lookup table that provides improved sorting and grouping performance on dates. A typical time dimension has one row per day, and one column for each date part. For example, for 3/21/05 my row in my time dimension would look like this: TimeKey: 20050305 (int primary key) Date: 3/21/05 (datetime) Year: 2005 (int) Month: 3 (int) Day: 5 (int) Quarter: 1 (int) Fiscal Year: 2005 (int) Fiscal Quarter: 3 (int) Why persist this data you ask? Because query processors can perform sorting, grouping and aggregation much faster against this persisted data than against datetime expressions. I needed a time dimension for my data mart, and I needed it fast. After a few google searches I was unable to turn up a quick time dimension generator, so I wrote my own in TSQL. Here it is: CREATE TABLE [dbo].[timeDim]( [timeKey] int PRIMARY KEY, [timeValue] datetime NOT NULL, [calendarYear] smallint NOT NULL, [calendarQuarter] tinyint NOT NULL, [calendarMonth] tinyint NOT NULL, [calendarDay] tinyint NOT NULL, [fiscalYear] smallint NOT NULL, [fiscalQuarter] tinyint NOT NULL) GO CREATE PROCEDURE [buildTimeDim]( @yearSpan smallint = 10, -- # of years to generate @fiscalStartMonth tinyint = 7) AS BEGIN DECLARE @today datetime, @startDate datetime, @stopDate datetime, @curDate datetime, @fiscalDate datetime IF @yearSpan <= 0 BEGIN RAISERROR('Invalid parameter value. @yearSpan must be greater than or equal to 0.', 16, 0) RETURN 1 END SET @today = '1/1/' + CAST(DATEPART(year,GETDATE()) AS char(4)) SET @startDate = DATEADD(year, (ROUND(@yearSpan / 2, 0) * -1),@today) SET @stopDate = DATEADD(year, ROUND(@yearSpan / 2, 0),@today) SET @curDate = @startDate WHILE @curDate <= @stopDate BEGIN SET @fiscalDate = DATEADD(month, @fiscalStartMonth -1, @curDate) INSERT INTO[dbo].[timeDim] ([timeKey] ,[timeValue] ,[calendarYear] ,[calendarQuarter] ,[calendarMonth] &nb | |
|