Inside SQL Server

Technology perspectives, tips, futures, and occaisonal humor from inside Redmond's building 35.

  • Tricks to help DBA's get more sleep

    If you are a DBA, the chances are that your sheep counting is interrupted by a pager or cellphone going off in the middle of the night. While you can't eliminate it, having an explicit Service Level Agreement will help you to handle outages that occur and to establish an understanding of the level of uptime that the business users will get given the level of funding for the systems. Availability is complicated by the fact that it means different things to different people, but one thing is clear - your ultimate availability is measured by the users of the applications who access the database. Taking into account this perspective, you can clarify how you pursue your system availability investments by ensuring you consider and capture the availability system design goals and resulting technologies using a framework:

    Minimize downtime
    Planned
    Unplanned
    Total amount of yearly downtime (9’s)
    Total number of events
    Includes improving detection of downtime and surfacing resolution options
    Also includes tracking historical downtime
    Minimize the percentage of users affected by downtime
    Fault isolation
    Minimize data loss
    Total number of events
    Total amount of data lost
    Includes need to detect and report the scope of data loss
    Graceful degradation when SLA performance criteria are not met

     

    -Matt

  • MTBF is useless - how to decide when to use RAID for Database High Availability

    Building reliable database systems is complicated by the fact that, as engineers, we tend to think about system design in a narrow way that is usually focused on the individual technologies. In order to make the right choices about when to use specific high availbility technologies, it is important to back up and take a big picture view. The first step is to understand what it means for a database system to be highly available. Sounds simple, right? What we typically want to enable is the end user of the database application to achieve a specific level of servicability - and this is defined by an explict or implicit service level agreement. The service level agreement should be formally defined and include the response time for specific operations, availability times, data consistency needs, times services will be available, need for protection against site failure etc. Once the SLA is understood, we can take the next step and evaluate how to acheive the best uptime based on technology that's available.

    Now on to disks and RAID..

    Taken in the broad context, RAID for HA is used to provide data redundancy. There are other techniques for data redundancy. It's important to think holistically about your SLA requirements, because it may be more cost effective to rely on one of the other forms of data redundancy. As an example, if you need to provide site level disaster protection, you may already be using database mirroring, log shipping, dual commit, etc. to keep another copy of the data for availability. In this case, it is important to consider if you also need local RAID to protect against local disk failures. It might make sense instead to invest in ensuring fast failover times. One caveat if you decide to go this route is that a failover using most existing technologies causes a brief outage from the application perspective. This may not meet the SLA requirements.

    Another interesting factor to consider is how likely a disk failure will be the cause of an outage. From a pure investment perspective, it makes sense to invest in the areas the provide the most cost effective improvement in system availability first. You might be tempted (as I once was) to use MTBF of the disk drives to help to understand how likely disk failures are to occur. Unfortunately, MTBF bears little or no relationship to disk failure rates due to the way it is measured. I once read a posting on the web that made the point that if we predicted our death rate based on MTBF, we'd have a MTBF of about 600 years!For single disk systems, I typically assume a failure rate of about 2% chance per year per drive. A 2% failure rate is very low in the list of possible downtime for a system during a year. This may lead you to not use RAID on small systems - which is in fact what most choose to do. As the number of disks increases, the chance of failure rapidly climbs. As an example, the chance of failure rises to over 72% if the system has 64 disks (1-(1-.02)^64*100))! Using this 64 disk case, if I add RAID 1 or 5 to the disks, I am able to reduce the failure rate back to about the same level as a single disk system - 2.5% (1-((1-.02)^2)^64*100)). 

    Thinking back to the SLA, you must also take into account what the tolerance is for data loss. Some disk failures (for example the database log drive(s)) are not the same as other failures from a data recoverability perspective. If you want to improve the 2% chance of data loss which would occur with a log disk failure, you should ensure that the log data has redundancy. This means either using RAID on the log drive(s) or implementing log based redundancy as part of your geographic data redundancy solution (mirroring, log shipping, etc.).

    In practical terms, it may make sense to change our thought process when it comes to high availability of database systems. Given the need to protect against data loss and to provide site redundancy, systems tend to be moving towards a geo-redundancy approach as a basic requirement which makes the importance of RAID decline somewhat in the overall decision making process.

    One final thought - the above discussion actually ignores the biggest issue which prevents high database system uptime - patching and upgrades. This is typically excluded from HA analysis since a maintenance window is often assumed for the systems. As system uptime requirements continue to shrink the available times for maintenance, it is clear that our database systems will require the ability to implement patches and upgrades without system downtime. While this is a difficult problem to solve, I'm confident that we will be up to the task.

     -Matt

     

     

  • When you need to fix copies of data that get out of sync with each other

    I can't tell you how many times I've been in the situation of trying to figure out what has changed between two tables. Often this happens when you create copies of schemas. Maybe you have a dev, test, QA and production environment and they all contain a reference table (for example sales tax by state) but the table has been updated over time. How do you tell that the data between the environments is different? How do you tell which data has diverged? How do you bring it back in sync?

    There is a really cool trick that can quickly determine which tables don't match. Using SQL Server 2005, all you have to do is configure transactional replication between the two databases, while being careful to specify that the subscriber already has the data. If you don't know how to setup transactional replication, you can find more details in the books online documentation, but basically you launch the new publication wizard from replicatin->publications folder in management studio and specify the tables you want to include. You then follow that up with the new subscription wizard from the publication node and specify the destination database. When creating the subscription, be sure and uncheck the option to initialize the subscriber. Warning: if you don't uncheck "initialize", replication will copy all of the data from the publication database to the subscrption database, wiping out all the data that used to be at the subscriber. Once replication is configured, right click on the subscription and select the option to validate. By default, replication will count the rows in each table while also generating a unique checksum based on the data in the table. This list of tables, rowcounts, and checksums will be sent to the destination subscription database and will be automatically calculated and compared. This is very efficient - much more efficient and simpler than trying to do a manual set of queries. To view the results, just launch the replication monitor (right click the replication node) and then select the "subscription watch list" and double click on the subscription in the right hand pane. The results will look something like this for tables that don't match:

    Table 'test' might be out of synchronization. Rowcounts (actual: 0, expected 1). Checksum values (actual: 0, expected: 467540).

    Pretty cool, huh? Now for the next step. For each table that you find doesn't match, you can use a little tool that I snuck into SQL Server 2005 called tablediff. Tablediff is a command line tool in the SQL Server\90\com directory that compares the rows in tables and produces a report showing what rows are different. Basically you specify the source and destination servers, instances, and tables. You can get more detail in the books online or via tablediff /? on the command line. So now you know which tables are different and which rows are different. Last but not least, you may choose to bring the two tables into sync with each other. You have two options at this point. If you have already have setup replication and just want to overwrite the subscriber with all the changes from the source publication, you can select the subscription and right click and select the re-initialize option. After the agents run, the data will be automatically loaded into the subscriber via parallel bcp. It's very fast. The second option to fix up the subscriber is to just use the tablediff tool. Tablediff has a "-F" option which is handy in this case. This option generates a SQL script that will fixup the subscriber. All you have to do is run the resulting script on the subscriber.

    Hope that helps out, please feel free to comment or post questions and suggestions

    -Matt

  • Got database scale?

    As you might expect given the size of Microsoft, there are many websites which must be able to scale to very high levels. Have you ever wondered how this is acheived? It would be impossible to use a single server to support the databases on the highest volume sites. Solutions for scaling across multiple servers typically revolve around two approaches which are sometimes combined:

    1. Splitting the data across different servers based on some form of unique identification, aka logical partitioning

    This is the key for improving scalability of insert, update and delete statements. Since each server has it's own data, the system scales linearly as you add servers. This technique is not well suited for data that must be read consistently to provide aggregate reports or views across all the data due to the cost in coordinating the servers to produce the result. For this latter type of workload, it is best to aggregate the data as it is moved to a centralized server for reporting. Customer id is an example of a common partitioning key seen in practice, as each customer typically only needs to see their own data...except in the case of something like the catalog of products they might be browsing to complete their order. For that case, we move to the second option.

    2. Keeping copies of the data synchronized across different servers

    This is the key for improving the scalability of select queries. Since each server has a copy of the data, the system scales linearly as you add servers. This technique is not well suited for data that must scale for insert, update, or delete since each server must perform the same insert, update, and delete statements. The product catalog in the above example is a great choice for keeping copies on multiple servers.

    From a technology perspective, the first option requires that you write some application specific logic to route the application to the appropriate server for the appropriate partition. This is usually a very small amount of code. For the second option, the most common technology used for this purpose is SQL Server transactional replication coupled with a small application modification to acheive adequate load balancing. Sometimes network load balancing is used (for example the newsgroups on MS.com use NLB). Transactional replication scales well to large numbers of servers.

    Combining the two techniques appropriately as needed can produce near infinite scale, using the technology you already have.

    -Matt

  • Bio

    I just realized I hadn't included my bio when I started blogging.

    After graduation in 1990, I started working as a developer on the mainframe for GTE Data Systems in Florida and moved into DB2, Informix, Datacom, and Oracle administration in the centralized DBA team. GTEDS  produced software for GTE (now Verizon) as well as international software development across all the available platforms of the day - Cray, Bull, IBM, Dec, various Unices etc. They were the biggest non-governmental purchaser of IBM mainframes. After five years, I decided that I wanted to live in the Puget sound area, so I called up Oracle and accepted a position in Bellevue as a consultant in the Telco and Cable and Wireless industry area working with the core database and financials for a couple of years. I next spent a bit over a year locally working as the database manager at SSA and doing some development work. SSA is a large shipping company which operates seaports around the world. I joined Microsoft in the SQL Server group in 1999 and have served multiple roles across the product culminating in my current position as one of seven Senior Program Managers in the cental SQL Server strategy group. While I am involved in many aspects of the technology and the business, one my focus areas is the High Availability technical direction and strategy for SQL Server.

    Outside of work, I spend much of my free time restoring my 1912 Craftsman farmhouse in Redmond.

  • How safe is your database?

    Ensuring your database is protected from loss is about as enjoyable as a dentist visit. We all know that we should have backups, and maybe even have thought about where we store the backups. I want to mention two items to think about today to save you pain later. The first is obvious when you think about it, but you may not have time to put it in perspective during your normal day.

    1. Your database can only be recovered to the point of your last backup.

    Of course this is true. Why I mention it is that many may think that since the database can recover up until the last transaction in the log that they are protected because they know they have been diligent in implementing a backup plan that includes transaction log backups. The problem is that the transaction log backups occur on an interval and aren't continuous. This means that a loss of the log drive(s) will be a single point of failure for losing data, even if you have the backups. Fortunately there is a good solution to ensure you don't have this single point of failure - Database Mirroring. Database Mirroring in High Availability mode ships the log data to a Mirror copy of the database in real time so you are always protected. This is available in SQL Server 2005 SP1 and is unrestricted in the Enterprise Edition. Best of all, you don't need to buy a license for the Mirror server so long as it is only used as a passive Mirror. Since it takes only a few minutes to setup Mirroring from an existing restored backup, it's really a no brainer. Are your databases protected?

    2. If your database log is on an IDE drive, you may lose data if the server crashes

    I was at a conference recently and a customer told me that they wanted to use mirroring, but got repeated database corruptions when testing it. I immediately knew there was something else involved given the level of testing we completed before supporting Database Mirroring in SQL Server 2005 SP1. Then I remembered the issue of IDE drives. As you know, databases must ensure that the log is physically written to disk during each commit. The API when writing to disk has a flag which you can set which indicates that you want to wait for the data to be successfully written. Unfortunately, when using IDE drives, the API will return a successful write before the data is guaranteed to be written to disk. When I asked the customer what type of machines he used to test mirroring, he replied "I used my small lab machines". Case solved. So, how many of you have important databases residing on IDE drives?

    -Matt

    matthol@microsoft.com

  • When inserts don't scale, it's not time to bail

    I often encounter questions on why an insert heavy workload doesn't scale well when moving a workload from a small server in a test environment to a much more powerful production server. Usually customers resolve the issue by adding multiple parallel insert processes or enlarging the size of the transactions. The reason that this is true is tied to how the database writes to the log. To maintain the reliability of data, the database must ensure that the log is written to disk before the database returns a success from the commit - this covers the case of a power failure. When the application issues the commit, the modified data buffers will be written out to disk sometime in the future, but the number one priority is to flush the log buffers from the transaction to disk. Once the data is confirmed by the OS as having been written to disk, the application will receive the successful message and continue. If a power outage occurs between the commit and the response, all is well because the application never received a successful response. The database ensures the data is either committed or not.

    Performance would be a really big problem in this case if there were no special optimizations. The time it takes to write the log is fixed and is a fairly significant delay introduced to every committing transaction - possibly on the order of 10 milliseconds. That means that you might only be able to support 100 transactions per second regardless of the processor power of the machine. By making the transactions bigger, you can of course dramatically speed up the insert performance, mainly because the percentage of the transaction time spent writing to disk is small vs. the size of the transaction. The second trick is what speeds up aggregate throughput when you utilize multiple parallel processes inserting data. This trick is called "group commit". The basic idea of "group commit" is to combine multiple concurrent commit requests into one effective commit. This is equivalent to issuing a larger transaction initially.

    There are other ways to improve insert performance. For more information, check out the SQL Server Customer Advisory Blog: http://blogs.msdn.com/sqlcat/archive/tags/Performance+and+Scalability/default.aspx

    -Matt

    matthol@microsoft.com

     

  • Great SQL Server resources

    The customer advisory team works directly with our biggest customers. Check out their blogs here: http://blogs.msdn.com/sqlcat/

    Euan has a good blog with particular emphasis on tools. Euan owned the DTS product (now SSIS) before his current role http://blogs.msdn.com/euanga/

    The SQL Server engine blogs here: http://blogs.msdn.com/sqlserverstorageengine/default.aspx

  • SQL Server 2005 SP2 is just around the corner

    In case you were wondering...

    The SP2 relese of SQL Server is likely to be available in late March or early April. There is a really great new performance optimization in the SP: compressed vardecimal. Numeric and Vardecimal will save significant space (and improve performance). This is especially useful for data warehousing applications, but also is important in other scenarios. For more information on SP2, check out

    http://download.microsoft.com/download/2/b/5/2b5e5d37-9b17-423d-bc8f-b11ecd4195b4/WhatsNewSQL2005SP2.htm.

    Matt

    matthol@microsoft.com

  • Database virtualization is virtually certain

    You may not realize it, but operating system virtualization is at least 40 years old. That's eons in the technology world. The first virtualization offering that I remember was IBM VM, however there were others before that point. IBM had an internal project around 1966 that eventually surfaced as VM, but not before IBM released MVS (Multiple Virtual System), which was a partially virtualized operating system for the mainframe. MVS/ESA was the first operating environment I used as a programmer upon graduation. Why didn't they release the original VM version of the OS before MVS? The rumor was that there were concerns about how to license the technology in a way that allowed them to recapture the development costs of the operating environment. For more information, check out http://en.wikipedia.org/wiki/VM/CMS

    So this brief background brings me to two questions that relate to SQL Server:

    1. Why are virtual operating systems re-emerging?

    In the "old days" when hardware was very expensive, centralized systems made the most of precious commodities. The technology is re-emerging on commodity hardware for a different but related reason. Today the hardware is very inexpensive and powerful (especially with multi-core processors), and it is important to adequately utilize all the capacity. Space and power consumption are important factors. Of course consolidation of multiple systems onto fewer servers can make it much easier to manage large environments. Moving applications between servers often requires a re-install of all of the applications when not using virtualization. With a virtualized OS, you can move an environment to a new server in minutes and in some cases without more than a pause visible to the application users. Virtual Server is the current virtualization offering from Microsoft. 

    2. What are the SQL Server licensing issues for virtual environments?

    SQL Server is supported running within a Virtual Server environment. While it is possible to run SQL Server using other virtualized environments, we currently do not support them directly. Like all companies, we must be very careful to keep our test matrix to manageable proportions especially since the demands tend to increase exponentially along with the spread of new technology. When licensing virtual environments, you basically just treat each virtual machine environment like a non-virtual environment. Therefore, you will need a server license for each instance deployed for a CAL based approach or a processor license for each processor utilized in a virtual machine. Because the virtual environment licensing is relatively new, we may make adjustments to account for the types of deployments that emerge over the next several years. For current information, check out: http://www.microsoft.com/sql/howtobuy/virtualization.mspx

    Virtualization has some overhead, so it is not the only solution when consolidating workloads on servers. Since SQL Server provides multi-instance support, this will likely always be an option for highly performance sensitive database workloads. However, given the flexibility and cost savings of virtualized server environments, I think that the advance of database virtualization is a virtual certainty.

    *Updated* - I got this in email:

    Question:

    I am ...confused about licensing issues for virtual environments. You had said that 'When licensing virtual environments, you basically just treat each virtual machine environment like a non-virtual environment. Therefore, you will need a server license for each instance deployed for a CAL based approach'

     

    I believe that this is contradicting

    http://www.microsoft.com/sql/howtobuy/virtualization.mspx which states that 'When Licensed per Server/Client Access License. Workgroup, Standard, and Enterprise editions of SQL Server 2005 now allow for unlimited instances within each virtual or physical operating environment.'

     

    Am I correct in sating that you can install many SQL Server instances on separate virtual servers with a single SQL Server license?

     

    Answer:  The question you list above is related to a separate issue - how many instances can be installed within a virtual machine. As that article states, there is no limit to the number of instances that can be installed within a virtual machine. In other words, you can install as many virtualized servers as you want on any of the listed editions. Whether the machine is virtual or physical, if you license per proc you just need licenses for each virtual or physical proc regardless of the number of instances on the machine/image. With the server/CAL model you need server licenses for every instance of SQL Server regardless of the whether the instance is virtualized or native to the machine. You can find the detailed legalese here: http://www.microsoft.com/sql/howtobuy/processor.mspx

     Matt

    matthol@microsoft.com

  • Zero to high availability in 60 seconds...

    I am a car fanatic. My first car was a Triumph GT6+ which I noticed under a tarp in my neighbors garage. It had been sitting idle for ten years.  Six years ago, I modified a Mustang extensively, adding Brembo brakes and a supercharger, exhaust headers, a modified suspension, etc and it will accelerate from zero to sixty miles per hour in just over five seconds. When new car magazines come out, I pore over new car statistics, constantly in search of exciting cars across the performance, style, price and other measures. Sometimes friends will ask me which car I like the best. I can never answer this question because each car has a different personality based on the trade off decisions made during the design of the car. The recent ice storm in Seattle makes the point well – the Mustang was completely immobile on the ice.

    I often field questions asking which high availability technology is best. This is the same type of question as the one asking which car is best. It is relatively easy to make generalizations that are somewhat helpful: choose Microsoft Clustering Services if you want a simple way to provide local redundancy for a server running SQL Server and other products. If you want to provide disaster recovery across geographic locations, choose Database Mirroring. Providing a method of recovering from user errors will likely lead you to use Log Shipping, which allows you to specify a delay to be introduced when the remote system is synchronized, giving you time to stop and recover using an older version of the data. And finally, replication is a good choice if you may need to use the remote copies of the data for other workloads instead of it being a purely passive backup copy.

    Of course there are other factors to consider. What if you need two of the characteristics in one solution? Which technologies work seamlessly together? Which technologies actually provide more than one capability? For example, if I use peer to peer replication to provide multiple updatable sites with copies of the data, should I also use clustering to protect against local site failures, or just include another local P2P replica to provide the redundancy? These and other questions require a bit more thought about the trade offs that best suit the individual situation. If you have questions about which technology is right for you, let me know and I’ll post interesting questions and commentary. Just don’t send me mail asking which car is best for you.

     

    - Matt

    matthol@microsoft.com


© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker