SQL Databases on File Shares - It's time to reconsider the scenario.

SQL Databases on File Shares - It's time to reconsider the scenario.

Rate This
  • Comments 11

For those who have been around databases for any length of time, the idea of putting a database that you care about from either a reliability or performance perspective on an  (SMB – Server Message Block) file share seems like a crazy idea, but recent developments have made SMB-based file shares a very viable platform for production SQL Server databases with some very interesting advantages.

Historically, the perspective has been:

  • File shares are slow.
  • The connections to the share may be unreliable.
  • The storage behind the file share may be flaky.
  • SMB consumes large amounts of CPU if you can get it running fast enough.

Over the past few years, all of these conditions have changed, and in particular the work which has been done on the 2.2 revision to the SMB protocol has produced some stunning results.

So let’s look at these one by one:

File Shares are slow

There are two components to this one:

The raw speed of Ethernet vs  Fibrechannel and the speed/efficiency of the SMB protocol.

The transport layer has seen a very significant improvement in recent years.  Where at one point Ethernet was orders of magnitude slower than Fibrechannel, this is no longer the case.  Current commodity Ethernet is running up to 10 gigabit with 40 gigabit being tested, and on the near horizon.  This will put Ethernet on par with Fibrechannel from a bit-rate perspective, and the projections are that the two technologies will leap-frog each other from here out with neither one being a clear leader.

On the protocol front, the original SMB 1.x protocol was chatty, inefficient, and slow.  Over the last couple of years, the Windows file server team, while developing the 2.2 version of the protocol, has been using SQL Server, with a TPCC workload, as one of the performance benchmarks.

The benchmark configuration was to take a fibre-attached array, connect it to a server and run TPCC. 
Then add an identical server connected to the first with a single 1Gb link, and run the TPCC database on the new server with the original server functioning as a file server.

When they started, TPCC over a file server ran at ~ 25% the speed of direct-attach storage.  The team discovered several performance  problems in the stack, but one particular bug on the client side made a stunning difference.  The current results are that TPCC running over an SMB link as described above performs at 97% of the speed of direct-attach.  That is a stunning result, and one which is not limited to Windows file servers since the fix is on the client side.

So now, we have an SMB implementation running at speeds comparable to a fibre-attached array.

Connections to the share may be unreliable

Again, there are multiple parts to this one.  One aspect of this is that the underlying networking hardware has gotten very much more reliable in recent years.  Consumers and enterprises alike just wouldn’t put up with flaky network connections these days.  The popularity of FCoE (Fibre Channel over Ethernet) is an indicator of how much confidence of Ethernet as a storage transport has grown.

The second aspect to this one again comes back to the work done in the 2.2 version of the SMB protocol. 
With this version, SMB has a number of resiliency features built in.  If a link was to momentarily drop, in the past the connection would be lost and the file handle broken.  With the 2.2 version of the protocol, the link is automatically re-established and the application never sees the event other than a momentary stall in outstanding IO.

If we take the configuration a step further, the file server itself can be clustered, and now has the capability to failover a share from one file server to the other without losing handle state.  To clarify, SQL Server running an active workload can have the file server hosting the database files fail over, planned or unplanned, and SQL sees only a momentary drop in IO rates.

The storage behind the file server may be flaky or unreliable.

While it is always possible to put together an unreliable server, the tools now exist to incorporate very sophisticated reliability features right in the box.  Particularly with the advent of Windows 8 features recently announced, we have a pretty good toolset native in the OS.  We can create pools of storage which can be dynamically expanded.  Pools can be assigned a variety of RAID levels.  Many of the features which were previously only available in Fibre-attached arrays are now available with direct-attached storage on a Windows File Server.  When you add in the capability for failover and scale-out clustering, the reliability becomes very impressive.

SMB consumes large amounts of CPU if you can get it going fast enough.

This is actually a painful aspect of Ethernet which has hurt iSCSI as well as SMB and other protocols.

A recent transport development is the rDMA transport, which enables data to flow directly from the network wire into user space, without being copied through kernel memory buffers. 
This produces a huge reduction in CPU utilization at high data rates.  How much? 
I’ve seen an Infiniband-based SMB connection sustaining  > 3 gigaBYTES per second, while consuming around 7% CPU, using SQLIO 512K writes as a workload.  We’ve seen prototype units performing at twice that rate in the lab.

Additional benefits

Now that we’ve discussed why the factors which previously were blockers no longer are, let’s discuss some of the additional benefits:

Manageability

Consider the steps required for a DBA to move a database from one server to another:

SAN

SMB 

Take database offline/detach

Take database offline/detach

File request to remap LUNs

Database is attached to new server using UNC path

Meet with storage admin

Database is brought online

LUNs are unmapped from original server

 

LUNs are mapped to new server

 

LUNs get discovered and mounted on new server

 

Database is attached to new server

 

Database is brought online

 

 

Additionally, you have one set of tools for configuring storage, as opposed to separate tooling for each SAN vendor which you use.

Cost

Cost is always a concern, and with the capabilities which this platform brings to bear, we can accomplish what previously has required a much more expensive solution, for a fraction of the cost without sacrificing performance, reliability or manageability.

Example

As one example of the whole package, one reference configuration for a SQL deployment had originally been configured with Infiniband for communications, and several small SAN arrays – one per server in a rack.  By converting that configuration to a single clustered file server, the total cost of the solution was dropped dramatically:  ~$50,000 in FibreChannel hardware was saved, and the cost savings in moving from multiple FC arrays to the clustered fileserver was very substantial. 
The kicker though was that the performance of the solution was better than the original configuration, as previously it had bottlenecked on the
Storage Processors in the arrays.

So, the overall cost is substantially lower, the required features are delivered, and the performance is improved. 

What’s not to like?

Leave a Comment
  • Please add 8 and 8 and type the answer here:
  • Post
  • What about the case where your SMB file share is a NAS (that, for licensing cost reasons you don't have iSCSI)?

  • This is all great and good but from what I'm reading so far seems more academic than practical. Can we by chance see hard numbers on these tests and some more details? Operating System version/patch level? SQL Server version and workload tested?

  • Brian,  The File Server team has spent the last 6-9 months meeting with the major NAS vendors, laying out how SMB 2.2 works, and getting commitments from them to implement.  So, the protocol-related parts will be available across NAS vendors to a large extent.

    SQLChicken, We are running our automated test suites against SMB configurations and direct-attach side-by-side in the lab now.    There were a number of talks at the BUILD conference that discussed this work as well.  One good example would be: channel9.msdn.com/.../SAC-444T

  • The Manageability section seems to imply that servers will just use one network connection for both SMB traffic and regular network traffic.  "Mount and go" means there's no permissions - the files are just available to everyone without security issues.  Is that really the case you expect in the real world?

    Assuming it's not, and that network traffic and security issues will match typical SAN storage, I'm not sure how manageability is easier for SMB.  I mean, sure, if you say anyone can access any mdf/ldf, manageability is easier - but then you can do that same thing in a SAN.  Just don't zone any storage and you get the same effect.

    Of course, there's two reasons we zone storage - performance and security.

  • By the way, just to be clear, as long as it's secured and the traffic isn't fighting for the same network adapters, I love this.  Makes perfect sense in virtualization environments.  Just wanna be really, really sure we're not using the regular networks in virtualization environments - that's a recipe for performance disaster.

  • Brent,

    I can see where it also has applications for smaller organizations who can afford a more lightweight NAS and want a reasonable HA solution. Not talking SAN. Now if the physical server where SQL is installed on drops, pretty quick to recover if you have another one, especially if you're extracting logins on a regular basis.

  • In fact, in some cases, money can solve everything, but the problem is not money, it can only improve performance through technology, thank you for your share.

  • Brent,

    I assume that someone who is laying out a storage network will not intermix it with the office email network.  With Fibrechannel that's easy because you can't, but the same principles should apply.

    So I would always recommend a dedicated network to isolate impact in both directions.

    As to permissions, I would expect that permissions on the share would be locked down, but that either the servers would all have access to shares which they might have need to mount, or that adding the permissions would not be an onerous task.  Once added/configured, my comments about the ease of operation should stand.

  • Kevin - great, can you update your manageability steps above then?  In the SMB column, you'll need to add steps to get permissions on the shares for the database.  If it's a free-for-all, then on the SAN side, you'll need to remove the steps about LUN mapping.

    You can't have it both ways - you can't say that SMB is easy to manage because there's no security, and then add security over on the SAN side.  Is that fair?

  • While SMB *may* be a choice, things shift. You may have a security issue. So yes, you take the complexity away from LUN masking/mapping/et al., but cause another potential hurdle. I've tested it, and the SQL service account needs admin rights.

    Another thing as it relates to availability. Network reliability is still an issue. Then there's ensuring the SMB share itself is highly available (read: on a clustered file share). Sure, things get easier in Win8 with built-in NIC teaming for fault tolerance, but to outright assume the network won't be a potential availability problem anymore is not true - at least not from what I've seen at customers.

    If SMB 2.2 requires Win8 (which I believe it does), it's already a non-starter for the near future.

    Then there's network performance. I would be very hesitant to stick my most demanding, mission-critical (and large) DB on a SMB share if my network sucked. So in theory while it's true networking itself is better, how a company implements is a different ballgame. Testing is still an important aspect.

    Don't get me wrong. On a well designed infrastructure with dedicated 10 gigabit NICs, SMB could fly. I've tested it with current functionality and it works. But using SMB is far from a slam dunk right now.

  • I would be curious to know if the stale read and lost write conditions associated with cifs have gone away. This would be a compelling reason to avoid this.

    Quite like all the new features though

Page 1 of 1 (11 items)