Applications connecting to SQL Server sometimes experience connections breaks due back-end failures.  In general, they tend to be uncommon in LAN environment and more frequent in WAN topologies, when connectivity tends to be challenged by geographically dispersed links.

The introduction of Windows Azure SQL Database (WASD), which is a cloud-based SQL Server, added new availability and reliability challenges to the connectivity space. As a result, applications connecting to WASD witness frequent connection breaks due to backend reconfigurations and failures, forcing developers to include special logic for every database operation in order to handle these unexpected situations.

Basically, every time a connection drops, the application needs to decide to either reconnect (if it was a recoverable error (e.g. “server is busy)) or return an exception to the end user (if it was a non-recoverable one (e.g. “invalid password”)).  In order to better handle these situations, developers can use a set of resources and best practices to avoid unwelcomed surprises after deploying applications to production.  The Transient Fault Handling Framework for SQL Database, Windows Azure Storage, Service Bus & Cache is a free component that encapsulates all the logic required to handle most of the disconnections happening in SQL Database.  For more information, please refer to its MSDN’s documentation here.

Also, for general resiliency guidance, please refer to Failsafe: Guidance for Resilient Cloud Architectures.

The Idempotency Issue

Now, special care must be considered when writing to a SQL Database table because a simple IUD (INSERT, UPDATE or DELETE) statement can potentially lead to data corruption.  This is because there is no way to know if the statement was executed by the server immediately after a connection failure.  Consider the code snippet below:

public class Data

{

public void Query(SqlConnection sqlConnection, string szQuery)

{

try

{

using (SqlConnection conn = new SqlConnection("…"))

{

conn.Open();

SqlCommand sqlCommand = new SqlCommand(szQuery, conn);

sqlCommand.ExecuteNonQuery("UPDATE productList SET Price=Price*1.01");

}

}

catch (SqlException)

{

//...

throw;

}

}

}

If the connection fails immediately after sqlCommand.ExecuteNonQuery, there is no way for the application to know if it needs to reissue the query (the UPDATE didn’t execute) or not (the UPDATE executed) upon reconnection.

In order to avoid this problem, the application must track if a command was uniquely executed (idempotent command) or not.  If it were, then, there is nothing to do, but if it weren’t, then the application can safely re-execute it.  Currently, SQL Database does not offer any idempotency capability, transferring this responsibility to the application.

One way to track idempotent activities is to track all writing database operations in a table, like the one below.  This way, the application can query the table to know if a command was executed or not.

 Once the tracking information is available, there are many ways to query it in order to reissue the command.  Personally, I like to encapsulate the code in stored procedures and confine the idempotent logic to the database.  This way, it becomes transparent to the application and simplifies the use of the Transient Fault Handling Framework for SQL Database, Windows Azure Storage, Service Bus & Cache component.  The snippet below exemplifies this approach:

CREATE PROCEDURE UpdateBranch

-- Add the parameters for the stored procedure here

@Operation AS UNIQUEIDENTIFIER,

@ParentId AS INT,

@BranchDbId as INT

AS

BEGIN

DECLARE @COUNT_ID AS INT = 0

-- SET NOCOUNT ON added to prevent extra result sets from

-- interfering with SELECT statements.

SET NOCOUNT ON;

SET @COUNT_ID = (SELECT COUNT([ID]) FROM dbo.IdempotencyTracker WHERE ID = @Operation)

IF (@COUNT_ID = 0)

BEGIN

BEGIN TRANSACTION

UPDATE [dbo].[Branch] SET [BranchDbId] = @BranchDbId WHERE ParentId = @ParentId

INSERT INTO [dbo].[IdempotencyTracker] ([ID]) VALUES (@Operation)

COMMIT

END

END

In the scenario above, the application is responsible for generating and saving @Operation value in order to avoid losing track of the operation.

Concluding, although using the Transient Fault Handling Framework for SQL Database, Windows Azure Storage, Service Bus & Cache component is the right way to avoid connectivity issues in SQL Database, it’s not enough.  You really need to implement an idempotency strategy in order to avoid data duplications or inconsistencies when writing to the database.

 

Luiz Santos

Sr. Program Manager

SQL Server Team