Last week I had the good fortune of presenting at the the Seattle PHP Meet Up on a topic I knew only a little about: transaction isolation levels. It was fun doing the homework to learn more and I’ll share what I learned in this post. However, before diving in, I want to call out two things:

  1. This post is only somewhat PHP-specific. I’ll focus largely on the concepts behind database transactions and isolation levels.
  2. Most of the concepts I’ll look at are database agnostic (although I will use SQL Server as the vehicle for explaining the concepts) and are supported by most relational database management systems out there (e.g. SQL Server, MySQL, Oracle, DB2, etc.).

So, consider this post to be Database Transactions and Isolation Levels 101.

Database Transactions

A database transaction is a unit of work (i.e. a group of queries) that should either completely succeed or completely fail. A classic example used to illustrate transactions is what happens when you transfer money from your checking account to your savings account. Typically, this would involve two update statements (showing a $10 transfer):

UPDATE BankAccount SET Checking = Checking - 10 WHERE AccountId = 1
UPDATE BankAccount SET Savings = Savings + 10 WHERE AccountId = 1

In order for the transfer to be successful, BOTH of these queries must succeed. And, if one of them fails, it would be best for the other to fail too (thus meaning that the transfer was unsuccessful, but money didn’t “magically” appear or disappear in one account). This is exactly what happens if we make these statements part of a transaction. When statements are part of a transaction, we can then commit the transaction (i.e. permanently save the results) if everything went according to our liking, or rollback the transaction (i.e. return to the state the database was in before we started the transaction) if something went wrong. A bit of PHP code makes this idea clear:

$serverName = "ServerName\sqlexpress";
$connectionInfo = array( "Database"=>"DatabaseName", "UID"=>"uid", "PWD"=>"pwd");
$conn = sqlsrv_connect( $serverName, $connectionInfo);

//Initiate transaction.
sqlsrv_begin_transaction( $conn );
//Initialize parameter values.
$money = 10;
$accountId = 1;

// Set up and execute the first query.
$tsql1 = "UPDATE BankAccount SET Checking = (Checking - ?) WHERE AccountId = ?";
$params1 = array($money, $accountId);
$stmt1 = sqlsrv_query( $conn, $tsql1, $params1 );

// Set up and execute the second query.
$tsql2 = "UPDATE BankAccount SET Savings = (Savings + ?) WHERE AccountId = ?";
$params2 = array($money, $accountId);
$stmt2 = sqlsrv_query( $conn, $tsql2, $params2 );

//If both queries were successful, commit the transaction.
//Otherwise, rollback the transaction.
if( $stmt1 && $stmt2 )
{
    sqlsrv_commit( $conn );
    echo "Transaction was committed.<br/>";
    $stmt3 = sqlsrv_query($conn,
                          "SELECT * FROM BankAccount WHERE AccountId = ?",
                          array($accountId));
    $row = sqlsrv_fetch_array($stmt3);
    echo "AccountId: ".$row[0]." Checking: ".$row[1]." Savings: ".$row[2]."<br/>";
}
else
{
    sqlsrv_rollback( $conn );
    echo "Transaction was rolled back.<br/>";
    $stmt3 = sqlsrv_query($conn,
                          "SELECT * FROM BankAccount WHERE AccountId = ?",
                          array($accountId));
    $row = sqlsrv_fetch_array($stmt3);
    echo "AccountId: ".$row[0]." Checking: ".$row[1]." Savings: ".$row[2]."<br/>";
}

Note: The PHP code above uses the transaction APIs of the SQLSRV driver (sqlsrv_begin_transaction, sqlsrv_commit, sqlsrv_rollback) to handle transaction operations. This is the recommended way to begin and commit or rollback transactions even though it is possible, to use plain ‘ol SQL to do so (e.g. BEGIN TRANSACTION). The reason for this is that queries that are not part of an explicit transaction are part of an implicit transaction (this behavior is common to most RDMSs). If you begin a transaction like this, sqlsrv_query($conn, “BEGIN TRANSACTION”), that query itself is part of an implicit transaction (i.e. you have nested transactions). Rather than try to predict the outcome of such transactions, it is simply recommended that you use the driver APIs to handle transaction operations.

The code above should successfully commit the transaction. To see what happens if part of the transaction fails, change this line, $params2 = array($money, $accountId);, to this: $params2 = array(“ABCD”, $accountId);. This will cause the second UPDATE query to fail (“ABCD” is not the correct data type for the Savings column), thus forcing the entire transaction to be rolled back. The values in the accounts will be restored to their values before the transaction was initiated.

 

Transaction Isolation Levels

Transactions are relatively simple until we start wondering what happens when multiple users are trying to read and/or modify data while a transaction is in progress. Suppose that while the transaction above was in progress another user with access to the bank account tried to find out the balance for the checking account. What should this user see? The amount before the transaction was initiated? The amount after the transaction is committed? What if the transaction is rolled back? This is where transaction isolation levels come into play. However, it is difficult to talk about the effects of isolation levels with out first talking about three concepts related to concurrency:

  • Dirty read: A “dirty read” occurs when one transaction reads data that is being modified by another transaction and the uncomitted data is returned. For example, suppose there is $100 in my checking account. When I try to read the balance of my checking account while my wife is in the process of transferring $10 to savings and I see $90, that is a dirty read. I saw the uncomitted version of data that was in transaction.
  • Non-repeatable read: A “non-repeatable read” occurs when a read cannot be repeated within a transaction. For example, suppose I begin a transaction that reads my checking balance (say $100) and my wife begins another transaction that is transferring $10 to savings. If I try to read my checking balance again (within the same transaction) and I see $90, then my first read of the data was a non-repeatable read.
  • Phantom read: A “phantom read” occurs when data seems to appear out of nowhere. For example, suppose I started a transaction that read the balances of all bank accounts. And, further suppose that I read all the data again within the same transaction. If someone opened a new bank account between those two reads and I see a new row of data in the second read, that is a phantom read (i.e. a new row appeared out of nowhere).

Those concepts are important in understanding isolation levels.

Set Up

Here, I will describe how the following transaction isolation levels can be used to change what multiple users can see when accessing data that is in transaction: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE (all of which are supported by SQL Server, MySQL, Oracle, DB2). (For information about SNAPSHOT isolation, see this post: SQL Server Driver for PHP Connection Options: Snapshot Isolation.)

To effectively show what each of these isolation levels does, I need to be able to begin a transaction and leave it open (i.e. not commit it or roll it back). This is very easy to do using SQL Server Management Studio (SSMS) and Transact-SQL, but not so easy using PHP scripts since it’s difficult to execute only part of a script (at least it’s difficult when it comes to demonstrating transactions). You can download and install (for free) SQL Server Express and SQL Server Management Studio here in case you want to work along with this post. (Be sure to select Database with Management Tools or Database with Advanced Services.)

Here is the script for creating the database that I’ll use:

CREATE DATABASE [TestDB]
GO

USE [TestDB]
GO

CREATE TABLE [dbo].[BankAccount](
    [AccountId] [int] NOT NULL,
    [Checking] [money] NOT NULL,
    [Savings] [money] NOT NULL,
CONSTRAINT [PK_BankAccount] PRIMARY KEY CLUSTERED
(
    [AccountId] ASC
)
) ON [PRIMARY]
GO

INSERT INTO BankAccount (AccountId, Checking, Savings)
VALUES (1, 100, 100)
GO

With that database in place, I’ll demonstrate how to set the various transaction isolation levels and what affect they have.

 

READ UNCOMITTED Isolation Level

When a transaction is run under the READ UNCOMITTED isolation level, other users can read data that has not yet been committed (i.e. dirty reads are allowed). For example, consider the following transaction (which you can execute in SSMS). Note that the transaction is still open (i.e. it has not been committed) since the COMMIT TRANSACTION command is commented out:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
BEGIN TRANSACTION

DECLARE @money INT
DECLARE @accountId INT
SET @money = 10
SET @accountId = 1

UPDATE BankAccount SET Checking = Checking - @money WHERE AccountId = @accountId
UPDATE BankAccount SET Savings = Savings + @money WHERE AccountId = @accountId
--COMMIT TRANSACTION

When another query tries to read the data that is in the transaction, the uncommitted data is returned. For example, if you execute this query in a new SSMS window…

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

BEGIN TRANSACTION

SELECT * FROM BankAccount

COMMIT TRANSACTION

…you’ll see values of 90 and 110 returned for Checking and Savings respectively (i.e. uncommitted data is readable). If you highlight the COMMIT TRANSACTION in the first query window and press F5, you will commit the transaction. As I mentioned earlier, whether you want users to be able to read uncommitted data (also known as “dirty” data) will depend on the needs of your application.

The READ UNCOMITTED isolation level is the most permissive level. Not only does it allow dirty reads, it allows non-repeatable reads and phantom reads (which are demonstrated in the examples below).

READ COMMITTED Isolation Level

When a transaction is run under the READ COMMITTED isolation level, users cannot read data that has not yet been committed; only committed data is readable. (This is the default isolation level for most databases.)  To see this, execute this query in SSMS (note that the transaction has not been committed):

SET TRANSACTION ISOLATION LEVEL READ COMMITTED

BEGIN TRANSACTION

SELECT * FROM BankAccount

--COMMIT TRANSACTION

You should see data returned for Checking and Savings.

Now execute the following query in a new SSMS window (again note that the transaction has not been committed):

BEGIN TRANSACTION

DECLARE @money INT
DECLARE @accountId INT
SET @money = 10
SET @accountId = 1

UPDATE BankAccount SET Checking = Checking - @money WHERE AccountId = @accountId
UPDATE BankAccount SET Savings = Savings + @money WHERE AccountId = @accountId
--COMMIT TRANSACTION

Now return to the first query, highlight SELECT * FROM BankAccount, and press F5 (this executes the SELECT statement as part of the open transaction). , You will see this at the bottom of the query window:

image

Because the the isolation level is READ COMMITTED, only committed data will be returned. The query is blocked from reading any data because it is trying to access uncommitted data. When you commit the data (select COMMIT TRANSACTION in the second transaction and press F5), you will see data returned by the first query (it is no longer blocked because the data it is selecting is now committed).

Be sure to select COMMIT TRANSACTION in the first transaction and press F5 (so we don’t leave any transactions open).

While the READ COMMITTED isolation level does not allow dirty reads, it does allow non-repeatable reads and phantom reads (again, demonstrated below).

REPEATABLE READ Isolation Level

When a transaction is run under the REPEATABLE READ isolation level, users are guaranteed to get the same data from one read to the next within a transaction. To see this, execute the following query in SSMS:

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ

BEGIN TRANSACTION

SELECT * FROM BankAccount

--COMMIT TRANSACTION

You should see data returned for Checking and Savings.

Now execute the following query in a new SSMS window:

BEGIN TRANSACTION

DECLARE @money INT
DECLARE @accountId INT
SET @money = 10
SET @accountId = 1

UPDATE BankAccount SET Checking = Checking - @money WHERE AccountId = @accountId
UPDATE BankAccount SET Savings = Savings + @money WHERE AccountId = @accountId
COMMIT TRANSACTION

You will see this at the bottom of the query window:

image

Now return to the first query, highlight SELECT * FROM BankAccount, and press F5 (this executes the SELECT statement as part of the open transaction). Note that the same data is returned (you repeated a read). When you commit the transaction (highlight COMMIT TRANSACTION and press F5), you’ll find that the second query completes (i.e. the data is updated).

Because the the isolation level is REPEATABLE READ, the second query is blocked from updating data so that reads can be repeated in the first query. Running the transaction under the READ UNCOMMITTED or READ UNCOMMITTED isolation levels would not block the updating query from executing (thus allowing for a non-repeatable read). Note that the REPEATABLE READ isolation level does allow for phantom reads (explained below).

 

SERIALIZABLE Isolation Level

When a transaction is run under the SERIALIZABLE isolation level, other users will not get dirty reads, unrepeatable reads, or phantom reads. You can repeat any of the above scenarios with the isolation level set to SERIALIZABLE to see that dirty reads and non-repeatable reads are not possible. To see how this isolation level prevents phantom reads, consider the following example. (Note: to see an example of a phantom read, follow the example with the isolation level set to any of the other levels mentioned above).

Execute the following query in SSMS:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

BEGIN TRANSACTION

SELECT * FROM BankAccount

--COMMIT TRANSACTION

In a new SSMS window, execute the following query:

BEGIN TRANSACTION

INSERT INTO BankAccount (AccountId, Checking, Savings) VALUES (2, 200, 200)

COMMIT TRANSACTION

You will now see the familiar…

image

…indicating that the query is blocked. Return to the first query, highlight SELECT * FROM BankAccount, and press F5. Note that you do not see a second row returned (phantom reads are not possible). When you select COMMIT TRANSACTION and press F5 in the first query, the INSERT in the second query will then be allowed to complete.

 

Summary of Isolation Levels

Isolation levels govern what data concurrent users can see during transactions. The most permissive isolation level is READ UNCOMMITTED and the most restrictive is SERIALIZABLE. The following table shows this in more detail:

Isolation Level Dirty read Non-repeatable read Phantom read
READ UNCOMMITTED Yes Yes Yes
READ COMMITTED No Yes Yes
REPEATABLE READ No No Yes
SERIALIAZBLE No No No

As demonstrated above, data is locked according to the isolation level of a transaction. The implications of locked data should be considered carefully when designing an application that utilizes transactions.

 

Setting Transaction Isolation Levels in the SQL Server Drivers for PHP

Both the SQLSRV and PDO_SQLSRV drivers (downloadable here) support transactions and transaction isolation levels. The transaction isolation level is set when establishing a connection to the server. For more information, see the IsolationLevel keyword in the Connection Options topic of the driver documentation. Also note that the driver APIs for executing transactions (sqlsrv_begin_transaction, sqlsrv_commit, sqlsrv_rollback in the SQLSRV driver, and PDO::beginTransaction, PDO::commit, PDO::rollback in the PDO_SQLSRV driver) should be used instead of executing a SQL such as sqlsrv_query($conn, “BEGIN TRANSACTION”).

That’s it for today. Hope this proves helpful.

Thanks.

-Brian

Share this on Twitter