In my previous post on searching encrypted data, I mentioned that the SQL Server 2005 encryption procedures are salted and that this prevents an index on encrypted data from being useful for any type of cleartext searches. Today, I will illustrate why encryption should be salted by presenting a small C# encryption demo.

Interest in searching encrypted data comes primarily from the use of social security numbers and credit card numbers as identifying information, while these are at the same time supposed to be confidential information. Thus, the requirement for the ability to search these data is combined with the need to protect it via encryption. To make the demo relevant to these requirements, I will use two made-up credit card numbers, but the issues I will point out are relevant to social security numbers as well as to other types of data.

Before I present the code, I should explain some technical details about encryption. In the following discussion, plaintext will refer to the unencrypted data and ciphertext will refer to the encrypted data.

Block encryption algorithms: TripleDES is a block encryption algorithm with a block size of 64 bits (8 bytes). This means that the algorithm encrypts chunks of 64 bits at a time. It takes the first 64 bit block of the cleartext, encrypts it, then moves on to the next 64 bit block of cleartext. This also means that if we encrypt two pieces of cleartext and they have an identical 64 bit block, then the corresponding encrypted block will be identical in both encryptions. Most encryption algorithms today are block algorithms: AES (128 bit block) and RC2 (64 bit block), for example, are other examples of block encryption algorithms.

Chaining Modes: To mitigate the fact that the same plaintext block would encrypt to the same ciphertext block, various chaining methods can be used for encryption. The chaining mode I will use in this demo is called Cipher Block Chaining (CBC) and is a commonly used mode. In this mode, when a cleartext block is about to be encrypted, it is first combined with the previously encrypted block (through a XOR operation), before the actual encryption transformation takes place. This way, even if two blocks of plaintext are identical, the corresponding encrypted blocks would still be different except when all previous plaintext blocks were identical as well.

Initialization Vector: So the CBC mode will combine the currently processed plaintext block with the previous ciphertext block. But what will happen if we just started encryption and we're processing the first block of plaintext - there is no previously generated ciphertext block yet! This is where the Initialization Vector comes into play - an IV is just a randomly generated block that will play the role of the first ciphertext block. By using a different IV for each encryption, prefix patterns are hidden because the first ciphertext block is the IV and it is different for each encryption. Using a fixed IV will allow prefix patterns to be reflected in the output of the encryption, as we will see in the demo. The use of a random IV for each encryption is also referred to as salting.

So, putting these together, block encryption can leak information about the plaintext having identical blocks, because when this happens, the ciphertext blocks will be identical too. Chaining modes combined with random IVs for each encryption will strengthen the encryption by hiding patterns in the cleartext, but both these mechanisms have to be used for this to happen. The encryption procedures in SQL Server 2005 use both CBC and random IVs to prevent patterns in the cleartext data from being reflected in the ciphertext data.

In the demo, we will encrypt two credit card numbers using a TripleDES key. The credit card numbers will have the same first eight digits because we will process them as ASCII and 8 characters require 8 bytes - the size of a TripleDES block. The initial encryptions will be done using the same IV. We will also perform an encryption of a credit card number with the same TripleDES key but with a different IV, to show the difference made by changing the IV. Then we'll look at the encrypted data and see what we can learn from it.

1. First, let's start by creating a TestCryptoSalt.cs file and copying the code below into it:

//////////////////
///
/// TestCryptoSalt
///
/// Small demo of encryption and salting
///
/// compile command:
/// csc TestCryptoSalt.cs
///
//////////////////
using System;
using System.Collections;
using System.Text;
using System.Security.Cryptography;

namespace TestCryptoSalt
{
 /// <summary>
 /// Showing effect of salt on encryption
 /// </summary>
    class TestCryptoSalt
    {
        /// <summary>
        /// The main entry point for the application.
        /// </summary>
        [STAThread]
        static void Main(string[] args)
        {
            crypto_test_no_salt();
        }

        /// <summary>
        /// Helper function for hexadecimal display of a byte array
        ///
        /// Arguments:
        /// str - a description to be output before the hex print
        /// data -  the byte array to print
        /// </summary>
        static void print_bytes(string str, byte[] data)
        {
            Console.WriteLine("\n" + str);
            Console.Write("\t");
            for (int i = 0; i < data.Length; i++)
            {
                Console.Write(String.Format("0x{0:X}", data[i]));
                Console.Write(' ');
                if ((i + 1) % 8 == 0)
                {
                    Console.Write("\n\t");
                }
            }
            Console.WriteLine();
        }

        /// <summary>
        /// The main test
        /// </summary>
        static void crypto_test_no_salt()
        {
            // Two made-up credit card numbers with identical first half
            //
            string strCC1 = "4011023411013758";
            string strCC2 = "4011023410254622";

            // Display CC1
            //
            Console.WriteLine("\nCC1: " + strCC1);
            byte[] bytesCC1 = Encoding.ASCII.GetBytes(strCC1);
            print_bytes("CC1 (bytes):", bytesCC1);

            // Display CC2
            //
            Console.WriteLine("\nCC2: " + strCC2);
            byte[] bytesCC2 = Encoding.ASCII.GetBytes(strCC2);
            print_bytes("CC2 (bytes):", bytesCC2);

            // Create a TripleDES key
            // We'll use the Cipher Block Chaining encryption mode
            //
            TripleDES tdes = TripleDES.Create();
            tdes.KeySize = 128;
            tdes.Mode = CipherMode.CBC;
            tdes.GenerateKey();
           
            // Display the key and its Initialization Vector
            //
            print_bytes("Triple DES key (bytes):", tdes.Key);
            print_bytes("Triple DES IV (bytes):", tdes.IV);

            // Encrypt CC1
            //
            ICryptoTransform tdes_encr = tdes.CreateEncryptor();
            byte[] bytesCC1Encrypted = tdes_encr.TransformFinalBlock(bytesCC1, 0, bytesCC1.Length);
            print_bytes("Triple DES encryption of CC1 (bytes):", bytesCC1Encrypted);

            // Encrypt CC2
            //
            byte[] bytesCC2Encrypted = tdes_encr.TransformFinalBlock(bytesCC2, 0, bytesCC2.Length);
            print_bytes("Triple DES encryption of CC2 (bytes):", bytesCC2Encrypted);

            // Set a new Initialization Vector
            //
            tdes.GenerateIV();

            // Display the key and its Initialization Vector
            //
            print_bytes("Triple DES key (bytes):", tdes.Key);
            print_bytes("Triple DES IV (bytes):", tdes.IV);

            // Re-encrypt CC2 using the new IV
            //
            tdes_encr = tdes.CreateEncryptor(tdes.Key, tdes.IV);
            byte[] bytesCC2EncryptedNewIV = tdes_encr.TransformFinalBlock(bytesCC2, 0, bytesCC2.Length);
            print_bytes("Triple DES encryption of CC2 using new IV (bytes):", bytesCC2EncryptedNewIV);

            // Now attempt to decrypt, just for fun, to show CBC's nice recovery
            // Only the last encryption will be fully decrypted correctly,
            // because the first two were made using a different IV
            //
            ICryptoTransform tdes_decr = tdes.CreateDecryptor();
            Console.WriteLine("\nRetrieving encrypted CC1: "
                + Encoding.ASCII.GetString(tdes_decr.TransformFinalBlock(bytesCC1Encrypted, 0, bytesCC1Encrypted.Length)));
            Console.WriteLine("\nRetrieving encrypted CC2: "
                + Encoding.ASCII.GetString(tdes_decr.TransformFinalBlock(bytesCC2Encrypted, 0, bytesCC2Encrypted.Length)));
            Console.WriteLine("\nRetrieving encrypted CC2: "
                + Encoding.ASCII.GetString(tdes_decr.TransformFinalBlock(bytesCC2EncryptedNewIV, 0, bytesCC2EncryptedNewIV.Length)));
        }
    }
}

2. Now, let's compile this file using the csc compiler. For example, on my system, the command for compiling the demo looks like this:

C:\WINNT\Microsoft.NET\Framework\v2.0.50727\csc.exe TestCryptoSalt.cs

3. Let's execute the program. The output will differ each time the program is run because a new encryption key will be generated with each execution. On my system, I got the following output:

CC1: 4011023411013758

CC1 (bytes):
        0x34 0x30 0x31 0x31 0x30 0x32 0x33 0x34
        0x31 0x31 0x30 0x31 0x33 0x37 0x35 0x38


CC2: 4011023410254622

CC2 (bytes):
        0x34 0x30 0x31 0x31 0x30 0x32 0x33 0x34
        0x31 0x30 0x32 0x35 0x34 0x36 0x32 0x32


Triple DES key (bytes):
        0xD 0xDC 0xFC 0xC4 0xD7 0xD 0xB7 0xB8
        0x92 0xC5 0x77 0xA8 0xD 0x96 0x1 0xBD


Triple DES IV (bytes):
        0x1A 0x42 0xD0 0x17 0x9E 0xA1 0xE6 0xF4


Triple DES encryption of CC1 (bytes):
        0x26 0x8B 0xEC 0x7C 0x56 0x5F 0x5A 0xD3
        0x13 0xB4 0xB6 0xCB 0x7F 0x80 0x52 0x49
        0x12 0xBB 0x89 0x7 0x37 0xF6 0x9F 0x69


Triple DES encryption of CC2 (bytes):
        0x26 0x8B 0xEC 0x7C 0x56 0x5F 0x5A 0xD3
        0x3B 0x8F 0xB0 0xAD 0x4C 0x14 0x1 0xB2
        0x76 0xBD 0x56 0x8C 0x52 0x6C 0x22 0xD


Triple DES key (bytes):
        0xD 0xDC 0xFC 0xC4 0xD7 0xD 0xB7 0xB8
        0x92 0xC5 0x77 0xA8 0xD 0x96 0x1 0xBD


Triple DES IV (bytes):
        0x4D 0x25 0x8E 0xEC 0x38 0xF7 0xA6 0xE9


Triple DES encryption of CC2 using new IV (bytes):
        0xC6 0x49 0xD3 0x43 0x79 0x96 0xF7 0xE6
        0x50 0xC3 0xAC 0x5E 0x27 0x9C 0xF9 0xF
        0x60 0x3 0xB 0x2B 0x4A 0xA1 0x8D 0x2B


Retrieving encrypted CC1: cWo??ds)11013758

Retrieving encrypted CC2: cWo??ds)10254622

Retrieving encrypted CC2: 4011023410254622

4. Let's discuss the program output.

We start by printing out the two credit numbers in ASCII and hexadecimal representations. The hexadecimal printouts are arranged so that we have 8 bytes per line - the size of a TripleDES block. This will make it easier to compare identical blocks.

We then print the TripleDES key - it has 16 bytes - 128 bits. The IV is 8 bytes, the size of a block.

Looking at the encryptions of CC1 and CC2 that are done with the same IV, we can notice that the first blocks are identical. This happens because the two credit card numbers share their first eight digits, and the encryption discloses this fact. The credit card numbers actually share their first nine digits, but the nineth digit will be encrypted as part of the second block, so we cannot see that it is shared just by looking at the encrypted output.

The program is now generating a new IV and prints again the key and this IV. We can observe that the key is the same as before; only the IV has changed. The second credit card number is now encrypted using this new IV and this time the output has no common pattern with the previous outputs. We can no longer detect that this encryption was made for a plaintext that has anything in common with the plaintexts from which the previous two ciphertexts were generated.

Finally, using the last IV, we will attempt to decrypt the three ciphertexts. This is just a side-demo that illustrates how the CBC chaining mode recovers from an error. Because we are using the wrong IV for the first two ciphertexts, we are not able to decrypt those ciphertexts completely: the first block is garbled because the IV is incorrect, however, the subsequent blocks are decrypted correctly and we can retrieve the second halves of the credit card numbers. Of course, the decryption of the third ciphertext works as expected and decrypts the entire credit card number.

Conclusions

We have seen in this demo that by using a fixed IV, prefix patterns in cleartext data can be reflected in the ciphertext data. The common prefix has to be at least as large as an encryption block for this to happen. In the example, because the numbers were stored as ASCII, an 8 digit common prefix would be reflected in the ciphertext. If we would have encrypted credit card numbers as Unicode, a common 4 digit prefix would show up in the ciphertext. It's also not uncommon for credit card numbers or social security numbers to have common prefixes; for example, the first 4 digits of a credit card number are used to identify the card type.

Of course, knowing that two credit card numbers have the same prefix doesn't automatically allow us to find out what those credit card numbers are. But it's a very good start. If we can identify one number from some other information (maybe we inserted one of the records), then we automatically find out information about the other number, and this doesn't require any number crunching.

So, the reason why SQL Server 2005 encryption procedures are using a random IV for each encryption is to provide the safest use of encryption. It's not sufficient to use a good encryption algorithm, it's also necessary to use it well, and block encryption algorithms should be used with a chaining mode and a random IV for each encryption.