<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Indexing encrypted data</title><link>http://blogs.msdn.com/raulga/archive/2006/03/11/549754.aspx</link><description>Encrypted data and indexes One thing I have been asked many times is how to create an index on top of encrypted data in SQL Server 2005. In SQL Server 2005 the encryption functions are nondeterministic, which means that every time a function is called,</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>Great article on how to index encrypted data</title><link>http://blogs.msdn.com/raulga/archive/2006/03/11/549754.aspx#550357</link><pubDate>Mon, 13 Mar 2006 15:49:33 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:550357</guid><dc:creator>Michael Zilberstein SQL Server Blog</dc:creator><description>In SQL 2005 encryption functions are nondeterministic, which means that every time a function is called,...</description></item><item><title>re: Indexing encrypted data</title><link>http://blogs.msdn.com/raulga/archive/2006/03/11/549754.aspx#576022</link><pubDate>Thu, 13 Apr 2006 23:33:20 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:576022</guid><dc:creator>jdyoumans</dc:creator><description>Good article. &amp;nbsp;The more I read, the dumber I feel. &amp;nbsp;Thanks!</description></item><item><title>re: Indexing encrypted data</title><link>http://blogs.msdn.com/raulga/archive/2006/03/11/549754.aspx#1350613</link><pubDate>Sat, 23 Dec 2006 05:16:29 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:1350613</guid><dc:creator>Mike C#</dc:creator><description>&lt;p&gt;Good stuff, quick question. &amp;nbsp;It seems as if this MAC method is designed to provide &amp;quot;indexing&amp;quot; of encrypted data with protection against dictionary attacks, but it doesn't look like it provides additional protection against frequency analysis. &amp;nbsp;Is this a fair assessment? &amp;nbsp;Thanks!&lt;/p&gt;</description></item><item><title>re: Indexing encrypted data</title><link>http://blogs.msdn.com/raulga/archive/2006/03/11/549754.aspx#1358952</link><pubDate>Mon, 25 Dec 2006 00:08:47 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:1358952</guid><dc:creator>raulga</dc:creator><description>&lt;p&gt; &amp;nbsp;Thanks a lot for your comment Mike C#.&lt;/p&gt;
&lt;p&gt; &amp;nbsp;The main objective of this article was to suggest alternative mechanisms to allow identity searches (i.e. SELECT … WHERE &amp;lt;column&amp;gt; = &amp;nbsp;&amp;lt;value&amp;gt;) in SQL Server 2005 without the need to do a linear search (i.e. decrypt every single row) and explain why the IV is used in SQL Server 2005 EncryptByKey builtin.&lt;/p&gt;
&lt;p&gt; &amp;nbsp; The idea behind the HMAC solution I proposed was thought considering concealing patterns in the cipher text column that could be recovered (i.e. keep using the CBC cryptographic mode with a random IV generated, just like SQL Server does today) but creating a column based on the plaintext (in such a way that the application can still query using the known plaintext) and using well known cryptographic operations to solve the indexing problem our customers had. As you suggested, I decided to use an HMAC to help against dictionary attacks as opposed to using a simple hashing function.&lt;/p&gt;
&lt;p&gt; &amp;nbsp;I am not a cryptographer or a cryptoanalyst myself, and to b honest I didn’t consider frequency analysis when I discussed about this idea with my peers mostly because I assumed, based on my understanding, that it shouldn’t be a problem. I don’t know how the HMAC column (or the alternative Id column I also suggested in the article) can be used in a frequency analysis attack, but as I understand this technique, using an HMAC shouldn’t help in a frequency analysis as the result of the operation (A cryptographic hash of the plaintext + some key) should not contain patterns of the plaintext that may leak information on the distribution of characters (or other information blocks) in it, but I may be misinterpreting or missing some information.&lt;/p&gt;
&lt;p&gt; &amp;nbsp;I will appreciate if you or anyone would provide more information (i.e. a link or a white paper or book suggestion) on how frequency analysis can be used against an HMAC or other attacks against such solution. I will also try to look some information about this by myself, and I will be more than glad to update this article with the information I find as I am sure a lot of people will be interested.&lt;/p&gt;
&lt;p&gt; &amp;nbsp;-Raul Garcia&lt;/p&gt;
&lt;p&gt; &amp;nbsp;SDE/T&lt;/p&gt;
&lt;p&gt; &amp;nbsp;SQL Server Engine&lt;/p&gt;
</description></item><item><title>re: Indexing encrypted data</title><link>http://blogs.msdn.com/raulga/archive/2006/03/11/549754.aspx#1602128</link><pubDate>Mon, 05 Feb 2007 09:34:18 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:1602128</guid><dc:creator>Mike C#</dc:creator><description>&lt;P&gt;Hi Raul,&lt;/P&gt;
&lt;P&gt;I guess the point I was trying to make is that the random IV generation is specifically designed to prevent frequency analysis on the data sets, so that duplicate data encrypted with the same key does not generate the same result. &amp;nbsp;But you circumvent the random IV generation with a hash or MAC code stored in separate columns (or tables) related back to the encrypted data.&lt;/P&gt;
&lt;P&gt;Just as a very simple example (extremely simple...), if the Title column of the HumanResources.Employee AdventureWorks table were encrypted, you would not be able to tell which employee performed which job without encrypting the data. &amp;nbsp;If you added a hash code column (or MAC column), you could look at the frequency of occurrence of certain hash codes (or MAC codes) as a starting point for determining that the items that occur the least are probably higher-level management positions (e.g., Finance Manager, Q.A. Manager, CEO, CFO) versus those that occur most (Product Technicians, Sales Reps, etc.)&lt;/P&gt;
&lt;P&gt;Would it help eliminate these patterns if you were to store multiple MAC keys for a single table? &amp;nbsp;For instance with three or four MAC keys you would cut down the number of occurrences of duplicate hash or MAC codes significantly. &amp;nbsp;With 10 MAC keys you could potentially reduce the recurrences of "Product Technician - WC60" from my example to the same levels as "Network Administrator" or "Database Administrator". &amp;nbsp;And for the sample data, you could potentially reduce the occurrences of all data to about the same level with about 25 or so different MAC keys on the same table. &amp;nbsp;I don't know if joining to another table to calculate the MACs would defeat the efficiency gains though?&lt;/P&gt;
&lt;P&gt;For your example of encrypting SSN's though, unless there were a lot of duplicate SSN's (unlikely), the frequency analysis should not be an issue. &amp;nbsp;The problem, I think, comes in when you are encrypting columns that could contain duplicate entries in different rows.&lt;/P&gt;</description></item><item><title>re: Indexing encrypted data</title><link>http://blogs.msdn.com/raulga/archive/2006/03/11/549754.aspx#1686439</link><pubDate>Fri, 16 Feb 2007 04:35:42 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:1686439</guid><dc:creator>raulga</dc:creator><description>&lt;p&gt; &amp;nbsp; I see what you are saying… my understanding of frequency analysis is analyzing the patterns within the same ciphertext. I know the problem you mention as a correlation or statistical analysis, and you are correct.&lt;/p&gt;
&lt;p&gt; &amp;nbsp;If you use the encrypted (i.e. encryption without the IV, HMAC-based keys, or any other variation) you are leaking statistical information that can be used to guess the nature of the plaintext even if the attacker cannot directly decrypt it. &lt;/p&gt;
&lt;p&gt; &amp;nbsp; I have never seen this problem being solved by using n unique keys as you suggested per plaintext; I am guessing that the attacker can still analyze the information available by joining multiple tables, bt I am not an expert on this area myself.&lt;/p&gt;
&lt;p&gt; &amp;nbsp;A solution I have read for this particular problem is to create buckets (i.e. use the last 4 digits of the SSN to generate the key on the table, or simply randomly assign the buckets), but the keys will not be unique anymore, and extra processing is necessary (i.e. decrypt all rows from the bucket until you find the unique row you are really looking for).&lt;/p&gt;
&lt;p&gt; &amp;nbsp;-Raul garcia&lt;/p&gt;
&lt;p&gt; &amp;nbsp; SDE/T&lt;/p&gt;
&lt;p&gt; &amp;nbsp; SQL Server Engine&lt;/p&gt;
</description></item><item><title>re: Indexing encrypted data</title><link>http://blogs.msdn.com/raulga/archive/2006/03/11/549754.aspx#1831314</link><pubDate>Thu, 08 Mar 2007 02:05:45 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:1831314</guid><dc:creator>raulga</dc:creator><description>&lt;p&gt; &amp;nbsp;K07 found a bug in the sample code (&lt;a rel="nofollow" target="_new" href="http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1306030&amp;amp;SiteID=1"&gt;http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1306030&amp;amp;SiteID=1&lt;/a&gt;). The column storing the ciphertext is declared as nvarchar instead of varbinary. It works properly for the example because of the implicit conversion from varbinary to nvarchar, but I have to recommend against this practice as the ciphertext is not a proper Unicode string, and some other tools may have problems when accessing the raw data.&lt;/p&gt;
&lt;p&gt;-- Table encryption example:&lt;/p&gt;
&lt;p&gt;CREATE TABLE t_RawCustomer( &lt;/p&gt;
&lt;p&gt;	-- MAC-based index (SSN)&lt;/p&gt;
&lt;p&gt;	SSN_index	varbinary(20) PRIMARY KEY, 	&lt;/p&gt;
&lt;p&gt;	-- ciphertext (SSN)&lt;/p&gt;
&lt;p&gt;SSN_cipher	varbinary(120), &amp;nbsp;--nvarchar(60), &amp;nbsp;			&lt;/p&gt;
&lt;p&gt;	-- Two extra columns with information. &lt;/p&gt;
&lt;p&gt;name	nvarchar(max),&lt;/p&gt;
&lt;p&gt;	ExtraData	nvarchar(100) &lt;/p&gt;
&lt;p&gt; &amp;nbsp; &amp;nbsp; )&lt;/p&gt;
&lt;p&gt;go&lt;/p&gt;
&lt;p&gt; &amp;nbsp;After running a couple of quick tests after the correction, I realized that the insert calls will require an explicit conversion from nvarchar to varbinary, for example:&lt;/p&gt;
&lt;p&gt;insert into t_RawCustomer values ( null, convert( varbinary(100), N'111-11-1111'), N'customer 1', N'New data 1' )&lt;/p&gt;
&lt;p&gt; &amp;nbsp;It is a relatively small change, but I think it was worth mentioning it.&lt;/p&gt;
&lt;p&gt; &amp;nbsp;Thanks a lot,&lt;/p&gt;
&lt;p&gt;-Raul&lt;/p&gt;
</description></item><item><title>re: Indexing encrypted data</title><link>http://blogs.msdn.com/raulga/archive/2006/03/11/549754.aspx#3524369</link><pubDate>Mon, 25 Jun 2007 20:53:39 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3524369</guid><dc:creator>kpantiga</dc:creator><description>&lt;p&gt;Does it makes sense just to store the hash of the plain text and not the encrypted column?&lt;/p&gt;
&lt;p&gt;Thanks&lt;/p&gt;</description></item><item><title>re: Indexing encrypted data</title><link>http://blogs.msdn.com/raulga/archive/2006/03/11/549754.aspx#3530777</link><pubDate>Tue, 26 Jun 2007 01:23:50 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3530777</guid><dc:creator>raulga</dc:creator><description>&lt;p&gt; &amp;nbsp;It really depends on what do you want to do with them. Hashes are one-way functions, and therefore if you only store the hashes it will be very unlikely to recover the original information, but if you know the message, you can always calculate its hash and match it. &lt;/p&gt;
&lt;p&gt; &amp;nbsp;If you intent to store hashes as a mechanism to identify if somebody knows a secret (i.e. password table), I would recommend using a salt in order to avoid rainbow attacks and protect the table (i.e. grant SELECT permission only to privileged principals in your system).&lt;/p&gt;</description></item><item><title>Cryptography in SQL Server 2005 and 2008</title><link>http://blogs.msdn.com/raulga/archive/2006/03/11/549754.aspx#7938245</link><pubDate>Fri, 29 Feb 2008 02:29:17 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:7938245</guid><dc:creator>John Hicks’ Data Page</dc:creator><description>&lt;p&gt;Introduction It seems to be the case that cryptography is often employed to solve problems for which&lt;/p&gt;
</description></item></channel></rss>