<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Beware the hash reset attack</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/05/19/134937.aspx</link><description>Don't use just the hash to check a file. Also use the file size.</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>re: Beware the hash reset attack</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/05/19/134937.aspx#135017</link><pubDate>Wed, 19 May 2004 15:28:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:135017</guid><dc:creator>Nick Parker</dc:creator><description>Very interesting, I wonder how often people check the file size along with the hash result to confirm the validity of their document.</description></item><item><title>re: Beware the hash reset attack</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/05/19/134937.aspx#135037</link><pubDate>Wed, 19 May 2004 15:38:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:135037</guid><dc:creator>Keith Moore [exmsft]</dc:creator><description>MD5 (and most other hash algorithms I've studied) pads the message to an integral number of blocks. This padding includes the length of the original message, so I really don't see how this attack could possibly work.&lt;br&gt;</description></item><item><title>re: Beware the hash reset attack</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/05/19/134937.aspx#135044</link><pubDate>Wed, 19 May 2004 15:40:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:135044</guid><dc:creator>Keith Moore [exmsft]</dc:creator><description>{Replying to my own post...}&lt;br&gt;&lt;br&gt;If len(alternate) + len(reset garbage) was exactly 2^64 (bits), then the reset could work, but I think people would notice this extra data.</description></item><item><title>re: Beware the hash reset attack</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/05/19/134937.aspx#135053</link><pubDate>Wed, 19 May 2004 15:52:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:135053</guid><dc:creator>Raymond Chen</dc:creator><description>If the length is just encoded in the padding, then just act as if the original message were pre-padded with its original length (so MD5 won't add its own padding). Pad your replacement message, and (due to the block nature) the garbage will necessarily be a multiple of the block size, so no padding necessary there.</description></item><item><title>re: Beware the hash reset attack</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/05/19/134937.aspx#135061</link><pubDate>Wed, 19 May 2004 16:07:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:135061</guid><dc:creator>Keith Moore [exmsft]</dc:creator><description>Hmmm... I don't think this would work. MD5 *always* pads, and *always* includes the message bit length in the padding, even if the message is already a multiple of the block length.&lt;br&gt;&lt;br&gt;You're trying to reset MD5 back to its initial state. Part of that state is the bit length of the message (stored in a 64-bit integer).</description></item><item><title>re: Beware the hash reset attack</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/05/19/134937.aspx#135068</link><pubDate>Wed, 19 May 2004 16:13:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:135068</guid><dc:creator>Eric Lippert</dc:creator><description>Here are some interesting notes on MD5.&lt;br&gt;&lt;br&gt;&lt;a target="_new" href="http://www.rsasecurity.com/rsalabs/node.asp?id=2253"&gt;http://www.rsasecurity.com/rsalabs/node.asp?id=2253&lt;/a&gt;&lt;br&gt;&lt;br&gt;To my knowledge, no MD5 collisions have been found yet.  Apparently in 1994 researchers estimated that brute-forcing a hash collision would take $10 million dollars of equipment 24 days.  Assuming Moore's law, and inflation that would be about $200000 worth of equipment today -- and that's to find A collision, not a particular collision that has evil properties.&lt;br&gt;&lt;br&gt;But of course, that's conjecture, not proof.  When I did the hashing algorithm for code signing on Windows Script Files I included the length of the file in the hash -- it's the right thing to do, because we don't know for SURE whether the algorithm is effectively unbreakable.</description></item><item><title>re: Beware the hash reset attack</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/05/19/134937.aspx#135075</link><pubDate>Wed, 19 May 2004 16:18:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:135075</guid><dc:creator>Raymond Chen</dc:creator><description>Ah, you're right, Keith; I missed that the size is *always* appended.</description></item><item><title>re: Beware the hash reset attack</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/05/19/134937.aspx#135083</link><pubDate>Wed, 19 May 2004 16:28:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:135083</guid><dc:creator>Keith Moore [exmsft]</dc:creator><description>The reset is still possible (in theory) -- it would require 2 exabytes of data to do it.</description></item><item><title>re: Beware the hash reset attack</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/05/19/134937.aspx#135126</link><pubDate>Wed, 19 May 2004 17:02:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:135126</guid><dc:creator>Jim Davis</dc:creator><description>I'd like to think that would be noticable.&lt;br&gt;&lt;br&gt;&amp;quot;Say... this Half-Life mod is **HUGE**.&amp;quot;</description></item><item><title>re: Beware the hash reset attack</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/05/19/134937.aspx#135166</link><pubDate>Wed, 19 May 2004 17:55:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:135166</guid><dc:creator>Cooney</dc:creator><description>&amp;gt; Heck, maybe what really happened was that somebody on the deployment got their MR2 into a car accident...&lt;br&gt;&lt;br&gt;Not that hard to do in Seattle - midengine car + rain + some more rain = hours of fun.</description></item><item><title>re: Beware the hash reset attack</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/05/19/134937.aspx#135410</link><pubDate>Wed, 19 May 2004 22:35:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:135410</guid><dc:creator>josh</dc:creator><description>istm that the possibility of something in the datastream causing the hash to reset would be a major flaw in the code.  The only case where that might be reasonable is if you had multiple files in one stream, and then you'd get two files and two hash values out, which defeats the point of this attack.  Presumably whatever else you're operating on the stream with will &amp;quot;reset&amp;quot; at this point too, or you have a bug there.&lt;br&gt;&lt;br&gt;So how can this possibly happen?</description></item><item><title>re: Beware the hash reset attack</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/05/19/134937.aspx#135421</link><pubDate>Wed, 19 May 2004 22:52:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:135421</guid><dc:creator>Raymond Chen</dc:creator><description>True, resettability would be a flaw in the algorithm, but algorithms have been known to be found flawed years after their development. My point is that with a tiny amount of additional work, you can significantly reduce your exposure if such a flaw were to be found.</description></item><item><title>re: Beware the hash reset attack</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/05/19/134937.aspx#135436</link><pubDate>Wed, 19 May 2004 23:10:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:135436</guid><dc:creator>asdf</dc:creator><description>Are you saying to store the file length as part of the stream (so the algorithm hashes it also), to store the original file length in addition to the hash, or to do something else with the hash and file length?</description></item><item><title>re: Beware the hash reset attack</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/05/19/134937.aspx#135454</link><pubDate>Wed, 19 May 2004 23:24:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:135454</guid><dc:creator>Raymond Chen</dc:creator><description>Save both the hash and the original file size.</description></item><item><title>re: Beware the hash reset attack</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/05/19/134937.aspx#135506</link><pubDate>Thu, 20 May 2004 01:28:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:135506</guid><dc:creator>Shawn</dc:creator><description>I wrote a bit about this back in March (&lt;a target="_new" href="http://blogs.msdn.com/shawnfa/archive/2004/03/05/84799.aspx"&gt;http://blogs.msdn.com/shawnfa/archive/2004/03/05/84799.aspx&lt;/a&gt;) -- its an interesting attack, but as you pointed out, its very easily mitigated by simply knowing anything else about the data being hashed.</description></item><item><title>re: Beware the hash reset attack</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/05/19/134937.aspx#135516</link><pubDate>Thu, 20 May 2004 01:51:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:135516</guid><dc:creator>Raymond Chen</dc:creator><description>Oops, sorry, Shawn.  Looks like my article is pretty much identical to yours!</description></item><item><title>re: Beware the hash reset attack</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/05/19/134937.aspx#136011</link><pubDate>Thu, 20 May 2004 18:13:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:136011</guid><dc:creator>Jeremy Hilliker</dc:creator><description>&amp;gt; Next, generate the bytes necessary to &amp;quot;reset&amp;quot; the engine.&lt;br&gt;&lt;br&gt;Finding these bytes is as dificult as finding a message which will match the original.&lt;br&gt;&lt;br&gt;Start with hash algorithm that is preimage, and second preimage resistant, otherwise it isn't useful in the first place.  Given this, you want to find a string that will &amp;quot;reset&amp;quot; the algorithm to its initial state.  Assume that you can do this, then you have found a message which will hash to the algorithm's initial state.  IE, you have found a message which hashes to an given hash value.  IE, your algorithm is not preimage resistant, which contradicts the initial statement.  Therefore your assumption is false.  Therefore you cannot find the string that will &amp;quot;reset&amp;quot; the algorithm.&lt;br&gt;&lt;br&gt;I only see this attack as being usefull against an algorithm that is not already preimage resistant -- which means that it was broken in the first place.&lt;br&gt;&lt;br&gt;Am I missing something here?  Or are you allowing for broken implementations of the algorithm which can be reset?</description></item><item><title>re: Beware the hash reset attack</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/05/19/134937.aspx#140095</link><pubDate>Mon, 24 May 2004 06:13:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:140095</guid><dc:creator>Michael Grier [MSFT]</dc:creator><description>The original context of the internal discussion was around proper use of hash functions and what you should and shouldn't use them for.&lt;br&gt;&lt;br&gt;The secure hash functions in use were designed to ensure that messages could not be easily tampered in transmission.&lt;br&gt;&lt;br&gt;One of the big differentiating factors between the secure hashes and other digest methods like checksums and CRCs is that they're intended to be &amp;quot;cryptographically hard&amp;quot; to &amp;quot;break&amp;quot;.  I'll leave &amp;quot;cryptographically hard&amp;quot; mostly undefined as some sort of reference to an NP-complete type of problem like curve fitting or factoring.  Specifically, it's easy to stay ahead of the brute-force solutions by extending sizes.  I will define &amp;quot;break&amp;quot; as the ability to, with little or no effort, product another byte stream with the same hash value.&lt;br&gt;&lt;br&gt;Checksums, parity, CRCs and simple hashing algorithms like the ones you could find in any textbook are /insecure/ hashes.  Meaning that they vary sufficiently for a typical data set but they also explicitly don't attempt to make &amp;quot;breaking&amp;quot; the hash difficult.  My personal favorite string hash function { hash = 0; for (i=0; i&amp;lt;l; i++) hash = (hash * prime) + string[i]; return hash; } is trivially breakable.&lt;br&gt;&lt;br&gt;Back to the original topic, there are a lot of people who want to have a pseudokey/hash where they can use it in a searching algorithm /instead/ of actually comparing the data.  Well it's true that if the hash values are different there's no possible way that the bytestreams are equal, but that in no way implies that the pseudokeys being equal means that the bytestreams are equal.&lt;br&gt;&lt;br&gt;Which then comes back to the point.  What are you going to do with this?  I personally feel that this falls into the category of &amp;quot;if it doesn't actually have to work right, I can make it as fast as you like&amp;quot;.&lt;br&gt;&lt;br&gt;Secure hashes are great for their intended use.  Given a message which must conform to some syntactic requriements (readable English, XML, ASN.1, whatever), it is computationally difficult to construct a message which still fits the syntactic requirements, has the same hash value and is different from the original message /in a useful way/.&lt;br&gt;&lt;br&gt;If you can change the message from &amp;quot;send the troops through the northern pass&amp;quot; to &amp;quot;send the troops via the southern river valley&amp;quot; easily, man, you're in trouble.  If the end result looks more like &amp;quot;send the troops via the southern river valley &amp;lt;big honking string of unreadable text&amp;gt;send the troops through the northern pass&amp;quot;, well, it's pretty obvious that there was tampering and the hash has accomplished its task.&lt;br&gt;&lt;br&gt;Expecting these algorithms to stand for arbitrary unformatted data is actually not reasonable in the long run.  The birthday effect is likely to show its head before too long.&lt;br&gt;&lt;br&gt;Expecting these algorithms to give you a unique enough number that you don't have to worry about collisions in a context which requires correctness is also unreasonable.&lt;br&gt;&lt;br&gt;Which then leads me to the point that if all you want is statistical uniqueness in order to decrease the number of times you have to compare byte streams that are different, you can use a significantly (computationally)&lt;br&gt;cheaper hashing algorithm than one of the secure hash algorithms.&lt;br&gt;</description></item></channel></rss>