- Office 2007 SP2 Encryption Settings
-
Now that we've actually shipped SP2, some of you may be curious about how to use the shiny new encryption. Here's the registry settings:
Registry keys
|
Base keys (also corresponding Policy keys) |
|
HKCU\Software\Microsoft\Office\12.0\<appname>\Security\Crypto |
| |
| |
|
Name |
Type |
Default |
Description |
|
CompatMode |
DWORD |
0 |
Controls encrypted database compatibility:
- 0 - Legacy format for new files
- 1 - NextGen format for new files only
- 2 - All files saved with NextGen format
|
|
Context |
String |
|
Restrict encryption parameters to those defined in this CNG context |
|
CipherAlgorithm |
String |
|
Cipher algorithm to use, optional, CNG string |
|
CipherKeyBits |
DWORD |
|
Number of bits to use when creating the cipher key, rounded down to a multiple of 8, optional |
|
CipherChaining |
String |
|
Cipher chaining mode to use, optional, CNG string |
|
HashAlgorithm |
String |
|
Hash algorithm to use, optional, CNG string |
|
RngAlgorithm |
String |
|
Random number generator algorithm to use, optional, CNG string |
|
SaltBytes |
DWORD |
16 |
Bytes of salt to use, optional |
|
PasswordSpinCount |
DWORD |
100000 |
Number of times to spin (e.g. rehash) the password verifier, optional |
|
NewKeyOnPwdChange |
DWORD |
1 |
If non-zero, a new intermediate key is generated when the password is changed. This will cause any extra key encryptors to be removed on save. |
Many thanks to my tester for giving me the information in such a nicely formatted and well documented table. Once you have Office 2010 Technical Preview available to you, the same settings should work there as well. Many more thanks to Dan Jump for carefully implementing our design. Note that if you use the new format, then the converter for Office 2003 and earlier won't be able to read them until we update the converters to understand the new encryption.
- Legacy RC4 Example on Codeplex
-
Just a quick note on this – a customer had a question about the old RC4 40-bit encryption yesterday, and this prodded me into taking some memory dumps of intermediate steps and figuring out where my own example code wasn't working. Fortunately, it wasn't really a problem with the documentation – I'd just made a dumb mistake where I put '5' in a loop instead of '16'. I also had a bug in the ManagedRC4 project, so if you already have that, pick up the new one, too.
At any rate, there are now working examples for all of the currently shipping encryption techniques specified in MS-OFFCRYPTO.
Maybe next we can work on getting code up to do signatures, though I think the xmldsig stuff is fairly well covered by the .NET framework already.
The project is at http://www.codeplex.com/offcrypto
- MS-Offcrypto Example Update
-
Just a quick note that I've updated the examples. I added an example for the CAPI RC4 encryption that does work. Along the way, I got smarter about managed C++ and C# interop, which turned out to be a bit of an adventure. I didn't find the documentation on MSDN exceptionally helpful in this area. Maybe there's a good book on the topic, but I haven't found it yet. We had a huge amount of snow for this area – accumulations of about 2-3 feet at my house, and I couldn't go anywhere, so that's what I ended up doing.
The reason for the interop adventure was that there is no RC4 implementation in .NET. There's a couple of those on the Internet, but I didn't want to burden people with getting and installing some 3rd party library. Along the way, I figured out a mystery of CAPI RC4 encryption. Turns out that if you set a 40-bit RC4 key, there are 3 modes of operation:
- CRYPT_NO_SALT – nothing is added to the key. Office doesn't set this flag.
- Salt added – some bunch of random bits from the hash is added to the key – seems only useful for temporary keys. We don't use this one, either.
- Default – basically, if you import a 5 byte (40-bit) key, it is the exact same thing as importing a 16-byte (128 bit) key where the last 11 bytes are all 0, since CAPI does that for you under the covers unless you set CRYPT_NO_SALT.
The sample code that I have now makes this explicit, and we'll update the document to make this clear as well. This only applies to 40-bit
A last thing that ought to be tidied up is that when you decrypt the two parts of the verifier (encrypted salt and encrypted hash of salt), you do not reset the RC4 stream between encryption operations.
I'm working on doing the legacy RC4 example, and when that's done, it will be posted to www.codeplex/offcrypto. After that, I'll probably move on to the new encryption and some of the signing. If you're interested in this, I'd suggest signing up to the RSS feed on the offcrypto project on CodePlex – I may not remember to post here when I change things.
- MS-Offcrypto Examples
-
In response to some questions I've gotten about details of MS-OFFCRYPTO, I've created a CodePlex project to contain sample code demonstrating the documentation. You can find it at http://www.codeplex.com/offcrypto. I had originally wanted to include sample code in MS-OFFCRYPTO itself, but we couldn't do that. Instead, we can put sample code on CodePlex. To keep it real, I did the work from my home system where I don't have access to the original source, and wrote it in C# instead of the original C++ to shake out library differences.
Please note that the sample code is not intended to replace proper documentation. In the course of helping a customer with their attempt to implement the AES encryption, we figured out a problem with the CryptDeriveKey documentation, and got that updated. If there are any nuances of the approach that are in the sample, but not the documentation, we'll update the document to match. The sample code is there to verify that the documentation is complete, and to help anyone who wants to do this.
Currently, there are 2 projects – the first is ExtractStream. I'd needed a way to get streams out of structured storage so that the rest of the sample code could be a lot simpler, and I'm also not too good at managed-unmanaged interop. You use this app to extract the stream you'd like to parse – the rest of the examples will use this. It may turn out to be a good thing having this as a stand-alone project – there's some other features we can build on this that might be helpful.
The second project is OoxmlEncrypt – it demonstrates parsing and validating an EncryptionInfo stream, as well as validating the password.
I have a third project that I need to post which does the same for the CAPI RC4 encryption, which is default for encrypted PowerPoint files, and can show up as a format for Word and Excel files. That's done – I just need to post it.
A fourth project that I'm working on (hindered by the fact that .NET has no RC4) is to demonstrate the legacy 40-bit RC4.
After that, as I find time, I'll move on to signatures. BTW, if someone else would like to contribute to the project, that's certainly possible – just let me know, and sign a release, then we can add other people as devs.
- CVE Count and Statistics
-
Larry Seltzer had some interesting comments on my post about the rate of Office vulnerabilities at Vulnerabilities and Office Versions
There may be a little flaw in the analysis in that LeBlanc studied reports during the period from 9/18/2007 to 11/17/2008. By that time earlier Office versions had been around for a long time and many vulnerabilities had already been reported on them. But even so, it makes the numbers all the more impressive for the new versions; the older ones had already had the low-hanging fruit picked clean and yet they still had CVE numbers in excess of the new ones. It seems there is no low-hanging vulnerability fruit in new versions of Office.
Having had more grad school than I'd like to admit, I have a more than passing acquaintance with statistics. While there are certainly potential flaws in the numbers I posted, I don't think this is one of them. I'll argue that comparing vulnerability rates over the same time frame for two applications that are very similar, and which both have large market share, is better than comparisons of some number of days since release. If we have the same time frame, then the techniques used by the attackers are likely to be similar, and when we're looking at multiple versions of the same thing, we can get a good estimate of how resistant one version is to attacks that another version is susceptible to.
What will be a problem in my analysis is how small the overall sample size is, and the fact that updates tend to ship at most 3-4 times per year for most of these apps. For example, this month's set of bulletins are going to skew the results considerably, but the overall trend of substantial improvement will still show up. Once I get updated numbers, I'll work them up and post them here.
- Office Crypto KDF Details
-
I've gotten a couple of questions asking how our key derivation function works. The technique is very similar to that described in RFC 2898, also known as PKCS #5. There are two key derivation functions (KDF) documented in this RFC – PBKDF1 and PBKDF2. Our KDF implementation is very similar to PBKDF1 (section 5.1), with the following changes:
- 8 bytes of salt are recommended, we use 16 by default
- 1000 iterations are recommended, we use 50,000
- The function documented hashes the previous hash a number of times. We concatenate the counter with the previous hash, and hash that. This was done on the recommendation of Paul Leach, and makes the hash harder to optimize.
- We will allow more advanced hashing algorithms than SHA-1 in the Agile encryption.
- The final output is passed into CryptDeriveKey as documented in MS-OFFCRYPTO, agile encryption does something slightly different.
So we're not absolutely using a completely standard KDF, but it's close – and stronger than the standard.
- New, Improved Office Crypto
-
If you're enough of an Office crypto geek to stay on top of the most recent changes in MS-OFFCRYPTO, you already know about some of this, but my assumption is that most people aren't going to want to parse something that hard to read. What we're doing is introducing some substantial improvements in our encryption in Office 2007 SP2, which are known to MS-OFFCRYPTO as 'ECMA-376 Agile Encryption'.
Our goal as a company is to make all of our encryption agile. Many government customers have requirements to use encryption algorithms specific to their own governments, and we'd like to eventually get to a point where (for example) Office 16 can emit a document with improved encryption that Office 2007 can just pick up and use. In Office, our first attempt at this was how we created password verifiers – we let you use any hashing algorithm we can support on the operating system, and the spin count is also agile. I don't think we created a way to configure Office 2007 to emit password verifiers with a configurable spin count, but if one shows up with a higher spin count created by a future version, it should work – which was the short-term goal.
The encryption we used in Office 2007 for the new file format is pretty robust – even with very advanced techniques, Elcomsoft reports getting around 5000 cracks/sec. Compare this to Acrobat 9, which allows 74.5 million cracks/sec – meaning it takes about 15,000 times more computing power to crack a password on an Office 2007 document than an Acrobat 9 encrypted PDF. If you're interested in the details of where Adobe made some mistakes, check out With 256-bit encryption, Acrobat 9 passwords still easy to crack. Our default encryption uses an iterated SHA-1 hash with an iteration count of 50,000, and AES128 as the encryption algorithm. In order to check a password on our system, you have to perform 50,002 SHA-1 hashing operations, and 2 AES128 decryptions. On Acrobat 9, that's just one SHA-256 hashing operation. To make matters worse, SHA-256 is more efficient than SHA-1. To be fair, we very nearly made the same error until our tester found the problem and I fixed it prior to release. The key here is that when a password is used to protect something, that's going to be the weakest link, and using more bits for the encryption algorithm will not make anything harder to access. Breaking the encryption itself (assuming a modern block cipher) isn't feasible. Brute-forcing the password is feasible, so the order of the day is one of the least well regarded of the Saltzer and Shroeder design principles – work factor. If you can't absolutely stop something, make it so much work that it isn't worth it.
With the new encryption, we're introducing a bunch of very cool stuff:
- Configurable symmetric encryption
- Cipher-block chaining (CBC or CFB)
- Configurable hashing algorithms
- Support for block ciphers with block sizes from 2 to 4096 bytes
- Configurable salt, up to 64k bytes
- Iterated hashing of passwords up to 10 million iterations (default raised to 100,000)
- Integrity checking
If you're running on Vista, this means you can use anything that you can write a CNG plug-in to support, which means we'll be able to support Suite-B compliant encryption. To be clear, if you're using a strong password, the existing Office 2007 encryption will do a very good job, but you're restricted to one set of algorithms, and you're missing cipher block chaining and integrity protection.
The reason we're shipping this in a service pack is that we'd like customers to be able to be Suite-B compliant using Office 2007 SP2, or use other private encryption mechanisms. The second reason is that we need Office 2007 SP2 users to be able to consume encrypted documents created by Office 14 when we ship it. We do have some more cool stuff we're working on and I'll blog about that when we get closer to being able to release Office 14. When I first came to Office, we had some fairly awful encryption (see Office Crypto Follies), and with the help of a PM who also serves on the Microsoft crypto board, and some really great devs and testers, I think we now have a first-class product with some of the best document encryption in the industry.
- SafeInt Compiles on gcc!
-
[update 12-1-08] I now have it completely compiling on gcc, with a test harness that exercises every method of the class for every combination of types (all 15 of them). Version 3.0.12p is now moved to release status.
Once I got SafeInt posted on CodePlex, Niels Dekker grabbed a copy and started figuring out what needed to be done in order to get SafeInt to compile using gcc. Niels was one of the first people to give me bug reports during the development of SafeInt 1.0.x, and he's been extremely helpful over the years – hard to believe I've been working on SafeInt for 5 years now. When we first tried to compile it with gcc, gcc didn't have the template support that SafeInt needed – it was a complete mess, and we gave up quickly – about as bad as trying to get it to compile using Visual Studio 6.
As it turns out, I'm also working on updating the "19 Deadly Sins of Software Security" – there's going to be 2 dozen sins this time around, and I set up a Linux system so I can test things. I originally started running Linux in 1993 – I think it was version 1.0.3, and have tried to keep one running for most of the time since. Having a Linux system handy, I also had g++ available, and I could then actually test some of the issues Niels pointed out – and found a couple more – seems that gcc on cygwin != gcc on Linux. It took about 3-4 hours, but since Niels had already pointed me in the right direction, it went much more quickly than if I'd had to puzzle it out completely by myself. Seems that gcc is a little stricter about some things than Visual Studio, and some of this is a very good thing.
Here's an example of an insidious issue I'd never given much thought to – say you had some function with the following signature:
void Foo(unsigned int& cch);
If you passed it some argument, you would expect it to get modified, right? That's what references are for. Now let's say you called it like so:
unsigned int x;
Foo((unsigned int)x);
Would the cast change anything? How about if we did this?
void Foo(bar_class& cch);
bar_class x;
Foo((bar_class)x);
As it turns out, the cast creates a temporary copy, and the temporary copy _might_ be the one that gets modified. The Visual Studio compiler will do the right thing if you're dealing with a native type – like unsigned int – and go ahead and discard the cast and modify the argument as you'd expect. Unfortunately, if it is not a simple type (haven't tested with things like simple structs), it is the temporary copy that gets modified, and the argument does not. The most recent released Visual Studio compiler doesn't even complain about this >8-O OTOH, the gcc compiler does, and while it wasn't a runtime bug in SafeInt, it was non-standard C++, and it's now fixed in the 3.0.12p line.
A gcc issue that I think is not a bad thing for Visual Studio is in the area of implied template parameters – say you did this:
template <typename T, typename U> class Foo {
template <typename T> void FooMethod(){ Bar:: foo<U>();}
};
You'd expect the U parameter to be the same as the enclosing class, and with the Visual Studio compiler it is – but gcc complains. In order to fix it, you have to use 'template' keywords, like so:
template <typename T> void FooMethod(){ Bar::template foo<U>();}
Despite Niels pointing me to a helpful page, I'm still not sure why the template keyword is really anything helpful. Maybe someone reading this can illuminate me on the topic.
Some of you may be wondering why I care about SafeInt on gcc – after all, I do only really write code for Windows, and have for a long time. The first reason I care about this is that I'd like it if SafeInt were something that could be used by people writing cross-platform code – a lot of our customers do this, and if I can eliminate one more reason to have to fork code (or not use SafeInt), that would be nice. The second reason is that we do have quite a few people developing for MacOS here – MacOffice is the biggest example, and Apple for some reason requires gcc to compile. It would be really nice if we could use SafeInt on all of our code, not just Windows code.
If you'd like to check out a version of SafeInt that does compile on gcc, you can get it here - http://www.codeplex.com/SafeInt/Release/ProjectReleases.aspx?ReleaseId=19785
I've also included the start of a public test harness for the class, which isn't complete just yet – which is why this release is marked planned, not released.
- Improvements in Office Security
-
We now have a pretty neat internal web site where I can easily search for CVE entries and bulletin counts by product. It shows some interesting trends that I hope will continue to hold. First, let me preface this by saying that CVE entry count is a better (though not perfect) way to measure how secure something is than bulletin count. We might sometimes package fixes for several CVE entries into one bulletin, and an older product might be vulnerable to all of them, but a newer product might only be vulnerable to around half.
We did a lot of work to make Office 2003 more secure in service pack 3 – one question I've had is just how much that's paid off? It has been about a year, and if I search from 9/18/2007 to 11/17/2008 (today), I get the following:
|
Product |
CVE count |
|
Office 2000 SP3 |
33 |
|
Office XP SP3 |
40 |
|
Office 2003 SP2 |
35 |
|
Office 2003 SP3 |
20 |
|
Office 2007 Gold |
19 |
|
Office 2007 SP1 |
16 |
The trending here is pretty clear – while we did a lot of good work to try and make Office 2003 more secure than previous versions, against the attacks we're seeing in 2007, it wasn't any better than Office XP. Now if you factor in huge amounts of work (no magic, no silver bullet, just lots and lots of work) that we did fixing fuzz bugs in Office 2007 and Office 2003 SP3, it looks like we've cut the incoming vulnerability rate by approximately half. If we look at it app-by-app, I think PowerPoint is a clear winner – they've had 5 CVE entries for older versions and only 1 for PowerPoint 2007 since 1/1/2007! Word has also done very well, dropping from 11 and 12 CVE entries in prior versions to only 2 for Word 2007 over the same period.
We're continuing to do that level of work on anything that still has a service pack left – next SP will be SP2 for Office 2007. It will be interesting to see how much additional gain that gives us. I'd like to see us do even better over time – while we've clearly made some significant gains, we still have more work remaining. We are currently doing about as many fuzzing iterations per weekend as we're required to do to meet SDL requirements for the entire product cycle (to be fair, the requirement is for clean runs, and we're not there yet, and when we do get there, we use a different fuzzer). We've done twice as many fuzz iterations against Office 2007 SP2 as we did against Office 2007 during the entire product cycle, and 4x more against Office 14 than against Office 2007.
If there's anyone out there still on Office 2003 SP2, I hope I've given you some convincing data that shows an upgrade to SP3 or better yet Office 2007 is going to pay off in much better security.
- MS-OFFCRYPTO, W7 Engineering blog, etc
-
We have a new version of MS-OFFCRYPTO out. The big change is that how CryptDeriveKey was documented on MSDN was incorrect, we copied it, which made our document also incorrect. As it turns out, CryptDeriveKey always uses the same code path for AES as it does as if the hash output is shorter than the needed key. Several people have written me about this, which I do appreciate – it's how we first knew about the problem and got it fixed. If you're trying to make something work with Office 2007 encryption, get the new version.
There's also some interesting stuff that gives a preview of how we'll be doing encryption in the future. Some of this will change in the next update, but the general outline still holds. Some neat stuff that I'll talk about in more detail later – don't have time today.
The other thing that came to my attention is that there's a really interesting blog – the "Engineering Windows 7" blog. Today's entry is written by someone I've known the longest at MS – Larry Osterman. You can read his post here - http://blogs.msdn.com/e7/archive/2008/10/15/engineering-7-a-view-from-the-bottom.aspx. The overall blog is some interesting stuff, and explains a lot of why we make certain trade-offs.
A quick story about Larry that's funny in hindsight. Back around 1997, I was working at ISS, and one of our devs had a VPN into our corporate network. Everything on the ISS corporate network at the time was subject to abuse/testing by the ISS Internet Scanner, and I was dev lead on the Windows version. Our dev had is POP3 server keep falling over, the service would hang, and he had to restart his server. Since we tested the scanner many times per day, his server fell over a lot. We got to the bottom of it, and it had a simple buffer overrun. Since ISS has always done responsible disclosure (before it even had a name), the dev called the company responsible, and they didn't do anything. So I called them, trying to explain how serious this was. They replied to my e-mail by telling me that buffer overruns on Windows were not exploitable (ROTFLMAO, yeah, right – even in 1997, we knew better). I tried in vain to convince them that this was something serious, and told them I would go public with an advisory if they didn't do something.
Finally, it became apparent they weren't going to do anything – they'd quite rudely told me I didn't know what I was talking about, they wouldn't fix it, and that was that. So I wrote up an advisory – pretty informal stuff by today's standards – and because I don't encourage criminal activity, I didn't bother to create an exploit – I just said it crashed, looked exploitable, and the company wouldn't fix it, so your only real option was to use someone else's product. As soon as I did that, I got a very special letter from the company's legal department the very next day. Sigh. One of the reasons I came here – everything I ever asked Microsoft to fix when I was at ISS eventually did get fixed, so I've liked Microsoft's attitude towards fixing security issues since I first interacted with Jim Kelly (another story here) back in early 1996. If you search BUGTRAQ archives on my name and POP3, you might be able to dig up the only time I've ever gone public without a fix.
A public discussion ensued, I think on the old NTBUGTRAQ list. The company continued to assert I was an idiot and the problem wasn't exploitable (simple stack overrun), and Larry came to my defense, which I've always appreciated.
An interesting thing about Larry's blog post – Steven Sinofsky used to run Office. How Larry describes Windows 7 working now is how Office has run for quite a while, thought the "feature crew" concept was new to the O12/Office 2007 development cycle. It works well, and makes life as a dev much better than some other experiences I've had, and also helps ship more predictable, higher quality code, which benefits everyone.
- SafeInt 3 on CodePlex!
-
I have finally found a stable place to keep SafeInt. It can now be found at http://www.codeplex.com/SafeInt. In terms of the code, this is exactly the same stuff as we're using internally. This version is documented a little better than the master copy that's checked into the Office development depot, but other than that, should be the same.
Some notable changes from 2.0 –
- Replaceable exceptions – the exception handler class is now a template argument, so if your project has well developed exception classes, this makes it easier.
- Solved the short-circuiting problem – I previously had overloads for the || and && operators, which seemed to be the lesser of several evils. As it turns out, adding a cast to bool and removing the overloads causes things to short-circuit properly.
- Added limited floating point support – you can now initialize a SafeInt with a floating point number, as well as assign to a float.
- Several non-throwing functions have been added to check operations when you'd prefer not to throw on failure. All of the internal code for all of the operators is available in both throwing and non-throwing forms in the header.
If you're currently using SafeInt 2.0, this should just drop in without causing much work. There have also been a small number of bug fixes, which will not be ported back to 2.0. This is the only supported version. To make "support" clear, this is released under the Microsoft Public License (Ms-PL). I believe that it works well, it passes our internal test rig, and we're using it in several thousand places in Office, Windows, and other projects. However, it is still up to you to test it. If you think you've found a problem, please let me know. Likewise, if you have a suggestion, also let me know. This should be very stable at this point.
Speaking of suggestions, I had to talk to a number of people about the issues with actually publishing code that we're really using, as opposed to sample code. If you have suggestions, write them out. Sample code illustrating a bug is OK. Please do not send me a diff – I can't take it. If you want to make major changes, feel free to fork it and maintain your version somewhere else, though you need to heed the license.
If you happen to be curious about why this is 3.0.11p, that's major version 3, minor version 0, 11th check-in internally, and 'p' is for public.
- Chrome Getting a Bit Rusty
-
Put this one in the rant category –
I'm honored that Google has been paying attention to my blog and decided to use my sandboxing approach to try and make their app more secure. Very cool stuff, and they did some interesting things that I want to better analyze. It would be even nicer if they gave credit to where the idea came from, but that's OK.
We've now seen an entire termite mound's worth of bugs come out of it – see http://www.computerworld.co.ke/articles/2008/09/09/security-agencies-rally-against-google-chrome.
I've been watching all the claims of Google claiming that they'd invented a new programming paradigm – I think its release early and often. Rubbish. It's really the "code like hell" seat of the pants programming model. This is nothing new – we've been coding that way since we invented programming. It isn't the best way to ship quality, and it is sure to emit security bugs. While I'm certainly an SDL heretic – whether or not people work hard at security and get it is really the most important component, kind of like what happens if you make bread with good or bad yeast – creating secure software requires some level of thought and planning that you just don't get when the code like hell software approach. This comes out when you look at some of the problems – their sandbox is so restrictive that some of the most dangerous code (plug-ins) can't run in it, so breaking out is as easy as finding a vuln in a plug-in, which I think happened the first day. Even a cursory TM would have called this out.
While I'm off on a software development cycle rant, let me correct a common misconception that if you're not using Agile (Beware Of Things With Leading Caps – Usually, It Is Overblown), then it must be the waterfall model. As was documented more than 10 years ago in "Rapid Development", there's a whole spectrum of techniques that can be used. Agile most closely resembles the spiral process, just with really short milestones. I'm not advocating overly cumbersome processes (where overly cumbersome varies depending on what you're doing). I am advocating _thinking_ about what you're doing, considering the security implications (as well as perf, robustness, and so on) up front. Planning really helps deliver more secure software.
It is quite possible to create software that's _more_ secure using Agile (I think XP is really well suited for this), but that's a topic for another day.
[9/15 - update]
From a link about Google's identity server problem:
The developers decided to forget about the SAML specification as it’s written and just “do their own thing.” As great as this kind of move might be on the dance floor, it’s dangerous when it comes to protecting peoples’ resources and privacy. In fact it is insideous since the claim that Google SSO implemented a well vetted protocol tended to give security professionals a sense of confidence that we understood its capabilities and limitations.
Full text here - http://www.identityblog.com/?p=1011
The thing to remember is that it's all connected - we all need to make secure software. As one of my professors used to say (loudly in a deep, booming voice with an Indian accent):
You have to THINK about it.
He didn't mean just a little - security is some subtle stuff.
- Why can't you comment?
-
This is because $#@!!!! spammers can screw up anything. I have to disallow anonymous comments, or I get a bazillion blog spam comments, I check comments a week later, and there's 200 of these that I can only delete 10-20 at a time. Annoying to say the least.
How you can comment is to register a user for our blogs. You don't have to have a blog here to be a user. I know it is annoying to have yet another user-password pair to remember, but this is the best I can do.
The second thing you can do is send me e-mail. I try to respond quickly to e-mail posted via the blog, and if you want it added to the comments, just ask. Sometimes when people ask really good questions, it turns into a whole post, which is good for everyone (actually, this is an example).
- Ptrdiff_t is evil
-
Well, not really, but here's a code problem that confounded some really smart devs – and it looks so simple!
void IncPtr( unsigned int cElements )
{
if( m_pMax - m_pCurrent > cElements )
m_pCurrent += cElements;
else
throw;
}
OK, so here's the question – if an error has happened, and m_pCurrent is > m_pMax, which implies the difference in the pointers is negative, which code branch do we execute? Assume cElements is a reasonably small number.
Hmmm – the immediate answer would be that the difference gets cast to an unsigned int to be compared with cElements, if it is negative, then it becomes large, and it is not less than cElements, so we throw, so this code is safe, right?
The answer, unfortunately, is a solid maybe. Back in engineering school, I got acquainted with something called dimensional analysis where you worked something out based on dimensions. For example, if you want to know how to get gallons from some number of miles and miles/gallon, figuring (miles / (miles/gallon)) shows the answer is in gallons. A similar approach for integers is often helpful. Let's look at the types involved in the if statement. First, what is the type of a pointer difference? That's a ptrdiff_t – which is a signed number that is the same number of bytes as a pointer, which means that on a 64-bit build, it is an __int64, and on a 32-bit build, it is a 32-bit int.
What we now have is:
If( ptrdiff_t < unsigned int)
If you have a 32-bit build, this works out to:
If( int < unsigned int)
Which then implies:
If((unsigned int)int < unsigned int)
The cast gives you an implied check for the lhs being less than zero (assuming reasonably small values for the unsigned number), and negative numbers will now fail, and since this is an error, this is what we want, and life is good. Do note that if you're compiling with all the warnings on, this will cause a warning, which you'd probably ignore, or cast away, being oblivious to the impending doom that is approaching.
Now consider 64-bit:
If(__int64 < unsigned int)
This gets cast very differently…
If( __int64 < (__int64)unsigned int)
You won't get a warning on your 64-bit build because the cast from unsigned int (like the cast from unsigned short to int) preserves value, and the assumption is that the comparison will always be correct. Under the error condition outlined here, the problem is that we're now not catching the error, we'll add to a pointer that's already probably bogus, and things will get worse from here.
As you can see, which branch gets executed depends on whether you're building 32 or 64-bit!
The solutions and lessons are –
- Be wary of pointer math in porting code to 64-bit – ptrdiff_t is negative, and changes size.
- Using the bit flipping of negative to very large positive effect is really programming with side effects, and being clever, neither of which are good practices.
- Explicitly determine which path to take when dealing with negative numbers in a comparison
BTW, SafeInt will be available very shortly on CodePlex – I have just one more internal hoop to jump before I can post it.
- Office Crypto Follies
-
What I've been working on lately that has kept me from doing nearly anything else can be found at:
http://msdn.microsoft.com/en-us/library/cc313071.aspx
MS-OFFCRYPTO is very detailed documentation of exactly how we do cryptography for binary and OOXML documents. Overall, it covers:
- IRM
- Encryption and obfuscation
- Password to modify
- Digital signing
Before someone else figures out some of this and makes fun of us on Slashdot, let me be the first to detail what's really going on. Hang on – some of it is (in the immortal words of Warren Zevon) not that pretty at all. On a happier note, in the even more immortal words of Monty Python, "it got better" – what we're shipping now is quite good (for encrypting OOXML documents), and we have plans to make it even better.
Let's start with the worst of it – XOR. You may note that I consistently refused to ever say "XOR encryption", preferring the more accurate "XOR obfuscation". Not only is it the worst way to protect a document, but it was horrible to try and explain. We did all sorts of silly things to make this hard to figure out, it did nearly nothing to actually protect the data, but it sure was no fun to try and document in a normative style. I believe this obfuscation dates back to around 1994. Here's some pseudo-code to show you the sheer horror of it all – this is from one of the two password verifier approaches:
FUNCTION CreatePasswordVerifier_Method1
PARAMETERS Password
RETURNS 16-bit unsigned integer
DECLARE Verifier AS 16-bit unsigned integer
DECLARE PasswordArray AS array of 8-bit unsigned integers
SET Verifier TO 0x0000 SET PasswordArray TO (empty array of bytes)
SET PasswordArray[0] TO Password.Length
APPEND Password TO PasswordArray
FOR EACH PasswordByte IN PasswordArray IN REVERSE ORDER
IF (Verifier BITWISE AND 0x4000) is 0x0000
SET Intermediate1 TO 0
ELSE
SET Intermediate1 TO 1
ENDIF
SET Intermediate2 TO Verifier MULTIPLED BY 2
SET most significant bit of Intermediate2 TO 0
SET Intermediate3 TO Intermediate1 BITWISE OR Intermediate2
SET Verifier TO Intermediate3 BITWISE XOR PasswordByte
ENDFOR
RETURN Verifier BITWISE XOR 0xCE4B
END FUNCTION
We'd have been much better off just taking a CRC16 of the password. I wanted to see just how bad this was, and wrote up a quick app to try the first 2^40 alpha passwords, and I started seeing cycles in the collisions. Values would go from under-represented to over-represented very quickly. Further inspection shows that for reasonably short passwords, you can immediately tell the number of characters from this value. Seems that for a 1 character password, the first 7 bits vary, 2 characters vary 8 bits, and so on.
Oddly enough, this is so bad that it actually has a benefit. There's so many collisions that while you may find a password that will work, there's no assurance that you found the password used to obfuscate the file, so you're not as likely to be able to get into other things by brute-forcing the 16-bit verifier. Somewhere around 8 billion passwords only generate about 16,000 verifiers, so there's literally hundreds of thousands possible passwords that could have created any given verifier.
What we have here is that someone who is actually a very good general dev (he's now a well thought of dev manager) who tried to roll his own crypto, and implement a simple hashing function. Moral of the story is DO NOT DO THIS.
If you look deeply into the obfuscation array initialization, you'll see another fairly ghastly mistake – the part of the array that coincides with the number of characters in the password actually varies, but the remainder of it is initialized based on values hard-coded into the binary along with values that end up in the document. This makes it possible to write a tool to directly extract quite a bit of information, and then there's the obvious disaster of what happens when you XOR chosen plain-text.
Our next attempt at encryption first showed up in Office 97, and featured RC4. As those of you who are familiar at all with encryption, RC4 is really hard to do correctly, and this is an example of most of the mistakes you can make with RC4. The number one rule of stream ciphers is to NEVER re-use a key stream, since the crypt text is the result of the cipher stream XOR'd with the plain text. If you reuse a key stream, you can XOR them to get the XOR of the two input plain texts, and that's often very easy to sort out. We did this more than once. Next rule of RC4 is that you have to have an integrity check, or you're subject to bit-flipping attacks – there's no integrity check. Finally, you should toss out the first 1k or so of the cipher stream, but we didn't know to do that.
Next, take into account that encryption was considered a munition at the time, and we were limited to 40-bit initial keys. These days, you can work your way through a 40-bit key space in minutes using only one computer. No need to bother with password cracking, just go directly to the key and attack it. The moral of this story is that agile encryption is a serious requirement, because while it may have taken some time to brute force 2^40 keys on a 286, a modern system will make short work of it – and in fact, it would be possible to store all the keys in a single file – it would only consume about 18 GB (much less if we did it as a tree of some kind). A second and more interesting flaw that our tester uncovered was that they thought they were doing an iterated hash (though only 16 iterations), but what they were doing in reality was concatenating the first 40 bits of the MD5 hash of the password, a 16 byte salt, and then repeating this concatenation 16 times. The old encryption library we used made this an understandable error, and it's harder to do with CryptoAPI or CNG. Moral of this story is that crypto is unlike any other code, and you should always get an expert to review what you're doing.
The RC4 encryption was then "strengthened" to use CryptoAPI, and could be configured to go to 128-bit keys, though unfortunately, the old 40-bit stuff was still default – this all happened in Office XP. Sadly, most of the implementation flaws remained. I found one place where there was triple key stream re-use (though only for 8 bytes) in the same spot. The unfortunate attempt at an iterated hash dropped back to a non-iterated hash of the password for reasons I don't understand. Some of the applications, notably PowerPoint, don't suffer as much from key stream re-use as others, and if you chose to use 128-bit RC4 and used a good password, a presentation could be relatively well protected.
As of Office 2007, we do warn you that the encryption we do on the binary documents is weak. Most of the time, it's so weak that it will only act as a mild deterrent. In some cases, we missed encrypting things entirely (which is actually called out in a KB article some time ago). My advice is that if you must encrypt a binary document, use a 3rd party tool to do it. These flaws are called out in the Security Considerations section of MS-OFFCRYPTO.
When we get to OOXML document encryption, the picture gets a lot better. I personally fixed some of the problems. We moved to AES, which is a block cipher, went to an iterated hash that far exceeds RFC2898 (50,000 cycles), and as I stated in a previous post, don't forget your password, or you may never see your data again. The only problems we had here was that we didn't support CBC (cipher block chaining) mode, and we didn't have an integrity check, though a 1 bit flip would result in 16 bytes of junk, so it isn't as high priority a problem as with RC4. We'll address both of these problems in the future, as well as some other improvements I'll talk about once we ship it.
If you take a good look at the ODF specification (as of 1.1) with regards to crypto, you see some of the same sorts of issues. They do some interesting things:
- The number of times to iterate on the password hash is relatively low (1000), and is a fixed value.
- The encryption algorithm must be Blowfish. Blowfish may well be a nice algorithm, but it hasn't been seriously validated, isn't FIPS compliant, isn't Suite-B compliant, and simply can't be used by some customers that require FIPS, Suite-B, or the analogues in their countries. Bruce Schneier himself has this to say about it – "At this point, though, I'm amazed it's still being used. If people ask, I recommend Twofish instead." It would be better to require an algorithm that has been validated, and better yet to allow it to be agile.
- The integrity check and the password verifier are the same thing. There is no way to know whether the user has the wrong password, or whether a bit got flipped. This has the potential for data loss, though I'd suppose one could build a special tool to just try the decryption.
The really good news is that we have some people who are seriously good with cryptography on the Office TWC team now – our tester is really sharp, we've got a PM who previously worked in the crypto group in Windows and is on the Microsoft-wide crypto board, and we have some devs who know this stuff as well. I'm happy to say that when you encrypt an OOXML document that it will be very hard to brute force the password and retrieve the information – and it will keep getting better.