Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

More proof that crypto should be left to the experts

More proof that crypto should be left to the experts

  • Comments 41

Apparently two years ago, someone ran a static analysis tool named "Valgrind" against the source code to OpenSSL in the Debian Linux distribution.  The Valgrind tool reported an issue with the OpenSSL package distributed by Debian, so the Debian team decided that they needed to fix this "security bug".

 

Unfortunately, the solution they chose to implement apparently removed all entropy from the OpenSSL random number generator.  As the OpenSSL team comments "Had Debian [contributed the patches to the package maintainers], we (the OpenSSL Team) would have fallen about laughing, and once we had got our breath back, told them what a terrible idea this was."

 

And it IS a terrible idea.  It means that for the past two years, all crypto done on Debian Linux distributions (and Debian derivatives like Ubuntu) has been done with a weak random number generator.  While this might seem to be geeky and esoteric, it's not.  It means that every cryptographic key that has been generated on a Debian or Ubuntu distribution needs to be recycled (after you pick up the fix).  If you don't, any data that was encrypted with the weak RNG can be easily decrypted.

 

Bruce Schneier has long said that cryptography is too important to be left to amateurs (I'm not sure of the exact quote, so I'm using a paraphrase).  That applies to all aspects of cryptography (including random number generators) - even tiny changes to algorithms can have profound effects on the security of the algorithm.   He's right - it's just too easy to get this stuff wrong.

 

The good news is that there IS a fix for the problem, users of Debian or Ubuntu should read the advisory and take whatever actions are necessary to protect their data.

  • "As has already been commented several times in these discussions, the Debian maintainer did ask the package maintainers: http://marc.info/?t=114651088900003&r=1&w=2"

    don't defend this guy , Kurt Roeckx:

    i) he touched code without analyzing the consequences ( i'm a amateur programmer and i *do* analyze every piece of dependent code before removing anything )

    ii) he committed this patch without consensus in the package maintenance team ( see the bug report , you will see that Kurt Roeckx unilaterally decided to commit the infamous patch ).

    iii) he didn't run any test to check if the RNG keep working ok

    etc, etc

    How ironic, the stupidity of one man can throw to trash the ( well earned ) reputation of a great distro as Debian.

    The remedy:

    Tigher policies, unit testing, peer review, code audits, clear consensus before commit

  • @Ned:

    > a simple hotfix doesn't do *anything* to close your vulnerability

    Which is why it's more than a simple hotfix; there are also updates to openssh and other packages which, in openssh's case, actively blacklists the affected keys. After that update is installed, the compromised keys won't work anymore.

  • Ned: Yep, any data ever sent to an box where the RSA private key data for the the box's TLS cert was generated on an Etch/recent Ubuntu system must be consider compromised.

    Should be possible to generate a list of all possible RSA public key pairs that could be made by openssl on x86, x64, however.  The broken RNG had a whopping 32768 (!) possible output states.

    It'd certainly be an interesting exercise to go around with such a list of known-bad 1024-bit pubkey values and see how many https sites you've visited in the past two years match.

  • RNG - Random Number Generator Kada se otkrije propust u modulu koji generiše kripto ključeve, problem

  • He did include the patch in that thread, albeit not in context.  He clearly said he was going to remove both of them, and it was assented to by OpenSSL devs - Ulf Moler said "[it does not contribute much to the entropy].  If it helps in debugging, I'm in favor of removing them."

    But it very obviously wasn't tested, as the original check-in didn't even compile!

    http://svn.debian.org/viewsvn/pkg-openssl/openssl/trunk/crypto/rand/md_rand.c?rev=173&r1=167&r2=173.

    This was an WTF all the way around, and clearly the "many eyes makes bugs shallow" theory does not apply if those eyes are not looking at the code or not understanding the code they're looking at.

  • By the way, those advocating unit testing - how do you use unit tests to verify something is /not/ deterministic?

  • Mark, read Knuth volume 2 - there are a number of tests that can measure the randomness of a RNG.  Given that this change removed all the randomness of the seed, it should have been pretty easy to tell that something had been broken.

    So yeah, a unit test could have been written to catch this.

  • >But the vast majority of the developers at MSFT aren't

    >crypto experts, that's why we rely on the crypto team to

    >review our decisions that are crypto-specific.

    I think that's the difference between MS and Debian, this was a failure of change control and the development process, not necessarily a failure of crypto.  The problem arose because a single maintainer decided to make a change in some code that they didn't understand and that no-one ever reviewed before it was committed.  Presumably at MS a single user wouldn't be able to slip through a change like that without it being reviewed by a domain expert.

    (Having said that, Benny Pinkas and co showed that the original Windows CryptGenRandom implementation wasn't so hot either :-).

  • >Mark, read Knuth volume 2 - there are a number of tests that

    >can measure the randomness of a RNG.  Given that this change

    >removed all the randomness of the seed, it should have been

    >pretty easy to tell that something had been broken.

    Not necessarily, since it was still being postprocessed with MD5.  I can (for example) use MD5 to hash the sequence { 0, 1, 2, 3, 4, 5, 6, ... } (entirely predictable) and yet no amount of entropy testing will be able to tell that the output isn't random.  If it could then you have a distinguisher for MD5, which means you've broken the hash function.  This is why you need to do your goodness-testing on the input to the hash function.  However since hash functions are often used to accumulate entropy from low-entropy sources you can't just reject apparently low-entropy input out of hand either.  In general there's no easy way to determine something like this, if you specifically want to check for cycles you can use the n/2n-step trick to try and locate them, but if the cycle is too large or the fault is something other than short cycles this won't detect it.

    (It's a lot more complex than that, I could go on about this for awhile :-).

  • It doesn't appear that all the programs that use cryptography on Debian-based systems were affected by this issue. gnupg for example is not likely affected by this particular issue.

  • Peter, both of your points are totally valid.  This <i>was</i> a failure of process, not crypto.  I'm also not saying that it couldn't happen at MSFT (there but for the grace of G_d go I) - stuff happens, and it sucks bigtime when it does.  I hope that the checks and balances we have in place would have caught this but sometimes problems get missed.

    Also I hadn't realized they ran the output of the RNG through MD5, you're right that would complicate the ability to detect the amount of entropy in the RNG.  

    And Ryan, again, you're right - it was only OpenSSL that had the issue, so only SSL and SSH keys were busted, not all crypto on the platform.

  • It has been interesting to read this discussion, particularly after reading more blogs in the last 2 days than can be considered within healthy limits (most of which has been startlingly ill-informed). Nice to see some informed discussion here. As one of the openssl devs who responded to the original posting from the debian package maintainer, I want to clarify what I think is an important point - because I've seen this "but he *did* mail the openssl list and they botched it, so it's openssl's fault too" stuff floating about with troubling regularity. Far be it for me to chalk this up to debian fanboyism ... <ahem> ... but;

    Please consult the openssl FAQ and search the page for valgrind, that may help put this in context. We get this same question *frequently* on the mail-list - some fresh intern or overeager basement-dweller shows up on the list and says "I ran valgrind on <openssl-dependent-app> and found this bug, I'm gonna save the world". More jetsam amidst the flotsam that we all have too little time to adequately respond to. My response (one of the two devs who responded) was to note that one of the quoted lines had a comment mentioning purify and "That's your first clue, build with -DPURIFY". Another FAQ bites the dust, I think to myself. But oh no, this innocent FAQ posting was to later resurface in the most uncomfortable of fashions... Ulf's response to the two quoted lines (essentially identical to one another, out of any kind of context, and without any real hint that a review was warranted) was tantamount to saying he'd like to see the uninitialised usage go away, presumably because he was fed up with this question coming up and perhaps based on a suspicion that something daft would, one day, happen. (FWIW, the use of uninitialised data from the caller's buffer at best adds a little entropy that is caller-pattern-dependent, and at worst won't do any harm.) There was no patch posted, no indication that this was anything other than a FAQ (that the poster *should* have determined for himself, before or after posting), and most importantly, no indication that this was going to be patched onto a popular distribution affecting thousands of mission-critical systems around the globe.

    It was a giant package-management cockup and I welcome Larry's comment that patching an application's code should ideally be left to those who understand that code best. IMHO distributions need to get a little less preoccupied with filling in some delusional (and competitive) "value-add" between their users and up-stream applications, and a little more preoccupied with managing risk and ensuring that the apps their users run match as closely as possible from the same source code that the rest of the world is reviewing (ie. the up-stream code). This "problem" was not a distribution- or usage-specific thing, the maintainer *thought* it was a fundamental glitch in the code and, had this been true, must have assumed it affects everyone in the same way - not just debian. So why did it stay local, unannounced, an unreviewed? So rather than blaming openssl or even FOSS more generally, one could reasonably argue that from a code-review perspective, this maintainer created a more or less *closed-source* variant of openssl. But this being an MSDN blog, I'm better to leave that whole discussion alone ... ;-)

  • Geoff, thanks for stopping by, I appreciate the insightful comment.

    I totally agree that this was a failure of management - while I have made changes to other portions of Windows before (most recently to WMP, the shell and HID control logic), I ALWAYS get the people who actually own the code (and who will ultimately be responsible for my bugs) to carefully review my code.  

    And those developers have found bonehead bugs in my code that they would never have made, and the code review caught them (this works both ways - I've found some stupid bugs in changes other developers have made to code I own).

    I don't blame FOSS for this, this has absolutely nothing to do with the FOSS vs OTS argument, IMHO

    One advantage our model has over the FOSS model is that there is effectively only one distrbution for Windows, so it's much harder to have long-term variations beween components[1].  In FOSS terms, because we're a closed source shop it means that all external patches are submitted to the maintainers of the branch.

    [1] the sustained engineering division has its own set of branches of the Windows source code that they use to create hot fixes and service packs but even those changes eventually get merged back into the main line distribution.

  • Larry, I totally agree. You make an interesting point, as the "FOSS advantage" of having millions of eyes is very dependent on them all looking at the code that everyone is *using*. If everyone is looking at their own forked version of the code then the value of the code-review is diminished and coordination of any issues that get found is quasi-impossible in any case. Worse still, if everyone's looking at the same code but everybody is *using* derivations of the code that *nobody* is reviewing, the risks should be clear to even the dullest among us.

    I'm not interested in debating FOSS vs OTS here, in case that wasn't clear. But I would comment that FOSS, in isolation, has the valuable (nay fundamental) characteristic that users can assume a significant responsibility for identifying problems directly - either preventatively (audit, study, curiosity, etc) or retrospectively (debugging, diagnosis, localised support). So, to some extent the solution-domain is pushed out to meet the problem-domain. You can test and test and test, but it's always the noob that buys your product that's going to tickle out those last few bugs after release, I dare say you're familiar with this. :-) FOSS has this strength, but the corresponding weakness is precisely what you identify - it presumes that users do not (yield to human nature and) fork, hoard, and re-badge the up-stream code. Linux distributions straddle this strength and this weakness, and to my dispair, all too frequently veer the wrong way. In particular, they raise the bar too high for users to help themselves. Users's are typically running a legacy version, patched and built via a pathological packaging system that they are usually shielded from, in which even just identifying the source code that is compiled into their system is *hard*. As you know, if you're going to give users an essentially closed-source system (in terms of the ease with which users can engage problems or look for them before they even occur), you better have a serious budget for identifying such problems yourself. This debian issue has been an excellent example of this - the widely-reviewed code that users figured they were relying on did not contain the bug that was present on their system, and yet the distributed package had minimal internal review because like most FOSS, it relies on users and peers (and up-stream quality) to meet the appropriate levels of Q/A.

  • What if your whole business was about trust. More specifically, what if your entire business was about providing a long number (very long) to companies so nobody knows that number except you and the company. And then, suddenly you find out that the number

Page 2 of 3 (41 items) 123