May, 2008

Larry Osterman's WebLog

Confessions of an Old Fogey
  • Larry Osterman's WebLog

    Cool Schwag :)

    • 7 Comments

    Not surprisingly, I'm the security contact for my small part of the Windows organization (it's called the Devices&Media group, which is within the WEX division).  As such, I'm responsible for providing security guidance and reviewing the threat models for our group (I've done a lot of them over the past few months :)).

    Earlier this morning, one of the PMs for one of the teams in D&M stopped by my office with a thank you gift for the work I've done with his team.  He had noticed my 20 year old office tool kit and his team decided to replace it with something newer (and way cooler):

     

    0515081331a

    I sent them a private thank-you, but I've got to say publicly that I'm really touched - it was extraordinarily nice of them and I truly appreciate it.

     

     

    PS: before anyone asks, the photo was taken with the toolkit resting on a test laptop (it's an old Toshiba M5).  In the background, you can see the Blibbet Hat I had made about 20 years ago at a local fair.

  • Larry Osterman's WebLog

    More proof that crypto should be left to the experts

    • 41 Comments

    Apparently two years ago, someone ran a static analysis tool named "Valgrind" against the source code to OpenSSL in the Debian Linux distribution.  The Valgrind tool reported an issue with the OpenSSL package distributed by Debian, so the Debian team decided that they needed to fix this "security bug".

     

    Unfortunately, the solution they chose to implement apparently removed all entropy from the OpenSSL random number generator.  As the OpenSSL team comments "Had Debian [contributed the patches to the package maintainers], we (the OpenSSL Team) would have fallen about laughing, and once we had got our breath back, told them what a terrible idea this was."

     

    And it IS a terrible idea.  It means that for the past two years, all crypto done on Debian Linux distributions (and Debian derivatives like Ubuntu) has been done with a weak random number generator.  While this might seem to be geeky and esoteric, it's not.  It means that every cryptographic key that has been generated on a Debian or Ubuntu distribution needs to be recycled (after you pick up the fix).  If you don't, any data that was encrypted with the weak RNG can be easily decrypted.

     

    Bruce Schneier has long said that cryptography is too important to be left to amateurs (I'm not sure of the exact quote, so I'm using a paraphrase).  That applies to all aspects of cryptography (including random number generators) - even tiny changes to algorithms can have profound effects on the security of the algorithm.   He's right - it's just too easy to get this stuff wrong.

     

    The good news is that there IS a fix for the problem, users of Debian or Ubuntu should read the advisory and take whatever actions are necessary to protect their data.

  • Larry Osterman's WebLog

    Resilience is NOT necessarily a good thing

    • 66 Comments

    I just ran into this post by Eric Brechner who is the director of Microsoft's Engineering Excellence center.

    What really caught my eye was his opening paragraph:

    I heard a remark the other day that seemed stupid on the surface, but when I really thought about it I realized it was completely idiotic and irresponsible. The remark was that it's better to crash and let Watson report the error than it is to catch the exception and try to correct it.

    Wow.  I'm not going to mince words: What a profoundly stupid assertion to make.  Of course it's better to crash and let the OS handle the exception than to try to continue after an exception.

     

    I have a HUGE issue with the concept that an application should catch exceptions[1] and attempt to correct them.  In my experience handling exceptions and attempting to continue is a recipe for disaster.  At best, it takes an easily debuggable problem into one that takes hours of debugging to resolve.  At it's worst, exception handling can either introduce security holes or render security mitigations irrelevant.

    I have absolutely no problems with fail fast (which is what Eric suggests with his "Restart" option).  I think that restarting a process after the process crashes is a great idea (as long as you have a way to prevent crashes from spiraling out of control).  In Windows Vista, Microsoft built this functionality directly into the OS with the Restart Manager, if your application calls the RegisterApplicationRestart API, the OS will offer to restart your application if it crashes or is non responsive.  This concept also shows up in the service restart options in the ChangeServiceConfig2 API (if a service crashes, the OS will restart it if you've configured the OS to restart it).

    I also agree with Eric's comment that asserts that cause crashes have no business living in production code, and I have no problems with asserts logging a failure and continuing (assuming that there's someone who is going to actually look at the log and can understand the contents of the log, otherwise the  logs just consume disk space). 

     

    But I simply can't wrap my head around the idea that it's ok to catch exceptions and continue to run.  Back in the days of Windows 3.1 it might have been a good idea, but after the security fiascos of the early 2000s, any thoughts that you could continue to run after an exception has been thrown should have been removed forever.

    The bottom line is that when an exception is thrown, your program is in an unknown state.  Attempting to continue in that unknown state is pointless and potentially extremely dangerous - you literally have no idea what's going on in your program.  Your best bet is to let the OS exception handler dump core and hopefully your customers will submit those crash dumps to you so you can post-mortem debug the problem.  Any other attempt at continuing is a recipe for disaster.

     

    -------

    [1] To be clear: I'm not necessarily talking about C++ exceptions here, just structured exceptions.  For some C++ and C# exceptions, it's ok to catch the exception and continue, assuming that you understand the root cause of the exception.  But if you don't know the exact cause of the exception you should never proceed.  For instance, if your binary tree class throws a "Tree Corrupt" exception, you really shouldn't continue to run, but if opening a file throws a "file not found" exception, it's likely to be ok.  For structured exceptions, I know of NO circumstance under which it is appropriate to continue running.

     

    Edit: Cleaned up wording in the footnote.

Page 1 of 1 (3 items)