In the lifetime of every single project I've ever worked on, there's this little "game" that's informally played at the very end of the ship cycle. It's never been formalized, but it's always been played (at least it was in every group I've ever worked in).
I call it "Last Checkin Chicken", it's kinda like the childrens game of "Hot Potato".
The way the game works is that no group wants to be the last group that makes a checkin in a particular release of the product. I've never really understood why this is, maybe it's because we could have shipped one day earlier if it hadn't been for that darned last bug (which is silly, but).
Since nobody wants to make the last checkin, whenever you make a checkin, you hold your breath and hope and pray that somebody else makes a checkin (and thus your checkin isn't the last one in the product). If you ARE the last person making a change, then you don't win a prize or have to wear goat horns, or anything, it's just an invisible badge of shame :)
Last Checkin Chicken has varients. Some groups hold informal office pools about which group is going to be the one that makes the last checkin ("Heh, I bet it's going to be the Glibert team, their code is just garbage", "Nah, it's going to be Snorklewanger team, they always find stuff at the last minute"). And within the individual teams, there is even further speculation, either by feature ("I know we've had trouble with the FLOMBERT feature, if it's our team that has the last bug, it's going to be in that feature") or developer ("Man, Terry writes crappy code, bet the last fix is in code he wrote").
Ultimately, Last Checkin Chicken is something that developers do during the somewhat tense final hours of a project as a strange way of blowing off steam, ultimately it's harmless.
A couple of weeks ago, my old laptop's motherboard gave up the ghost, so after much price and feature shopping, I got a new Toshiba Satellite A100 laptop for work.
In general, I love this box. It's pretty fast, has a good modern processor, a decent video card (NVidia Go 7300), 1G RAM, and more extras than I would expect for a value laptop (it cost under $1000). This thing's got Bluetooth, a fingerprint reader, a SD flash reader, IR, and other stuff - way more than I'd expect for a low end laptop.
I have only two major complaints about it. The first is that the monitor, while a 15 inch widescreen monitor, can only do 1280x800 (my old laptop could do 1280x1024 and I miss that).
The other major complaint is that I can't run Vista on it. Toshiba hasn't yet released the display drivers for the video card, so I have the choice of running XP in 1280x800 or Vista in 1024x768 VGA mode. And I'm addicted to the resolution.
I actually tried Vista for a while, but it was too painful - everything worked (even the silly fingerprint reader had drivers on Windows Update), but not having the screen working just got on my nerves.
What's interesting is that after using Vista exclusively at work for 6 months and almost exclusively for over a month (I have one machine at home I use that doesn't have Vista on it), I've decided that the XP UI is just broken. I've got totally used to the Vista UX and hate not having it. I miss being able to type into the pearl, I actually miss breadcrumbs (which is wierd, because I hated them when I first encountered them).
I can't wait until Toshiba finally gets the drivers for my machine finished, XP feels so dated.
I was reading this months issue of Dr. Dobbs Journal, and I ran into the column "Illusions of Safety" by Pete Becker. Pete writes about enhancements to the C language, and I usually really enjoy his columns.
This month was an exception. Pete basically spent the entire column explaining why you don't have to worry about the "unsafe" C runtime library functions like gets() and strcpy() as long as you design your application correctly.
I'm just fine until he gets to the 2nd page of the article, "Prevention by Brute Force". First off, he presents a "safe" version of strcpy called safe_strcpy which isn't actually a safe version of strcpy. It's actually a replacement for strncpy, and preserves one of the critical bugs in strncpy - strncpy isn't guaranteed to null terminate the output string, and neither is his "safe" replacement. He also describes testing for success on the strcpy as "tedious". Yeah, I guess that ensuring that you handle the failure of API's is "tedious". I'd also describe it as "correct".
But it's when we get to the 3rd page "Prevention by Design" that Pete totally loses me.
You see, on Page 3, Pete decides that it's OK to use strcpy. You just need to make sure that you put an assert() to make sure it doesn't overflow. And you need to make sure that the inputs to your functions are checked upon entry to the system.
But that's just plain wrong. First off, the assert() won't be there in non debug code - assert()'s disappear in production code. So your production code won't have the protection of the assert. And it completely ignores the REAL cause of the problem - what happens when the vulnerable function is called from an unchecked code path. If (when) that happens, you've got a security hole. And the bad guys ARE going to find it. Michael Howard gives LOTS of examples where developers added checks to a vulnerable code path at the entry point of a function without realizing that there was another code path to the vulnerable function. You also don't know what will happen four or five years in the future - it may be that a future maintainer of your code won't realize that your low level routine has such a constraint and calls it improperly.
It's far better to replace the strcpy() call with a strcpy_s() call and make the caller pass in the size of the target buffer. That way you don't rely on others to protect your vulnerability. There is at least one known vulnerability that was caused by someone taking Pete's advice. A development organization had a vulnerability reported to them, and they fixed it by adding a check up-front to cut off the vulnerability. Since they'd removed the vulnerability, they released a patch containing the fix. Unfortunately, they didn't realize that there was another path to the vulnerable function in their code, and they had to re-release the patch. This time they did what they should have done in the first place and not only did they add the up-front test, they also removed the vulnerable call to strcpy - that way they'll never be caught by that vulnerable call again.
On page 4, he claims that it IS possible to use gets() correctly. It's ok to have programs that read from the console as long as those programs are only called from other, known programs. But of course, this totally ignores the reality of how the bad guys work. Security vulnerabilities are rarely exposed by someone using an API or protocol in the way it was intended. They almost always occur because someone DIDN'T use a program in the way it was intended. For instance, I know there was at least one *nix vulnerability that was explicitly caused by a setuid application calling gets. Someone fed a bogus command line string to that application and presto! they had root access to the machine.
Here's another way of thinking about it (oh no, not another stupid automotive analogy). If you drive safely, you don't need seat belts (or airbags). But I wouldn't even DREAM of riding in a car without at least seat belts. Why? Seatbelts and airbags are a form of defense in depth. Your primary defense is safe driving. But just in case, your seat belts (or airbags or both) may save your life.
Bottom line: A good architecture is no substitute for Defense in Depth. Sure, apply the architectural changes that Pete described. They're great. But don't believe that they are adequate to ensure safety.
I got an email from someone using the contacts form asking:
There is an article on MSDN about using VirtualAlloc to reserve then commit memory pages. Here is the link: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/memory/base/reserving_and_committing_memory.asp The article demonstrates a use of SEH to handle a page fault and then commit the appropriate page at runtime. My question: is SEH being used for performance reasons? I mean alternatively we could write a special allocator function that checks the allocation range and commits new pages when necessary without triggering a page fault. Of course such code would be run for every allocation, whether or not it actually required a new page to be committed. Can you elaborate?
There is an article on MSDN about using VirtualAlloc to reserve then commit memory pages. Here is the link: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/memory/base/reserving_and_committing_memory.asp
The article demonstrates a use of SEH to handle a page fault and then commit the appropriate page at runtime. My question: is SEH being used for performance reasons? I mean alternatively we could write a special allocator function that checks the allocation range and commits new pages when necessary without triggering a page fault. Of course such code would be run for every allocation, whether or not it actually required a new page to be committed.
Can you elaborate?
It's actually a good question. It turns out that even though I've come down hard on the use of SEH as a mechanism to ensure reliability, there ARE a couple of places where SEH is not only a good idea, but it's required.
IMHO, the techniques shown are valid, but IMHO they're less likely to be used than this shows. However, if you were going to implement a sparse memory manager where you'd like to reserve a huge chunk of memory and commit the pages as needed, it might make sense.
And the function DOES show one of the three places where SEH is reasonable. They are:
In all of these cases, it's required that you use SEH. The first two because SEH is used to propogate out-of-band error information, the third because you cannot trust the contents of memory handed to you by someone on the other side of a security boundary.
For Memory Mapped files (and RPC), the system has to have a way of communicating error status to the caller. If you attempt to read from a memory mapped file and an error occurs when reading the file, there's no way of "failing" a read - it's just a MOV CPU instruction, and it has no failure semantics. As a result, the only way that the system can "fail" the operation is to abort the instruction with some form of access violation.
The only way to catch such an access violation is to use SEH to wrap the access to the memory, the "Reserving and Committing Memory" article shows how to do that, and shows some techniques to inspect the actual cause of the failure.
For RPC, they have a similar problem. RPC allows an application to define the full semantics of the function being remoted - there's no way of communicating transmission failures to the application, so once again, the system needs to have a way of propogating out-of-band error information. That's why RPC calls should always be wrapped with RpcTryExcept/RpcExcept/RpcEndExcept sequences.
The third case in which SEH is reasonable is when dealing with accessing data that is passed across security boundaries. When data is passed across a security boundary, you cannot EVER trust the caller, because that leads to security holes. There have been a number of security bugs in both Windows AND *nix caused by this problem. To resolve this, you need to copy all the data from the user into a kernel data structure. If you don't, you'll bluescreen the system (on Windows). The same thing holds true for high privileged services (and other security boundaries). The advantage that services have is that they live in another address space, so it's less likely that their caller has direct access to their address space (it can happen if your service communicates using named shared memory though).
Bottom line: Don't use SEH unless you're in one of the three scenarios above. And even then, think long and hard about it.
Recently someone posted the attached screen shot on the internal self hosting alias.
What's wrong with this English?
It's the use of "Can" instead of "May". This happens to be one of my minor pet peeves with common English usage. The difference between "Can" and "May" can be quite subtle and most people don't catch it. "Can" reflects the ability to do something, "May" requests permission to do something.
To use my kids as an example, a dialog with them might go something like:
"Dad, can I go to the store?" "Absolutely you can - it's just down the street, so it's not a big deal."
"Dad, may I go to the store?" "No you may not, without a parent accompanying you."
The first question asks if the kid asking has the ability to go to the store - of course they do, it's nearby. The second question asks permission to go to the store.
In this dialog's case, the prompt is asking if Windows has the ability to collect more information. Of course Windows has the ability to collect more information. It's asking permission to collect more information, so the correct prompt should be "May Windows collect more information about this problem?".
Unfortunately I didn't notice this until yesterday, so it's too late get this fixed for Vista, but it'll be fixed in a subsequent release. And it's going to annoy the heck out of me every time I see it (which shouldn't be that often :->).
Edit: Fixed typo pointed out by Peter Ritchie (I love the power of the edit button to make me look less stupid) :)
I don't usually do "Me Too", but I just had to share this. James Senior has released a short podcast with the new Vista sound scheme!
It's super awesome :)
Check it out here!
I was just chatting with one of the other developers in my group (Mitch), and he mentioned a bug that just popped up on our radar. In our bug tracking system, there's a field called "NumInstances" which indicates the number of people who have encountered the bug. The OCA system is hooked up to this field, so each time a distinct customer reports a particular problem, the OCA system automatically updates NumInstances (at least that's the way it looks like it works :))
On this particular bug, the NumInstances was high - more than triple digits, all of which came from the RC1 build (which has been out for about a month now).
IMHO, this illustrates the huge difference between bugs in the OS and bugs elsewhere. Bugs in the OS show up and immediately affect millions of customers. A long time ago, I wrote the parable of "One in a million is next Tuesday", that also applies to bugs in Windows - if a bug in Windows affects one in a million users, that means that there are hundreds and hundreds of people who are affected.
Mitch was talking to his brother Mike (who used to work with me on Exchange), and he mentioned this bug. Mitch commented on how many people had seen this problem, and Mike commented that it was sort of like Exchange bugs.
When Mitch relayed their conversation to me, my only comment was "Yeah, the Windows bug is much worse, but in a better way".
You see, the problem has to do with the direction of impact. Both Windows and Exchange bugs are horrible, and cause distress. But Windows bugs tend to affect individuals, while Exchange bugs tend to affect enterprises. A Windows bug typically affects one user at a time, but Exchange bugs affect THOUSANDS of users at a time. Now of course, it depends on what part of Windows and what part of Exchange you're dealing with. A bug in a disk driver for Windows can affect thousands of users, and a bug in the anti-spam filtering system might only affect one mailbox. Similarly, a problem in parts of Windows (like the Active Directory) can take out an Exchange server.
There's a corrolary to this "direction of impact" issue. Because a Windows bug affects individuals and there are many hundreds of millions of users running Windows, each bug has the potential to affect millions of users (and thus have an insanely high NumInstances). But an Exchange bug might only affect one or two enterprises (and thus have a NumInstances of one or two). Each of them has the same aggregate impact, but they express themselves in radically different ways.
So I decided to take the advice people mentioned in the previous post, and my laptop is now happily running Vista - I get a WEX score of 3.0 based on the video perf (expected), other than the video, the machine gets a 4.4, and I'm a very happy camper.
Thanks to everyone who suggested solutions.
I wanted to shout out to two bloggers I read religiously but haven't commented on.
The first is Ryan Bemrose, who writes as the Audio Fool. Ryan's been writing a bunch about signal quality and DSP in audio (which makes sense since he's one of the testers working on the audio quality tests in Vista).
The other one is Jeff Jones Security Blog. Jeff's been having a huge amount of fun(?) slicing and dicing security vulnerability counts across a bunch of different products. He also writes about other random security issues. Jeff's written a number of "wow, I can't believe he wrote that" posts, but he's got the numbers to back them up. He very rarely takes a "our platform is better" position (because frankly, it's pointless), he just puts numbers out there and says "make up your own mind". His most recent post (analyzing vulnerability trends across the past 5 years) was fascinating (it's clear from the numbers that the bad guys have shifted their focus from vulnerabilities in the OS to vulnerabilities in the application space (across all operating systems). Anyone who has been looking at the various vulnerability reporting lists over the past year has clearly seen this, but it's interesting seeing the data laid out clearly. This is also not very surprising - in general operating systems have gotten harder and harder to attack, so the bad guys have shifted their focus from the operating system to the application space.
Anyway, enjoy :)