One question that keeps on coming up when you’re writing a server in NT is: “Why can’t I access remote resources from my server when impersonating my client?” It shows up on our internal aliases about once a month in one form or another.
This situation is also known as “delegation.”
There are actually two different answers to the question.
The first answer applies if you’re not running with the active directory; the other applies if you’re using AD.
If you’re not using the active directory, then your clients are typically authenticated via the NTLM or DIGEST authentication mechanism. Both of these mechanisms work in similar fashions: When the user logs into the client machine, the machine logging the user on computes a one-way hash of the user’s credentials.
When the client connects to the server, the server computes a “challenge”, which is tailored to the client, and sends the challenge to the client. The client then encrypts the challenge using the one-way hash as a key, and sends the encrypted challenge to the server. The server, in turn hands the challenge and the encrypted result to the domain controller to validate that the challenge and the response could only have been generated by the user (if the user’s a local user, it uses the local security authority (LSA)). Note that the server doesn’t ever see the user’s password OR the hash of the users password. The server just knows the challenge and the response to the challenge, neither of which contains the user’s password.
When the domain controller responds to the client, it includes enough information to allow the local machine to create the token for the user. So far so good.
Now let’s consider what happens when the server tries to access a remote resource, say a file on another machine. The server impersonates the user, and attempts to open the file. The remote machine sees the new connection, computes the challenge, and sends it to the client. Now the server needs to encrypt the challenge with the hash of the user’s password. But the server doesn’t KNOW the hash of the user’s password. It was never told that information. The domain controller does, and so does the client, but the server doesn’t. All the server could know is the challenge and the encrypted challenge. And thus, since the server has no way of responding to the challenge, the delegation fails.
Now for the other case, when you use Kerberos, if the client authenticated with the server using the Kerberos authentication package (which requires the active directory), then the picture’s a bit brighter. It turns out that one of the design criteria of Kerberos was supporting delegation. In Kerberos authentication, the domain controller gives the server a “ticket” that identifies the client. The cool thing about the ticket is that it can be handed to a remote machine, which can then validate the ticket against the domain controller. So if you’re using the active directory, then it’s POSSIBLE for the client to be authenticated with the remote machine.
But, as always, there’s a catch. NT wants to limit the number of services that can do delegation, since it’s conceivable that such a service could be used to mount an attack (since a ticket can be saved and re-used for several hours after it’s been granted, a service that can do delegation can act as if it was the client user for several hours beyond when the user actually connected). So you need to mark the service account for your service (or the machine account if the service’s running as LOCALSYSTEM, LOCALSERVICE, NETWORKSERVICE, or if you’re using a named pipe). This KB article shows how to do this in Win2K, the same techniques work for Win2K3.
Btw, see my post on Security Terms for clarification on any vocabulary that’s confusing J.
Today’s post is a bit boring.
I’ve got a bunch of security-related articles that I’d like to write up, but I’ve realized that doing this requires that I define some terms up front to give a common framework for the articles. I’ll be using this post as a reference for these posts in the future.
So please bear with me while I define some commonly used terms that show up a lot in discussions about security.
First off, there’s “authentication”. This is the process of knowing that the guy at the other end of the pipe (or keyboard) is really who she says she is. Authentication is hard. REALLY hard. So hard that any company that thinks it can design its own authentication scheme might as well be putting out a sign saying “Hack Me!” There are lots of examples of this, Google for NTLMv1 vulnerability sometime for some horror stories about when Microsoft tried it. Or, my personal favorite: Dark Age of Camelot, a MMRPG put out by Mythic. As this post to Bugtraq points out, they made some slight errors in their authentication protocol (like sending the encryption key from the server to the client, then transmitting all the customer’s billing information to the client (including credit card number and expiration date) J.
When you authenticate a user, you establish the user’s “security principal”. A “security principal” isn’t the head of the security school; instead a security principal uses the 2nd definition of principal: “A main participant in a situation”. In NT, a principal is identified by their SID, or Security IDentifier. A security principal doesn’t have to be a user; it can also be a group (a collection of users). Again, because groups are principals, they have SIDs.
When you authenticate a user, one of the byproducts of that authentication (in NT) is the “token” of the user. The token of the user contains the user’s SID, and the SID for all the security groups of which the user is a member (there are other types of groups that don’t participate in security like Distribution Groups).
The next thing to consider is “authorization”. This is the process of knowing if the principal you just authenticated has the rights to access a particular piece of data. People often confuse authentication and authorization, but it’s important to keep them straight. In NT, the principal mechanism used for authorization is the AccessCheck API.
When you’re doing an access check, all the SIDs in the user’s token are considered during the access check. If a principal in the token (a SID in the token) is present, then the principal is considered to be “active” in the user’s token. It is also possible to create a “restricted” token from an existing token. When you create the restricted token, you can specify principals in the token to be disabled. When principals are disabled in a token, they are only considered for deny access in an access check.
In order to perform authorization, it is often useful to “impersonate” the client. To impersonate a client, the server’s process takes on the identity of the client. This is done via the ImpersonateLoggedOnUser (or the RpcImpersonateClient or the CoImpersonateClient) API. More on impersonation later in this post.
Ok, those are the biggies, now for some more terms.
First off, there’s “Quality of Service”, or QoS. When you’re dealing with networking, QoS typically means a guaranteed throughput, which is useful for multimedia. When you’re talking about security, QoS means the quality of the security information sent from the client to the server. RPC exposes this as the RPC_SECURITY_QOS structure. For COM, the CoInitializeSecurity API can be used to set these fields.
In the RPC structure, there are basically 3 fields of interest. The first sets up the capabilities of the connection. This determines which authentication package should be used, and whether or not the connection should be encrypted.
The second defines the impersonation level – this controls the level of the authentication.
The third indicates whether the security is dynamic or static.
Let me take all these in turn.
First the capabilities of the connection. This determines if the connection will be authenticated with a security package that supports mutual authentication (so the client can trust the server or not), etc.
Next, the impersonation level – the impersonation level determines what can be done once the server’s authenticated the user.
There are four levels of impersonation:
1) Anonymous – there’s no authentication of the user at all. If you use Anonymous QoS to identify the user, the server authenticates the client, but the RPC server can’t use that information. This is usually counter-intuitive, so it’s not recommended. If you want to see what the Anonymous user can access, then you can use ImpersonateAnonymousToken to impersonate the anonymous user, and then do your access checks.
2) Identify – The server validates the user. The server can call AccessCheck to verify the user’s access to resource, but it cannot access resources outside the process space (like files, registry keys, shared memory regions, etc).
3) Impersonate – The server has validated the user. In addition, the server can act on the user’s behalf to access local resources. For many security providers, there is no difference between Identify and Impersonate.
4) Delegate – The server has validated the user. In addition, the server can act on the user’s behalf to access both local AND remote resources.
And finally, there’s static or dynamic QoS. Static QoS indicates that the security token of the user is fixed at the time of connection. Dynamic QoS indicates that changes to the token on the client will be reflected in the client’s token on the server. Similarly, if the client application is impersonating a user on the client machine, then that impersonation will be reflected on the server. If the client then reverts to his own identity, the token that server is using will also revert to the client’s original identity.
Thanks go to Richard Ward for his review.
Never let those rewinding fees from your local video store stop you again, someone’s now selling a DVD rewinder!
It’s not by the Buy Dehydrated Water folks, but…
Well. I honestly didn’t expect to get traction on my post about getting infected with a virus, but I guess sending a politely worded complaint to the CEO and Chief Software Architect of Microsoft gets things done J
On Monday, I received a fascinating response from the CIO which was truly enlightening.
The bottom line is that I was infected because it’s not possible to keep a network as big as Microsoft’s 100% virus free.
There are 285,000 Windows machines running on the Microsoft network at any time. Microsoft runs constant scans and forces patches on all unpatched machines within a set timeline (24 hours for emergencies, 14 days for critical updates). Microsoft’s IT department gets patches to more than 99% of the machines on the corporate network within 8 days of the announcement of a critical vulnerability via either voluntary patches, SMS deployed patches, logon script patching or port shutdowns (if none of the previous things work).
The problem is that it’s impossible to get to 100% compliance. Machines get constantly added to the network from mobile employees (sales force, etc), contingent staff (temporary workers), new hires, etc. There’s also the problem of the machines in the conference rooms that run the projectors – they’re often turned off, which means that they don’t get updated. And when they’re brought online, they’re vulnerable until the vulnerability scanners pick them up.
Last week (when I was infected), Microsoft’s IT department detected 330 machines on the corporate network that were vulnerable to Sasser. 249 of them had their network ports shut off; the other machines were force-patched by one of the techniques above.
My problem was that Sasser propagates VERY quickly. Which means that during the time my machine was vulnerable, one (or more) of the 330 machines that was vulnerable also was infected. So even though 99.98% of the machines on Microsoft’s network weren’t vulnerable to the patch, enough were to cause me grief.
One of the key takeaways from Ron’s email was that the IT department strongly suggests that people use Remote Installation Service to upgrade their machines instead of using my technique of rebooting from the unpatched XP CD that they’ve been carrying around for years. The RIS images that OTG deploys have most of the patches deployed on them already, so if you reinstall via RIS, your machine won’t be vulnerable.
The truly frustrating thing for me is that I truly wish I had known that our RIS technology supported a non destructive reinstall option (as Saurabh Jain pointed out). If I had known about that, I would have tried the RIS option to reinstall XP SP1 without reformatting when I decided to back out the interim XP SP2 build.
One other fascinating tidbit in the email is that apparently there are some releases in the future that will render the network even more secure. Unfortunately I can’t talk about them L, because I don’t believe they’ve been announced yet but in the future vulnerable machines won’t even be allowed to get on our network.
The Exchange team just sent me email that they’ve posted the 3rd of my “Exchange 2000 ACL” ‘blog entries, so here’s a link to it.
This one’s my favorite, because it finally shows the cool things you can do with NT ACLs.
Following on the heals of Eric Lippert’s posts on Hungarian and of course Rory Blyth’s classic “Die, Hungarian notation… Just *die*”, I figured I’d toss my hat into the fray (what the heck, I haven’t had a good controversial post in a while).
One thing to keep in mind about Hungarian is that there are two totally different Hungarian implementations out there.
The first one, which is the one that most people know about, is “Systems Hungarian”. System’s Hungarian is also “Hungarian-as-interpreted-by-Scott-Ludwig” (Edit: For Scott's side of this comment, see here - the truth is better than my original post). In many ways, it’s a bastardization of “real” (or Apps) Hungarian as proposed by Charles Simonyi.
Both variants of Hungarian have two things in common. The first is the concept of a type-related prefix, and the second is a suffix (although the Systems Hungarian doesn’t use the suffix much (if at all)). But that’s where the big difference lies.
In Systems Hungarian, the prefix for a type is almost always related to the underlying data type. So a parameter to a Systems Hungarian function might be “dwNumberOfBytes” – the “dw” prefix indicates that the type of the parameter is a DWORD, and the “name” of the parameter is “NumberOfBytes”. In Apps Hungarian, the prefix is related to the USE of the data. The same parameter in Apps Hungarian is “cb” – the “c” prefix indicates that the parameter is a type, the “b” suffix indicates that it’s a byte parameter.
Now consider what happens if the parameter is the number of characters in a string. In Systems Hungarian, the parameter might be “iMaxLength”. It might be “cchWideChar”. There’s no consistency between different APIs that use Systems Hungarian. But in Apps Hungarian, there is only one way of representing the parameter; the parameter would be “cch” – the “c” prefix again indicates a count, the “ch” type indicates that it’s a character.
Now please note that most developers won’t use “cch” or “cb” as parameters to their routines in Apps Hungarian. Let’s consider the Win32 lstrcpyn function:
LPTSTR lstrcpyn( LPTSTR lpString1, LPCTSTR lpString2, int iMaxLength);
This is the version in Systems Hungarian. Now, the same function in Apps Hungarian:
LPTSTR Szstrcpyn( LPTSTR szDest, LPCTSTR szSrc, int cbLen);
Let’s consider the differences. First off, the name of the function changed to reflect the type returned by the function – since it returns an LPTSTR, which is a variant of a string, the function name changed to “SzXxx”. Second, the first two parameters name changed. Instead of “lpString1” and “lpString2”, they changed to the more descriptive “szSrc” and “szDest”. The “sz” prefix indicates that the variable is a null terminated string. The “Src” and “Dest” are standard suffixes, which indicate the “source” and “destination” of the operation. The iMaxLength parameter which indicates the number of bytes to copy is changed to cbLen – the “cb” prefix indicates that it’s a count of bytes, the standard “Len” suffix indicates that it’s a length to be copied.
The interesting thing that happens when you convert from Systems Hungarian to Apps Hungarian is that now the usage of all the parameters of the function becomes immediately clear to the user. Instead of the parameter name indicating the type (which is almost always uninteresting), the parameter name now contains indications of the usage of the parameter.
The bottom line is that when you’re criticizing Hungarian, you need to understand which Hungarian you’re really complaining about. Hungarian as defined by Simonyi isn’t nearly as bad as some have made it out to be.
This is not to say that Apps Hungarian was without issue. The original Hungarian specification was written by Doug Klunder in 1988. One of the things that was missing from that document was a discussion about the difference between “type” and “intent” when defining prefixes. This can be a source of a great confusion when defining parameters in Hungarian. For example, if you have a routine that takes a pointer to a “foo” parameter to the routine, and internally the routine treats the parameter as single pointer to a foo, it’s clear that the parameter name should be “pfoo”. However, if the routine treats the parameter as an array of foo’s, the original document was not clear about what should happen – should the parameter be “pfoo” or “rgfoo”. Which wins, intent or type? To me, there’s no argument, it should be intent, but there have been some heated debates about this over the years. The current Apps Hungarian document is quite clear about this, intent wins.
One other issue with the original document was that it predated C++. So concepts like classes weren’t really covered and everyone had to come up with their own standard. At this point those issues have been resolved. Classes don’t have a “C” prefix, since a class is really just a type. Members have “m_” prefixes before their actual name. There are a bunch of other standard conventions but they’re relatively unimportant.
I used Hungarian exclusively when I was in the Exchange team; my boss was rather a Hungarian zealot and he insisted that we code in strict Apps Hungarian. Originally I chafed at it, having always assumed that Hungarian was stupid, but after using it for a couple of months, I started to see how it worked. It certainly made more sense than the Hungarian I saw in the Systems division. I even got to the point where I could understand what an irgch would without even flinching.
Now, having said all that, I don’t use Hungarian these days. I’m back in the systems division, and I’m using a home-brewed coding convention that’s based on the CLR standards, with some modifications I came up with myself (local variables are camel cased, parameters are Pascal cased (to allow easy differentiation between parameters and local variables), class members start with _ as a prefix, globals are g_Xxx). So far, it’s working for me.
I’ve drunk the kool-aid from both sides of the Hungarian debate though, and I’m perfectly happy working in either camp.
I’m not an evangelist like Scoble, but I do love toys. And you get to work on a lot of really cool toys in the multimedia group J
A couple of months ago I was redirected from working on Longhorn to work on a project code named Fjord. Fjord’s public name is “Windows Media Connect”, and it’s really pretty awesome.
Essentially, WMC is a service that runs on your Windows XP SP2 machine that turns your PC into a audio/video jukebox. It allows you to export the multimedia content on your machine to any device that supports the UPnP MediaRenderer protocol. There are a bunch of these devices available, Sony sells one called the RoomLink, and similarly Omnifi sells one called a DMS1. The Sony supports audio, video and pictures, the Omnifi is audio-only.
I’ve got to say that in all seriousness these things will totally change the way that people experience multimedia on their PC. I currently have 3 machines in my office (the Omnifi, the RoomLink and a 3rd that’s not yet been announced). These things rock. You can play your MP3 files, your WMA files, you AVI files (I’m not sure which codecs are supported) on them. You get the convenience of having your media on your PC but you get to play it on your home stereo. Some of the devices are wired, some are wireless, so you can hook these things up to your in-house wireless network without too much difficulty (you do have an in-house wireless network, don’t youJ?)
Even in my office, I LOVE the fact that I can now use my office stereo to listen to my ripped CDs. So much that I’ve been aggressively ripping them instead of just keeping them in their jewel cases (yes, I’m a dinosaur – up until about 6 months ago, I didn’t have ANY ripped music, all my office CDs were still in their jewel cases).
I’ll have to be honest and say that the UI on the devices can be a bit “esoteric”, but I’m confident that this will change as more devices come on line and competition heats up. Some of them are REALLY quite nice (especially the one that’s not yet been announced).
I hate hydraulic elevators.
Really I do. My building has 4 of them, and they work intermittently at best. My personal favorite is #4, which I call the moaner.
You get in it, punch the floor and it starts groaning at you. All the way up it groans and moans.
I never know if I’m going to get stuck in it or not.
I hate hydraulic elevatorsJ.
Anyway, more technical stuff soon, I just wanted to get this off my chest J
Well, it finally happened. For the first time in my 20 year history at Microsoft, I had to reformat a computer because it got hit by a virus.
I’m not sure how the virus got inside the firewall, my guess is someone brought it inside on a laptop or something, but it happened.
You see, I was running an interim build of XP SP2, and wanted to update to the RC build. So I uninstalled the interim build (we only support upgrading from public releases).
And my machine puked. This happens; there was probably a bug in the interim build’s uninstaller, no big deal, it’s not like I’ve not done this dozens of times before.
So I figured I’d reinstall XP and re-install the patches. Again, nothing new here. I’ve done this dozens of times, its part of the cost of running interim builds.
But this time, things went horribly wrong. Seconds after I installed the RTM bits, I got the dreaded “Access violation in LSASS.EXE at 0x00000023” that indicates I was infected with Sasser.
I tried about 6 different ways of removing this from my machine – reinstalling again, reinstalling clean, reinstalling into another partition. Nothing worked, and I was left with wiping the machine.
Now I’m reinstalling windows again, after the reformat. I guess I know what I’m going to be doing for the rest of the day L
The reality is that once I got infected I had no choice but to reformat my machine, I was just holding off on the inevitable. Why would I have to reformat the machine? Well, because there’s no way of knowing what the payload of the infection is. It could have been an innocuous payload that popped up a “Hey, you got infected!” popup every 10 minutes – Annoying but harmless. It could have been a rootkit that would use my machine as a doorway for hackers to gain access to the Microsoft corporate network. And once you’re rooted, there is NO way of knowing that you’re rooted – A good root kit covers its tracks so that it is essentially undetectable.
This is important: IMHO, once you’ve confirmed that you’re infected with a virus, you really have no choice but to wipe the machine since you have no way of knowing what’s been compromised. Hopefully you have a recent backup, or you have a way of saving your critical files before the reformat. I recently saw a report (I’m not sure where now) that someone discovered a worm that was infecting the system restore partitions on some machines – these are backup partitions that are installed by OEM’s on machines with a copy of the image that they use to create the system – it’s a replacement for the OEM install CD that used to come with computers. The worm was modifying the files on the master copy, so if you used the OEM’s “recover my system” procedure, you just re-infected your machine. The only recourse from this one was to find a copy of a Windows CD and reinstall from that.
I’ve always been a staunch advocate of safe computing. At my home network (with only 7 computers), before I installed broadband, I bought a hardware firewall (first a Netgear RO318, now a DLINK DI604 (a truly sweet piece of hardware btw)). I made sure that all 7 machines were kept up-to-date on patches. Every machine has antivirus installed on it and the signatures are kept up-to-date. I was smug in my self-assured knowledge that I was safe because I was doing the right thing. I berated my parents for not having a firewall on their broadband connections.
So I’ve just had my first taste of what it feels like to be on the other side of the firewall. And it leaves a very bitter taste in my mouth.
So as President Clinton once said: “I feel your pain”.
There’s been an interesting confluence of discussions at http://weblogs.asp.net about debugging other people’s code. JeremyK started the ball rolling, and Eric’s picked up with it. So I figured I ought to add some more details from my end. First off, Eric’s “part two” article is an absolute must read.
Just about every week I end up having to debug a problem in somebody else’s code. Either it’s something I’m testing that doesn’t work (Hmm. After my changes, why doesn’t winamp play music any more?), or it’s someone on my team that’s having a problem (Can you help me figure out why CoCreateInstance isn’t creating my object?).
The first thing I do when I’m debugging is to ensure that windbg is installed on the machine, and that the NT symbols are up-to-date (or that they’re using Microsoft’s public symbol server). Btw, the symbols that Microsoft publishes for Windows are almost exactly the same symbols we use internally, the internal functions have some more information like line number information and structure definitions (and routine names for static functions), but I rarely need that information when debugging – the routine names are almost always enough to get me started.
Often times I come into people’s office and ask to use windbg but they say “I’ve got visual studio, why can’t you use that”? Well, the answer is simply: “Because visual studio doesn’t have the level of command line support that windbg has”. It really doesn’t, although it’s improved immensely in recent versions. Windbg offers an essentially unlimited length command history window – I can look backwards through the history window and see what’s changed.
Also, when I’m debugging (even when I have source code available) I almost always debug in assembly language single step mode. This way, even if I miss the decision point that caused the failure I can look back in the history to see what failed. Windbg’s command line window is essential for that – I get registers AND code at the same time. I’ve had other developers at Microsoft look over my shoulder as I’m debugging and exclaim in surprise “I didn’t know anyone actually ever looked at the assembly language any more!”. Well, I do. Sue me J
The next thing that’s important is to be fearless. I can’t think of the number of times that people have said to me “Wait, that’s OLE’s code. Why are you debugging in OLE’s code?” Well, if you want to understand the problem, you need to look at the code. Even if you don’t have the symbols, you need to look at the code. It can be quite daunting to debug someone else’s code, but press on. At a minimum, you might learn something.
The other thing I always keep in mind is relates to Eric’s Calvin&Hobbes comment: “I have got to start listening to those quiet, nagging doubts.”
Look at the routines that are being called. If I’m debugging something, then at every procedure call, I ask myself “Could this be the source of the error?” If it is, I step into it. If it isn’t, I step over it. But when I do, I always look at the EAX register. That’s where the C calling convention leaves the return value of the function (AL or AX if it’s a function that returns a bool or a word). So if I’m debugging CoCreateInstance, then if I see it call “CoCreateInstanceEx”, I’ll step into it – it’s likely that CoCreateInstance is just a wrapper around CoCreateInstanceEx and CoCreateInstanceEx is going to be the real routine that returns the failure.
The next thing is to iterate over the failure. At some point you’ll step over the real cause of the failure. When this happens, restart the app and retry the failure case. And this time, step into the function instead of stepping over the function. Keep an eye out as to what is going on. Every function that fails should trigger a quiet nagging doubt. Please keep in mind though, it’s entirely possible that the failure is expected – you need to use critical thinking when evaluating the failure. For example, if the function calls RegOpenKeyEx and the RegOpenKey fails, then check the registry key in question – see if the failure is supposed to happen or not. Maybe the registry key they’re opening is an optional key.
The other thing to keep in mind is that this stuff takes practice. I’ve debugged through COM activation enough times that I know where to put the breakpoints right away. That’s wasn’t always the case, I’ve spent enough time looking at problems that I’ve pretty much learned my way around the code by trial and a great deal of error.
Of course all the discussion above assumes that the problem you’re debugging is simple and easily reproducible. This is true for the vast number of problems I’ve debugged over the years, but every once in a while you run into one that takes hours of work to reproduce. Those are harder to deal with, especially when the crash appears to be in someone else’s code. If it takes a long time to reproduce the problem then you need to be very careful when stepping through the code. It can be quite frustrating, I know.
Oh, and always take every debugging session as an opportunity to learn something new. For instance, as I just mentioned above, CoCreateInstance is a wrapper around CoCreateInstanceEx. Well, if you application uses both CoCreateInstance and CoCreateInstanceEx, then you can speed up you application’s load time slightly by removing all the calls to CoCreateInstance in your application and replace them with calls to CoCreateInstanceEx by removing one routine that needs to be loaded into your DLL.
Barry Dorrans made a comment on Monday’s blog post that reminded me of the old IBM PC technical reference manual.
In my opinion, this document is the only reason that we’re all using wintel computers these days (as opposed to Apple MacIntoshs or Commadore Amiga’s).
You see, when IBM first developed the IBM PC, they entrusted the project to a visionary named Don Estridge. Don’s vision was to produce a platform whose design was closed but whose architecture was totally open. When IBM first shipped the PC, they also made available a reference manual for the PC. This reference manual included EVERYTHING about the PC’s hardware. The pin-outs on the cards. The source code to the System ROMs. And most importantly, they even included the schematics of the original PC.
They continued this tradition throughout the original IBM PC line – for every major revision of the original PC line, there was a technical reference manual that accompanied the product. The XT, AT and network cards all got their own technical reference manuals.
This was an EXTRAORDINARY admission. For most of the other PC manufacturers, their schematics and ROM source code were tightly held secrets. They didn’t want people designing hardware for their platforms or messing with their system ROMs, because then 3rd parties could produce replacement parts for their PCs and undercut their hardware business. For instance, the original Mac didn’t even have an expansion ability – you could plug a keyboard, a mouse and a power cord into it and that was about it.
For whatever reason, Don Estridge decided that IBM should have a more open policy, and so he published EVERYTHING about the IBM PC. The ROM sources were copyrighted, but other than that, everything was fully documented – Everything, from the pin-outs and timing diagrams on the parallel interface, to the chip specifications of the various processors used on the motherboard. As a result, a thriving 3rd party hardware market ensued providing a diverse hardware platform far beyond what was available on other platforms. In addition, they licensed MS-DOS and published full documentation for it as well. When I was writing the BIOS for MS-DOS 4.0, I had a copy of the Intel components data catalog and a ream of chip spec sheets on my desk at all times so I could look up the detailed specifications for the system. I used the timing diagrams in the system to debug a bunch of problems with the printer drivers, for example (there was a bug in the printer hardware on the original IBM PC that prevented using the printer interrupt to allow interrupt driven printing – IIRC, the INTR line was raised before the “data ready” line was raised, which meant that the printer interrupt would be generated before the printer was actually ready to accept the next byte of data – they later fixed this on the PC/AT machines).
As a result, a confluence of documented hardware and software platforms existed which allowed software developers to take full advantage of the hardware platform, and the IBM PC platform grew and flourished. When IBM didn’t provide graphics support for their monochrome monitors, then an OEM, Hercules stepped up and provided it. When IBM/Microsoft didn’t provide spreadsheet support, then an ISV, Lotus stepped up and provided it.
But it was the synergy of open hardware and open software that made all the magic come together. None of the other PC manufacturers provided that level of openness at the time.
This openness wasn’t always to IBM’s advantage – it also allowed OEM’s like Compaq to clone the IBM hardware and produce their own interoperable IBM clone machines, but it did allow the platform to thrive and succeed.
In my honest opinion, THIS is the reason that the IBM PC architecture (ISA, later called Wintel) succeeded. It was because IBM and Microsoft let anyone produce products for their platform and NOT because of any marketing genius on IBM’s (or Microsoft’s) part.
A couple of months ago, I wrote about Sharron's competing at the Whidbey Equestrian Center's Spring Training show.
After the show, we convinced her to write up her experiences as a story for her 3rd grade class. The story she finally wrote was pretty good, so we submitted it to Flying Changes, which is the newsletter for Region 6 (Pacific Northwest) of the USDF.
Well, we just got word yesterday that her story has been accepted for the July edition! With luck it'll appear on their web site sometime in early July.
So my daughter is now the first member of the Osterman family to be a published author.
I cannot say how proud I am of her for this.
Ok, a bit of somewhat embarrassing, but kind-of cool history time.
Back in the early 1980’s, Microsoft got this rather grandiose idea of building a complete reference library of PC technology. The idea was to have a 5ish volume set covering topics like MS-DOS, CD-ROM’s (the CD-ROM was cutting edge technology at the time), Multimedia, etc. The collection of books was to be called the “Microsoft Reference Library”.
It was one of the first projects of the fledgling Microsoft Press division of Microsoft, a group that has done some remarkable projects over the years (many of the books I come back to again and again are Microsoft Press books).
The first volume in the Microsoft Encyclopedia was the MS-DOS® (Versions 1.0-3.2) Technical Reference Encyclopedia. As the introduction states: “This book was conceived as the ultimate resource for anyone writing applications for MS-DOS”. It included detailed API references, the official story of Microsoft and MS-DOS, details of all the MS-DOS command line utilities, etc. It’s a really remarkable book.
Unfortunately, however, there was a bit of a gotcha.
You see, the authors of the encyclopedia didn’t want to spend a huge amount of time pestering the MS-DOS development team while they were writing the book, especially since the MS-DOS team was heads down working on the various MS-DOS releases. So, in addition to the existing MS-DOS documentation (which was actually quite good), they got access to the source code to MS-DOS and proceeded to enhance the existing API documentation. One of the really cool things they did that they thought would be an incredible help was to write a flow chart of the internal data flow of every MS-DOS system call. Unfortunately, for some of the functions, they included a bunch of information that was Microsoft confidential. In addition, the encyclopedia included code examples that weren’t correct, and there were many inaccuracies in the text. Even though MS-Press had given copies to the MS-DOS team for review, somehow this wasn’t discovered until after the MS-DOS encyclopedia had gone to press. So Microsoft was forced to pull all the copies of the Encyclopedia from the shelves. As far as I know, no copies of the encyclopedia actually made it out of the warehouse. There were a couple of copies circulated around Microsoft internally, and somehow I managed to get a hold of one of the few remaining copies in existence, and it proudly lives on my bookshelf.
So if Microsoft pulled the MS-DOS Encyclopedia, why do people keep finding references to it on the web? Well, it turns out that the idea of an MS-DOS encyclopedia as a sort-of uber-reference book was a really good idea, even if the execution of the first version was flawed. So after the first version, MS Press went back to the drawing boards and totally re-created the MS-DOS encyclopedia. The new version had a very similar layout to the original, but was composed with the close cooperation of the MS-DOS team (I remember getting drafts every week for review). In addition, MS Press went out and got many of the top authors of MS-DOS books to provide the content. Looking at the authors biographies is a veritable who’s-who of programming authors at the time (Van Wolverton, Ray Duncan, Charles Petzold, Steve Bostwick, Chip Rabinowitz, etc). The new version is really a very nice piece of writing; I still go back to it from time to time.
It’s also fun to read the technical advisors section – it’s a fairly complete listing of all the MS-DOS developers that were still at Microsoft when it was writtenJ. At a minimum, to my knowledge, it’s the very first time my name appeared in an actual printed bookJ.
One of my early blog posts was one I entitled “Every software engineer should know roughly what assembly language their code generates“. It got a lot of commentary.
Well, I just ran into (via /.) the following article by Randall Hyde entitled: “Why learning assembly language is still a good idea“.
His article is much more in-depth and better written than my post, but essentially restates my premise. I'm always happy when other people post stuff that agrees with me :)
I’m having a really hard time coming up with something technical for today’s blog entry, and I have 3 or 4 others out for review (after my last experience with off-the-cuff posts, I’ve learned my lesson – if I’m not 100% sure about a ‘blog post, I have the real experts review it for accuracy).
I’m also really swamped at work wrestling with fxcop (a truly useful tool, but it’s an iterative process cleaning up my code). So I’ve not had the time to come up with something new for the day – I’ve tossed out several post ideas and others will take too much time to develop.
So it’s time for an office prank story instead.
As I mentioned earlier, Microsoft employees LOVE trashing each others offices. It’s sort-of a sport around here. Another of my favorites (which Valorie mentioned in the comments section of my previous post) was when some of the summer interns in the Lan Manager group decided to “shorten” the office of a developer on the server team. He worked in an inside office, so they removed all the furniture from the office, and built a false wall at the back of the office, shortening the office by six inches. This was a professional looking job – fully mudded and taped, they even relocated the outlets into the new wall. When they were done, they came in and repainted the entire office to match the original colors. And then, of course they put all the stuff back into his office.
They did all this work on a Friday night, so that on Monday morning when the developer came back in, he wouldn’t notice the smell.
He came in, and started working just like normal. And they sat there and waited for him to notice the changes. They kept on coming by his office all day. I’m sure he was totally confused as to why he was suddenly the most popular developer in the server group, but he ignored it.
Well, the end of the day came and went, and he left for home. And he came in the next day, started working and they once again waited for him to notice the changes.
And the end of that day came and went, and he left for home. At this point, the interns were starting to get a little panicky. Was he just unobservant? Or had he noticed and was he playing a trick back on them by NOT reacting to the shortened office?
It started driving them up the wall. They couldn’t spill the beans to him, since it sort-of would defeat the whole point of the joke. And he never did seem to notice it.
It wasn’t until six weeks later that the developer in question noticed that his office was smaller than it was before and started asking what happened…
The interns all breathed a sigh of relief at that, the cat was finally out of the bag.
And they went in and restored the original wall back to it’s initial location the next day J
Sweet! One more for the good guys!
PS: No snarky comments about Microsoft holes facilitating the break-in, please, there's absolutely no evidence either way about how the hackers broke into Valve.
Actually, it’s not.
First, a quick review. I know that lots of others have gone into this in great detail before (Dare and Eric in particular), but a bit of refreshing always helps.
The system calls a DLL’s DllMain entrypoint when a DLL is loaded into a process, when the DLL is unloaded from the process, when a thread is created and when a thread is destroyed. Four messages are used for this, DLL_PROCESS_ATTACH, DLL_PROCESS_DETACH, DLL_THREAD_ATTACH, and DLL_THREAD_DETACH respectively.
When your DllMain entrypoint receives a DLL_PROCESS_DETACH, there is an additional piece of information provided: The lpvReserved parameter to DllMain is NULL if the DLL is being unloaded because of a call to FreeLibrary, it’s non NULL if the DLL is being unloaded due to process termination.
Ok, so much for the review.
When NT unloads a process gracefully (due to a call to ExitProcess()), it calls all the DLL entrypoints in roughly the reverse order that the DLL_PROCESS_ATTACH entrypoints were called (there’s absolutely no guarantee of the order though). NT tries to issue the DLL_PROCESS_DETACH message to a DLL after its issued DLL_PROCESS_DETACH messages for all the DLL’s that depend on that DLL, but it doesn’t always happen (because of circular dependencies, etc).
So consider the case where you have a DLL that instantiates a COM object at some point during its lifetime. If that DLL keeps a reference to the COM object in a global variable, and doesn’t release the COM object until the DLL_PROCESS_DETACH, then the DLL that implements the COM object will be kept in memory during the lifetime of the COM object. Effectively the DLL implementing the COM object has become dependant on the DLL that holds the reference to the COM object. But the loader has no way of knowing about this dependency. All it knows is that the DLL’s are loaded into memory.
Now the process terminates gracefully. The loader calls the DllMain entrypoint on all the DLL’s in the process, specifying DLL_PROCESS_DETACH. It’s entirely possible (in fact highly likely in this case) that the DLL_PROCESS_DETACH message for the DLL implementing the COM object will be called BEFORE the DLL_PROCESS_DETACH message for the DLL that holds the reference to the COM object.
So this means that the DLL that implements the COM object will get the DLL_PROCESS_DETACH message, even though there are still live COM objects that use the code in the DLL!
We ran into this with some of our leak detection code, it was generating a false positive – it reported a leak in the DLL_PROCESS_DETACH code when in fact the objects were being referenced by another DLL.
When I brought this up on an internal alias, one of the people on the NT base team indicated “There’s basically no way you can do anything other than freeing memory” in the case where a DLL_PROCESS_DETACH message is called from process shutdown. You can do reliable processing on the FreeLibrary case, but not in the process termination case.
Ultimately, I believe that the real culprit here is the DLL that keeps the COM object reference alive. That DLL is violating the “It is not safe to call FreeLibrary from a DllMain routine” stricture, because
(a) There’s no way of knowing if CoInitialize has been called on the current thread – COM might not be initialized currently.
(b) It’s possible that the call to ComObject->Release() would cause FreeLibrary to be called,
A little known fact about me: My cousin, Jeff Pevar is actually a famous musician. Really. And he even has a sort-of blog (no RSS feed though)
He’s been touring for several years as the P in “CPR” (David Crosby (yes, the David Crosby), Jeff Pevar, and James Raymond (David’s son)), and apparently this summer Jeff and James are going to be touring as members of the new Crosby Stills and Nash concert tour this summer!
From his site:
"I'd been hoping this opportunity would come around one day. After working with David and Graham in all these various combinations for over 10 years now, to have a chance to see what the synergy will be also working with Stephen, is an amazing opportunity and admittedly, a dream come true for me. I learned so much about music and inspired guitar playing from all their records, to now have a chance to work with the 3 of them together is such an honor. I am up for the challenge and ready to bring to the table whatever I can to support their incredible art"- jp
Wow! I had heard that CSN was touring this summer, but I didn’t realize that Jeff was going to be with the band.
And they’ll be appearing in Woodinville at Chateau St. Michele on the 22nd of September!
In my previous post about OCA, the comments thread has a long discussion started by Shannon J Hager about Mozilla’s behavior when you attempt to access https://winqual.microsoft.com. If you attempt to access this web site using Firefox (or other Mozilla variants), you get the following dialog box:
Which is weird, because of course the web site works just fine in IE. No big deal, right – Microsoft’s well known for sleazing the rules for it’s own products, so obviously this is Microsoft’s fault – they probably did something like hard coding in trust to the Microsoft issuing CA. But I was kinda surprised at this, so I spent a bit of time checking it out...
The way that SSL certificate verification is supposed to work is that if the issuer of a certificate isn’t trusted, then the code validating the certificate is supposed to check the parent of the issuer to see if IT is trusted. If the parent of the issuer isn’t trusted, it’s supposed to check the grandparent of the issuer, and so forth until you find the root certificate authority (CA).
The issuing CA of the certificate on the winqual web site is the “Microsoft Secure Server Authority”, it’s not surprising Mozilla doesn’t trust that one. The parent of the issuing CA is the “Microsoft Internet Authority”, again, no surprise that Mozilla doesn’t trust it.
But the grandparent of the issuing CA is the “GTE CyberTrust Root”. This is a well known CA, and Mozilla should be trusting it. And what do you know, Mozilla DOES claim to trust that root CA:
Well, Cesar Eduardo Barros actually went and checked using openssl to see why the CA isn’t trusted. He tried:
$ openssl s_client -connect winqual.microsoft.com:443 -showcertsdepth=0 /C=US/ST=Washington/L=Redmond/O=WHDC (Old WHQL)/OU=Microsoft/CN=winqual.microsoft.comverify error:num=20:unable to get local issuer certificateverify return:1depth=0 /C=US/ST=Washington/L=Redmond/O=WHDC (Old WHQL)/OU=Microsoft/CN=winqual.microsoft.comverify error:num=27:certificate not trustedverify return:1depth=0 /C=US/ST=Washington/L=Redmond/O=WHDC (Old WHQL)/OU=Microsoft/CN=winqual.microsoft.comverify error:num=21:unable to verify the first certificateverify return:1CONNECTED(00000003)---Certificate chain0 s:/C=US/ST=Washington/L=Redmond/O=WHDC (Old WHQL)/OU=Microsoft/CN=winqual.microsoft.comi:/DC=com/DC=microsoft/DC=corp/DC=redmond/CN=Microsoft Secure Server Authority-----BEGIN CERTIFICATE-----[...]-----END CERTIFICATE--------Server certificatesubject=/C=US/ST=Washington/L=Redmond/O=WHDC (Old WHQL)/OU=Microsoft/CN=winqual.microsoft.comissuer=/DC=com/DC=microsoft/DC=corp/DC=redmond/CN=Microsoft Secure Server Authority---No client certificate CA names sent---SSL handshake has read 1444 bytes and written 324 bytes---New, TLSv1/SSLv3, Cipher is RC4-MD5Server public key is 1024 bitSSL-Session:Protocol : TLSv1Cipher : RC4-MD5Session-ID: [...]Session-ID-ctx:Master-Key: [...]Key-Arg : NoneStart Time: [...]Timeout : 300 (sec)Verify return code: 21 (unable to verify the first certificate)---DONE
Decoding the certificate it gave me above (openssl x509 -text) I get the same information Mozilla gives me and a bit more, but no copy of the issuer. The only suspicious thing in there is:Authority Information Access:CA Issuers - URI:http://www.microsoft.com/pki/mscorp/msssa1(1).crtCA Issuers - URI:http://corppki/aia/msssa1(1).crtGetting that URI gives me a blank HTML page with a 0.1 second redirect to itself. (The CRL one seems valid, however.)
So I was confused, why wasn’t openSSL able to verify the certificate? So I started asking the security PM’s here at Microsoft what was up. One of the things he told me was that Microsoft doesn’t hard code ANY intermediate certificates in our browser. Instead, our browser relies on the referral information in the certificate to chase down the CA hierarchy.
So why can’t Mozilla do the same thing? Is there something wrong with our certificates that’s preventing this from working? I kept on pestering and the PM’s kept on digging. Eventually I got email from someone indicating “IE is chasing 48.2 AIA”.
Well, this isn’t very helpful to me, so I asked the security PM in question to explain it in English. Apparently the root cause of the problem is that IE is following the Authority Information Access 48.2 OID (188.8.131.52.184.108.40.206.2) to find the parent of the certificate, while Mozilla isn’t.
Inside the Microsoft certificate is the following:
And if you go to http://www.microsoft.com/pki/mscorp/msssa1(1).crt you’ll find the parent CA for the certificate on the winqual web site. So now it’s off to figure out if the IE behavior is according to standard, or if it’s another case of Microsoft ignoring web standards in favor of proprietary extensions.
A few minutes of googling discovers that the AIA 48.2 field is also known as the id-ad-caIssuers OID. The authoritative reference for this OID is RFC2459 (the RFC that defines the x.509 certificate infrastructure). It describes this field as:
The id-ad-caIssuers OID is used when the additional information lists CAs that have issued certificates superior to the CA that issued the certificate containing this extension. The referenced CA Issuers description is intended to aid certificate users in the selection of a certification path that terminates at a point trusted by the certificate user.
In other words, IE is correctly chasing the AIA 48.2 references in the certificate to find the root issuing CA of the certificate. Since it didn’t have direct knowledge of the issuing CA, it correctly looked at the AIA 48.2 field of the certificate for the winqual web site and chased the AIA 48.2 references to the root CA. It appears that Mozilla (and OpenSSL and GnuSSL) apparently don’t follow this link, which is why they pop up the untrusted certificate dialog.
Issue solved. Now all someone has to do is to file bugs against Mozilla and OpenSSL to get them to fix their certificate validation logicJ.
Btw, I want to give HUGE kudo’s to Cesar Eduardo Barros for tirelessly trying to figure this out, and to Michael Howard and the lead program manager for NT security for helping me figure this out. If you look at the info from the certificate that Cesar posted above, he correctly caught the AIA 48.2 fields inside the CA, it was a huge step in the right direction, all that remained was to figure out what it really meant.
Edit: Fixed picture links.
Edit2: Fixed line wrapping of reference from RFC2459.
Continuing on the previous theme of cool Win32 APIs that many people ignore, this week’s entry is one of my favorites: DisableThreadLibraryCalls().
DisableThreadLibraryCalls was added in NT 3.5 as a part of the performance enhancements we added to the system. As we measured the system, it quickly became clear that one major contributor to the overall system working set was the number of pages that were occupied by the DllMain entrypoint in the various system DLL’s.
The reason for this was that a DLL’s DllMain entrypoint is called whenever a thread is created or destroyed in an application. This is critical for DLLs that maintain per-thread state like the C runtime library or Winsock. But for 99% of the DLL’s on the system, the routines simply ignore the DllMain DLL_THREAD_ATTACH and DLL_THREAD_DETACH messages. Since the system couldn’t determine if a DLL ignores the DLL_THREAD_XXX messages, it always called the DllMain entrypoint whenever a thread was created.
This caused the page that contained the DllMain entrypoint for the DLL to be brought into memory, which increased the application’s working set.
The NT loader guys added the DisableThreadLibraryCalls API to the system to fix this problem. When an application calls this routine, it lets the system know that the module specified in it’s parameter doesn’t care about DLL_THREAD_XXX messages, and thus the loader won’t call into the DLL on thread creation.
This API is so useful that ATL’s CAtlDllModule.DllMain() method always calls DisableThreadLibraryCalls(). If your DLL doesn’t rely on thread creation/destruction messages, then it should too.
When you’re writing a windows application, there are often times that you need to signal your UI thread that an event has occurred. One of the ways to do this of course is to post a message to a window on the UI thread.
But sometimes using a window message isn’t convenient. For example, if you’re writing to a file asynchronously, you could create an event, stick it in an LPOVERLAPPED structure, and call WriteFile asynchronously. When the write completes, the event’s set to the signaled state.
In this case, it’s often convenient to put a call to MsgWaitForMultipleObjects in your message loop – this routine will block until either a message is posted to the thread (qualified by the dwWakeMask parameter to MsgWaitForSingleObjects), or until the events are signaled.
The thing that makes this API “tricky” is the bWaitAll parameter. The bWaitAll parameter is documented the same as the bWaitAll parameter for WaitForMultipleObjects’ bWaitAll parameter:
[in] If this parameter is TRUE, the function returns when the states of all objects in the pHandles array have been set to signaled and an input event has been received. If this parameter is FALSE, the function returns when the state of any one of the objects is set to signaled or an input event has been received. In this case, the return value indicates the object whose state caused the function to return.
But this isn’t quite the case. For example, if you were only waiting on a single event (as in the example above), you might be tempted to set bWaitAll to TRUE. After all, since there’s only one object being waited on, all of the objects will be signaled when the one event is set to the signaled state, right?
Actually no. In this case, MsgWaitForMultipleObjects will block until both the event is set to the signaled state, AND a message that meets the dwWaitMask criteria is posted to the thread. The reason for this is simple (but subtle). A hint as to the reason can be found in the documentation for the nCount parameter:
[in] Number of object handles in the array pointed to by pHandles. The maximum number of object handles is MAXIMUM_WAIT_OBJECTS minus one.
Why on earth is it MAXIMUM_WAIT_OBJECTS minus one? Well, it’s because under the covers, the MsgWaitForSingleObjects takes your wait array, and adds an additional handle that’s associated with the thread. It then forwards the call to WaitForMultipleObjectsEx. This additional handle is set to the signaled state when a message that meets the dwMaskCriteria is queued to the threads message queue. Since the bWaitAll parameter is simply passed onto WaitForMultipleObjectsEx, all of the events have to be set to the signaled state, including the event handle that was added to the wait list.
So the bottom line is: you need to be very, very careful when calling MsgWaitForMultipleObjects – if you specify bWaitAll as true, you may find that your application doesn’t wake up when you expected it to.
I normally don’t do “me too” posts, since I figure that most of the people reading my blog are also looking at the main weblogs.asp.net/blogs.msdn.com feed, but I felt obliged to chime in on this one.
A lot of people on weblogs.msdn.com have been posting this, but I figured I’d toss in my own version.
When you get an “your application has crashed, do you want to let Microsoft know about it?” dialog, then yes, please send the crash report in. We’ve learned a huge amount of where we need to improve our systems from these reports. I know of at least three different bug fixes that I’ve made in the audio area that directly came from OCA (online crash analysis) reports. Even if the bugs are in drivers that we didn’t write (Jerry Pisk commented about creative lab’s drivers here for example), we still pass the info on to the driver authors.
In addition, we do data mining to see if there are common mistakes made by different driver authors and we use these to improve the driver verifier – if a couple of driver authors make the same mistake, then it makes sense for us to add tests to ensure that the problems get fixed on the next go-round.
And we do let 3rd party vendors review their data. There was a chat about this in August of 2002 where Greg Nichols and Alther Haleem discussed how it’s done. The short answer is you go here and follow the instructions. You have to have a Verisign Class 3 code-signing ID to do participate though.
Bottom line: Participate in WER/OCA – Windows gets orders of magnitude more stable because of it. As Steve Ballmer said:
About 20 percent of the bugs cause 80 percent of all errors, and — this is stunning to me — one percent of bugs cause half of all errors.
Knowing where the bugs are in real-world situations allows us to catch the high visibility bugs that plague our users that we’d otherwise have no way of discovering.
I recently ran into this post from Alex Papadimoulis’s “Daily WTF”, and it reminded me of one company’s response to mandatory source disclosure (no, this isn’t really another open source discussion, really – I’ve learned my lesson J).
This company (which will remain nameless) licensed the sources to its code to Microsoft for integration in a Microsoft product (no, I’m not going to name names).
As a matter of fact, giving away the source code was one of the selling points of their product. They licensed the source code to any and everyone who bought the product. This was important because some of their customers were government agencies with source code availability requirements. It also allowed for their code to run on lots of different platforms, all you needed was a compiler (and of course the work to adopt the program to your platform, which they were more than happy to provide).
But of course, if you’re giving away the source code to your product, how do you prevent the people who have your source code from using it? How do you continue to make money off the product once your customers have the source code? What’s to prevent them from making the bug fixes for you? Why should they continue to pay you lucrative contracting fees so that you’ll continue to get revenue from the product? And more importantly, how do they prevent their customers from making an incompatible (or incorrect) change to their server? If your customers have the source, you lose the ability to ensure quality of fixes. This latter issue is a very real issue btw. I see this all the time on the IETF IMAP mailing list. About once a semester or so, someone posts a “fix” for the U.W. IMAP server, and Mark Crispin immediately jumps on the fix explaining how the guy got it wrong. So it’s important that you make sure that your customers, who have the source code to your product, only make the fixes that you authorize.
Well, this company hit on what I think is a novel solution to the problem. Since their code had to be platform independent, they already had a restriction that none of their identifiers could be more than 6 characters in length (to work around limitations in the linkers on some of their supported platforms). So they took this one step further and purposely obfuscated their entire source code.
Every single function name in the source code took up exactly 6 letters. So did all the structures and local variables. And they stripped most of the comments out of the code. They had a book (on paper) that translated the obfuscated names to their functions to the human readable names, and their support guys (and internal development) all had copies of the book.
The customers weren’t allowed to have the book, only employees of the company got the book.
So the customers couldn’t really figure out what was going on inside the source code, the only thing they could do was to call support and have them look at the code.
A clever solution to the problem, if a bit difficult for the customer J
Oh, and before you ask, no, this is NOT what Microsoft does when it licenses the source to someone. If you license the source to a Microsoft product, as far as I know, you get the real source.
“Derek” posted a comment to my previous post about validating inputs to functions that’s worth commenting on.
IMHO, the user shouldn't be able to crash the app. The app should verify all information from any untrustworthy source (user input, file, network, etc.) and provide feedback when the data is corrupt. This makes it a "firewall" of sorts. The app itself is trustworthy, or at least it should be once it's debugged. The application is better equipped to deal with errors than API's, because (1) it knows where the data comes from, and (2) it has a feedback mechanism.
He’s absolutely right.
He’s more than right. This is (IMHO) the key to most of the security issues that plague the net today. People don’t always validate their input. It doesn’t matter where your input comes from, if you don’t validate it, it WILL bite you in the rear. Just about every form of security bug out there today is caused by one form or another of not validating input – SQL injection issues are caused by people not validating items typed into forms by users, many buffer overflows are often (usually?) caused by people passing inputs into constant sized buffers without checks.
This applies to ALL the user’s input. It applies if you’re reading a source file from disk. It applies when you’re reading data from a network socket. It applies when you’re processing a parameter to an RPC function. It applies when you’re processing a URL in your web server.
What’s fascinating is how many people don’t do that. For Lan Manager 1.0 and 2.0, validation of incoming packets was only done on our internal debug releases, for example. Now this was 15 years ago, and Lan Manager’s target machines (20 megahertz 386 boxes) didn’t have the horsepower to do much validation, so there’s a lot of justification for this. Back in those days, the network services that validated their inputs were few and far between – it doesn’t justify the practice but… There was a huge amount of internal debate when we started working on NT (again, targeted at 33MHz 386 machines). Chuck Lenzmeier correctly insisted that the NT server had to validate EVERY incoming SMB. The Lan Manager guys pushed back saying that it was unnecessary (remember – Lan Manager comes from the days where robustness was an optional feature in systems). But Chuck stood his ground and that the input validation had to remain. And it’s still there. We’ve tightened up the checks on every release since then, adding features like encryption and signing to the CIFS protocol to even further reduce the ability to tamper with the incoming data.
Now the big caveat: If (and only if) you’re an API, then some kinds of validation can be harmful – see the post on validating parameters for more details. To summarize, check your inputs, obsessively, but don’t ever use IsBadXxxPtr to ensure that the memory’s invalid – just let the user’s app crash if they give you garbage.
If you’re a system service, you don’t have that luxury. You can’t crash, under any circumstances. On the other hand, if you’re a system service, then the memory associated with your inputs isn’t handed to you like it is on an API. This means you have no reason to ever call IsBadXxxPtr – typically you’ve read the data you’re validating from somewhere, and the thing that did the reading gave you an authoritative length of the amount of data received. I’m being vague here because there are so many ways a service can get data – for instance, it could be read from a file with ReadFile, it could be read from a socket with recv, it could come from SQL server (I don’t know how SQL results come in J, but I’m willing to bet that the length of the response data’s included), it could come from RPC/COM, it could come from some a named pipe, etc.
Rule #1: Always validate your input. If you don’t, you’ll see your name up in lights on bugtraq some day.