I’ve been mulling writing this one for a while, and I ran into the comment below the other day which inspired me to go further, so here goes.
Back in May, Jim Gosling was interviewed by Asia Computer Weekly. In the interview, he commented:
One of the biggest problems in the Linux world is there is no such thing as Linux. There are like 300 different releases of Linux out there. They are all close but they are not the same. In particular, they are not close enough that if you are a software developer, you can develop one that can run on the others.
He’s completely right, IMHO. Just like the IBM PC’s documented architecture meant that people could create PC’s that were perfect hardware clones of IBM’s PCs (thus ensuring that the hardware was the same across PCs), Microsoft’s platform stability means that you could write for one platform and trust that it works on every machine running on that platform.
There are huge numbers of people who’ve forgotten what the early days of the computer industry were like. When I started working, most software was custom, or was tied to a piece of hardware. My mother worked as the executive director for the American Association of Physicists in Medicine. When she started working there (in the early 1980’s), most of the word processing was done on old Wang word processors. These were dedicated machines that did one thing – they ran a custom word processing application that Wang wrote to go with the machine. If you wanted to computerize the records of your business, you had two choices: You could buy a minicomputer and pay a programmer several thousand dollars to come up with a solution that exactly met your business needs. Or you could buy a pre-packaged solution for that minicomputer. That solution would also cost several thousand dollars, but it wouldn’t necessarily meet your needs.
A large portion of the reason that these solutions were so expensive is that the hardware cost was so high. The general purpose computers that were available cost tens or hundreds of thousands of dollars and required expensive facilities to manage. So there weren’t many of them, which means that companies like Unilogic (makers of the Scribe word processing software, written by Brian Reid) charged hundreds of thousands of dollars for installations and tightly managed their code – you bought a license for the software that lasted only a year or so, after which you had to renew it (it was particularly ugly when Scribe’s license ran out (it happened at CMU once by accident) – the program would delete itself off the hard disk).
PC’s started coming out in the late 1970’s, but there weren’t that many commercial software packages available for them. One problems developers encountered was that the machines had limited resources, but beyond that, software developers had to write for a specific platform – the hardware was different for all of these machines, as was the operating system and introducing a new platform linearly increases the amount of testing required. If it takes two testers to test for one platform, it’ll take four testers to test two platforms, six testers to test three platforms, etc (this isn’t totally accurate, there are economies of scale, but in general the principal applies – the more platforms you support, the higher your test resources required).
There WERE successful business solutions for the early PCs, Visicalc first came out for the Apple ][, for example. But they were few and far between, and were limited to a single hardware platform (again, because the test and development costs of writing to multiple platforms are prohibitive).
Then the IBM PC came out, with a documented hardware design (it wasn’t really open like “open source”, since only IBM contributed to the design process, but it was fully documented). And with the IBM PC came a standard OS platform, MS-DOS (actually IBM offered three or four different operating systems, including CP/M and the UCSD P-system but MS-DOS was the one that took off). In fact, Visicalc was one of the first applications ported to MS-DOS btw, it was ported to DOS 2.0. But it wasn’t until 1983ish, with the introduction of Lotus 1-2-3, that PC was seen as a business tool and people flocked to it.
But the platform still wasn’t completely stable. The problem was that while MS-DOS did a great job of virtualizing the system storage (with the FAT filesystem) keyboard and memory, it did a lousy job of providing access to the screen and printers. The only built-in support for the screen was a simple teletype-like console output mechanism. The only way to get color output or the ability to position text on the screen was to load a replacement console driver, ANSI.SYS.
Obviously, most ISVs (like Lotus) weren’t willing to deal with this performance issue, so they started writing directly to the video hardware. On the original IBM PC, that wasn’t that big a deal – there were two choices, CGA or MDA (Color Graphics Adapter and Monochrome Display Adapter). Two choices, two code paths to test. So the test cost was manageable for most ISVs. Of course, the hardware world didn’t stay still. Hercules came out with their graphics adapter for the IBM monochrome monitor. Now we have three paths. Then IBM came out with the EGA and VGA. Now we have FIVE paths to test. Most of these were compatible with the basic CGA/MDA, but not all, and they all had different ways of providing their enhancements. Some had some “unique” hardware features, like the write-only hardware registers on the EGA.
At the same time as these display adapter improvements were coming, disks were also improving – first 5 ¼ inch floppies, then 10M hard disks, then 20M hard disks, then 30M. And system memory increased from 16K to 32K to 64K to 256K to 640K. Throughout all of it, the MS-DOS filesystem and memory interfaces continued to provide a consistent API to code to. So developers continued to write to the MS-DOS filesystem APIs and grumbled about the costs of testing the various video combinations.
But even so, vendors flocked to MS-DOS. The combination of a consistent hardware platform and a consistent software interface to that platform was an unbelievably attractive combination. At the time, the major competition to MS-DOS was Unix and the various DR-DOS variants, but none of them provided the same level of consistency. If you wanted to program to Unix, you had to chose between Solaris, 4.2BSD, AIX, IRIX, or any of the other variants. Each of which was a totally different platform. Solaris’ signals behaved subtly differently from AIX, etc. Even though the platforms were ostensibly the same, they were enough subtle differences so that you either wrote for only one platform, or you took on the burden of running the complete test matrix on EVERY version of the platform you supported. If you ever look at the source code to an application written for *nix, you can see this quite clearly – there are literally dozens of conditional compilation options for the various platforms.
On MS-DOS, on the other hand, if your app worked on an IBM PC, your app worked on a Compaq. Because of the effort put forward to ensure upwards compatibility of applications, if your application ran on DOS 2.0, it ran on DOS 3.0 (modulo some minor issues related to FCB I/O). Because the platforms were almost identical, your app would continue to run. This commitment to platform stability has continued to this day – Visicalc from DOS 2.0 still runs on Windows XP.
This meant that you could target the entire ecosystem of IBM PC compatible hardware with a single test pass, which significantly reduced your costs. You still had to deal with the video and printer issue however.
Now along came Windows 1.0. It virtualized the video and printing interfaces providing, for the first time, a consistent view of ALL the hardware on the computer, not just disk and memory. Now apps could write to one API interface and not worry about the underlying hardware. Windows took care of all the nasty bits of dealing with the various vagaries of hardware. This meant that you had an even more stable platform to test against than you had before. Again, this is a huge improvement for ISV’s developing software – they no longer had to wonder about the video or printing subsystem’s inconsistencies.
Windows still wasn’t an attractive platform to build on, since it had the same memory constraints as DOS had. Windows 3.0 fixed that, allowing for a consistent API that finally relieved the 640K memory barrier.
Fast forward to 1993 – NT 3.1 comes out providing the Win32 API set. Once again, you have a consistent set of APIs that abstracts the hardware and provides a constant API set. Win9x, when it came out continued the tradition. Again, the API is consistent. Apps written to Win32g (the subset of Win32 intended for Win 3.1) still run on Windows XP without modification. One set of development costs, one set of test costs. The platform is stable. With the Unix derivatives, you still had to either target a single platform or bear the costs of testing against all the different variants.
In 1995, Sun announced its new Java technology would be introduced to the world. Its biggest promise was that it would, like Windows, deliver platform independent stability. In addition, it promised cross-operating system stability. If you wrote to Java, you’d be guaranteed that your app would run on every JVM in the world. In other words, it would finally provide application authors the same level of platform stability that Windows provided, and it would go Windows one better by providing the same level of stability across multiple hardware and operating system platforms.
In Jim Gosling post, he’s just expressing his frustration with fact that Linux isn’t a completely stable platform. Since Java is supposed to provide a totally stable platform for application development, just like Windows needs to smooth out differences between the hardware on the PC, Java needs to smooth out the differences between operating systems.
The problem is that Linux platforms AREN’T totally stable. The problem is that while the kernel might be the same on all distributions (and it’s not, since different distributions use different versions of the kernel), the other applications that make up the distribution might not. Java needs to be able to smooth out ALL the differences in the platform, since its bread and butter is providing a stable platform. If some Java facilities require things outside the basic kernel, then they’ve got to deal with all the vagaries of the different versions of the external components. As Jim commented, “They are all close, but not the same.” These differences aren’t that big a deal for someone writing an open source application, since the open source methodology fights against packaged software development. Think about it: How many non open-source software products can you name that are written for open source operating systems? What distributions do they support? Does Oracle support other Linux distributions other than Red Hat Enterprise? The reason that there are so few is that the cost of development for the various “Linux” derivatives is close to prohibitive for most shrink-wrapped software vendors; instead they pick a single distribution and use that (thus guaranteeing a stable platform).
For open source applications, the cost of testing and support is pushed from the developer of the package to the end-user. It’s no longer the responsibility of the author of the software to guarantee that their software works on a given customer’s machine, since the customer has the source, they can fix the problem themselves.
In my honest opinion, platform stability is the single thing that Microsoft’s monoculture has brought to the PC industry. Sure, there’s a monoculture, but that means that developers only have to write to a single API. They only have to test on a single platform. The code that works on a Dell works on a Compaq, works on a Sue’s Hardware Special. If an application runs on Windows NT 3.1, it’ll continue to run on Windows XP.
And as a result of the total stability of the platform, a vendor like Lotus can write a shrink-wrapped application like Lotus 1-2-3 and sell it to hundreds of millions of users and be able to guarantee that their application will run the same on every single customer’s machine.
What this does is to allow Lotus to reduce the price of their software product. Instead of a software product costing tens of thousands of dollars, software products costs have fallen to the point where you can buy a fully featured word processor for under $130.
Without this platform stability, the testing and development costs go through the roof, and software costs escalate enormously.
When I started working in the industry, there was no volume market for fully featured shrink wrapped software, which meant that it wasn’t possible to amortize the costs of development over millions of units sold.
The existence of a stable platform has allowed the industry to grow and flourish. Without a stable platform, development and test costs would rise and those costs would be passed onto the customer.
Having a software monoculture is NOT necessarily an evil.
At the end of this blog entry, I mentioned that when I drop a new version of winmm.dll on my machine, I need to reboot it. Cesar Eduardo Barros asked:
Why do you have to reboot? Can't you just reopen the application that's using the dll, or restart the service that's using it?
It turns out that in my case, it’s because winmm’s listed in the “Known DLLs” for Longhorn. And Windows treats “KnownDLLs” as special – if a DLL is a “KnownDLL” then it’s assumed to be used by lots of processes, and it’s not reloaded from the disk when a new process is created – instead the pages from the existing DLL is just remapped into the current process.
But that and a discussion on an internal alias got me to thinking about DLL’s in general. This also came up during my previous discussion about the DLL C runtime library.
At some point in the life of a system, you decide that you’ve got a bunch of code that’s being used in common between the various programs that make up the system.
Maybe that code’s only used in a single app – one app, 50 instances.
Maybe that code’s used in 50 different apps – 50 apps, one instance.
In the first case, it really doesn’t matter if you refactor the code into a separate library or not. You’ll get code sharing regardless.
In the second case, however, you have two choices – refactor the code into a library, or refactor the code into a DLL.
If you refactor the code into a library, then you’ll save in complexity because the code will be used in common. But you WON’T gain any savings in memory – each application will have its own set of pages dedicated to the contents of the shared library.
If, on the other hand you decide to refactor the library into its own DLL, then you will still save in complexity, and you get the added benefit that the working set of ALL 50 applications is reduced – the pages occupied by the code in the DLL are shared between all 50 instances.
You see, NT's pretty smart about DLL's (this isn’t unique to NT btw; most other operating systems that implement shared libraries do something similar). When the loader maps a DLL into memory, it opens the file, and tries to map that file into memory at its preferred base address. When this happens, memory management just says “The memory from this virtual address to this other virtual address should come from this DLL file”, and as the pages are touched, the normal paging logic brings them into memory.
If they are, it doesn't go to disk to get the pages; it just remaps the pages from the existing file into the new process. It can do this because the relocation fixups have already been fixed up (the relocation fixup table is basically a table within the executable that contains the address of every absolute jump in the code for the executable – when an executable is loaded in memory, the loader patches up these addresses to reflect the actual base address of the executable), so absolute jumps will work in the new process just like they would in the old. The pages are backed with the file containing the DLL - if the page containing the code for the DLL’s ever discarded from memory, it will simply go back to the DLL file to reload the code pages.
If the preferred address range for the DLL isn’t available, then the loader has to do more work. First, it maps the pages from the DLL into the process at a free location in the address space. It then marks all the pages as Copy-On-Write so it can perform the fixups without messing the pristine copy of the DLL (it wouldn’t be allowed to write to the pristine copy of the DLL anyway). It then proceeds to apply all the fixups to the DLL, which causes a private copy of the pages containing fixups to be created and thus there can be no sharing of the pages which contain fixups.
This causes the overall memory consumption of the system goes up. What’s worse, the fixups are performed every time that the DLL is loaded at an address other than the preferred address, which slows down process launch time.
One way of looking at it is to consider the following example. I have a DLL. It’s a small DLL; it’s only got three pages in it. Page 1 is data for the DLL, page 2 contains resource strings for the DLL, and page 3 contains the code for the DLL. Btw, DLL’s this small are, in general, a bad idea. I was recently enlightened by some of the office guys as to exactly how bad this is, at some point I’ll write about it (assuming that Raymond or Eric don’t beat me too it).
The DLL’s preferred base address is at 0x40000 in memory. It’s used in two different applications. Both applications are based starting at 0x10000 in memory, the first one uses 0x20000 bytes of address space for its image, the second one uses 0x40000 bytes for its image.
When the first application launches, the loader opens the DLL, maps it into its preferred address. It can do it because the first app uses between 0x10000 and 0x30000 for its image. The pages are marked according to the protections in the image – page 1 is marked copy-on-write (since it’s read/write data), page 2 is marked read-only (since it’s a resource-only page) and page 3 is marked read+execute (since it’s code). When the app runs, as it executes code in the 3rd page of the DLL, the pages are mapped into memory. The instant that the DLL writes to its data segment, the first page of the DLL is forked – a private copy is made in memory and the modifications are made to that copy.
If a second instance of the first application runs (or another application runs that also can map the DLL at 0x40000), then once again the loader maps the DLL into its preferred address. And again, when the code in the DLL is executed, the code page is loaded into memory. And again, the page doesn’t have to be fixed up, so memory management simply uses the physical memory that contains the page that’s already in memory (from the first instance) into the new application’s address space. When the DLL writes to its data segment, a private copy is made of the data segment.
So we now have two instances of the first application running on the system. The space used for the DLL is consuming 4 pages (roughly, there’s overhead I’m not counting). Two of the pages are the code and resource pages. The other two are two copies of the data page, one for each instance.
Now let’s see what happens when the second application (the one that uses 0x40000 bytes for its image). The loader can’t map the DLL to its preferred address (since the second application occupies from 0x10000 to 0x50000). So the loader maps the DLL into memory at (say) 0x50000. Just like the first time, it marks the pages for the DLL according to the protections in the image, with one huge difference: Since the code pages need to be relocated, they’re ALSO marked copy-on-write. And then, because it knows that it wasn’t able to map the DLL into its preferred address, the loader patches all the relocation fixups. These cause the page that contains the code to be written to, and so memory management creates a private copy of the page. After the fixups are done, the loader restores the page protection to the value marked in the image. Now the code starts executing in the DLL. Since it’s been mapped into memory already (when the relocation fixups were done), the code is simply executed. And again, when the DLL touches the data page, a new copy is created for the data page.
Once again, we start a second instance of the second application. Now the DLL’s using 5 pages of memory – there are two copies of the code page, one for the resource page, and two copies of the data page. All of which are consuming system resources.
One think to keep in mind is that the physical memory page that backs resource page in the DLL is going to be kept in common among all the instances, since there are no relocations to the page, and the page contains no writable data - thus the page is never modified.
Now imagine what happens when we have 50 copies of the first application running. There are 52 pages in memory consumed by the DLL – 50 pages for the DLL’s data, one for the code, and one for the resources.
And now, consider what happens if we have 50 copies of the second application running, Now, we get 101 pages in memory, just from this DLL! We’ve got 50 pages for the DLL’s data, 50 pages for the relocated code, and still the one remaining for the resources. Twice the memory consumption, just because the DLL was wasn’t rebased properly.
This increase in physical memory isn’t usually a big deal when it’s happens only once. If, on the other hand, it happens a lot, and you don’t have the physical RAM to accommodate this, then you’re likely to start to page. And that can result in “significantly reduced performance” (see this entry for details of what can happen if you page on a server).
This is why it's so important to rebase your DLL's - it guarantees that the pages in your DLL will be shared across processes. This reduces the time needed to load your process, and means your process working set is smaller. For NT, there’s an additional advantage – we can tightly pack the system DLL’s together when we create the system. This means that the system consumes significantly less of the applications address space. And on a 32 bit processor, application address space is a precious commodity (I never thought I’d ever write that an address space that spans 2 gigabytes would be considered a limited resource, but...).
This isn’t just restricted to NT by the way. Exchange has a script that’s run on every build that knows what DLLs are used in what processes, and it rebases the Exchange DLL’s so that they fit into unused slots regardless of the process in which the DLL is used. I’m willing to bet that SQL server has something similar.
Credits: Thanks to Landy, Rick, and Mike for reviewing this for technical accuracy (and hammering the details through my thick skull). I owe you guys big time.
Many years ago, Valorie and I gave money to a gun-control initiative in here in Washington State (a friend and former boss was heavily involved in the campaign). It went down in flames, but not before I got put on the Democratic Party's mailing list as someone who's a likely donor (although I don't understand entirely why donating to a gun control initiative necessarily marks me as a Democrat).
Well, about 6 months later, at dinner time, the phone rings. For whatever reason, I answered it.
It was Jay Inslee, our congressman, asking for money. I like Jay, I think he's done pretty good job in Washington (I also liked his Republican predecessor, but who's counting).
Before he got very far into his pitch, I cut him off (yeah, I'm rude like that - cutting off a U.S. Congressman, but w.t.h. he interrupted my dinner).
“Jay, if you ever call my house again, I'm immediately going to make a donation to your opponent.”
He's never called my house again, neither has any other candidate.
Someone asked on an internal mailing list why the documentation of security impersonation levels has the following quote:
When the named pipe, RPC, or DDE connection is remote, the flags passed to CreateFile to set the impersonation level are ignored. In this case, the impersonation level of the client is determined by the impersonation levels enabled by the server, which is set by a flag on the server's account in the directory service. For example, if the server is enabled for delegation, the client's impersonation level will also be set to delegation even if the flags passed to CreateFile specify the identification impersonation level.
The reason’s actually fairly simple: The CIFS/SMB protocol doesn’t have the ability to track the user’s identity dynamically (this is called Dynamic Quality of Service or Dynamic QOS). As a result, the identity of the user performing an operation on a networked named pipe is set when the pipe is created, and is essentially fixed for the lifetime of the pipe.
If the application impersonates another user’s token after opening the pipe, the impersonation is ignored (because there’s no way of informing the server that the user’s identity has changed).
Of course if you’re impersonating another user when you call CreateFile call, then that user’s identity will be used when opening the remote named pipe, so you still have some ability to impersonate other users, it’s just not as flexible as it could be.
I don’t normally do “me too” posts, but Robert Scoble posted this link to some truly amazing pictures of SpaceShipOne’s first flight and I wanted to share J
In yesterday’s post, there was one huge, glaring issue: It completely ignored internationalization (or i18n in “internet-lingo”).
The first problem occurs in the very first line of the routine:
if (string1.Length != string2.Length)
The problem is that two strings can be equal, even if their lengths aren’t equal! My favorite example of this is the German sharp-s character, which is used in the German word straße. The length of straße is 6 characters, and the length of the upper case form of the word (STRASSE) is 7 characters. These strings are considered as equal in German, even though the lengths don’t match.
The next problem occurs 2 lines later:
string upperString1 = string1.ToUpper();
string upperString2 = string2.ToUpper();
The call to ToUpper() doesn’t specify a culture. Without a specific culture, String.ToUpper uses the current culture, which may or may not be the culture that created the string. This can create some truly unexpected results, like the way that Turkish treats the letter “I” as Tim Sneath pointed out in this article.
Even if you call String.ToUpper() BEFORE making the length check, it’s still not correct. If you call String.ToUpper() on an sharp-s character, you get a sharp-s character. So the lengths of the string don’t change when you upper case them.
The good news is that the .Net framework specifies a culture neutral language, System.Globalization.CultureInfo.InvariantCulture. The invariant culture is similar to the English language, but it’s not associated with any region, so the rules are consistent regardless of the current UI culture. This allows you to avoid the security bug that Tim pointed out in his article, and allows you to have deterministic results. This is particularly important if you’re comparing strings against constants (like if you’re checking arguments to a function). If you call String.Compare(userString, “file:”, true) without specifying the locale, then as Tim pointed out, if the userString contains one of the Turkish ‘I’ characters, you won’t match correctly. If you use the invariant culture you will.
As an interesting piece of trivia that I learned while writing the test app for this ‘blog entry, if you use System.String.Compare(“straße”, “STRASSE”, true, System.Globalization.CultureInfo.InvariantCulture), it returns TRUE J.
Btw, in case it wasn’t obvious, the exact same issue exists in unmanaged code (I chose managed code for the example because it was simpler). If you use lstrcmpi to compare two strings, you’ll have exactly the same internationalization problem. The lstrcmpi routine will compare strings in the system locale, which is not likely to be the locale you want. To get the invariant locale, you want to use CompareStringW specifying the LOCALE_INVARIANT locale and NORM_IGNORECASE. Please note that LOCALE_INVARIANT isn’t described on the CompareString API page, it’s documented in the table of language identifiers or in the MAKESORTLCID macro documentation.
The bottom line: Internationalization is HARD. You can follow some basic rules to make yourself safe, but…
Now for kudos:
Francois’ answer was the most complete of the responses in the forum, he correctly pointed out that my routine compares only codepoints (the individual characters) and not the string. He also picked up on the culture issue.
Mike Dunn’s answer was the first to pick up on the first error, he correctly realized that you can’t do the early test to compare lengths.
Carlos pointed out the issue of characters that form ligatures, like the Unicode Combining Diacritical characters (props to charmap for giving me the big words that people use to describe things). The sharp-s character is another example. In English, the ligatures used for fi, fl, ff, and ffi are also another example of combining characters (Unicode doesn’t appear to have defined these ligatures btw; they appear in the private use area of some system fonts however).
Anon pointed out the Turkish i/I issue.
Sebastien Lambla pointed out the issue of invariant cultures explicitly.
Several readers pointed out that I don’t check parameters for null. That was intentional (ok, not really, but I don’t think it’s wrong). In my mind, passing null strings from the caller is an error on the part of the caller, and the only reasonable answer is for the routine to throw an exception. Checking the parameters for null allows me to throw a different exception. It’s also that the routine could be defined that a non null string compared with a null string was inequal, but that’s not what I chose (this is fundamentally stricmp()).
Also, some people believed that the string comparison routine should ignore whitespace, I’m not sure where that came from J.
Edit: Fixed stupid typo in the length of german strings. I can't count :)
Edit2: tis->this. It's not a good day :)