# April, 2008

• #### Book me the next flight to Tokyo, no wait, the second flight

I know someone who has a brother who works in Taipei, Taiwan. He travels to Japan often on business, and one day he needed to fly to Tokyo on short notice. He instructed his assistant to book the next flight from Taipei to Tokyo.

This is what he got. (Warning: Contains strange music.)

That's right. His assistant booked him on Hello Kitty Air, initially a daily flight from Taipei to Fukuoka, but soon extended to a second run to Tokyo. Everything on this plane is Hello Kitty. The paint scheme, the flight attendants, the boarding passes, the luggage tags, the chopsticks, the sugar packets, the in-flight meals, even the barf bags. I'm told that most people who take this flight are women who are way too into this Hello Kitty thing.

Anyway, he learned his lesson. Now when he has an urgent meeting in Tokyo, he tells his assistant, "Book me the next flight to Tokyo, but not the Hello Kitty one."

This story made me wonder what a "guy version" of the Hello Kitty plane would be. When I brought it up with my colleagues, we quickly agreed that it would be a Star Wars-themed airplane. But what would be the in-flight movie? (And which Leia will the female flight attendants be dressed as?)

Sidebar: There's also a Hello Kitty MMORPG. (Warning: Contains annoyingly cute music.)

• #### Some other places atoms (and the magical 0xC000) arise

The moment the Windows developers got a system for converting strings into numbers, they could use it anywhere they need to, well, convert a string into a number. Somtimes these integers are officially declared as atoms, but most of the time they are just integers that happen to be atoms under the covers.

I'll start with registered window messages, created by the `RegisterWindowMessage` function. These are not officially atoms; they are just integers that happen to lie in the range `0xC000` to `0xFFFF`, just like atoms. But yeah, internally, they're atoms. Of course, you shouldn't rely on it since it's not contractual. Think of it as a fantastic coincidence.

Registered clipboard formats created by the `RegisterClipboardFormat` message are also not officially atoms; they're just `UINT`s. The numeric range for registered clipboard formats isn't even specified; that they hang out in the `0xC000` range is just an implementation detail. Someday, registered clipboard formats may have values like `0x1234`, who knows.

Window properties are also stored in the form of atoms, but unlike the other examples above, the atomic nature of window properties is contractual. You can set a property either by passing the property name `SetProp(hwnd, TEXT("PropertyName"), hData)` or by passing the property atom `SetProp(hwnd, MAKEINTATOM(atm), hData)`, where `atm` was obtained from an earlier call to `GlobalAddAtom`. There is additional weirdness with the way these atoms are tracked, which I'll defer to Friday's article, though it is hinted at in the documentation for `SetProp` which cautions that you need to remove all the properties from a window before it is destroyed.

Window classes also have atoms. The return value of the `RegisterClass` function is an `ATOM`, and you can also retrieve the atom later by calling `GetClassWord(GCW_ATOM)`. We'll see more about that atom next time.

• #### Racking up the frequent shopper points at the register office

In Scotland, a 24-year-old woman got married for the fourth time. The first three ended under unusual circumstances. Let's see, for starters, marriage number one ended when her husband ran off to marry her mom, and the woman even served as a bridesmaid at her mother's wedding. Oh, and a musical thong was also involved.

When I read this article, I thought, "Certainly the UK has a counterpart to The Jerry Springer Show, doesn't it?" It appears that I am not disappointed.

• #### Why do atoms start at 0xC000?

There are two types of atoms, so-called integer atoms, which are just small integers, and, um, atoms that don't have a special adjective; these plain vanilla atoms come from functions like `AddAtom`. (For the purpose of this discussion, I'll call them string atoms.) The atom zero is invalid (`INVALID_ATOM`); atoms 1 through `MAXINTATOM-1`† are integer atoms, and atoms from `MAXINTATOM` through `0xFFFF` are string atoms. Why is the value of `MAXINTATOM` `0xC000`?

The reason has its roots in 16-bit Windows. Atoms are kept in a, well, atom table. The details of the atom table aren't important aside from the fact that the nodes in the atom table are allocated from the heap, and each node corresponds to one string in that atom table.

Since we're working in 16-bit Windows, the pointers in the atom table are 16-bit pointers, and all memory blocks in the 16-bit heap are 4-byte aligned. Alignment on 4-byte boundaries means that the bottom two bits of the address are always zero. Therefore, there are only 14 bits of value in the node pointer. Take that pointer, shift it right two places, and set the top two bits, and there you have your atom. Conversely, to convert an atom back to a node, strip off the top two bits and shift the result left two places.

Why encode the pointer this way? Well, you have 14 bits of information and you want to return a 16-bit value. You have two bits to play with, so your decisions are where to put those bits and what values they should have. It'd be convenient if all the integer atoms were contiguous and all the string were contiguous, to make range checking easier. Now you're down to two options. You have a 49152-value range of integer atoms and a 16384-value range of string atoms. Either you put the integer atoms at the low end (`0x0000-0xBFFF`) and the string atoms at the high end (`0xC000-0xFFFF`), or you put the string atoms at the low end (`0x0000-0x3FFF`) and the integer atoms at the high end (`0x4000-0xFFFF`). You probably don't want zero to be a valid string atom, since that's the most likely value for an uninitialized atom variable, so putting the string atoms at the top of the range wins out.

Now, with the conversion to Win32, the old implementation of atoms was thrown out. Atoms are no longer encoded pointers, but the new implementation still had to adhere to the breakdown of the 16-bit atom space into integer atoms and string atoms.

Over the next few entries, we'll take a look at other consequences of the way string atoms are assigned and surprising things you can do with atoms. (Not necessarily good things, but still surprising.)

Footnotes

†The `MAXINTATOM` symbol adheres to the classic Hungarian convention that "max" is one more than the highest legal value.

• #### In search of: Rossio Train Station

While wandering around the central Lisbon area, I remember walking past an ornate building that had people streaming out of it. As I neared the entrance, I realized, "Oh, it's the train station."

And then some days later, I wanted to catch the train to Sintra, but forgot exactly where the Rossio Train Station was. "It's at the south end of this square, or was it that square? I remember it was on the right hand side as you head south..."

The guide book said that it was right off of Rossio Square, but it's a big square, and, truth be told, I had all my squares confused for a long time. I mistook the neighboring Praça da Figueira for Rossio Square because the first time I got off the Metro at the Rossio stop, I emerged in that square. From then on, I was disoriented without even realizing it.

I popped into a clothing store on (the real) Rossio Square and asked the young woman working there if she could tell me where the train station was. My Portuguese is of course effectively nonexistent (save for the social niceties like Olá and Obrigado), but the guide book did have a picture of the train station. It didn't help. She didn't know where the train station was.

Turns out the train station was just a few steps away. So much for asking the locals how to get around.

The Rossio Train Station is off the northwest corner of the square, on the tiny street connecting Rossio to Restauradores. In fact, the closest Metro stop to the Rossio Train Station is not Rossio; it's Restauradores.

• #### Why does clearing document history also clear Run history?

Commenter John Topley wondered why clearing document history also clears Run history. Simple: Because people wanted it to.

Originally, the button merely erased your recent document history, but with the increasing paranoia over what records the computer kept on your past usage, people began to expect more and even filed beta bugs saying, "I cleared my document history, but I went to Location X and the names of documents I used recently was still visible."

I guess these people were afraid of being raided by the F*I, or more likely (but nobody will admit to it) by their spouse.

• #### News flash: Universities are more likely to admit students whose parents gave lots of money

Earlier, we saw that alumni give money to universities in order to increase the likelihood that the university will admit their children. Today, we learn that the tactic works. The children of big-donor alumni are more likely to be accepted. In fact, you don't even have to be the child of alumni. Just make sure you parents give lots of money.

• #### On the relationship between specific cases and general cases

One of the frustrations I have is people who don't see the relationship between specific cases and general cases.

Statement: In general, you can't do X.

Refutation: You're wrong. Y can do X.

Example: The statement "In general, you can't trust the return address" comes with the refutation "You're wrong. .NET trusts the return address." (Sometimes, it's not so much a refutation as just the assumption that the .NET folks are somehow stupid for not understanding this general rule.)

If I use this same argument in a different context, the fallacy becomes more clear.

Statement: In general, you can't sell cigarettes to people who appear to be underage without checking their ID.

Refutation: You're wrong. A bar can sell cigarettes to anyone.

The flawed refutation introduces new assumptions, namely that you're in the special case of being in a bar. But the bar has its own restrictions, namely that you have to be at least 21 years old to enter. Therefore, they don't need to check your ID because they can assume that everybody in the bar is already at least 21 years old.

Similarly, .NET can trust the return address since it doesn't let programs modify arbitrary memory. Memory access must first pass the verifier before the CLR will run it. (Of course, you can access arbitrary memory by using unsafe code or p/invoke, but you have to be granted full trust in order to do it, so you haven't actually accomplished anything.)

In the general case, however, a called function does not have control over what other code is running in the same process.

A related source of frustration comes from people who treat all statements as absolute metaphysical truths, even if they are clearly generalizations once you accept that people have free will. I've gotten in the habit of inserting weasel words like typically or generally, but not everybody remembers to do this in the comments, exposing themselves to nitpicky responses from other commenters. Welcome to my world.

• #### Take Our Daughters and Sons to Work today, unless you work at main Microsoft campus, in which case, wait until summer

Today is the fourth thursday of April, which is national Take Our Daughters and Sons to Work Day. The main Microsoft campus is not participating today, but there's a good reason for this.

The Washington State Assessment of Student Learning, better known as the WASL (pronounced "WAH-s'l"), is a four-day battery of standardized tests administered to all elementary school and high school students starting from grade 3. You can download sample tests and answer keys for the reading and mathematics sections of the test to see whether you would have passed high school had you gone to school in Washington. (Sample questions for other grades are also available, including calibration samples for the writing test.)

The problem is that the WASL is administered... at the end of April. Elementary schools have some discretion in choosing exactly which days they administer the test, and if they happen to choose a date that conflicts with Take Our Daughters and Sons to Work Day, then children in that school district are unable to participate in the event. Microsoft employees indicated that they would prefer to have the event in the summer to avoid the schedule conflict.

• #### User interface code + multi-threaded apartment = death

There are single-threaded apartments and multi-threaded apartments. Well, first there were only single-threaded apartments. No wait, let's try that again.

First, applications had only one thread. Remember, 16-bit Windows didn't have threads. Each process had one of what we today call a thread, end of story. Compatibility with this ancient model still exists today, thanks to the dreaded "main" threading model. The less said about that threading model the better.

OLE was developed back in the 16-bit days, so it used window messages to pass information between processes, there being no other inter-process communication mechanism available. When you initialized OLE, it created a secret `OleMainThreadWnd` window, and those secret windows were used to communicate between processes (and in Win32, threads). As we learned some time ago, window handles have thread affinity, which means that these communication windows have thread affinity, which means that OLE has thread affinity. When you made a call to an object that belonged to another apartment, OLE posted a message to the owner thread's secret `OleMainThreadWnd` window to tell it what needs to be done, and then it went into a private message loop waiting for the owner thread to do the work and post the results back.

Meanwhile, the OLE team realized that there were really two parts to what they were doing. There was the low-level object and interface management stuff (`IUnknown`, `CoMarshalInterThreadInterfaceInStream`) and the high-level "object linking and embedding" stuff (`IOleWindow`, `IOleDocument`) that was the impetus for the OLE effort in the first place. The low-level stuff got broken out into a functional layer known as COM; the high-level stuff kept the name OLE.

Breaking the low-level and high-level stuff apart allowed the low-level stuff to be used by non-GUI programs, which for quite some time were eyeing that object management functionality with some jealousy. As a result, COM grew two personalities, one focused on the GUI customers and another focused on the non-GUI customers. For the non-GUI customers, additional functionality such as multi-threaded apartments were added, and since the customers didn't do GUI stuff, multi-threaded apartments weren't burdened by the GUI rules. They didn't post messages to communicate with each other; they used kernel objects and `WaitForSingleObject`. Everybody wins, right?

Well, yes, everybody wins, but you have to know what side your bread is buttered on. If you initialize a GUI thread as a multi-threaded apartment, you have violated the assumptions under which multi-threaded apartments were invented! Multi-threaded apartments assume that they are not running on GUI threads since they don't pump messages; they just use `WaitForSingleObject`. This not only clogs up broadcasts, but it can also deadlock your program. The thread that owns the object might try to send a message to your thread, but your thread can't receive the message since it isn't pumping messages.

That's why COM objects involved with user interface programming nearly always require a single-threaded apartment and why `OleInitialize` initializes a single-threaded apartment. Because multi-threaded apartments were designed on the assumption that there was no user interface. Once you're doing user interface work, you have to use a single-threaded apartment.

Page 1 of 5 (41 items) 12345