Welcome to MSDN Blogs Sign in | Join | Help

Signs that the symbols in your stack trace are wrong

One of the things programmers send to each other when they are trying to collaborate on a debugging problem is stack traces. Usually something along the lines of "My program does X, then Y, then Z, and then it crashes. Here is a stack trace. Can you tell me what's wrong?"

It helps if you at least glance at the stack trace before you send it, because there are often signs that the stack trace you're about to send is completely useless because the symbols are wrong. Here's an example:

We are testing our program and it gradually grinds to a halt. When we connect a debugger, we find that all of our threads, no matter what they are doing, eventually wind up hung in kernel32!EnumResourceLanguagesA. Can someone explain why that function is hanging, and why it seems all roads lead to it?

   0  Id: 12a4.1468 Suspend: 1 Teb: 000006fb`fffdc000 Unfrozen
kernel32!EnumResourceLanguagesA+0xbea00
kernel32!EnumResourceLanguagesA+0x2b480
bogosoft!CObjMarker::RequestBlockForFetch+0xf0
...

   1  Id: 12a4.1370 Suspend: 1 Teb: 000006fb`fffda000 Unfrozen
kernel32!EnumResourceLanguagesA+0xbea00
kernel32!EnumResourceLanguagesA+0x2b480
bsnetlib!CSubsystem::CancelMain+0x90

   2  Id: 12a4.1230 Suspend: 1 Teb: 000006fb`fffd8000 Unfrozen
NETAPI32!I_NetGetDCList+0x117e0
kernel32!EnumResourceLanguagesA+0x393a0
ntdll!LdrResFindResource+0x58b20
...

   3  Id: 12a4.cc0 Suspend: 1 Teb: 000006fb`fffd6000 Unfrozen
kernel32!EnumResourceLanguagesA+0xa80
bsnetlib!BSFAsyncWait+0x190
...

  4  Id: 12a4.1208 Suspend: 1 Teb: 000006fb`fffd4000 Unfrozen
kernel32!EnumResourceLanguagesA+0xbea00
kernel32!EnumResourceLanguagesA+0x2b480
bogosoft!TObjList<DistObj>::Get+0xb0

  5  Id: 12a4.1538 Suspend: 1 Teb: 000006fb`fffae000 Unfrozen
kernel32!EnumResourceLanguagesA+0xbf3d0
kernel32!EnumResourceLanguagesA+0x2c800
bsnetlib!Tcp::ReadSync+0x340
...

   6  Id: 12a4.16e0 Suspend: 1 Teb: 000006fb`fffac000 Unfrozen
ntdll!LdrResFindResource+0x61808
ntdll!LdrResFindResource+0x1822a0
kernel32!EnumResourceLanguagesA+0x393a0
ntdll!LdrResFindResource+0x58b20 
...

This stack trace looks suspicious for a variety of reasons.

First of all, look at that offset EnumResourceLanguagesA+0xbea00. It's unlikely that the EnumResourceLanguagesA function (or any other function) is over 750KB in size, as this offset suggests.

Second, it's unlikely that the EnumResourceLanguagesA function (or any other function, aside from obvious cases like tree walking) is recursive. And it's certainly unlikely that a huge function will also be recursive.

Third, it seems unlikely that the EnumResourceLanguagesA function would call, NETAPI32!I_NetGetDCList. What does enumerating resource languages have to do with getting a DC list?

Fourth, look at those functions that are allegedly callers of EnumResourceLanguagesA: bogosoft!CObjMarker::RequestBlockForFetch, bsnetlib!CSubsystem::CancelMain, bsnetlib!Tcp::ReadSync. Why would any of these functions want to enumerate resource languages?

These symbols are obvious wrong. The huge offsets are present because the debugger has access only to exported functions, and it's merely showing you the name of the nearest symbol, even though it has nothing to do with the actual function. It's just using the nearest signpost it can come up with. It's like if somebody gave you directions to the movie theater like this: "Go to city hall downtown and then go north for 35 miles." This doesn't mean that the movie theater is in the downtown district or that the downtown district is 35 miles long. It's just that the person who's giving you directions can't come up with a better landmark than city hall.

This is just another case of the principle that you have to know what's right before you can see what's wrong. If you have no experience with good stack traces, you don't know how to recognize a bad one.

Oh, and even though the functions in question are in kernel32, you can still get symbols for that DLL with the help of the Microsoft Symbol Server.

Posted by oldnewthing | 11 Comments
Filed under:

The day the coffee machine exploded

Some time ago, Microsoft began installing Starbucks coffee makers in the kitchens, and caffeine addicts waited anxiously for the machines to reach their building. Or at least that's what happened on the main Redmond campus. But what about the satellite offices?

I'm told that each satellite office qualified for an iCup machine when the number of employees at the office reached some magic value. One of my colleagues who works at the office in New York City told me that they eagerly awaited the arrival of the machine when they learned that they reached that threshold. The long-anticipated day arrived: The coffee machine was installed in the kitchen.

And it exploded.

Okay, it didn't really explode. But the receptacle for holding the spent grounds overflowed and burst, spilling its guts out onto the kitchen floor. If you didn't know what happened, you'd have thought it had exploded.

The reason it exploded was that, although the New York office is rather small, it does have a very high number of visitors. As you can imagine, clients pay visits to the New York offices for meetings, presentations, all that stuff that clients visit offices for; but the underlying algorithm for determining how many coffee machines each office receives doesn't take into account how many visitors each location receives.

Oh, and happy Guy Fawkes Day. Try not to blow up any coffee machines.

Posted by oldnewthing | 18 Comments
Filed under:

In the product end game, every change carries significant risk

One of the things I mentioned in my talk the other week comparing school with Microsoft is that in school, as the deadline approaches, the work becomes increasingly frantic. On the other hand, in commercial software, as the deadline approaches, the rate of change slows down, because the risk of regression outweighs the benefit of the fix.

A colleague of mine offered up this example from Windows 3.1: To fix a bug in GDI, the developers made a very simple fix. It consisted of setting a global flag when a condition was detected and checking the flag in another place in the code and executing a few lines of code if it was set. The change was just a handful of lines, it was very tightly scoped, and it did not affect the behavior of GDI if the flag was not set. They tested the code, it fixed the problem, everything looked good. What could possibly go wrong?

A few days after the fix went in, the GDI team started seeing weird crashes that made no sense in code completely unrelated to the places where they made the change. What is going on?

After some investigation, they discovered a memory corruption bug. In 16-bit Windows, the local heap came directly after the global variables, and local heap memory was managed in the form of local handles. A common error when working with the local heap was using a local handle as a pointer rather than passing it to the LocalLock function to convert the handle to a pointer. The developers found a place where the code forgot to perform this conversion before using a local handle. (In Windows 3.1, most of GDI was written in assembly language, so you didn't have a compiler to do type checking and complain that you're using a handle as a pointer.) Using the handle as a pointer resulted in a global variable being corrupted.

Investigation of the code history revealed that this bug had existed in the code since the day it was first written. Why hadn't anybody encountered this bug before?

The handle that was being used incorrectly was allocated at boot time, so its value was consistent from run to run. The corruption took the form of writing a zero into memory at the wrong location, and it so happened that the variable that was accidentally being set to zero was not used often, and at the time the corruption occurred, it happened to have the value zero already.

Adding a new global variable shifted the other global variables around in memory, and now the accidental write of zero hit an important variable whose value was usually not zero.

In the product end game, every change carries significant risk. It's often a more prudent decision to live with the bug you understand than to fix it and risk exposing an even worse bug whose existence may not come to light until after you ship.

Posted by oldnewthing | 20 Comments
Filed under:

Good advice comes with a rationale so you can tell when it becomes bad advice

A customer asked for guidance in software design:

Is there an issue with creating and using COM objects from a UI thread which was initialized as STA? I have heard that it is a best practice to create and use COM objects on a background thread which is MTA. I would like to have some more information as to why. Any help?

(I still have trouble with the phrase best practice, especially when it is combined with the indefinite article: a best practice. It's like asking "Where is a tallest building?")

Good advice comes with a rationale so you can tell when it becomes bad advice. If you don't understanding why something should be done, then you've fallen into the trap of cargo cult programming, and you'll keep doing it even when it's no longer necessary or even becomes deleterious.

In fact, you will find that if you try to follow this advice to the letter, most shell objects will stop working, because shell objects tend to require an STA. But in the absence of a rationale document or any other context, it's unclear what the scope of the original advice was. Maybe it makes sense in context, but right now it's just a statement with no discussion or rationale.

When I asked the customer, "Can you provide the documents that provided this recommendation? Perhaps this 'best practice' makes sense in context. Right now, it's just a bare recommendation with no discussion or rationale."

The customer never wrote back.

Posted by oldnewthing | 18 Comments
Filed under:

When asked to choose among multiple options, the politician will pick all of them

During the run-up to a local election some time ago, the newspaper posed the same set of questions to each of the candidates and published the responses in a grid format so the readers could easily compare them.

The candidates agreed on some issues, had opposing positions on others, but the question whose answers struck me was one of the form "If budget cuts forced you to eliminate one of the following four programs, which would you cut?"

  • Candidate 1: "I have no intention of letting our budget get into a situation in which this would become an issue. All of these programs are very important to our community, and under my leadership, they will continue to be funded."
  • Candidate 2: "I don't believe we need to eliminate any of these popular programs. If we review our financial situation, we will find that we can continue to provide for all of them."
  • Candidate 3: "Much as I personally enjoy Program X, it ranks as a lower priority to me than the other options. Program X was originally a community-run program, and I would encourage residents and the business community to step forward and keep alive this program which has greatly benefited our community over the years."

Notice that the first two candidates, when asked to make a tough decision, opted to make no decision at all. (Compare another election in which the mainstream candidates rated everything as high priority.) The first candidate said, "This would never happen." The second candidate said, "It's not happening." The third candidate is the only one who sat down and made the call to cut one of the programs. The first two were playing politics, afraid to make a decision for fear that it would alienate some portion of the electorate. The third understood the situation and made the hard decision.

I voted for the third candidate.

Today is Election Day in the United States. Don't forget to vote. (Void where prohibited.)

Posted by oldnewthing | 13 Comments
Filed under:

Microspeak: Net out

It started out in finance, but the term has crept into more mainstream usage (at least within Microsoft) and along the way picked up its own meaning:

Where did we net out on this?
Customers want you to net out the business value.
Note any significant changes to the forecast and explain the reasons why. Net out changes to start conversation.

Include the following points in your presentation:

  1. ...
  2. Net out action plan moving forward

The next citation is a bullet point from a PowerPoint slide:

Agenda
Each district/vertical will answer/report back on:

  • ...
  • Net out top 3 business asks

(I also have some finance citations, but they aren't relevant to Microspeak, so I've left them out.)

In finance, to net out is to cancel out positive and negative amounts. For example, you might net out an account by cancelling amounts owed against amounts due in order to eliminate offsetting transactions. When calculating tax liability, you net out your gains against your losses to determine your net change for the tax period.

In Microspeak, well, I'm not sure what it means. In that first citation, it appears to be a synonym for come to a conclusion. The question appears to be a rephrasing of "What was our conclusion on this?" or "What did we finally decide on this?"

In the second citation, it appears to be a synonym for summarize in terms of net benefit/loss. "Customers want you to show the net benefit of the product."

In the third citation, it appears to be used merely to mean summarize.

And in the final two citations, it appears to be simply a verb meaning to produce.

Note that net out is unrelated to that other Microspeak phrase net net, discussed earlier.

Posted by oldnewthing | 5 Comments
Filed under: ,

Hey, is there somebody around to accept this award?

Back in the late 1990s, some large Internet association conducted a survey in order to bestow awards in categories like Best Web server and Best Web browser, and one of the categories was Best Web authoring tool.

We didn't find out about this until the organization contacted the Windows team and said, "Hi, we would like to present Microsoft with the award for Best Web authoring tool. Please let us know who the author of Notepad is, so that we can invite them to the award ceremony."

Yup, Notepad won the award for Best Web authoring tool.

The mail went out to the team. "Hey, does anybody remember who wrote Notepad?"

Even a decade ago, the original authorship of Notepad was lost to the mists of time. I think the person who ended up going was the original author of the multi-line edit control, since that's where the guts of Notepad lie.

Posted by oldnewthing | 66 Comments
Filed under:

Still working out the finer details of how this Hallowe'en thing works

Here's an excerpt from a conversation on the subject of Hallowe'en which I had with my niece some time ago. Let's call her "Cathy". (This is a different Cathy from last time.)

"Cathy, what do you do on Hallowe'en?"

"You get all dressed up and people give you candy."

"What do you say when people come to the door?"

"Chuck-E-Cheese!"

Posted by oldnewthing | 31 Comments
Filed under:

What is the format for FirstInstallDateTime on Windows 95?

Public Service Announcement: Daylight Saving Time ends in most parts of the United States this weekend.

Windows 98/98/Me recorded the date and time at which Setup was run in the registry under HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion as a binary value named FirstInstallDateTime. What is the format of this data?

Take the binary value and treat it as a 32-bit little-endian value. The format of the value is basically DOS date/time format, except that the seconds are always 0 or 1 (usually 1), due to a programming error.

Exercise: What error would result in the seconds always being 0 or 1 (usually 1)?

[Update: Falcon is the first to post the correct answer.]

Posted by oldnewthing | 19 Comments
Filed under: ,

What this batch file needs is more escape characters

(Employing the snowclone "What this X needs is more Y.")

Each time you add a parsing pass to the batch processor, you have to add another layer of escaping. This is just a special case of the more general rule of thumb: any problem in quoting can be solved by adding another layer of escaping.

(Okay, it's not actually true, nor is it a rule of thumb, but it's still something to keep in mind.)

When you enable delayed variable expansion, you add another parsing pass to the batch processor. It used to expand % variables at the time the line is ready, but now you told it that, oh wait, just before executing the command, expand it a second time (this time looking for ! variables.)

Which means that if you want to echo an exclamation point, you have to protect the exclamation point so the parser won't treat it as a delayed expansion.

echo Error^^!

The ^^ collapses to a ^ during the first parsing pass. On the second parsing pass, the ^! turns into a !

Remember, the batch language was not designed; it evolved. I admire the approach taken by commenter Nick, in a tip of the hat to Douglas Adams:

Much like the universe, if anyone ever does fully come to understand Batch then the language will instantly be replaced by an infinitely weirder and more complex version of itself. This has obviously happened at least once before ;)
Posted by oldnewthing | 16 Comments
Filed under:

Warning: Not much useful content inside

Remember, this Web site is for entertainment purposes only. Sometimes it takes people a little while before they realize this:

I apologize for posting the link to the "Old New Thing" blog. [...] I have read a few articles in the "Old New Thing" blog and so far I have not seen much that is useful there.
Posted by oldnewthing | 20 Comments
Filed under:

Why does the Photo Gallery show all my photos with a colored tinge?

When you view your pictures with the Photo Gallery program which comes with Windows Vista, and which is also available for download from live.com, you might see a colored tinge. Where is the tinge coming from, and how do you get rid of it?

Ironically, what you're actually seeing is the absence of a tinge, but you got so used to seeing the tinge, your eyes established the tinge as the new baseline.

Not all display devices show exactly the same color when you ask them to display a particular RGB. The Windows Color System takes into account the color characteristics of output devices so that these variations can be taken into account when rendering to those devices. (Not that you could have figured this out from reading the official description, which just rambles for two paragraphs of marketing nonsense without actually saying what it does.) The goal is to make the color you see on the screen match the color that comes out on the printer, and have both match the color the person who created the image intended you to see.

If you don't want Windows to perform this color correction, open your Start menu and run the Color Management tool by typing its name into the Search box, or by hunting for it inside your Control Panel. Once you manage to launch it (by whatever means), go to your display device, check Use my settings for this device and then remove the color profile.

That was the tip. Now come da history.

The feature now known as the Windows Color System was introduced in Windows 95 under the name Independent Color Management. This explains why the color profile files have the *.icm extension.

But Independent Color Management was not the original name for the feature. The original name was Device-Independent Color, but the name was changed because the original name resulted in an unfortunate acronym that was lost on nobody. When Device-Independent Color was being written, one of the programmers in the user interface group reviewed the work in progress and sent an update to the rest of the team. She wrote, "I just looked at David's DIC, and (since I know you're all going to ask)... it looks good."

Posted by oldnewthing | 31 Comments
Filed under: ,

If aluminum pull tab redemption is a rumor, what happens to all the tabs?

Everybody should know by now that it is not true that pull tabs from aluminum cans can be redeemed for time on a dialysis machine. Of course, not everybody actually knows this, and then the next question is, well, what happens to all those pull tabs collected by misinformed people?

The Snopes article explains that it depends on where you turn in the tabs. They might get recycled at the going scrap rate and the proceeds donated to the National Kidney Foundation or the Ronald McDonald House. But I was most fascinated by this resourceful researcher who played the game of "follow the tabs" from a State Police office to a high school to a hospital to the Shriners to a library to a community college, where the trail finally runs cold.

I related this story to a friend of mine, who did a double-take. "Wait a second. That guy where the trail runs cold? I know that guy. I used to work for him!"

That the trail runs cold was hardly surprising to my friend. Apparently, my friend's former boss was the sort of person who would never admit that he made a mistake, no matter how obvious the error. My friend speculated that when his boss discovered that nobody would take the aluminum pull tabs, he certainly wasn't going to admit, "Oops, sorry everybody." In order to keep up appearances, he had to keep collecting them, even though he had nowhere to dispose of them. And of course, all the independent pull-tab collectors who couldn't find anybody to give them to gradually learned about this guy who will take them, which meant that more and more of them kept coming in. My friend figured, "He probably has an enormous pile of aluminum pull tabs just sitting in his garage."

Posted by oldnewthing | 25 Comments
Filed under:

Freudian typo: The accidental emoticon

Some time ago, I ran across the following Freudian typo in a mail thread discussing plans for the project after Milestone 3, commonly abbreviated M3.

I'd like to talk with you about your plans for this area after <3.

On the US-English keyboard layout, the M and comma keys are adjacent, and a shifted comma is a less-than sign. A simple off-by-one-key typo resulted in M3 turning into an emoticon.

Posted by oldnewthing | 5 Comments
Filed under:

Why won't my computer go to sleep? Where is the energy going?

The powercfg utility has been around for a while, but in Windows 7, it gained a little bit more awesome.

powercfg /energy will analyze your computer's power consumption and report on things like devices that prevent the computer from sleeping, devices which won't suspend, and processes which are increasing your battery drain.

Another neat flag is powercfg /requests which will report on why your computer can't go to sleep, for example, because it has open files on the network, or because the clown will eat it.

Posted by oldnewthing | 20 Comments
Filed under:
More Posts Next page »
 
Page view tracker