Welcome to MSDN Blogs Sign in | Join | Help
Callstack to the Rescue!

Been busy getting Office14 out the door. One thing keeping me busy is crashes buckets from Office 2007. When an application crashes, it will send information back as to where the program was when it crashed.

These crashes are analyzed and grouped together into similar “buckets”. We take the top buckets and try to determine what went wrong so we can fix them.

It pays to be good with a debugger and know assembly code here since all you have is registers, code and some memory at the time of a crash.

But what can be even more helpful is if someone else has already found a way to hit this crash. It’s 100x more useful to have repro steps which the same crash than it is to just have a dump. You can use a debug build (sometimes), you can use break points, you can change code to see how it affects the crash and later you can write automation to make sure it doesn’t come back!

How do you find these mythical repro steps? By searching the bug database for similar callstacks. If a tester has found a crash, ideally they also logged the callstack. If the callstack is similar to the dump file’s, it’s likely the same issue.

A Punted Bug

Two years ago, before Office 2007 shipped, we were all testing it trying to find a recall class problem. I came across a crash that involved SmartArt, text fields, multiple slides and undo/redo. Given the very convoluted way the crash reproed, we decided it wasn’t worth investigating and punted the bug.

Fast forward to today and we have a number of crashes in SmartArt that we don’t know what’s causing them. Turns out, people are hitting the same crash I found two years ago, although likely with simpler steps.

If I hadn’t copied the callstack into the bug, we may never have found out why this was happening.

Thank you, two years past self.

"How do I fix more bugs?"

I was asked “How do I fix more bugs?”

The answer is to own buggier areas! Devs who own buggy areas (best when not through some fault of their own) will resolve more bugs because more bugs will be incoming.

Another is to file bugs and fix them. Not dumb bugs, but real bugs. Sometimes only devs can see the problems while looking at the code. If you feel bad filing and fixing your own bugs, assign them to PM first to prioritize.

More importantly, seek out buggy areas. This shows initiative and desire to help ship the product. If instead you wait until you run out of bugs and someone assigns you something else to do, it’s less impressive. “He fixed it, but only after I gave it to him”.

Of course, if the area remains buggy, you haven’t helped yourself. If bugs keep incoming, it could be the feature needs to be scoped again. This usually means talking with PM and test about cutting or stripping down features. Sometimes, a small change in spec cuts off an entire area of bugs. This is just as good as fixing a lot of bugs, since you identified a key area that if cut would dramatically reduce bug counts.

One such example would be supporting rotation of entire SmartArt diagrams. It’s a simple user scenario, but it affects every aspect of editing the diagram plus you need to deal with the content pane when the diagram is rotated and the list will go on and on. In the end, not supporting rotation simplifies the entire implementation and cuts off a whole chain of bugs. (Today of course you can just ungroup it and rotate the group if you really want).

Better bug ordering means better fix rate

While there are many bug priorities, they are rarely stacked in such a way that will lead to the most amount of bug fixing possible. I’ve found fixing everything strictly in priority order will get you stuck on the hard bugs the entire cycle.

Instead, I always have one machine used to fix hard, important bugs and another free to fix easy, “fun” bugs. This gets rid of lower priority fit and finish issues that we wouldn’t normally get to if we didn’t just fix them. It also gives me a break from working on any particular monotonous and difficult bug which usually leads to better a fix anyway as I’ve had more time to think about it.

 Fixing lower priority bugs in between higher ones prevents a backlog of stuff to work on which PMs will have to go through and make cuts of. You can’t save every bug and many bugs should be punted, but getting rid of the easy ones means less things PMs have to cut. Plus it makes you look good with a higher resolve rate and less bugs assigned to you.

Negative space

Additionally, the most severe bugs from the user’s perspective are not always the most important to fix first. Many bugs which come to be known as “Work Items” in Office need a lot of ground level support to really fix. It’s not enough to just fix the symptom of a particular bug, there tends to be a larger cause that needs addressing to prevent future bugs.

These are probably the most important things for devs to identify since no one else really knows that the future holds many more problems if the area doesn’t get looked at.

And that’s probably the most valuable thing to put in a review, but the hardest to prove: that you prevented downfall of the entire feature because you reworked the area. If you hadn’t done it, you’d be drowning in bugs, but because you did it there’s no proof there was ever a problem. It’s productivity in “negative space”.

One way to show this is to show how bugs in the area tended to go down after your fix. Another is to show a certain class of bugs that kept coming up over and over but stopped after your fix.

Either way, hammer it home during reviews. Every manager should know exactly how much awesomer the product is now that you’ve graced it with your presence.

Airports and Cylons

I’ve noticed one thing in my travels through airports: power is king.

I routinely bring a laptop and MP3 player stocked with music and movies to watch during the flight but I always over prepare incase I get stuck waiting.

But all of this is for naught if you can’t find an outlet.

As I sit in DIA waiting another 3 hours for my flight, I’ve observed huddles of people, clutching to floor space near coveted outlets. Wires are strung across open space from phone booths to chairs 2 feet away. The most ambitious sit next to a computer booth next to someone else as they charge their cell phone on the one outlet the occupier isn’t using.

When it snows DIA must turn into a modern day Lord of the Flies as people degenerate into amp hoarding gangs of travelers, fending off other’s drained cellphones and computers. Entire tribes die as a circuit breaker trips and knocks out power to gate A25, leading to the remaining mounting an attack on Concourse B.

Maybe the smartest thing to pack is a powerstrip. It would give me absolute power converting two outlets into 6 or more! Or at least let me leech off an outlet someone else has already claimed.

As for today, it turns out Midwest is done with flights and their gates are nearly abandoned. A sweet spot next to a column with two untouched outlets allows me to write blog posts and watch Battlestar Galactica, saving me from certain social doom if my friends were to find out on Monday that I’d never seen the show.

If they really need a Cylon detector, why can’t they just use X-rays and metal detectors? The writers clearly never spent 6 hours in an airport.

Best SmartArt Feature Ever!

Like SmartArt but wish there was more flexibility in customizing its shapes? We added just such a feature in Office 2007 SP2 which gives you the ability to ungroup a given diagram in PowerPoint.

Just right click the border of a top level diagram and choose “Ungroup”:

 

Below are the original diagram on the left and the resulting shape group on the right:

 

Now that the diagram is a group, you can manipulate the shapes individually which means they won't affect the size or position of other shapes in the diagram (the "smartness" of SmartArt will be gone). This also includes dragging the shapes outside of the original border, editing as freeform and adding text to all diagram shapes. Finally it allows for more tailored animations as each shape can now be animated individually in the Custom Animation pane (if you ungroup the resulting group first).

Why am I so excited about this feature? Because it was an idea I had after Office 2007 shipped. I convinced our PMs to create a spec for it, and it turned out to be so useful during our testing that we back ported it from Office 14 into Office 2007 SP2! Quite a rare event having a feature thought up by a developer making its way into a previous version of Office.

Being able to demonstrate a nearly fully functional prototyped also helped seal the deal. Many features are great ideas on paper, but run into trouble when we try to implement them. Doing a solid prototype proves the feature isn't down right impossible. You can also send the feature to a tester to bang on for a while to see if any showstopper bugs rear their heads.

So thanks to a bunch of people's hard work, we shipped the best and what will surely be the most beloved SmartArt feature of all time (except for all those still to come).

Office 2007 SP2!

A quick note: you can now download Office 2007 SP2 or read more about it here. There are some pretty sweet things in this update that I got to work on; I'll touch on "...more control over the appearance of SmartArt graphics" in a later post.

Productivity in the Car

Practically every book I see advertised on The Daily Show I've gotten at my local library. But this sometimes leaves me with tons of books piled up to read. So I started getting the books on CD when I can and using the CD changer in my car to queue up 6 CDs at once. I've since been able to "read" several books just by going to and from work, time that was in the past spent focused on the slow driver in front of me or lamenting the bug I was going to have to fix that morning.

When Bugs Knock on Your Door

I bought a place back in September and had just this week gotten things mostly settled. Then today some people knocked on my door. I opened, and there were 3 men, 2 women and baby. It sounded like the makings of some kind of movie sequel, but not the kind I wanted to be in the center of.

 

They said they had called earlier and wanted to see the place.

 

Well that’s interesting, I thought. For one, I'm under the age of 30 and so have never had an actual land line telephone for someone to call me on. For another, this must be the sign of a wet wear malfunction since I had no recollection of putting the place up for sale.

 

They showed me the sheet which had my address and picture of the place on it. I silently noted how the "asking price" was $15,000 more than what I paid. I informed them I had no interest in this many roommates and that they try some of the other places down the street.

 

I went back to my room and hopped on the source of all true information: the Internet. But it failed, or at least lied, since Realtor.com and Zillow.com both show the place as sold at the price I paid.

 

But the question still picks at my brain. What database hasn’t been notified that the house has sold? And what program (or person) increased the listing price?

 

I think a human error is more likely than a computer bug in this case, but the consequences of a computer bug would have had the same effect: people showing up at my door wondering why there was still furniture in the place. A software error manifesting itself in the physical world, now that’s a creepy thought.

 

Which is why I’m very glad I don’t work in any kind of mapping software, where that mistake in code makes the computer call out “turn left” and suddenly you’re not in Kansas anymore. We don’t need walking metal later to be governors with machine guns from the future to kill us all, just write a virus that invades MapQuest and let people do themselves in.

 

Oh, and who the hell did they talk to?

Today’s Bug is Brought to You by “Don’t Mix C and C++”

Take this for example:

#include <iostream>

#include <iomanip>

 

 

class Storage

{

public:

      Storage() : m_val( -1 ) {}

      int m_val;

};

 

class Holder

{

public:

      Holder() : m_double( 50000 ) {}

 

      union

      {

            double m_double;

            struct

            {

                  Storage m_storage;

                  int m_id;

            };

      };

};

 

int _tmain(int argc, _TCHAR* argv[])

{

      TestClass t;

      std::cout << std::setprecision(20) << t.m_double;

      return 0;

}

 

So where does 50000.03125 come from? It’s nowhere in the code.

It comes from mixing the 50000 passed into m_double and the constructor for Storage later assigning m_val to be -1.

But why? What’s the proper order things get initialized in a union? What if I have two classes each which have their own constructors, which one gets called since they both occupy the same memory?

The C++ standard says thusly:

An object of a class with a non-trivial constructor (12.1), a non-trivial copy constructor (12.8), a non-trivial destructor (12.4), or a non-trivial copy assignment operator (13.5.3, 12.8) cannot be a member of a union, nor can an array of such objects.

So, the above should result in a compiler error, but it doesn’t as of VC++ 9. In fact, if you change the code to have a named struct:

union

      {

            double m_double;

            struct MyStruct

            {

                  Storage m_storage;

                  int m_id;

            };

            MyStruct m_struct;

      };

You get the correct C2620 error:

 error C2620: member 'Holder::m_struct' of union 'Holder::<unnamed-tag>' has user-defined constructor or non-trivial default constructor

A similar coding error has existed in the Office code base for a while, but went undetected because of how the memory corruption occurs. Since 50000 becomes 50000.03125, nobody noticed that the sub-pixel rendering of certain shapes was slightly incorrect until a separate, seemingly unrelated assert kept popping up saying two numbers it expected to be equal were not. If the corruption had occurred in an integer, or the storage class and m_id were in a different order or the compiler had correctly shown an error on this or the original developer had used a named struct, the problem would have been easily identified. But since all these things didn't happen, we have small deviations in the original double value.

What did happen was good test automation, which routinely hit the assertion and we were able to trace the root cause of several bugs and fix the problem.

Would You Add Easter Eggs To Software Produced At Work?

According to many Slashdot commenters, the answer was clearly “Yes! Just be tactful and careful, and make it über geeky.”

Of course, those of us who live in the real world know the correct answer to be “Did you ask your boss first?”

Otherwise, coding random stuff that no one knows about in a large product is the mother of all bad ideas.

If you work on a large piece of software, chances are you’re not the only developer on it. So if you’re entitled to have a little fun adding code no one ordered to the product, why shouldn’t everyone else be? (Didn’t your mother teach you anything?)

So now we have a shipped product that took 50 people to code which now has 50 different hidden things in it to find.

Oh boy a treasure hunt! But no one knows what’s actually in the code since everyone decided to do their own thing, bypassing the company’s localization, legal and geopolitical checks. Who knows what time bomb is waiting to be found? If some lyrics no body screened can force the world wide recall of a game, a little message box that pops up with your favorite saying can cause just as much damage.

What’s worse is the author of the article mentions he’s at the end of developing a large product. Remember, shipping is a feature, and as such you need to carefully consider taking changes the closer you are to release. Any bug you introduce now has a high risk of not being found before release.

Any easter egg added to the product needs to go through the same triage process as fixing any other bug. It is in fact a feature you’re trying to add. You need to ask yourself if it’s worth breaking something else to add it? Recall that the business value of any easter egg is zero.

Also consider the cost. Cardboard boxes are expensive these days and it’s no fun having to get one to haul your stuff away. Any company which pays you to code is not going to pay you to secretly sabotage their products from the inside. And that’s exactly what an easter egg coded up and known only to an individual is: sabotage.

Work long enough and you’ll discover that no change is safe and simple things break big things in complex ways. Once someone spends 8 hours debugging a customer problem only to find out it’s your code trapping the CTRL+SHIFT keys to pop up some little mini game, your management will be especially unhappy that not only did you cost them money to fix the problem, but that you also spent a good chunk of company time coding it up in the first place. Tetris is cool, Tetris in the shipped product minus your job is not.

Easter Eggs Done Right

Many companies do add stuff to their products so that the people working on it get to add their personal touch. They plan to do it, they test it and they screen whatever people add.

Watch many game’s credit screens and you’ll see pictures of developers with their little catch phrases. I guarantee these were a planned feature of the final game, not something coded up at zero hour. Plus they would be reviewed by lawyers and regional experts. This is important because the company is taking a risk letting their employees add that personal saying. If there’s a lawsuit, the company is responsible.

So the moral is, ask your boss first. Depending on what you ship will depend on what kind of personalization you can add.

No Hackers in Deep Space

NASA recently tested their “Deep Space Internet” protocol.  It sounds pretty slick, enabling more complex missions with multiple satellites or rovers all communicating to each other and back to earth.

Unfortunately, we won’t be pinging any satellites from our “Close Earth Internet” any time soon:

Unlike TCP/IP on Earth, the DTN does not assume a continuous end-to-end connection. In its design, if a destination path cannot be found, the data packets are not discarded. Instead, each network node keeps the information as long as necessary until it can communicate safely with another node.

This works great until some alien manages to send a bunch of data to a non-existent satellite, forcing your node to queue up all the information indefinitely, triggering an interplanetary equivalent SYN flood.

This begs the question: what is the security around communicating with interplanetary probes?

I assume there's some form of encryption, but these probes exist for a long time, what's the lifetime of their encryption schemes?

Does NASA do testing to accomadate rogue people with satellite dishes sending data to their probes? Or is the real barrier that the probe won't be aligned to your dish's location so it's impossible to send data?

Are You and Your Date Binary Compatible?

Many exciting things have been happening since we shipped Office 2007, but most have been "hush hush". As in "Chris, don't tell your friends you're working on TOTAL AWESOME FEATURE X as you will make them jealous."

But not all my time is spent on cool new features. Some of it is spent on servicing Office 2007 and previous releases. This can be in the form of service pack work, security fixes or "hot fix" requests. These types of fixes have their own set of neat technical challenges because they have to be fixed after we’ve shipped the product.

The biggest technical hurdle I’ve found is “binary compatibility”. When you make a fix after shipping, sometimes that fix will occur in more than one EXE or DLL (referred to as a “binary”). And the kicker is: patching does not update all binaries in the product at once.

Really, who wants to download all of Office for every single patch? The answer is no one, so patches usually contain a minimum amount of changes.

Also, during the course of the Office life cycle, there are a number of patches that go out and all of these patches are built from the same source tree (otherwise you could install a patch, but the next patch would uninstall it). So, since we don’t ship updates to every file in Office for every patch and it’s entirely likely that some customers will apply some patches and not others we have to be careful how we change the code to not break when only part of a patch is installed.

Take this for example:

 

Application.EXE

Library.DLL

Update #1

Patched!

Patched!

Update #2

Patched!

(No Change)

 

Installing Update #2 will also include the code for Update #1, but only in Application.EXE. Library.DLL remains with the same code as it did before.

The jist of the constraint is: don’t make a code change such that another binary would not function if it does not have the change.

Things that break binary compatibility and how to address them:

Adding a new function to a DLL and calling it without checking for its existence in another DLL or EXE.

If you hardcode a call to an exported function from a DLL and the function isn’t there, it's dump town for you (crash dump that is).

The fix!
Export your new function by name and use
GetProcAddress to check for its existence first. Also make sure your code is fail safe in all binaries. Meaning, if the function doesn’t exist, the product behaves reasonably. Usually this means falling back to the behavior that was present before. If instead you fail with “Function BLAHBLAH123@S@Z cannot be found!” after installing a patch, you haven’t fully addressed the problem.

Adding new values in the middle of a shared enumeration.

enum Versions

{

   OfficeXP,

   Office2003,

   Office2003_SP2, // < Just added!

   Office2007

}

This causes drift in their values between patched and unpatched binaries. Passing Office2003_SP2 to an unpatched binary will be interpreted as Office2007

The fix!

enum Versions

{

   OfficeXP,

   Office2003,

   Office2007,

   Office2003_SP2 // < Just added!

 

}

Add stuff at the end. But note that the unpatched binary must be able to handle an enumeration value greater than what it was originally compiled with.

Adding or subtracting member variables from a shared class.

class Handler

{

...

public:

   int GetNewValue() const { return newValue; } //< Just added

private:

   int value1;

   int value2;

   int newValue; //< Just added

};

This changes the size of the class between the binaries. Note that here the accessor GetNewValue  is defined with the class, meaning it’s likely to get inlined. Creating the class in an unpatched binary then and accessing the new value in a patched binary will at best cause Access Violations and at worst cause undefined app behavior as you read or write to memory you don’t own.

You can also leak memory if you add a new complex member variable (something that holds memory and has a destructor), create in the patched binary and destroy in the unpatched binary, the destructor for the new member may not be called.

The Fix

This one is tough, there is no straight forward way to address this. You can try and have a global map where you map pointers to your class with new data you want to add, but this isn’t ideal and is not always thread safe. You can try and store the data on whatever object owns this one, assuming that object relies privately in one binary.

The best approach is to plan for this ahead of time. You could choose to not define functions in the header and instead export them from the DLL. Or you if you’re willing to sacrifice a little memory, add a dummy void* to the end, then use this to store the extra data if you need it post shipping.

In The Real World

These may seem like edge cases and in a way they are. Most fixes are changes to logic in a function, not sweeping changes across binaries. But some things I fix do have this scope. One thing that I wrote is the new ability to ungroup SmartArt in PowerPoint coming your way in Office 2007 SP2 (announced here). But much of the SmartArt logic is shared by several Office apps and is not present in the PowerPoint executable. So I had to deal with the same issues mentioned above and not make blind calls to new DLL functions. This way in case you get the SmartArt bits and not the PowerPoint bits, PowerPoint and SmartArt will still behave as they did before.

Office and the Movies

It’s finally arrived, proof of the Microsoft engineer’s celebrity status!

What’s the first thing you see in the following picture?

 

(collider.com) (much bigger version)

 

That’s right, it’s Office 2007!

I always wonder what low level movie set intern gets to create the dummy documents which show up on screen in the background. Did someone begrudgingly type out a full page of numbers and dates? Or did they use some scripts to generate it for them? Did they drop out of Berkeley for their big break in the movie business only to end up as secret document creator for blurry movie backdrops?

If you want to get really nerdy, you can tell that the document was originally a DOC file since the string “[Compatibility Mode]” appears in the title bar. Perhaps they wrote this on Word 2003, installed 2007, opened the document and filmed. I wager that’s the case since the Office Button blinks during the entire shot, meaning no one ever bothered to click it.

Or perhaps this goes towards defining the character on screen. Perhaps Chad from Hard Bodies is so hard core he only uses the keyboard ever, thus never using the mouse to close, open, print or even publish his documents to XPS or PDF. Which begs the question, why does he prefer to view his documents zoomed in so far that they need a horizontal scroll bar?

I’m not sure what the Coen brothers were going for here.

 

With the Cost of Water These Days

In an effort to cut down on my water and towel bills, I’ve started biking to work. Thanks to free showers and free towels (plus free soap and free locker), I can now spend the money saved by showering at work on things needed to maintain the bike.

Honestly, I hate bikers. Why ride on the sidewalk when you can ride on the side of the road and force all the cars to swerve around you? It’s like you live to inconvenience everyone else!

Then *I*started biking. I have to say, the feeling of environmental superiority over those driving is pretty sweet. It’s also a “get out of work free” card:

“We need you to fix these bugs, how come you weren’t in this morning?”

“I biked to work today in a sure fire effort to save my grand children from a hurricane.”

Of course, by biking two times last week I may have over done it. Don’t want to cut out too many green house gasses and cause a runaway “global cooling”. We’d probably lose our free towels.

My Official Genius Certification Has Arrived

http://channel8.msdn.com/Posts/Hey-Genius-Meet-Chris-Becker/

Virtualized Awesome

I’ve been using Hyper-V for a couple weeks and have been pleasantly surprised at how much benefit it’s brought me. What I love best is how I can start up a VM on my server box, add a network card and then log into it using remote desktop on another box. It's seemless and easy, not something I expected from just sitting down and using a server component.

Some benefits I’ve noticed in my daily routine:

Multitasking

Certain large tools only run in certain environments and it’s not possible to run them at the same time. With a virtual machine, I can run both simultaneously on the same physical box.

Long Operation Pausing

Some programs I have to run can take a few days to complete. It’s a pain to have to have a machine monopolized while it runs. But now I can run the operation on a virtual machine and if I need to use the physical machine for something else I simply save the virtual machine’s state. (While the virtual machine isn’t as fast, the operation usually finishes overnight anyway.)

Debugging Different OSes

Testers have been using virtualization to test our changes on different operating systems for a while now. Instead of having four separate machines (or one with multiple OS’s installed), you have one with many virtual machines.

I realize now that there’s the same benefits for developers: investigating OS specific bugs is a complete pain usually involving having a tester give you access to their machine. Now having a virtual machine server, I simply startup the desired OS image and remote debug the problem.

(Side note: It was totally awesome when I noticed out of the blue I was getting Areo effects running through remote desktop to a virtualized Vista machine.)

Application Hosting

Many times people will want to try out a change I've made for themselves. Usually this involves sending them all the bits and they have to have a spare machine to install it on. Now I start up “ChrisDemo” and let people remotely log in.

 

This week’s geek level:  Frighteningly High

More Posts Next page »
Page view tracker