Welcome to MSDN Blogs Sign in | Join | Help

Good luck NASA, or "The Dreaded No Repro"

Poor NASA.  Yesterday's shuttle launch was scrubbed at the last minute due to a malfunctioning fuel sensor.  This particular sensor is responsible for shutting down the engines if the shuttle runs out of fuel.  Apparently, their engine can run without fuel, but it's a very bad thing.  They have 2 other redundant sensors in the system that are working, but they made the right call in postponing the mission until the problem was understood.

This isn't the first time NASA engineers have seen this particular problem.  Back in April, as I understand it, the same sensor malfunctioned briefly.  Then it worked.  Then it malfunctioned.  Then they replaced a whole bunch of "magic shuttle parts" and it worked again.  Problem solved... or so they thought.  The key is that they never figured out why the original part malfunctioned.  This is the dreaded no repro bug.

I face these daily as a software tester.  You find a bad bug, you narrow down the steps, you file a detailed bug report, and the developer resolves it as "no repro" because they can't create the problem on their own test environment.  There are a million possibilities for why this happened:  insufficient repro information in the bug report, a misinterpretation of the repro steps on the developer's part (possibly due to ambiguous repro information), the product has changed significantly since the bug was found, different configuration environment (e.g you're running Windows XP, but the dev had Windows Server 2003 installed), sunspots, etc.

At this point, you have several options:

  1. Blindly trust the developer (and if you do, I have some prime land in Florida to sell you) and close the bug
  2. Try to reproduce the bug yourself again.  This may be difficult if you've already updated your system to a newer build than the one you found the bug on.  Or it might just take a long time, or it might work!  If you can reproduce it, provide remote access to the developer or ask them to come look at your machine in person.
  3. Go to the developer's machine and try to reproduce it there
  4. Broadcast the bug report to your team to see if anyone else has supporting data or has seen the bug themselves

Of course, which path you choose depends on lots of things, too much to go into here.  In the end, you need to reproduce the problem and understand the core issue before you can have confidence in the fix.  It's important to remember that you can always prove something doesn't work, but it's really hard and often impossible to really prove something does work.  This is the big challenge NASA faces today.

Good luck, NASA.

 

Published Thursday, July 14, 2005 9:51 AM by JasonBa
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

Wednesday, July 20, 2005 8:45 AM by zheyehu

# re: Good luck NASA, or "The Dreaded No Repro"

As a tester locating in Taiwan, you know how big our team is, I have about only 2 out of your 4 options to deal with my not repro bugs. Reasons are obvious. I can not ask developers to come by my machine in person. I can not go to developer's machine in person. I can even hardly ask my teammates to help me because we have too many areas to test and no one can be more familiar with the stuff I am testing than me. Things can be even worse. My testing environment is often different than developer's. That means I have bigger chance to get not repro bug due to the different repro environment. I am not even living in the same time zone as developer's!

Ok. Here are my ways to deal with this situation:
1. Provide remote access. However, due to the time zone issue, this way does not always work.
2. Ask some kind people to build a same testing environment as mine for the developer. This way works sometime because most people are nice.
3. Repro by myself, refine the repro steps and reactivate the not repro bug. This is what I do most often. I may also provide remote access before reactivate a not repro bug.

I think I can understand the challenge NASA is facing.
Monday, August 01, 2005 2:58 AM by Rob Caron's Blog - A Team System Nexus

# Suggested Reading - 2005-07-31

Cleaning-out my “To Blog” file again…
Architects

Handling data in service oriented systemsEdward...
Thursday, August 11, 2005 9:53 PM by the blog of michael eaton

# How to have confidence in your bug fixes

Thursday, August 11, 2005 9:53 PM by the blog of michael eaton

# How to have confidence in your bug fixes

Leave a Comment

(required) 
required 
(required) 
 
Page view tracker