How do we test our libraries?

How do we test our libraries?

  • Comments 5

Hi, my name is Jeff Peil and I’m the QA lead on the VC++ libraries team.  Today I wanted to talk a little bit about how we test our libraries.  One of the biggest challenges with libraries is that we don’t just ship them to our customers directly, but that code gets included in the applications that are built using our libraries.  The most exciting part of working on the libraries is that if you do something great, a huge number of people can benefit (because it doesn’t just benefit people building applications with our tools, it can benefit the users of all those applications!)  The most terrifying part of working on libraries is that if we let something slip through, like a security hole, the potential impact is that much bigger.  Thus testing our libraries and catching everything we can is critical.

 

When I first joined the libraries team and started to think about how we should test our libraries, one thing was particularly clear there are many different kinds of problems that can occur for instance, here’s a few of them:

·         Shape of the API (is it usable/discoverable/flexible)

o        Even more subtle is that the shape of an API can directly impact the likelihood of someone misusing it in a way that will introduce a bug in their code.  Many buffer overruns in code using libraries can be prevented just by getting the shape of the API right.

·         Correctness (does the function/class do what it claims to do)

·         Performance

·         Leaks (memory/handles)

·         Thread Safety

·         Size/Space limits (files >4gb, large memory allocation on 64 bits, …)

·         Security

 

Additionally our libraries span a huge scope of problems, from UI controls in MFC to file IO to sockets to heap management to locales you run into completely different problem sets.

 

Just as a thought exercise consider what it means to properly test a function like rand you need to:

·         Come up with a way to make sure it’s good enough at generating random numbers

o        I’ve seen people who think it’s sufficient to do things like call rand 10 times and see if the same number came up twice in those 10 calls, but that notion is completely wrong.

o        Generally the best suggestions involve executing rand a large number of times, tallying all of the results and checking the statistical probabilities

§         But would this catch any problems with implementation of rand that just returned an incremented integer after each call…

·         Make sure that rand is fast enough

Compared to a function like sprintf the problems are very different, and the testing required is very different.  Further the problems require very different domain expertise.

 

Given all of these problems how do we deal with them?  Well we break our types of testing down into categories:

·         App building (to prove the shape of the library is good)

·         Directed testing (mostly correctness testing)

·         Benchmarking (for perf)

·         Stress testing (long running tests to look for leaks and threading issues)

For every feature we plan, we identify what testing in each of these areas makes sense.  Our goal is to automate all of the testing we do (so if we build an app, we’d want to make sure that it keeps building and make sure we understand any breaking changes we’re introducing and what migration pain that would create.)

 

Depending on the features we are working on, the balance between these areas can vary dramatically (for instance, we didn’t do any directed app building around the Secure CRT work, but we did port major code-bases to it instead.)

 

Once we have a plan in place, we’ll loop in domain experts where possible, as well as more senior team members to review our plans and make sure we don’t have any holes in our test plan.  After we know what we need to do, we can get to implementing tests.

 

So how do we know we have sufficient testing in an area?  Well we have some techniques that can help – code coverage is certainly useful at identifying test holes (but note that presence of coverage does not mean you’re in good shape, while code coverage is great for finding holes it can’t tell you what is covered.)  We also, of course, leverage other code where possible (such as building and shipping Visual Studio with the current CRT.)  Finally we track bug trends in areas to see if that indicates any problems, and of course we use betas/community feedback/CTPs as yet another tool to help us identify if we’ve missed something, your feedback is incredibly valuable to us.

 

To wrap things up, thanks for taking the time to read this, and if you have any questions or comments please feel free to email me (jpeil@microsoft.com)

 

Jeff Peil

QA Lead

Visual C++ Libraries team

 


  • I could not find the word "regression check" in your post. May be that explains why MFC ships with so many regressions in each new major VS release.
  • Stephane,

    When we discover a regression we're very dilligent about creating a regression test to make sure it doesn't regress again.

    MFC has made many intentional breaking changes over time, I'm not sure if you are referring to those, or if you are referring to a large number of bugs you've hit in MFC?  If you want to give more specific details that'd be great (either here, or more ideally using the Product Feedback center at http://connect.microsoft.com/feedback/default.aspx?SiteID=210 )

    With that said, one thing I didn't touch on was how we tested historically.  Historically MFC's testing was heavily focused on app building activities, the upside to this is that it really did help drive the usability of the framework.  The downside is that it means that in some areas of MFC the amount of automated directed testing carried over from previous releases is more limited than I would like.  

    The good news is that we are fairly aggressively increasing the amount of automated coverage we have for the areas of MFC where we've identified weaknesses in our coverage.  However, that may help explain why we haven't caught some of the issues you hit.

    That said, from looking at the incoming rate of issues in MFC, it certainly doesn't look to me like we've had a large number of regressions in MFC relative to the size and scope of MFC.  If there are issues you are hitting but haven't reported, I'd love to hear about those.
  • Do your above approaches to testing apply to third party libraries such as STL (dinkumware) that you also include in the product?

    It would be nice to have memory leak regression tests as well in STL, for example the locale leak in iostream (hotfix KB 919280) so that type of problem can't happen again in the future.
  • Mike,

    We license from Dinkumware both the STL and their STL test suite, and primarily rely on that for testing of the STL.

    In addition to that, we do create regression tests (such as for the locale leak) when they are discovered, and of course we do test areas where we've done our own feature work in/around the STL (such as the checked iterators in VS2005.)
  • Interesting post.

    But can you describe more how you test for thread safety? What's the methodology/the plan and what kind of tools used?

    In short, how do you CERTIFY certain class/types/members are thread safe or not?
Page 1 of 1 (5 items)