Last fall (2013), I published two papers internally to Microsoft. I wrote them to challenge the status quo and paint a vision of where I felt the future of software testing was headed. The papers have resonated across Microsoft and some elements have leaked out to the Internet. The first paper, and the more heretical, was about Minimum Viable Quality (MVQ) and the other was on my personal perspective that every software and even devices project should be developed like a service. This second concept I call Everything as a Service “yes!” (EaaSy). The “yes!” is there to both make the acronym easier to remember but also to signify that it’s about time. This is my first public post on the concepts of MVQ and EaaSy. I did present the concepts in a talk at the ALM Forum in Seattle April of 2014 and the slides are currently available on the event site.
There are several other individuals writing about moving to lean data driven quality. All of these bloggers I have worked with and shared ideas back and forth. Brent Jensen wrote a post where he commented on MVQ in March of 2014. In his post however he was focusing not just on MVQ but also on “Shifting the Tester Mindset.” Alan Page focuses on a similar aspect of tester culture with his calls for testers to stop writing automation. See Alan’s May 2014 post, “Stop If You Want To.” Seth Eliot highlights rich fast data and the many uses of it in his writings and speeches on Data Driven Quality (DDQ) and Testing in Production (TiP).
DDQ goes much deeper into all the ways that data is changing all approaches to testing and quality. Seth and I have plans to write more on the topic of DDQ later this summer. Those are all great posts that I recommend you also read.
So, let’s dive in.
Minimum Viable Quality (MVQ) was born out of my frustration at being a speed bump. Early in my career I thought my job was to stop bad bits from getting to end users. I was there to stand in the gap and when an evil senior manager wanted to ship a product that wasn’t ready for primetime. I would raise my staff over my head, strike the tip of it on the ground and yell, “You shall not pass!” Down would go the evil Balrog along with the bad code. Or at least that’s what I thought would happen. What really happened was that despite having great test results showing the problems with the code, quite often the decision was made to ship anyway.
Well, after getting run over a few times and squashed flat like an ice-cream cone dropped onto a busy freeway, I realized my job wasn’t to stand in the gap, but rather to help the team figure out how we could ship the product on time with good enough quality. MVQ is just another way of saying the quality of the product is good enough for the target audience and well, no more than that either. You could consider MVQ to be lean testing. Just enough testing and no more than is needed at that point in time.
EaaSy is actually a framework that stems from my fifteen plus years developing and shipping online services and applying those lessons to developing and delivering all software products and even devices. To really adopt an MVQ approach to quality you need to implement the EaaSy framework. With that in mind I’ll introduce EaaSy and MVQ and then dive into each in more detail.
In the future I don’t think we will call what we do to ensure software is of the right quality “software testing.” As we trend toward everything as a service, quality will shift toward being a data driven service itself. What are these trends that are driving us toward EaaSy and MVQ? Well they all stem from a set of very dominant industry trends.
The market dynamics mean that we must think of new ways of shipping software at a lower cost, ship it more frequently, and maintain an acceptable level of quality. To help frame that discussion I created EaaSy as the foundational capabilities for enabling this new faster iteration cycle and then created MVQ as a way of aligning the goals of the quality engineers with the new business imperatives.
EaaSy is my framework for communicating a set of five key capabilities (see Figure 1) required to allow just about any product to look like and be managed like it was a service. They are all geared around reducing risk when developing and releasing software products. Managing Risk is essential when building complex products.
Ken aside – It occurred to me some number of years ago that my role in test was less about finding and fixing bugs but more about producing a signal for management to better understand risk to quality. Think about it, if you have a complex product that has not yet been released to the public you will have very little data available to you for assessing if the product is of acceptable quality. Test results are a proxy for real information. Of course with Testing in Production we now have real data so test results are becoming a bit less valued.
When in place, they allow any software engineer to ship faster with well-reasoned and managed risk in terms of catastrophic bugs impacting large numbers of users. In fact, as the math will bear out, it is safer to ship faster with EaaSy in place than it is to invest in significant up front testing.
Figure 1: there are five key capabilities required to adopt EaaSy.
There are five key capabilities required to successfully implement EaaSy and to make your product look like and behave like a service. Those five capabilities are:
With these capabilities in place a product should be able to be updated with new code on a daily basis. Now getting all these capabilities in place on your product team won’t be easy and they won’t happen overnight but when you arrive, you will be able to ship updates much more easily than ever before.
I want to start diving into EaaSy at the foundation. To me, componentization is the foundation of moving into a services architecture. One thing we learned early on in the services world was that you couldn’t stand up all the pieces of a large scale service all at the same time. This forced us to define and maintain contracts between service components. The canonical example of this is authentication. Most online services today do not run their own authentication service but instead trust other cloud services such as Twitter, Facebook, Google, and Microsoft.
Figure 2: Example of common components (sometimes called layers) of a service
In the software, apps, and devices world it is common to think in terms of getting something on a retail store shelf. That leads toward a philosophy of sim-shipping all components at once and tends to break down component boundaries. The product ships as one unit so it should just all work together as one SKU. Componentization and backward compatibility are often lost in this release model.
Ken aside – Every time I think of sim-shipping I think of the Titanic. I think of a massively complex product with hundreds of passengers all betting that the voyage will be a fabulous success. The problem is that when that ship hits an iceberg, the entire product is now in jeopardy. Things start to flood, fall apart, the ship (a metaphor for our software product) is sinking. Ship together and sink together as one.
The antithesis of sim-ship is to launch a hundred self-sustaining little boats. Together those boats can get all the passengers across the ocean with much less total risk. If one of them or even a dozen hits an iceberg and sinks the relative impact to the overall customer experience is less than the big ship going down. Now I’m not saying a hundred little boats is the same experience as a big grand ship like the Titanic. No, I’m using this as a metaphor for relative risk. From a pure engineering perspective it would be less risky to have more boats than fewer.
Think about it.
From a tester perspective componentization is nothing more than problem decomposition, something we do all the time. I see a bug, let me figure out where the root of the bug is by isolating factors. The most common type of bug I find in a service is a regression and those are caused by some sort of change either in the code or the environment in which the code is running. With a regression we often focus on what changed when and how that led to the bug. Problem decomposition.
With services when we would ship updates to any one of the service layers we would need to ensure we were both forward compatible with the new features and that we maintained backward compatibility. Each service component could be updated independently and the integration points as well as the end to end experience validated.
Componentization and asynchronous releasing is actually easier than trying to get one version of all the parts of a product to work relatively bug free in a single release. The reason it is easier is because componentization that is reinforced with asynchronous release processes teaches us to design better for forward and backward compatibility and to code for it up front. Risk reduction become built into the engineering processes.
Continuous Delivery (CD) is quite simply less risky and it leads to better componentization. Let me start with the impact CD has on componentization and then get back to the risk part.
While componentization sounds great, I have rarely seen it work unless it was combined with Continuous Delivery. The reason for this, I think, is simply repetition. The more experience a team can gain with a new practice the stronger and quicker it will grow. When I was working on MSN in the late 90s we would aim to release the MSN client CDs once a year just before back to school season. AOL followed a similar model.
Throughout the year the different service component would work on their features and continually validate they were working in a test environment we called the Integrated Test Environment (INT). Near the end of the yearlong cycle we all started upgrading the production services. We did this in pieces but we kept breaking each other. The reason for this, in my opinion, was that while we created contracts between the layers, we engineered and tested as if we were one cohesive unit. The INT environment was actually our downfall for componentization. It wasn’t until years later while working on Bing that I was able to observe the role continuous delivery with independent ship decision had on the success of componentization.
Let me go back to the risk reduction side of CD.
From an Operations Engineer or a Test Engineer’s perspective change is synonymous with risk. If I get a brand new PC, take it out of the box, and turn it on, it pretty much always works. Over time new patches get applied, add-ware sneaks on and before you know it I’m better off reinstalling the operating system from scratch and starting over. Change leads to bugs. The same is true in the services world but the impact of a bad change on a large scale service can be catastrophic.
Ken aside – It’s not good to yell at your engineers when things go wrong. It creates backlash behavior that can be even worse than the original error. I have had to go through two major un-learnings over the past few years and it occurred to me that both of them were results of being unduly harangued for a problem in production.
The first bad decision was helping design the INT test environment. The test managers of MSN at the time felt it was possible to mimic production well enough in a lab that we could catch all the critical bugs before we shipped. INT was directly born out of the frustration of test managers being blamed when a bug would get into production.
The second unlearning comes from my experiences in operations. Ops engineers, more than any other type of engineer, know that change leads to issues. That is why operations is so focused on changed management processes and documentation. Why is this? Well when a bad change goes in, everyone turns to ops and asks how they let this happen. The change management system is a way to track and assign blame.
The problem with both over testing in INT and excessive change management is that they slow down innovation and do not prevent bad bugs from getting through. They are counter to continuous delivery and don’t accomplish what they were designed to do.
It seems counter intuitive to say that shipping more frequently reduces risk but it does. More code into producing more quickly means less code churn per release. Changes are smaller and thus less risky. Additionally CD promotes componentization.
While I believe all five EaaSy capabilities are required to adopt MVQ I have to admit that user segmentation is the most important of the five. The reason for this is simply around managing risk. If you cannot divide the users of your product up into different segments and expose new code and features by user segment then by definition, when you first release your code, you must hit the release ready quality bar. When properly implemented, the great thing about user segmentation is that you can now create pools of users with different characteristics such as heavy or light users and you can create risk pools.
In Microsoft many teams are adopting the rings of risk model (see Figure 3). The rings signify that at each layer the number of users actively using the product is increasing and the tolerance for defects is decreasing. A typical model would have four or more pools of users divided across the rings.
Figure 3: Code Quality increases as the code is exposed to different types of users with ever increasing expectations of higher quality.
At the inner most ring you have the feature development team itself. These groups are actively working on the product and it’s not uncommon to even try out a buddy build from a team member. Even within Windows and Office teams do this regularly. The next ring might be a sizable product team. Think of a team say developing the Office Outlook client to say a team the size of Windows. The team as a whole is pretty tight nit and committed to shipping a great product with great final quality. There should be a high tolerance for bugs, in fact excitement to find them within the team before release, as long as there are not so many defects that productivity becomes completely blocked.
When you go to the next ring of an entire company the quality needs to be pretty good. These individuals may be committed to building the company but that doesn’t mean they will be excited to lose their work or have to try something two or three times before it works. Still, they will understand that finding bugs in house is better than releasing them on customers.
At the edge between the company and a public beta is a special divider. This is what we call the disclosure line. When you start to bring beta users onto a product you have to expect leaks so you will need to have a communication plan in place. Additionally if your company wants to protect its intellectual property, in most cases those patents will need to be filed before the features are shared with the beta user pool.
External public beta users are in many ways the most exciting ring of all. They are typically enthusiasts that understand they will be getting a pre-released version of a product that will likely have bugs. They are willing to take on some of this pain for the chance to be an early adopter. Perhaps the most exciting aspect of beta users is that they are not typically engineers and will likely use the product in ways none of the smaller rings every thought of trying. Going down new code paths helps discover new bugs.
The final ring is everyone that may be interested in your technology product. At this point you should be trying to monetize your product and individuals using it will expect that it is of good quality. That is true for products they purchase and also for free products that are perhaps monetized through ads.
Many companies use similar models to manage how they release features. Google is known for running hundreds of experiments a day on their search portal. Gmail was labeled a beta for years. When Facebook rolled out their new timeline UX early adopters could opt into the new experience. Similarly when Microsoft re-launched Hotmail as Outlook.com users had a very long period to opt into the new look.
In the AppStore model you are also seeing the adoption of user segmentation. These features for identifying and managing pools of users for Alpha and Beta releases are becoming common place in all the major AppStores.
Image 1: Article from http://support.google.com/googleplay touting the beta and alpha features of the Google Play store platform.
User segmentation isn’t just for services. There are countless examples of products shipping on CDs and DVDs that have leveraged user segmentation. There have been Beta releases of Windows since before the very first version was ever released. This method of segmentation was usually fairly manual and often spread out over long periods of time. The difference with EaaSy is that we are now taking products like Office and Windows and XBOX and building and releasing them like services. This requires a fully automated way of managing user pools and the versions of the code they receive.
Inside of Microsoft we use a term Flighting to mean something similar to the industry terms experimentation and A/B Testing and the associated capability of turning code paths on and off via configuration settings in a cloud service. In this case flighting also integrates with user segmentation (EaaSy capability 4) such that we can ship code to users within a particular ring and then configure the new code on and off at a later time. Some groups inside Microsoft call this shipping dark. The code is out there but it is not turned on until the configuration change is made in the cloud.
All of these capabilities come from lessons learned from A/B testing. In the case of A/B testing it is common to have the ability to dynamically route users to different experiences or different code paths. This is typically done to experiment on a UX treatment to determine which one has better engagement with end users.
Image 2: Example of A/B testing routing users to different code paths. Image from “The Ultimate Guide to A/B Testing” by Paras Chopra
Examples of A/B tests for a service might be a new treatment for where Ads are placed on a portal page. The goal of the experiment would be to answer the question of whether or not users will click more often or if the page is now so obnoxious users will stop using the service. With cloud based flags, users are routed to the control and the experiment. These two treatments are different code paths living side by side in production.
With flighting this same dynamic routing capability is leveraged by quality to roll out new code in the dark state and then have real users hit the new code path. In this way you can control the number of users hitting a new code path while answering the basic question of whether or not the new release is good enough or too buggy.
This technique of flighting as a way to manage new code exposure to real users is a variation on what services call rolling upgrade. The difference here is that access to the new code is managed via a configuration. This improves rollback because removing the new code from production use is simply an undo on the configuration management service.
This technique is being used by all types of technology. Apps acquired in app stores are actively using this technique. In the following article snipped I was reading how Facebook developed and released their iOS 7 app through the Apple AppStore but they managed to implement their own ability to flight and test different UX experiences (see references). It seems they value the capabilities they have with their online presence so much that they built those capabilities into the app and supported it from their own service infrastructure.
How Facebook secretly redesigned its iPhone app with your help
To get around these issues, Facebook built an entire system for creating alternate versions of the native app within the native app. The team could then turn on certain new features for a subset of its users, directly, and measure the results. Starting in early 2013, the team put together a system of “different types of Legos — that we can reconfigure really easily … and see the results on the server in real time.”
From article on The Verge by Dieter Bohn September 18, 2013.
One of the things we discovered early on in the world of services is that data is the life blood of quality. With services the ability to get telemetry off of the servers vastly increased our ability to detect problems and fix them. We also learned how to mine data for a better understanding of how users were trying to interact with features.
I’m referring to this as Rich-Fast-Data (RFD) because to really have your product work like a service you need a lot of good quality data and you need some of it in near real time. With RFD all sorts of systems can be put in place to automate risk mitigation. RFD works with user segmentation, with componentization, and with Flighting and the configuration managements systems. It is the final element to enable a team to move fast and safely with EaaSy.
Ken aside – User behavior and engagement are the ultimate arbiter of quality. If you ship new code and your monitors and your telemetry tell you everything is fine but you see a statistically significant negative change in user interaction (usually meaning usage tanks) then you’ve likely shipped too many bugs. These bugs may be code defects or they may simply be basic usability but either way something is wrong. If users don’t like something they will give you less of what you want and that is their time.
The real key to success in adopting a RFD approach is how the team leverages the data to mitigate risk, quickly respond to defects in the wild, and drive continuous improvement. One thing I look at with respect to a bug in the wild is an old service metric called Time to Detect (TtD). Quite simply if TtD is high then that means there is a gap in monitoring and or instrumentation. That gap is the most critical bug to fix and whey on going continuous improvement is important to identify these gaps before they become critical failures.
MVQ really builds on other lean development practices. It is in a way Lean Testing. It’s an area I’ve been thinking on for a few years now as I have seen Big Data start to enable all sorts of new capabilities across many different industries and job roles.
Back in 2012 I gave a talk on software testing and the move to Big Data with my Dev Manager peer Reena Agarwal. In the talk we covered how the rise of Big Data was beginning to impact the techniques of software testing of online services. I hadn’t thought of EaaSy yet so I was confining my thinking to the online space.
During the Q&A I stirred up a bit of controversy by stating that I felt we were simply testing services far too much before releasing them to production. This thought was bubbling up in me because I’d been reading “The Lean Startup” and wondering how the concept of Minimum Viable Product (MVP) could be applied to my work at Microsoft.
Several attendees of the event were incensed by my suggestion to test less. It was a testing conference after all. I was even accused of suggesting we ship “crappy code.” Nothing could be further from the truth. MVQ is about shipping great software with the right level of quality and amazing features as quickly as possible at the lowest possible cost so that you can charge less and gain more users. In many ways MVQ is a lot like lean manufacturing but for software. I’m sorry but testing, especially over testing, adds cost and drives up prices.
MVQ can be applied to all software and device projects and does not require MVP. I want to recognize that this thinking on Lean QA was prompted by “The Lean Startup.” It should also be noted that many others have written about just enough testing, just in time testing, and test driven development (TDD). All of that work has influenced MVQ.
Ken aside – Satya Nadella was talking to us a few years back and he said that we “needed to be a learning organization.” In the Bing test org, this was often brought up as fail fast. The idea has its roots in industry but it became part of the lexicon of the original Live Search (pre-Bing name) test org from its earliest days and it helped drive a mind-shift change that is very powerful for any development or test org to embrace. I didn’t initially see the connection between fail fast and learning organization but now I would summarize it this way. When you can try fast with low friction, when you can discover fast through real data, and when you can take all of that and build up knowledge on how to succeed in small and big ways fast, you can become a learning organization. You learn fast!
The only thing is, you must accept that you will fail, a lot, along the way. You will have to learn what good failure is and how to learn and grow from smart failures. I wouldn’t get too hung up on the quality of the failures because if you fail fast you will not have much sunk cost into any one failure. That’s a key element to fail fast. You minimize your losses and move onto the next grand idea.
“Fail Fast” and its corollary “Learn Fast” are much deeper than even what I just summarized but that framing has helped ground me into “Learning Organization.”
It’s funny, I’ve posted and talked many times now about the EaaSy capabilities and MVQ and I’m always getting asked, now what exactly is MVQ and how do I do it?
Here let me just walk through the possible levels of testing:
Under Tested – that means your product is below acceptable levels of quality. You will know this when you try to get individuals to user your product and they stop using it.
Over Tested – At this end of the spectrum you will have massive test labs and massive suites of tests. You likely have a long stabilization phase and you have multiple week long test passes. Here’s the rub though. Of all of those tests, manual and automated, most of them find nothing wrong with the product. I would go so far as to submit that every time a test runs and doesn’t find a bug, that is wasted effort.
Figure 2: MVQ focuses on less up front testing and more rapid releases to production.
MVQ Limited Release – This is your beta, your friends and family, you tech enthusiasts release. We all know that we will allow more bugs in a limited release than in the final release. In the MVQ model we would apply the EaaSy capabilities to segment this limited release user base into smaller groups and leverage automated deployment and fast roll back to test even less than normal and push bits out to the users.
MVQ Final Release – Okay, here we must meet customer expectations with quality. We can’t miss the mark here or we will get negative reviews in social media and the product will fail. Even with that in mind, we don’t want to waste time and resources over testing. This is where the Rich Fast Data really helps us. Even on a V1 product, let’s say a new gaming console, you can ship with known bugs. It takes time to get that product into the channels and into the retail stores. If the device is connected then you still have time to find and fix bugs between the time it is released and users start to turn the device on. Have you recently purchased a device that made updates before you could really use it? Hmm, maybe they found bugs and fixed them.
MVQ = Minimum Viable Quality and that means that as a tester you look for every possible excuse to ship the product as quickly as possible with acceptable quality and reasonable risk.
Risk, that’s the key isn’t it? Reasonable risk. In the old days with CDs and the high cost of recall class bugs, reasonable risk meant pretty darn solid quality. Also with the lack of connected users there were long delays in hearing about a bug in the wild. For those reasons we had to rely upon massive suites of tests run in labs. Not now, not in todays connected devices world.
What I learned from being in the services world is that you never actually ship a service. There is just the next feature, the next scenario, the next set of performance improvements, and the never ending list of bugs to fix. The concept of shipping comes from a time when product were developed over two and three year cycles and release on a CD. Shipping doesn’t apply in the modern world whether it’s an app, a device, an operating system, a server product, financial software, social networks and any bit of technology. If it is connected it is updatable and if it is updatable, it’s never done shipping, at least until you are ready to retire it.
Ken aside – If you release a version of your product to a set of target users and you have to roll back three times in a row then your quality bar is probably too low. If say one out of ten daily releases has to be rolled back, then your quality bar may be just fine.
What is the advantage of taking an MVQ approach to software product delivery? The bottom line is that you start to get data about how the code is functioning in production with real users more quickly. The key aspect is to balance the minimum such that the feature set and the quality of those features within the product is not so low that the product doesn’t get used. If users aren’t using the product, no matter what segment/pool they are in, then you don’t get data. If data is too low, then you won’t discover and learn the harder to find bugs because the code won’t be exercised.
Ken aside – I would rather miss a recall class bug that gets into production with minimal real end user impact than hold a release for an extra day to find and fix three bugs that would never actually manifest in production. While that statement may sound cavalier, it has quite a few qualifiers so don’t think I’m totally flipping the bit on quality.
The great thing about MVQ is that you can more quickly get into the final phase of real integration of your software, services, and devices and real usage by real users. When MVQ is implemented correctly with EaaSy, it is not only more efficient, but also less risky.
To get to the world of less testing, shipping faster and hitting just the right level of quality you need to implement the five key capabilities of EaaSy and then change your mind set to focus on Minimum Viable Quality. The best path to EaaSy that I have seen is to allow teams to ship their components on independent schedules. That would promote both the concept of componentization and to a lesser degree continuous delivery.
The reason this works is simple. Developers like to ship. If you let them ship faster and on an independent schedule, they will figure out how to maintain forward and backward compatibility. That is the key to componentization and the first foundational step toward making your product more like a service. The capabilities of user segmentation and flighting improve the control you will have on the levers of risk. If you can roll back bad code quickly, pushing a bug to a small set of users becomes reasonable risk.
The last piece is rich fast data. Some companies and even cloud platforms make this available to product developers natively. Even with the capability to have rich fast data in place, developers often under instrument their code. That is, until there is a catastrophic failure in the product and they do not have the ability to solve the issue quickly.
Yes, the path to MVQ almost always includes disaster. The reality is that all the top companies are moving in this direction. MVQ allows for lower cost and shipping more features to more users and delighting them than more traditional methods.
Research and References:
The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses (book 2011 by Eric Ries)
The Ultimate Guide to Minimum Viable Products (blog 2013 by Vladimir Blagojevic): Mostly provides an overview of key approaches to MVP from Lean Startup
How To Create A Minimum Viable Product (article 2013 on Tech Crunch by Emre Sokullu): Article is mostly a list of open source technologies and SOA design ideas for launch an online MVP based upon Emre’s experiences launching Grou.ps
In Pursuit of Quality: Shifting the Tester mindset (blog 2014 by Brent Jensen): Wake up testers, “fixing correctness issues on a piece of code that no one is using is a waste of time & resources.”
Stop, if You Want to… (Blog 2014 by Alan Page): Testers should stop writing so much automation. The world of data is changing how we can approach testing and quality.
Testing in Production, Your Key to Engaging Customers (presentation 2011 by Seth Eliot at STP Con): Scenarios and Testing in Production produce High Quality Product Delighted Customers.
Create Your Roadmap to Data Driven Quality (presentation 2014 by Seth Eliot at STP Con): Ditch the HiPPO (Highest Paid Person’s Opinion) and shift to data from real users to gain real insights on quality.
Good Enough Quality: Beyond the Buzzwords (IEEE paper 1997 by James Bach): Paper and article on reducing up front testing of product code.
Journey to Continuous Deployment (presentation 2009 by Nathan Dye): I couldn’t find the white paper that I originally read by Nathan but he covers all the key points in this presentation.