Michael Larsen's live blog of my ALM Forum talk, Your Path to Data-Driven Quality is perfect! You should check out his blog to get the scoop on the other talks from that great event. Here is his excerpt about my talk:
Now it's time for Seth Eliot and "Your Path to Data Driven Quality" and a roadmap towards how to use the data that you are gathering to help guide you to your ultimate destination. Seth wants to make the point that testing is measurement, and you can't measure if you don't have data (well, you can, but it won't really be worth much). Seth asks if we are HiPPO driven (meaning is our strategy defined buy the "Highest Paid Person's Opinion" or were we making decisions based on hard data. Engineering data can help a little bit (test results, bug counts, pass fail rates). They can give us a picture, but maybe not a complete one (in fact, not even close to a complete one). There's a lot of stuff we are leaving on the table. Seth says that leveraging production data (or "near production data") gives us a richer and more dynamic data set. Testers try to be creative, but we can't come close to the wacko randomness of the real world users that interact with our product. First step: Determine your questions. Use Goal Question Metrics. Start at the beginning and see what you ultimately want to do. Don't just get data and look for answers. Your data will taint the questions you ask if you don't ask the questions first. You may develop a confirmation bias if you look at data that may seem to point to a question you haven't asked. Instead, the data may give you a correlation to something, but it may not actually tell you anything important. Starting with the question helps to de-bias your expectations, and then it gives you guidance as to what the data actually tells you. Then: Design for production-data quality. There's two types of data we can access. Active and passive data can be used. active data could be test cases or synthetic data of a simulated user. Passive data is using real world data and real users interactions. Synthetic data is safer, but it's by definition incomplete. Passive data is more complete, but there's a danger to using it (compromising identification data, etc.). Staging the data acquisition lets us start with synthetics data (reminds me of my "Attack on Titan" account group that I have lovingly put together when I test Socialtext... yes, I have one. Don't judge me ;) ), to copying my actual account and sharing on our production site (much more rich data, but needs to be scrubbed of anything that could compromise individuals privacy... which in turn gets us back to synthetic data of sorts, but a richer set. Bulk up and repeat. Over time, we can go from having a small set of sample data to a much larger and beefier data-set, with lots more interesting data points. Then: Select Data sources. There's a number of ways to gather and accumulate data. We can export from user accounts, or we can actively aggregate user data and collect those details (reminds me of the days of NetFlow FlowCollection at Cisco). We need to be clear as to what we are gathering and the data handling privacy that goes with it. Anonymous data is typically safe, sensitive personally identifiable info requires protocols to gather, most likely scrub, or not touch with a ten foot pole. Will we be using Infrastructure data, app data. usage. account details, etc. Each area has its unique challenges. Plan accordingly. Then: Use the right data tools. What are you going to use to store this data. Databases are of course common, but for big data apps, we need something a little more robust (Hadoop is hip in this area). where do you store a Hadoop instance? Split it up into smaller chunks (note, splitting it makes it vulnerable, so we need to replicate it. Wow, big data gets bigger :) ).Using map reducing tools, we can crunch down to a smaller data set for analysis purposes. I'm going to take Seth's word for it, as Hadoop is not one of my strong suits, but I appreciated the 60 second guided tour :) ). Regardless of the data collection and storage, ultimately that data needs to be viewed, monitored, aggregated and analyzed. The tools that do that are wide and varied, but the goal is to drill down to the data that matters to you, and having the ability to interpret what you are seeing. Then: Get answers to your questions. Ultimately, we hope that we are able to get answers based on the real data we have gathered that will help us either support or dispute our hypothesis (back to the scientific method; testing is asking questions and then, based on the answers we receive, considering and proposing more interesting questions. Does our data show us interesting points to focus our attention? Do we know a bit more about user sentiment? Have we figured out where our peak traffic times are? If we have asked these questions, and gathered data that is appropriate for those questions, if we have been focused on aggregating the appropriate data and analyzing it, we should be able to say "yes, we have support for our hypothesis" or "no, this data refutes our hypothesis". Of course, that leads to even more questions, which means we go to... Lather. Rinse. Repeat. Hmmm, Mark Tomlinson just passed me a note with a statement that says "Computer Aided Exploratory Testing"? Hadn't considered it quite that way, but yes, this certainly fits the description. An intriguing prospect, and one I need to play with a bit more :).
Next stop is STPCon, I will be giving an augmented version of this presentation there.