Make Sense of the Test Results

Test automation per se is pretty common today. Few software companies are still hiring an army of manual testers (either regular employees or contractors).

However, the test results coming out of the test automation are usually still pretty raw. A typical test report today tells you that there are X test cases failed and Y passed. For each failed test case, the test report shows a brief error message (e.g. “AssertionFailed Timed Out waiting for status to be [Started]”) with a hyperlink to the full detail test log. That’s pretty much it. Beyond that point, from what I can see, people in many organizations are spending lots of time to make sense of the test results.

They want to figure out things like:

Among the failed test cases, whether the causes are all the same, or there are different causes for each failure?

Which failures are new failures vs. the chronic failure or flaky tests?

For a new failure, I want to quickly narrow down to one or two suspicious recent checkins?

For a chronic failure, is the failure this time the same kind as in the past, or the same test case but due to different cause?

Is the failure already being tracked by a bug? Is someone already working on it?

Is the failure supposed to be fixed? If so, the tracking bug should be reactivated since the fix didn’t seem to work.

Is the failure unique to my branch, or happening across the board? If it fails in other branches at the same time, it’s unlikely caused by changes in my branch and more likely an environment issue.

Besides understanding the failures, the engineers also care about the quality of the test automation:

Is the test pass taking longer time to finish than before? If so, why? Is that because: a) we have more test cases, b) the system-under-testing is now slower, c) a dependency of the system-under-testing is now slower, d) we are reaching the capacity limit of the test environment so things are getting queued/throttled, etc.

How is the repeatability of the test automation? What is the most flaky ones and what’s the reason?

Without help from good tools, the above analysis are laborious in many places today. No reason why we can’t use machines to perform these analysis and just put all the answers right in front of us soon after a test is finished. That shouldn’t be too hard. Most ideally, the machine can just tell us whether the build is good to ship or not. It’s very much like visiting hospitals for a general check-up. I don’t want to just get a pile of papers full of numbers, charts, etc., because I don’t know how to interpret them: is 42 bad or good for HDL cholesterol level? If bad, how bad it is? I have premature contractions? What does that mean? At the end, I just want to be told that “you are doing fine, just need to lose some weight”.

——

p.s. This reminds me of a TED talk that I watched recently. The speaker, Kenneth Cukier, said:

Big data is going to steal our jobs. Big data and algorithms are going to challenge white collar, professional knowledge work in the 21st century in the same way that factory automation and the assembly line challenged blue collar labor in the 20th century. Think about a lab technician who is looking through a microscope at a cancer biopsy and determining whether it’s cancerous or not. The person went to university. The person buys property. He or she votes. He or she is a stakeholder in society. And that person’s job, as well as an entire fleet of professionals like that person, is going to find that their jobs are radically changed or actually completely eliminated.

Well, long before machines can tell cancer cells from good cells and kill the lab technician’s job, we should be able to make machines help us make sense of the test results.

One Comment

zhengziying.com November 23, 2015 at 7:36 AM

[…] we made the automated tests faster, we found that now the long pole became the time human spent to make sense of the test result. So we developed some algorithms and tools to help us: 1) differentiate whether a failure is a new […]

zhengziying.com

One Comment

Leave a comment Cancel reply