Last reviewed | 2021-10-28 |
A flaky test is an unreliable test that occasionally fails but passes eventually if you retry it enough times.
In a test suite, flaky tests are inevitable, so our goal should be to limit their negative impact as soon as possible.
Current state | Assumptions |
---|---|
master success rate (with manual retrying of flaky tests) is between 88% and 92% for August/September/October 2021 |
We don't know exactly what would be the success rate if we'd stop retrying flaky tests, but based on this exploratory chart, it could go down by approximately 7% |
175 programmatically identified flaky tests and 211 `~"failure::flaky-test" issues out of a total of 159,590 tests | It means we identified 0.1% of tests as being flaky. This is in line with the "RSpec Job Flaky Failure Probability". GitHub identified that 25% of their tests were flaky at some point, our reality is probably in between. |
Coverage is currently at 97.86% | Even if we'd removed the 175 flaky tests, we don't expect the coverage to go down meaningfully. |
"Average Retry Count" per pipeline is currently at 0.08, it means given RSpec jobs' current average duration of 23 minutes, this results in an additional 0.08 * 23 = 1.84 minutes on average per pipeline , not including the idle time between the job failing and the time it is retried. Explanation provided by Albert. |
Given we have approximately 11k MR pipelines per month, that means flaky tests are wasting 20,240 minutes per month = 337 engineer hours = 14 days. Given our private runners cost us $0.0845 / minute, this means flaky tests are wasting $1,710 per month. |
When a flaky test fails in an MR, following is the workflow the author might follow:
Flaky tests negatively impact several teams and areas:
Impacted department/team | Impacted area | Impact description | Impact quantification |
---|---|---|---|
Development department | MR & deployment cycle time | Wasted time (by forcing people to look at the failure and retry them manually) | ~$26,000 wasted time per month based on 337 engineer hours and using $77 hourly rate for an Engineer |
Infrastructure department | CI compute resources | Wasted money | At least $1,710 worth of wasted CI compute time per month |
Delivery team & Quality department | Deployment cycle time | Distraction from actual CI failures & regressions, leading to slower detection of those | TBD |
master
stability to a solid 95% success rate without manual actionmaster
is broken or not and default action of retry