KPI Metrics
Contents
KPI Metrics#
In order to measure the effectiveness and quality of our CI testing process, we need to establish the relevant key performance indicators, or KPIs. These KPIs can not only help us evaluate any AI-based enhancements we make to the CI processes, but also pinpoint what specific areas need the most improvement and therefore should be devoted resources to. The notebooks contained in this directory define, calculate, and save several KPIs that we believe are relevant to various personas (developer, manager, etc.) involved in the CI process.
These notebooks are also available for viewing and running on the public JupyterHub instance provided via the Operate First initiative.
When addressing a KPI request, please make sure to follow the Metric-template.ipynb that defines a template for calculating metrics using number-of-flakes.ipynb as an example.
The following section describes what each of these KPIs represent, and links to the notebooks that calculate these.
List of Available Metrics#
Number of tests blocked : What is the total number of blocked tests on testgrid i.e. where the value in the cell is “8”.
Blocked tests Percentage : What is the percentage of blocked tests on testgrid i.e. where the value in the cell is “8”.
Number of tests timed out : What is the total number of timed out tests on testgrid i.e. where the value in the cell is “9”.
Timed out tests Percentage : What is the percentage of timed out tests on testgrid i.e. where the value in the cell is “9”.
Number of builds passed : What is the total number of builds that were passing i.e. had the “Overall” cell labeled as passing on testgrid.
Percentage of builds passed : What is the percentage of builds that were passing i.e. had the “Overall” cell labeled as passing on testgrid.
Number of builds failed : What is the total number of builds that were failing i.e. had the “Overall” cell labeled as failing on testgrid.
Percentage of builds failed : What is the percentage of builds that were failing i.e. had the “Overall” cell labeled as failing on testgrid.
Change in success and failure : What is the change in success and failure builds for all tests relative to the total number of builds.
Correlated test failure sets per test : What is the number of tests highly correlated with a given test i.e. with a correlation coefficient of 0.9 or above.
Average size of correlation set : What is the average size of correlated test sets as calculated in the above metric.
Flaky tests : What is the number of flaky tests i.e. where the value of the cell is “13”, aggregated over each platform, grid, or tab for each day/week/month.
Flake Severity : What is the percentage of flakes by test overall. This can also be seen as a severity level or overall flake rate of test.
Percent of Tests Fixed: What is the percentage of tests that were failing in the previous run of the build, but are now passing. This metric can be aggregated over each platform, grid, or tab.
Mean Length of Failures: How many times was the build (test suite) run before a failing test started to pass.
Mean Time to Fix: How much time was taken before a failing test started to pass.
Consecutive Failure Rate: What percentage of failing tests stay failing for more than one build. That is, what percentage of failing tests are not “one-off” failures.
Pass to Fail Rate: What percentage of test runs resulted in a “pass” to “fail” transition.
Fail to Pass Rate: What percentage of test runs resulted in a “fail” to “pass” transition.
Total number of test cases: What is the total number of tests runs (cells) on testgrid.
Number of Passing Tests: What is the total number of test runs that passed i.e. total number of green cells on testgrid.
Number of Failing Tests: What is the total number of test runs that failed i.e. total number of red cells on testgrid.
Percent of Passing Tests: What is the percent of test runs that passed i.e. percent of green cells on testgrid.
Percent of Failing Tests: What is the percent of test runs that failed i.e. percent of red cells on testgrid.
Average time to test: What is the average time taken to run builds and how does it change over time.
Probability to Fail: What is the probability that a given test will fail for a given moment in time.