Percent of Failing Tests Fixed
Contents
Percent of Failing Tests Fixed#
This notebook is an addition to the series of KPI notebook in which we calculate key performance indicators for CI processes. In this notebook, we will calculate the KPI “Percent of failing tests fixed in each run/timestamp.” Essentially, we will determine
percent of tests that were failing and are now fixed
For OpenShift managers, this information can potentially help quantify the agility and efficiency of their team. If this number is high, it means they are able to quickly identify the root causes of all failing tests in the previous run and fix them. Conversely if this number is low, it means only a small percent of previously failing tests get fixed in each new run, which in turn implies that their CI process is likely not as efficient as it could be.
Related issues: #149
import os
import gzip
import json
import datetime
import numpy as np
import pandas as pd
from ipynb.fs.defs.metric_template import decode_run_length
from ipynb.fs.defs.metric_template import testgrid_labelwise_encoding
from ipynb.fs.defs.metric_template import CephCommunication
from ipynb.fs.defs.metric_template import save_to_disk, read_from_disk
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())
True
## Specify variables
METRIC_NAME = "pct_fixed_each_ts"
# Specify the path for input grid data,
INPUT_DATA_PATH = "../../../../data/raw/testgrid_183.json.gz"
# Specify the path for output metric data
OUTPUT_DATA_PATH = f"../../../../data/processed/metrics/{METRIC_NAME}"
## CEPH Bucket variables
## Create a .env file on your local with the correct configs,
s3_endpoint_url = os.getenv("S3_ENDPOINT")
s3_access_key = os.getenv("S3_ACCESS_KEY")
s3_secret_key = os.getenv("S3_SECRET_KEY")
s3_bucket = os.getenv("S3_BUCKET")
s3_path = os.getenv("S3_PROJECT_KEY", "ai4ci/testgrid/metrics")
s3_input_data_path = "raw_data"
AUTOMATION = os.getenv("IN_AUTOMATION")
## Import data
timestamp = datetime.datetime.today()
if AUTOMATION:
filename = f"testgrid_{timestamp.day}{timestamp.month}.json"
cc = CephCommunication(s3_endpoint_url, s3_access_key, s3_secret_key, s3_bucket)
s3_object = cc.s3_resource.Object(s3_bucket, f"{s3_input_data_path}/{filename}")
file_content = s3_object.get()["Body"].read().decode("utf-8")
testgrid_data = json.loads(file_content)
else:
with gzip.open(INPUT_DATA_PATH, "rb") as read_file:
testgrid_data = json.load(read_file)
Calculation#
To find fixed tests, we modified the testgrid_labelwise_encoding function. The loop is adapted to put a “True” if a test was fixed in the current run, and “False” otherwise. Basically instead of indicating “is_flake” or “is_pass,” it indicates “is passing now but was failing before” aka “is_flip.”
# NOTE: this for loop is a modified version of the testgrid_labelwise_encoding function
percent_label_by_grid_csv = []
for tab in testgrid_data.keys():
print(tab)
for grid in testgrid_data[tab].keys():
current_grid = testgrid_data[tab][grid]
# get all timestamps for this grid (x-axis of grid)
timestamps = [
datetime.datetime.fromtimestamp(t // 1000)
for t in current_grid["timestamps"]
]
tests = []
all_tests_did_get_fixed = []
# NOTE: this list of dicts goes from most recent to least recent
for i, current_test in enumerate(current_grid["grid"]):
tests.append(current_test["name"])
statuses_decoded = decode_run_length(current_grid["grid"][i]["statuses"])
did_get_fixed = []
for status_i in range(0, len(statuses_decoded) - 1):
did_get_fixed.append(
statuses_decoded[status_i] == 1
and statuses_decoded[status_i + 1] == 12
)
# the least recent test cannot have "True", assuming it wasnt failing before
did_get_fixed.append(False)
# add results for all timestamps for current test
all_tests_did_get_fixed.append(np.array(did_get_fixed))
all_tests_did_get_fixed = [
list(zip(timestamps, g)) for g in all_tests_did_get_fixed
]
# add the test, tab and grid name to each entry
# TODO: any ideas for avoiding this quad-loop
for i, d in enumerate(all_tests_did_get_fixed):
for j, k in enumerate(d):
all_tests_did_get_fixed[i][j] = (k[0], tab, grid, tests[i], k[1])
# accumulate the results
percent_label_by_grid_csv.append(all_tests_did_get_fixed)
# output above leaves us with a doubly nested list. Flatten
flat_list = [item for sublist in percent_label_by_grid_csv for item in sublist]
flatter_list = [item for sublist in flat_list for item in sublist]
"redhat-assisted-installer"
"redhat-openshift-informing"
"redhat-openshift-ocp-release-4.1-blocking"
"redhat-openshift-ocp-release-4.1-informing"
"redhat-openshift-ocp-release-4.2-blocking"
"redhat-openshift-ocp-release-4.2-informing"
"redhat-openshift-ocp-release-4.3-blocking"
"redhat-openshift-ocp-release-4.3-broken"
"redhat-openshift-ocp-release-4.3-informing"
"redhat-openshift-ocp-release-4.4-blocking"
"redhat-openshift-ocp-release-4.4-broken"
"redhat-openshift-ocp-release-4.4-informing"
"redhat-openshift-ocp-release-4.5-blocking"
"redhat-openshift-ocp-release-4.5-broken"
"redhat-openshift-ocp-release-4.5-informing"
"redhat-openshift-ocp-release-4.6-blocking"
"redhat-openshift-ocp-release-4.6-broken"
"redhat-openshift-ocp-release-4.6-informing"
"redhat-openshift-ocp-release-4.7-blocking"
"redhat-openshift-ocp-release-4.7-broken"
"redhat-openshift-ocp-release-4.7-informing"
"redhat-openshift-ocp-release-4.8-blocking"
"redhat-openshift-ocp-release-4.8-informing"
"redhat-openshift-ocp-release-4.9-blocking"
"redhat-openshift-ocp-release-4.9-informing"
"redhat-openshift-okd-release-4.3-informing"
"redhat-openshift-okd-release-4.4-informing"
"redhat-openshift-okd-release-4.5-blocking"
"redhat-openshift-okd-release-4.5-informing"
"redhat-openshift-okd-release-4.6-blocking"
"redhat-openshift-okd-release-4.6-informing"
"redhat-openshift-okd-release-4.7-blocking"
"redhat-openshift-okd-release-4.7-informing"
"redhat-openshift-okd-release-4.8-blocking"
"redhat-openshift-okd-release-4.8-informing"
"redhat-openshift-okd-release-4.9-informing"
"redhat-openshift-presubmit-master-gcp"
"redhat-osd"
"redhat-single-node"
flatter_list[0]
(datetime.datetime(2021, 3, 15, 23, 40, 20),
'"redhat-assisted-installer"',
'periodic-ci-openshift-release-master-nightly-4.6-e2e-metal-assisted',
'Overall',
False)
# this df indicates whether a test was fixed or not at a given timestamp (as compared to previous one)
df_csv = pd.DataFrame(
flatter_list, columns=["timestamp", "tab", "grid", "test", "did_get_fixed"]
)
df_csv.head()
timestamp | tab | grid | test | did_get_fixed | |
---|---|---|---|---|---|
0 | 2021-03-15 23:40:20 | "redhat-assisted-installer" | periodic-ci-openshift-release-master-nightly-4... | Overall | False |
1 | 2021-03-15 00:01:06 | "redhat-assisted-installer" | periodic-ci-openshift-release-master-nightly-4... | Overall | False |
2 | 2021-03-13 20:51:32 | "redhat-assisted-installer" | periodic-ci-openshift-release-master-nightly-4... | Overall | False |
3 | 2021-03-13 07:51:20 | "redhat-assisted-installer" | periodic-ci-openshift-release-master-nightly-4... | Overall | False |
4 | 2021-03-13 06:43:20 | "redhat-assisted-installer" | periodic-ci-openshift-release-master-nightly-4... | Overall | False |
# each element in this multiindexed series tells how many tests got fixed at each run/timestamp
num_fixed_per_ts = df_csv.groupby(["tab", "grid", "timestamp"]).did_get_fixed.sum()
num_fixed_per_ts
tab grid timestamp
"redhat-assisted-installer" periodic-ci-openshift-release-master-nightly-4.6-e2e-metal-assisted 2021-03-04 01:01:58 0
2021-03-04 04:21:57 0
2021-03-04 07:22:22 0
2021-03-04 08:47:55 0
2021-03-04 23:12:31 0
..
"redhat-single-node" periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-single-node-live-iso 2021-03-14 00:01:00 0
2021-03-15 00:01:09 0
2021-03-16 00:01:15 0
2021-03-17 00:00:40 0
2021-03-18 00:01:25 0
Name: did_get_fixed, Length: 37761, dtype: int64
build_failures_list = testgrid_labelwise_encoding(testgrid_data, 12)
# this df indicates whether a test was failing or not at a given timestamp
failures_df = pd.DataFrame(
build_failures_list,
columns=["timestamp", "tab", "grid", "test", "test_duration", "failure"],
)
failures_df.head()
timestamp | tab | grid | test | test_duration | failure | |
---|---|---|---|---|---|---|
0 | 2021-03-15 23:40:20 | "redhat-assisted-installer" | periodic-ci-openshift-release-master-nightly-4... | Overall | 80.283333 | False |
1 | 2021-03-15 00:01:06 | "redhat-assisted-installer" | periodic-ci-openshift-release-master-nightly-4... | Overall | 92.050000 | False |
2 | 2021-03-13 20:51:32 | "redhat-assisted-installer" | periodic-ci-openshift-release-master-nightly-4... | Overall | 80.983333 | False |
3 | 2021-03-13 07:51:20 | "redhat-assisted-installer" | periodic-ci-openshift-release-master-nightly-4... | Overall | 117.716667 | False |
4 | 2021-03-13 06:43:20 | "redhat-assisted-installer" | periodic-ci-openshift-release-master-nightly-4... | Overall | 108.633333 | False |
# each element in this multiindexed series tells how many tests failed at each run/timestamp
num_failures_per_ts = failures_df.groupby(["tab", "grid", "timestamp"]).failure.sum()
num_failures_per_ts
tab grid timestamp
"redhat-assisted-installer" periodic-ci-openshift-release-master-nightly-4.6-e2e-metal-assisted 2021-03-04 01:01:58 0
2021-03-04 04:21:57 0
2021-03-04 07:22:22 0
2021-03-04 08:47:55 0
2021-03-04 23:12:31 0
..
"redhat-single-node" periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-single-node-live-iso 2021-03-14 00:01:00 0
2021-03-15 00:01:09 0
2021-03-16 00:01:15 0
2021-03-17 00:00:40 0
2021-03-18 00:01:25 0
Name: failure, Length: 37761, dtype: int64
# dividing the above two df's tells us what percent of failing tests got fixed at each timestamp
pct_fixed_per_ts = (num_fixed_per_ts / num_failures_per_ts.shift()).fillna(0)
pct_fixed_per_ts
tab grid timestamp
"redhat-assisted-installer" periodic-ci-openshift-release-master-nightly-4.6-e2e-metal-assisted 2021-03-04 01:01:58 0.0
2021-03-04 04:21:57 0.0
2021-03-04 07:22:22 0.0
2021-03-04 08:47:55 0.0
2021-03-04 23:12:31 0.0
...
"redhat-single-node" periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-single-node-live-iso 2021-03-14 00:01:00 0.0
2021-03-15 00:01:09 0.0
2021-03-16 00:01:15 0.0
2021-03-17 00:00:40 0.0
2021-03-18 00:01:25 0.0
Length: 37761, dtype: float64
# convert to df from multiindex series
pct_fixed_per_ts_df = pct_fixed_per_ts.reset_index().rename(columns={0: "pct_fixed"})
pct_fixed_per_ts_df
tab | grid | timestamp | pct_fixed | |
---|---|---|---|---|
0 | "redhat-assisted-installer" | periodic-ci-openshift-release-master-nightly-4... | 2021-03-04 01:01:58 | 0.0 |
1 | "redhat-assisted-installer" | periodic-ci-openshift-release-master-nightly-4... | 2021-03-04 04:21:57 | 0.0 |
2 | "redhat-assisted-installer" | periodic-ci-openshift-release-master-nightly-4... | 2021-03-04 07:22:22 | 0.0 |
3 | "redhat-assisted-installer" | periodic-ci-openshift-release-master-nightly-4... | 2021-03-04 08:47:55 | 0.0 |
4 | "redhat-assisted-installer" | periodic-ci-openshift-release-master-nightly-4... | 2021-03-04 23:12:31 | 0.0 |
... | ... | ... | ... | ... |
37756 | "redhat-single-node" | periodic-ci-openshift-release-master-nightly-4... | 2021-03-14 00:01:00 | 0.0 |
37757 | "redhat-single-node" | periodic-ci-openshift-release-master-nightly-4... | 2021-03-15 00:01:09 | 0.0 |
37758 | "redhat-single-node" | periodic-ci-openshift-release-master-nightly-4... | 2021-03-16 00:01:15 | 0.0 |
37759 | "redhat-single-node" | periodic-ci-openshift-release-master-nightly-4... | 2021-03-17 00:00:40 | 0.0 |
37760 | "redhat-single-node" | periodic-ci-openshift-release-master-nightly-4... | 2021-03-18 00:01:25 | 0.0 |
37761 rows × 4 columns
Save to Ceph or local#
Save the data frame in a parquet format on the Ceph bucket or locally
save = pct_fixed_per_ts_df
if AUTOMATION:
cc = CephCommunication(s3_endpoint_url, s3_access_key, s3_secret_key, s3_bucket)
cc.upload_to_ceph(
save,
s3_path,
f"{METRIC_NAME}/{METRIC_NAME}-{timestamp.year}-{timestamp.month}-{timestamp.day}.parquet",
)
else:
save_to_disk(
save,
OUTPUT_DATA_PATH,
f"{METRIC_NAME}-{timestamp.year}-{timestamp.month}-{timestamp.day}.parquet",
)
## Sanity check to see if the dataset is the same
if AUTOMATION:
sanity_check = cc.read_from_ceph(
s3_path,
f"{METRIC_NAME}/{METRIC_NAME}-{timestamp.year}-{timestamp.month}-{timestamp.day}.parquet",
).head()
else:
sanity_check = read_from_disk(
OUTPUT_DATA_PATH,
f"{METRIC_NAME}-{timestamp.year}-{timestamp.month}-{timestamp.day}.parquet",
).head()
sanity_check
tab | grid | timestamp | pct_fixed | |
---|---|---|---|---|
0 | "redhat-assisted-installer" | periodic-ci-openshift-release-master-nightly-4... | 2021-03-04 01:01:58 | 0.0 |
1 | "redhat-assisted-installer" | periodic-ci-openshift-release-master-nightly-4... | 2021-03-04 04:21:57 | 0.0 |
2 | "redhat-assisted-installer" | periodic-ci-openshift-release-master-nightly-4... | 2021-03-04 07:22:22 | 0.0 |
3 | "redhat-assisted-installer" | periodic-ci-openshift-release-master-nightly-4... | 2021-03-04 08:47:55 | 0.0 |
4 | "redhat-assisted-installer" | periodic-ci-openshift-release-master-nightly-4... | 2021-03-04 23:12:31 | 0.0 |
Conclusion#
This notebook computed the mean fail length, the mean time to fix failures, pass-to-fail rate, and fail-to-pass rate for tests. The dataframe saved on ceph can be used to generate aggregated views and visualizations on the percent of fixed tests at each timestamp.