Prow Logs and GCS Data
Contents
Prow Logs and GCS Data#
What do we have access to as data scientists when digging into the build artifacts?#
In this notebook we will demonstrate how to discover and interact with the data (logs) made availble on GCS/origin-ci-test as well as provide some simple EDA to help folks get started analyzing this data.
This notebook is divided into 2 sections:
Compare the different log files present throughout the archives and quantify how complete and comparable our log dataset is from build to build.
Download a sample dataset of the events and build logs to perform some lite EDA.
Note: We will be collecting data from the “origin-ci-test” Bucket on Google Cloud Storage. But, after some out-of-notebook exploration it has become aparent that this is a massive amount of data that contains more than just the OpenShift CI logs we are intrested in here and programatically investigating that Bucket is not advised. Therefore, we recommend using the web ui to inspect what jobs are exposed and identify what is of interest to your analysis before collecting data via the google cloud stporage api. Here we will rely on web-scraping the UI to explore what’s available to us based on what jobs are displayed on TestGrid.
Compare availability of log files across a build#
import json
import requests
import pandas as pd
from bs4 import BeautifulSoup
from google.cloud import storage
from pathlib import Path
from IPython.display import Markdown
Example to access a single set of Prow artifacts#
Let’s make sure we understand how this works, and focus on a single job first.
tab = '"redhat-openshift-ocp-release-4.6-informing"'
job = "periodic-ci-openshift-release-master-ci-4.6-upgrade-from-stable-4.5-e2e-gcp-upgrade"
response = requests.get(
f"https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/{job}"
)
soup = BeautifulSoup(response.text, "html.parser")
list_of_builds = [x.get_text()[1:-1] for x in soup.find_all("a")][1:-1]
response = requests.get(
f"https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/{job}/{list_of_builds[1]}"
)
response.url
'https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.6-upgrade-from-stable-4.5-e2e-gcp-upgrade/1383601003814719488'
soup = BeautifulSoup(response.text, "html.parser")
[x.get_text() for x in soup.find_all("a")]
[' ..',
' artifacts/',
' build-log.txt',
' finished.json',
' podinfo.json',
' prowjob.json',
' started.json']
Great, we can now programmatically access the archives. Now, lets walk through all of the build archives for a single job and create a list of what they have on the first level of their directories.
build_data = {}
for build in list_of_builds:
response = requests.get(
f"https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/{job}/{build}"
)
soup = BeautifulSoup(response.text, "html.parser")
artifacts = [x.get_text() for x in soup.find_all("a")]
build_data[build] = artifacts
builds_info = pd.Series({k: len(v) for (k, v) in build_data.items()})
builds_info.value_counts()
7 536
dtype: int64
pd.Series(build_data).apply(" ".join).value_counts()
.. artifacts/ build-log.txt finished.json podinfo.json prowjob.json started.json 536
dtype: int64
cent1 = builds_info.value_counts() / len(builds_info)
cent1 = cent1.to_frame().reset_index()
cent1
index | 0 | |
---|---|---|
0 | 7 | 1.0 |
Markdown(
"{}% of our records for this job appear to be complete and include the 'artifacts/' subdirectory,\
lets dig in and see what they contain.".format(
round(cent1.iloc[0, 1] * 100, 2)
)
)
100.0% of our records for this job appear to be complete and include the 'artifacts/' subdirectory, lets dig in and see what they contain.
build_data = {}
for build in list_of_builds:
response = requests.get(
f"https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/{job}/{build}/artifacts"
)
soup = BeautifulSoup(response.text, "html.parser")
artifacts = [x.get_text() for x in soup.find_all("a")]
build_data[build] = artifacts
artifact_info = pd.Series({k: len(v) for (k, v) in build_data.items()})
artifact_info.value_counts()
8 528
7 4
3 2
6 1
5 1
dtype: int64
cent2 = artifact_info.value_counts() / len(artifact_info)
cent2 = cent2.to_frame().reset_index()
cent2
index | 0 | |
---|---|---|
0 | 8 | 0.985075 |
1 | 7 | 0.007463 |
2 | 3 | 0.003731 |
3 | 6 | 0.001866 |
4 | 5 | 0.001866 |
Markdown(
"The above shows us that there are about {}% of the artifacts dirs "
"that have {} items and {}% that have {} items and so on "
"(but it does not account for different combinations).".format(
round(cent2.iloc[0, 1] * 100, 2),
cent2.iloc[0, 0],
round(cent2.iloc[1, 1] * 100, 2),
cent2.iloc[1, 0],
)
)
The above shows us that there are about 98.51% of the artifacts dirs that have 8 items and 0.75% that have 7 items and so on (but it does not account for different combinations).
pd.Series(build_data).apply(" ".join).value_counts(normalize=True)
.. build-resources/ e2e-gcp-upgrade/ release/ ci-operator-step-graph.json ci-operator.log junit_operator.xml metadata.json 0.985075
.. build-resources/ release/ ci-operator-step-graph.json ci-operator.log junit_operator.xml metadata.json 0.007463
.. ci-operator.log junit_job.xml 0.003731
.. build-resources/ ci-operator-step-graph.json ci-operator.log junit_operator.xml metadata.json 0.001866
.. e2e-gcp-upgrade/ release/ ci-operator.log metadata.json 0.001866
dtype: float64
We can see from the results above that once we get down into the artifacts there is a far less uniformity to the data available to us for analysis. And this is all within a single job! Moving forward we will assume that this issue gets worse when comparing available artifacts across jobs and can dedicate a later notebook to proving out that assumption.
This heterogeneity of objects available for each build will make it somewhat difficult to use these sets of documents as a whole to compare different CI behaviour. At this point, it makes sense to consider looking only at the same document (log) across job where available.
Collect Data#
Build logs#
In the next section we are going to walkthrough accessing the build-logs.txt
and the events.json
as they appear to be nearly universally available. We will both download a small testing dataset as well show how to work directly with the data in memory.
Now that we know what logs we want to collect its simpler to use the google cloud storage api to access or data.
def connect_storage(bucket_name):
storage_client = storage.Client.create_anonymous_client()
bucket = storage_client.bucket(bucket_name)
return {"bucket": bucket, "storage_client": storage_client}
def download_public_file(client, source_blob_name):
"""Downloads a public blob from the bucket."""
blob = client["bucket"].blob(source_blob_name)
if blob.exists(client["storage_client"]):
text = blob.download_as_text()
else:
text = ""
return text
bucket_connection = connect_storage("origin-ci-test")
# Read data into memory
build_log_data = {}
for build in list_of_builds:
file = download_public_file(bucket_connection, f"logs/{job}/{build}/build-log.txt")
build_log_data[build] = file
build_log_data[list(build_log_data.keys())[0]]
'\x1b[36mINFO\x1b[0m[2021-04-17T19:58:56Z] ci-operator version v20210416-d902916 \n\x1b[36mINFO\x1b[0m[2021-04-17T19:58:56Z] Loading configuration from https://config.ci.openshift.org for openshift/release@master [ci-4.6-upgrade-from-stable-4.5] \n\x1b[36mINFO\x1b[0m[2021-04-17T19:58:57Z] Resolved SHA missing for master in https://github.com/openshift/release (will prevent caching) \n\x1b[36mINFO\x1b[0m[2021-04-17T19:58:57Z] Using explicitly provided pull-spec for release initial (registry.ci.openshift.org/ocp/release:4.5.37) \n\x1b[36mINFO\x1b[0m[2021-04-17T19:58:57Z] Using explicitly provided pull-spec for release latest (registry.ci.openshift.org/ocp/release:4.6.0-0.ci-2021-04-17-163254) \n\x1b[36mINFO\x1b[0m[2021-04-17T19:58:57Z] Using namespace https://console.build02.ci.openshift.org/k8s/cluster/projects/ci-op-y3cf82zn \n\x1b[36mINFO\x1b[0m[2021-04-17T19:58:57Z] Running [input:origin-centos-8], [input:ocp-4.5-upi-installer], [release:initial], [release:latest], [images], e2e-gcp-upgrade \n\x1b[36mINFO\x1b[0m[2021-04-17T19:58:58Z] Tagging origin/centos:8 into pipeline:origin-centos-8. \n\x1b[36mINFO\x1b[0m[2021-04-17T19:58:58Z] Tagging ocp/4.5:upi-installer into pipeline:ocp-4.5-upi-installer. \n\x1b[36mINFO\x1b[0m[2021-04-17T19:58:59Z] Importing release image initial. \n\x1b[36mINFO\x1b[0m[2021-04-17T19:58:59Z] Importing release image latest. \n\x1b[36mINFO\x1b[0m[2021-04-17T20:01:08Z] Imported release 4.5.37 created at 2021-03-31 06:05:56 +0000 UTC with 108 images to tag release:initial \n\x1b[36mINFO\x1b[0m[2021-04-17T20:01:16Z] Imported release 4.6.0-0.ci-2021-04-17-163254 created at 2021-04-17 16:33:19 +0000 UTC with 121 images to tag release:latest \n\x1b[36mINFO\x1b[0m[2021-04-17T20:01:16Z] Acquiring leases for test e2e-gcp-upgrade \n\x1b[36mINFO\x1b[0m[2021-04-17T20:01:16Z] Running multi-stage test e2e-gcp-upgrade \n\x1b[36mINFO\x1b[0m[2021-04-17T20:01:17Z] Running step e2e-gcp-upgrade-ipi-install-hosted-loki. \n\x1b[36mINFO\x1b[0m[2021-04-17T20:01:27Z] Step e2e-gcp-upgrade-ipi-install-hosted-loki succeeded after 10.064135461s. \n\x1b[36mINFO\x1b[0m[2021-04-17T20:01:27Z] Running step e2e-gcp-upgrade-ipi-conf. \n\x1b[36mINFO\x1b[0m[2021-04-17T20:01:47Z] Step e2e-gcp-upgrade-ipi-conf succeeded after 20.05838057s. \n\x1b[36mINFO\x1b[0m[2021-04-17T20:01:47Z] Running step e2e-gcp-upgrade-ipi-conf-gcp. \n\x1b[36mINFO\x1b[0m[2021-04-17T20:01:57Z] Step e2e-gcp-upgrade-ipi-conf-gcp succeeded after 10.065333292s. \n\x1b[36mINFO\x1b[0m[2021-04-17T20:01:57Z] Running step e2e-gcp-upgrade-ipi-install-monitoringpvc. \n\x1b[36mINFO\x1b[0m[2021-04-17T20:02:07Z] Step e2e-gcp-upgrade-ipi-install-monitoringpvc succeeded after 10.071902479s. \n\x1b[36mINFO\x1b[0m[2021-04-17T20:02:07Z] Running step e2e-gcp-upgrade-ipi-install-rbac. \n\x1b[36mINFO\x1b[0m[2021-04-17T20:02:17Z] Step e2e-gcp-upgrade-ipi-install-rbac succeeded after 10.051469409s. \n\x1b[36mINFO\x1b[0m[2021-04-17T20:02:17Z] Running step e2e-gcp-upgrade-ipi-install-install-stableinitial. \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] Logs for container test in pod e2e-gcp-upgrade-ipi-install-install-stableinitial: \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] Installing from release registry.build02.ci.openshift.org/ci-op-y3cf82zn/release@sha256:037ed15c79bbd6bee6f0d524d91790d2b368cba945e7ae6d64c984dc0f162b82 \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Credentials loaded from environment variable \\"GOOGLE_CLOUD_KEYFILE_JSON\\", file \\"/var/run/secrets/ci.openshift.io/cluster-profile/gce.json\\"" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Consuming Install Config from target directory" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] Will include manifests: \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] /tmp/secret/manifest_metrics_rb.yml \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] /tmp/secret/manifest_creds.yml \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] /tmp/secret/manifest_ds.yml \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] /tmp/secret/manifest_cm.yml \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] /tmp/secret/manifest_clusterrole.yml \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] /tmp/secret/manifest_oauth_role.yml \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] /tmp/secret/manifest_metrics_role.yml \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] /tmp/secret/manifest_promtail_service.yml \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] /tmp/secret/manifest_grafanacom_creds.yml \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] /tmp/secret/manifest_sa.yml \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] /tmp/secret/manifest_promtail_cookie_secret.yml \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] /tmp/secret/manifest_role.yml \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] /tmp/secret/manifest_cluster-monitoring-pvc.yml \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] /tmp/secret/manifest_psp.yml \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] /tmp/secret/manifest_01_ns.yml \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] /tmp/secret/manifest_oauth_clusterrolebinding.yml \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] /tmp/secret/manifest_metrics.yml \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] /tmp/secret/manifest_rolebinding.yml \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] /tmp/secret/manifest_clusterrolebinding.yml \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=warning msg="Found override for release image. Please be warned, this is not advised" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Credentials loaded from environment variable \\"GOOGLE_CLOUD_KEYFILE_JSON\\", file \\"/var/run/secrets/ci.openshift.io/cluster-profile/gce.json\\"" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Consuming Openshift Manifests from target directory" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Consuming Master Machines from target directory" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Consuming OpenShift Install (Manifests) from target directory" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Consuming Worker Machines from target directory" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Consuming Common Manifests from target directory" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Creating infrastructure resources..." \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Waiting up to 20m0s for the Kubernetes API at https://api.ci-op-y3cf82zn-5a3d5.origin-ci-int-gce.dev.openshift.com:6443..." \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="API v1.18.3+cdb0358 up" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Waiting up to 40m0s for bootstrapping to complete..." \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=error msg="Cluster operator authentication Degraded is True with ConfigObservation_Error::IngressStateEndpoints_MissingEndpoints::RouteStatus_FailedHost::RouterCerts_NoRouterCertSecret: RouterCertsDegraded: secret/v4-0-config-system-router-certs -n openshift-authentication: could not be retrieved: secret \\"v4-0-config-system-router-certs\\" not found\\nConfigObservationDegraded: secret \\"v4-0-config-system-router-certs\\" not found\\nIngressStateEndpointsDegraded: No endpoints found for oauth-server\\nRouteStatusDegraded: route is not available at canonical host oauth-openshift.apps.ci-op-y3cf82zn-5a3d5.origin-ci-int-gce.dev.openshift.com: []" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Cluster operator authentication Progressing is Unknown with NoData: " \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Cluster operator authentication Available is Unknown with NoData: " \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Cluster operator csi-snapshot-controller Progressing is True with _AsExpected: Progressing: Waiting for Deployment to deploy csi-snapshot-controller pods" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Cluster operator csi-snapshot-controller Available is False with _AsExpected: Available: Waiting for Deployment to deploy csi-snapshot-controller pods" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Cluster operator kube-storage-version-migrator Available is False with _NoMigratorPod: Available: deployment/migrator.openshift-kube-storage-version-migrator: no replicas are available" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Cluster operator machine-config Progressing is True with : Working towards 4.5.37" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=error msg="Cluster operator machine-config Degraded is True with RenderConfigFailed: Unable to apply 4.5.37: openshift-config-managed/kube-cloud-config configmap is required on platform GCP but not found: configmap \\"kube-cloud-config\\" not found" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Cluster operator machine-config Available is False with : Cluster not available for 4.5.37" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Cluster operator openshift-apiserver Available is False with APIServices_Error: APIServicesAvailable: \\"apps.openshift.io.v1\\" is not ready: 503 (the server is currently unable to handle the request)\\nAPIServicesAvailable: \\"authorization.openshift.io.v1\\" is not ready: 503 (the server is currently unable to handle the request)\\nAPIServicesAvailable: \\"build.openshift.io.v1\\" is not ready: 503 (the server is currently unable to handle the request)\\nAPIServicesAvailable: \\"image.openshift.io.v1\\" is not ready: 503 (the server is currently unable to handle the request)\\nAPIServicesAvailable: \\"oauth.openshift.io.v1\\" is not ready: 503 (the server is currently unable to handle the request)\\nAPIServicesAvailable: \\"project.openshift.io.v1\\" is not ready: 503 (the server is currently unable to handle the request)\\nAPIServicesAvailable: \\"quota.openshift.io.v1\\" is not ready: 503 (the server is currently unable to handle the request)\\nAPIServicesAvailable: \\"route.openshift.io.v1\\" is not ready: 503 (the server is currently unable to handle the request)\\nAPIServicesAvailable: \\"security.openshift.io.v1\\" is not ready: 503 (the server is currently unable to handle the request)\\nAPIServicesAvailable: \\"template.openshift.io.v1\\" is not ready: 503 (the server is currently unable to handle the request)\\nAPIServicesAvailable: \\"user.openshift.io.v1\\" is not ready: 503 (the server is currently unable to handle the request)" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Cluster operator openshift-controller-manager Progressing is True with _DesiredStateNotYetAchieved: Progressing: daemonset/controller-manager: observed generation is 0, desired generation is 8.\\nProgressing: daemonset/controller-manager: number available is 0, desired number available > 1" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Cluster operator openshift-controller-manager Available is False with _NoPodsAvailable: Available: no daemon pods available on any node." \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Cluster operator operator-lifecycle-manager-packageserver Available is False with : " \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Cluster operator operator-lifecycle-manager-packageserver Progressing is True with : Working toward 0.15.1" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Pulling debug logs from the bootstrap machine" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=info msg="Bootstrap gather logs captured here \\"/tmp/installer/log-bundle-20210417204954.tar.gz\\"" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] level=fatal msg="Bootstrap failed to complete: failed to wait for bootstrapping to complete: timed out waiting for the condition" \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] Setup phase finished, prepare env for next steps \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] Copying log bundle... \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] Removing REDACTED info from log... \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] Copying required artifacts to shared dir \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] {"component":"entrypoint","error":"wrapped process failed: exit status 1","file":"prow/entrypoint/run.go:80","func":"k8s.io/test-infra/prow/entrypoint.Options.Run","level":"error","msg":"Error executing test process","severity":"error","time":"2021-04-17T20:50:28Z"} \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] error: failed to execute wrapped command: exit status 1 \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] Step e2e-gcp-upgrade-ipi-install-install-stableinitial failed after 48m20.074374705s. \n\x1b[36mINFO\x1b[0m[2021-04-17T20:50:37Z] Running step e2e-gcp-upgrade-gather-gcp-console. \n\x1b[36mINFO\x1b[0m[2021-04-17T20:51:07Z] Step e2e-gcp-upgrade-gather-gcp-console succeeded after 30.058654549s. \n\x1b[36mINFO\x1b[0m[2021-04-17T20:51:07Z] Running step e2e-gcp-upgrade-gather-must-gather. \n\x1b[36mINFO\x1b[0m[2021-04-17T21:10:07Z] Step e2e-gcp-upgrade-gather-must-gather succeeded after 19m0.055587687s. \n\x1b[36mINFO\x1b[0m[2021-04-17T21:10:07Z] Running step e2e-gcp-upgrade-gather-extra. \n\x1b[36mINFO\x1b[0m[2021-04-17T21:11:07Z] Step e2e-gcp-upgrade-gather-extra succeeded after 1m0.064604654s. \n\x1b[36mINFO\x1b[0m[2021-04-17T21:11:07Z] Running step e2e-gcp-upgrade-gather-audit-logs. \n\x1b[36mINFO\x1b[0m[2021-04-17T21:11:57Z] Step e2e-gcp-upgrade-gather-audit-logs succeeded after 50.057797698s. \n\x1b[36mINFO\x1b[0m[2021-04-17T21:11:57Z] Running step e2e-gcp-upgrade-ipi-deprovision-deprovision. \n\x1b[36mINFO\x1b[0m[2021-04-17T21:14:47Z] Step e2e-gcp-upgrade-ipi-deprovision-deprovision succeeded after 2m50.060881266s. \n\x1b[36mINFO\x1b[0m[2021-04-17T21:14:47Z] Releasing leases for test e2e-gcp-upgrade \n\x1b[36mINFO\x1b[0m[2021-04-17T21:14:48Z] Ran for 1h15m51s \n\x1b[31mERRO\x1b[0m[2021-04-17T21:14:48Z] Some steps failed: \n\x1b[31mERRO\x1b[0m[2021-04-17T21:14:48Z] * could not run steps: step e2e-gcp-upgrade failed: "e2e-gcp-upgrade" pre steps failed: "e2e-gcp-upgrade" pod "e2e-gcp-upgrade-ipi-install-install-stableinitial" failed: the pod ci-op-y3cf82zn/e2e-gcp-upgrade-ipi-install-install-stableinitial failed after 48m15s (failed containers: test): ContainerFailed one or more containers exited \n\x1b[31mERRO\x1b[0m[2021-04-17T21:14:48Z] Container test exited with code 1, reason Error \n\x1b[31mERRO\x1b[0m[2021-04-17T21:14:48Z] --- \n\x1b[31mERRO\x1b[0m[2021-04-17T21:14:48Z] \\"quota.openshift.io.v1\\" is not ready: 503 (the server is currently unable to handle the request)\\nAPIServicesAvailable: \\"route.openshift.io.v1\\" is not ready: 503 (the server is currently unable to handle the request)\\nAPIServicesAvailable: \\"security.openshift.io.v1\\" is not ready: 503 (the server is currently unable to handle the request)\\nAPIServicesAvailable: \\"template.openshift.io.v1\\" is not ready: 503 (the server is currently unable to handle the request)\\nAPIServicesAvailable: \\"user.openshift.io.v1\\" is not ready: 503 (the server is currently unable to handle the request)" \n\x1b[31mERRO\x1b[0m[2021-04-17T21:14:48Z] level=info msg="Cluster operator openshift-controller-manager Progressing is True with _DesiredStateNotYetAchieved: Progressing: daemonset/controller-manager: observed generation is 0, desired generation is 8.\\nProgressing: daemonset/controller-manager: number available is 0, desired number available > 1" \n\x1b[31mERRO\x1b[0m[2021-04-17T21:14:48Z] level=info msg="Cluster operator openshift-controller-manager Available is False with _NoPodsAvailable: Available: no daemon pods available on any node." \n\x1b[31mERRO\x1b[0m[2021-04-17T21:14:48Z] level=info msg="Cluster operator operator-lifecycle-manager-packageserver Available is False with : " \n\x1b[31mERRO\x1b[0m[2021-04-17T21:14:48Z] level=info msg="Cluster operator operator-lifecycle-manager-packageserver Progressing is True with : Working toward 0.15.1" \n\x1b[31mERRO\x1b[0m[2021-04-17T21:14:48Z] level=info msg="Pulling debug logs from the bootstrap machine" \n\x1b[31mERRO\x1b[0m[2021-04-17T21:14:48Z] level=info msg="Bootstrap gather logs captured here \\"/tmp/installer/log-bundle-20210417204954.tar.gz\\"" \n\x1b[31mERRO\x1b[0m[2021-04-17T21:14:48Z] level=fatal msg="Bootstrap failed to complete: failed to wait for bootstrapping to complete: timed out waiting for the condition" \n\x1b[31mERRO\x1b[0m[2021-04-17T21:14:48Z] Setup phase finished, prepare env for next steps \n\x1b[31mERRO\x1b[0m[2021-04-17T21:14:48Z] Copying log bundle... \n\x1b[31mERRO\x1b[0m[2021-04-17T21:14:48Z] Removing REDACTED info from log... \n\x1b[31mERRO\x1b[0m[2021-04-17T21:14:48Z] Copying required artifacts to shared dir \n\x1b[31mERRO\x1b[0m[2021-04-17T21:14:48Z] {"component":"entrypoint","error":"wrapped process failed: exit status 1","file":"prow/entrypoint/run.go:80","func":"k8s.io/test-infra/prow/entrypoint.Options.Run","level":"error","msg":"Error executing test process","severity":"error","time":"2021-04-17T20:50:28Z"} \n\x1b[31mERRO\x1b[0m[2021-04-17T21:14:48Z] error: failed to execute wrapped command: exit status 1 \n\x1b[31mERRO\x1b[0m[2021-04-17T21:14:48Z] --- \n\x1b[31mERRO\x1b[0m[2021-04-17T21:14:48Z] Link to step on registry info site: https://steps.ci.openshift.org/reference/ipi-install-install-stableinitial \n\x1b[31mERRO\x1b[0m[2021-04-17T21:14:48Z] Link to job on registry info site: https://steps.ci.openshift.org/job?org=openshift&repo=release&branch=master&test=e2e-gcp-upgrade&variant=ci-4.6-upgrade-from-stable-4.5 \n'
def get_counts(x):
"""
Gets counts for chars, words, lines for a log.
"""
if x:
chars = len(x)
words = len(x.split())
lines = x.count("\n") + 1
return chars, words, lines
else:
return 0, 0, 0
## Create a dataframe with char, words, and lines
## count for the logs
data = []
for key, value in build_log_data.items():
chars, words, lines = get_counts(value)
data.append([key, chars, words, lines])
df = pd.DataFrame(data=data, columns=["build_log_id", "chars", "words", "lines"])
df
build_log_id | chars | words | lines | |
---|---|---|---|---|
0 | 1383510269677801472 | 17065 | 1188 | 120 |
1 | 1383601003814719488 | 17063 | 1189 | 120 |
2 | 1383691728313126912 | 17063 | 1189 | 120 |
3 | 1383782446109036544 | 37601 | 2690 | 261 |
4 | 1383873157873537024 | 16784 | 1167 | 120 |
... | ... | ... | ... | ... |
531 | 1447844674805960704 | 20401 | 1485 | 223 |
532 | 1448025872249393152 | 20395 | 1480 | 222 |
533 | 1448207267424899072 | 20353 | 1477 | 223 |
534 | 1448388543654137856 | 20443 | 1484 | 222 |
535 | 1448569878565883904 | 20458 | 1486 | 222 |
536 rows × 4 columns
See the stats for chars, words, lines#
df["chars"].describe()
count 5.360000e+02
mean 3.516793e+04
std 2.711030e+05
min 2.120000e+03
25% 2.041650e+04
50% 2.052900e+04
75% 2.809525e+04
max 6.289343e+06
Name: chars, dtype: float64
df["words"].describe()
count 536.000000
mean 2382.583955
std 19301.541247
min 147.000000
25% 1482.000000
50% 1494.000000
75% 1648.000000
max 447981.000000
Name: words, dtype: float64
describe = df["lines"].describe()
describe = describe.to_frame()
describe
lines | |
---|---|
count | 536.000000 |
mean | 249.042910 |
std | 815.830161 |
min | 19.000000 |
25% | 215.000000 |
50% | 222.000000 |
75% | 222.000000 |
max | 19034.000000 |
Markdown(
"From the initial analysis above, we see that we have log files "
" with {} lines to {} lines with a mean of ~{} lines. "
"This suggest high variability. The next thing we could look "
" at would be the similarity between log files, "
"performing analysis, templating, and clustering. "
" We will address those questions in an upcoming notebook.".format(
round(describe.iloc[3, 0]),
round(describe.iloc[7, 0]),
round(describe.iloc[1, 0]),
)
)
From the initial analysis above, we see that we have log files with 19 lines to 19034 lines with a mean of ~249 lines. This suggest high variability. The next thing we could look at would be the similarity between log files, performing analysis, templating, and clustering. We will address those questions in an upcoming notebook.
Events#
build_events_data = {}
for build in list_of_builds:
file = download_public_file(
bucket_connection, f"logs/{job}/{build}/artifacts/build-resources/events.json"
)
if file:
build_events_data[build] = json.loads(file)
else:
build_events_data[build] = ""
## Percentage of builds that have the events.json file
count = 0
for key, value in build_events_data.items():
if value:
count += 1
percent = count * 100 / len(build_events_data)
print(percent)
99.44029850746269
# Analyzing the messages of a single build
single_build = sorted(build_events_data.keys())[0]
messages = [
(i["metadata"]["uid"], i["message"])
for i in build_events_data[single_build]["items"]
]
messages_df = pd.DataFrame(messages, columns=["UID", "message"])
messages_df
UID | message | |
---|---|---|
0 | c694dae6-d31b-469c-b9ca-e139d20ef644 | Running job periodic-ci-openshift-release-mast... |
1 | 8174d743-feb7-4446-84f5-def97617fd3c | Running job periodic-ci-openshift-release-mast... |
2 | e7545ab7-fa0a-473f-b719-4dbad5e01a65 | created SCC ranges |
3 | 8a49821c-3ef6-4f71-ae64-a3faad6d8365 | No matching pods found |
4 | b97c3b80-4e2f-4e5e-8e4a-5e890c3cc4fd | found no controller ref for pod "release-image... |
... | ... | ... |
219 | 6e002c16-f8cc-4390-82cb-a6e7f521541e | Created container release |
220 | e0592c0a-de8b-4ab8-9aee-5fe0384dd51f | Started container release |
221 | 946cf37e-9ce7-436a-8997-cd88a3ffeea8 | Container image "gcr.io/k8s-prow/sidecar:v2021... |
222 | 90fa202b-5ac5-4768-86ad-54767e6579d4 | Created container sidecar |
223 | 9d81659f-c5a9-4446-b86e-ddcc9e590dec | Started container sidecar |
224 rows × 2 columns
messages_df["message"].describe()
count 224
unique 89
top Created container sidecar
freq 13
Name: message, dtype: object
messages_df["message"].value_counts().reset_index()
index | message | |
---|---|---|
0 | Created container sidecar | 13 |
1 | Started container place-entrypoint | 13 |
2 | Started container sidecar | 13 |
3 | Container image "gcr.io/k8s-prow/sidecar:v2021... | 13 |
4 | Container image "gcr.io/k8s-prow/entrypoint:v2... | 13 |
... | ... | ... |
84 | Add eth0 [10.130.47.13/23] | 1 |
85 | Successfully assigned ci-op-y3cf82zn/e2e-gcp-u... | 1 |
86 | Successfully pulled image "image-registry.open... | 1 |
87 | Pulling image "image-registry.openshift-image-... | 1 |
88 | Successfully pulled image "image-registry.open... | 1 |
89 rows × 2 columns
Markdown(
"In the build data, we saw that about {}% builds have the events.json file."
" We further analyzed all the events that happened for a particular build"
" and found the frequencies of the messages. We can repeat the process for"
" all the other builds and find most common messages and perform further"
" analysis.".format(round(percent, 2))
)
In the build data, we saw that about 99.44% builds have the events.json file. We further analyzed all the events that happened for a particular build and found the frequencies of the messages. We can repeat the process for all the other builds and find most common messages and perform further analysis.
Save sample data#
path = "../../../data/raw/gcs/build-logs/"
filename = "sample-build-logs.parquet"
dataset_base_path = Path(path)
dataset_base_path.mkdir(parents=True, exist_ok=True)
build_logs = pd.DataFrame.from_dict(build_log_data, orient="index", columns=["log"])
build_logs.to_parquet(f"{path}/{filename}")
path = "../../../data/raw/gcs/events/"
filename = "sample-events.json"
dataset_base_path = Path(path)
dataset_base_path.mkdir(parents=True, exist_ok=True)
with open(f"{path}/{filename}", "w") as file:
json.dump(build_events_data, file)
Conclusion#
In this notebook, we demonstrated how to programmatically access the gcs openshift origins ci archives, pull specific logs types into our notebook for analysis and save them for later use.