Time to Merge Prediction Inference Service¶

In the previous notebook, we explored some basic machine learning models for predicting time to merge of a PR. We then deployed the model with the highest f1-score as a service using Seldon. The purpose of this notebook is to check whether this service is running as intended, and more specifically to ensure that the model performance is what we expect it to be. So here, we will use the test set from the aforementioned notebook as the query payload for the service, and then verify that the return values are the same as those obtained during training/testing locally.

import sys
import json
import os
import requests
from dotenv import load_dotenv, find_dotenv
import numpy as np

from sklearn.metrics import classification_report

metric_template_path = "../data-sources/TestGrid/metrics"
if metric_template_path not in sys.path:
    sys.path.insert(1, metric_template_path)

from ipynb.fs.defs.metric_template import (  # noqa: E402
    CephCommunication,
)

load_dotenv(find_dotenv())

True

## CEPH Bucket variables
## Create a .env file on your local with the correct configs,
s3_endpoint_url = os.getenv("S3_ENDPOINT")
s3_access_key = os.getenv("S3_ACCESS_KEY")
s3_secret_key = os.getenv("S3_SECRET_KEY")
s3_bucket = os.getenv("S3_BUCKET")
s3_path = "github"
REMOTE = os.getenv("REMOTE")
INPUT_DATA_PATH = "../../../data/processed/github"

if REMOTE:
    cc = CephCommunication(s3_endpoint_url, s3_access_key, s3_secret_key, s3_bucket)
    X_test = cc.read_from_ceph(s3_path, "X_test.parquet")
    y_test = cc.read_from_ceph(s3_path, "y_test.parquet")

else:
    print(
        "The X_test.parquet and y_test.parquet files are not included in the ocp-ci-analysis github repo."
    )
    print(
        "Please set REMOTE=1 in the .env file and read this data from the S3 bucket instead."
    )

X_test

	size	is_reviewer	is_approver	created_at_day	created_at_month	created_at_weekday	created_at_hour	change_in_.github	change_in_docs	change_in_pkg	...	title_wordcount_fix	title_wordcount_haproxy	title_wordcount_oc	title_wordcount_publishing	title_wordcount_revert	title_wordcount_router	title_wordcount_sh	title_wordcount_staging	title_wordcount_support	title_wordcount_travis
3599	3	True	True	6	7	0	21	0	0	1	...	0	0	0	0	0	0	0	0	0	0
14562	4	True	True	9	6	4	22	0	0	1	...	0	0	0	0	0	0	0	0	0	0
10102	0	False	False	29	7	4	3	0	0	1	...	0	0	0	0	0	0	0	0	0	0
578	3	False	False	16	12	1	13	0	0	1	...	0	0	0	0	0	0	0	0	0	0
9401	1	True	True	17	6	4	5	0	0	1	...	0	0	0	0	0	0	0	0	0	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
10438	1	False	True	16	8	1	2	0	0	0	...	0	0	0	0	0	0	0	0	0	0
6027	3	False	False	23	11	0	16	0	0	0	...	0	0	0	0	0	0	0	0	0	0
11558	3	False	False	25	10	1	8	0	0	1	...	0	0	0	0	0	0	0	0	0	0
18442	4	False	False	5	2	0	10	0	0	0	...	1	0	0	0	0	0	0	0	0	0
23485	0	False	False	25	7	3	13	0	0	1	...	0	0	0	0	0	0	0	0	0	0

2706 rows × 96 columns

y_test

	ttm_class
3599	9
14562	5
10102	2
578	4
9401	6
...	...
10438	6
6027	8
11558	5
18442	4
23485	6

2706 rows × 1 columns

# endpoint from the seldon deployment
base_url = "http://ttm-pipeline-opf-seldon.apps.zero.massopen.cloud/predict"

# convert the dataframe into a numpy array and then to a list (required by seldon)
data = {"data": {"ndarray": X_test.to_numpy().tolist()}}

# create the query payload
json_data = json.dumps(data)
headers = {"content-Type": "application/json"}

# query our inference service
response = requests.post(base_url, data=json_data, headers=headers)
response

<Response [200]>

# what are the names of the prediction classes
json_response = response.json()
json_response["data"]["names"]

['t:0', 't:1', 't:2', 't:3', 't:4', 't:5', 't:6', 't:7', 't:8', 't:9']

# probabality estimates for each of the class for a sample PR
json_response["data"]["ndarray"][0]

[0.02, 0.03, 0.085, 0.165, 0.09, 0.155, 0.09, 0.135, 0.07, 0.16]

# get predicted classes from probabilities for each PR
preds = np.argmax(json_response["data"]["ndarray"], axis=1)

# evaluate results
print(classification_report(y_test, preds))

              precision    recall  f1-score   support

           0       0.31      0.42      0.36       249
           1       0.14      0.10      0.12       217
           2       0.23      0.27      0.25       364
           3       0.15      0.17      0.16       240
           4       0.13      0.10      0.11       275
           5       0.14      0.10      0.12       236
           6       0.23      0.23      0.23       333
           7       0.16      0.14      0.15       270
           8       0.18      0.17      0.17       260
           9       0.23      0.28      0.25       262

    accuracy                           0.20      2706
   macro avg       0.19      0.20      0.19      2706
weighted avg       0.19      0.20      0.20      2706

AI Supported Continuous Integration

Time to Merge Prediction Inference Service¶

Conclusion¶