Time to Merge Prediction Inference Service¶
In the previous notebook, we explored some basic machine learning models for predicting time to merge of a PR. We then deployed the model with the highest f1-score as a service using Seldon. The purpose of this notebook is to check whether this service is running as intended, and more specifically to ensure that the model performance is what we expect it to be. So here, we will use the test set from the aforementioned notebook as the query payload for the service, and then verify that the return values are the same as those obtained during training/testing locally.
import sys
import json
import os
import requests
from dotenv import load_dotenv, find_dotenv
import numpy as np
from sklearn.metrics import classification_report
metric_template_path = "../data-sources/TestGrid/metrics"
if metric_template_path not in sys.path:
sys.path.insert(1, metric_template_path)
from ipynb.fs.defs.metric_template import ( # noqa: E402
CephCommunication,
)
load_dotenv(find_dotenv())
True
## CEPH Bucket variables
## Create a .env file on your local with the correct configs,
s3_endpoint_url = os.getenv("S3_ENDPOINT")
s3_access_key = os.getenv("S3_ACCESS_KEY")
s3_secret_key = os.getenv("S3_SECRET_KEY")
s3_bucket = os.getenv("S3_BUCKET")
s3_path = "github"
REMOTE = os.getenv("REMOTE")
INPUT_DATA_PATH = "../../../data/processed/github"
if REMOTE:
cc = CephCommunication(s3_endpoint_url, s3_access_key, s3_secret_key, s3_bucket)
X_test = cc.read_from_ceph(s3_path, "X_test.parquet")
y_test = cc.read_from_ceph(s3_path, "y_test.parquet")
else:
print(
"The X_test.parquet and y_test.parquet files are not included in the ocp-ci-analysis github repo."
)
print(
"Please set REMOTE=1 in the .env file and read this data from the S3 bucket instead."
)
X_test
size | is_reviewer | is_approver | created_at_day | created_at_month | created_at_weekday | created_at_hour | change_in_.github | change_in_docs | change_in_pkg | ... | title_wordcount_fix | title_wordcount_haproxy | title_wordcount_oc | title_wordcount_publishing | title_wordcount_revert | title_wordcount_router | title_wordcount_sh | title_wordcount_staging | title_wordcount_support | title_wordcount_travis | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3599 | 3 | True | True | 6 | 7 | 0 | 21 | 0 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
14562 | 4 | True | True | 9 | 6 | 4 | 22 | 0 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
10102 | 0 | False | False | 29 | 7 | 4 | 3 | 0 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
578 | 3 | False | False | 16 | 12 | 1 | 13 | 0 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
9401 | 1 | True | True | 17 | 6 | 4 | 5 | 0 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
10438 | 1 | False | True | 16 | 8 | 1 | 2 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
6027 | 3 | False | False | 23 | 11 | 0 | 16 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
11558 | 3 | False | False | 25 | 10 | 1 | 8 | 0 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
18442 | 4 | False | False | 5 | 2 | 0 | 10 | 0 | 0 | 0 | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23485 | 0 | False | False | 25 | 7 | 3 | 13 | 0 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2706 rows × 96 columns
y_test
ttm_class | |
---|---|
3599 | 9 |
14562 | 5 |
10102 | 2 |
578 | 4 |
9401 | 6 |
... | ... |
10438 | 6 |
6027 | 8 |
11558 | 5 |
18442 | 4 |
23485 | 6 |
2706 rows × 1 columns
# endpoint from the seldon deployment
base_url = "http://ttm-pipeline-opf-seldon.apps.zero.massopen.cloud/predict"
# convert the dataframe into a numpy array and then to a list (required by seldon)
data = {"data": {"ndarray": X_test.to_numpy().tolist()}}
# create the query payload
json_data = json.dumps(data)
headers = {"content-Type": "application/json"}
# query our inference service
response = requests.post(base_url, data=json_data, headers=headers)
response
<Response [200]>
# what are the names of the prediction classes
json_response = response.json()
json_response["data"]["names"]
['t:0', 't:1', 't:2', 't:3', 't:4', 't:5', 't:6', 't:7', 't:8', 't:9']
# probabality estimates for each of the class for a sample PR
json_response["data"]["ndarray"][0]
[0.02, 0.03, 0.085, 0.165, 0.09, 0.155, 0.09, 0.135, 0.07, 0.16]
# get predicted classes from probabilities for each PR
preds = np.argmax(json_response["data"]["ndarray"], axis=1)
# evaluate results
print(classification_report(y_test, preds))
precision recall f1-score support
0 0.31 0.42 0.36 249
1 0.14 0.10 0.12 217
2 0.23 0.27 0.25 364
3 0.15 0.17 0.16 240
4 0.13 0.10 0.11 275
5 0.14 0.10 0.12 236
6 0.23 0.23 0.23 333
7 0.16 0.14 0.15 270
8 0.18 0.17 0.17 260
9 0.23 0.28 0.25 262
accuracy 0.20 2706
macro avg 0.19 0.20 0.19 2706
weighted avg 0.19 0.20 0.20 2706
Conclusion¶
The evaluation scores in the above classification report match the ones we saw in the training notebook. Great, looks like our inference service and model are working as expected, and are ready to predict some times to merge of PRs!