{ "cells": [ { "cell_type": "markdown", "id": "3f5d1031-4a3c-4bd7-9089-4766917156f2", "metadata": {}, "source": [ "# Time to Merge Prediction Inference Service\n", "\n", "In the previous notebook, we explored some basic machine learning models for predicting time to merge of a PR. We then deployed the model with the highest f1-score as a service using Seldon. The purpose of this notebook is to check whether this service is running as intended, and more specifically to ensure that the model performance is what we expect it to be. So here, we will use the test set from the aforementioned notebook as the query payload for the service, and then verify that the return values are the same as those obtained during training/testing locally." ] }, { "cell_type": "code", "execution_count": 1, "id": "091c5463", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import sys\n", "import json\n", "import os\n", "import requests\n", "from dotenv import load_dotenv, find_dotenv\n", "import numpy as np\n", "\n", "from sklearn.metrics import classification_report\n", "\n", "metric_template_path = \"../data-sources/TestGrid/metrics\"\n", "if metric_template_path not in sys.path:\n", " sys.path.insert(1, metric_template_path)\n", "\n", "from ipynb.fs.defs.metric_template import ( # noqa: E402\n", " CephCommunication,\n", ")\n", "\n", "load_dotenv(find_dotenv())" ] }, { "cell_type": "code", "execution_count": 2, "id": "e4672acb", "metadata": {}, "outputs": [], "source": [ "## CEPH Bucket variables\n", "## Create a .env file on your local with the correct configs,\n", "s3_endpoint_url = os.getenv(\"S3_ENDPOINT\")\n", "s3_access_key = os.getenv(\"S3_ACCESS_KEY\")\n", "s3_secret_key = os.getenv(\"S3_SECRET_KEY\")\n", "s3_bucket = os.getenv(\"S3_BUCKET\")\n", "s3_path = \"github\"\n", "REMOTE = os.getenv(\"REMOTE\")\n", "INPUT_DATA_PATH = \"../../../data/processed/github\"" ] }, { "cell_type": "code", "execution_count": 31, "id": "29a28262", "metadata": {}, "outputs": [], "source": [ "if REMOTE:\n", " cc = CephCommunication(s3_endpoint_url, s3_access_key, s3_secret_key, s3_bucket)\n", " X_test = cc.read_from_ceph(s3_path, \"X_test.parquet\")\n", " y_test = cc.read_from_ceph(s3_path, \"y_test.parquet\")\n", "\n", "else:\n", " print(\n", " \"The X_test.parquet and y_test.parquet files are not included in the ocp-ci-analysis github repo.\"\n", " )\n", " print(\n", " \"Please set REMOTE=1 in the .env file and read this data from the S3 bucket instead.\"\n", " )" ] }, { "cell_type": "code", "execution_count": 32, "id": "c392d8d1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | size | \n", "is_reviewer | \n", "is_approver | \n", "created_at_day | \n", "created_at_month | \n", "created_at_weekday | \n", "created_at_hour | \n", "change_in_.github | \n", "change_in_docs | \n", "change_in_pkg | \n", "... | \n", "title_wordcount_fix | \n", "title_wordcount_haproxy | \n", "title_wordcount_oc | \n", "title_wordcount_publishing | \n", "title_wordcount_revert | \n", "title_wordcount_router | \n", "title_wordcount_sh | \n", "title_wordcount_staging | \n", "title_wordcount_support | \n", "title_wordcount_travis | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3599 | \n", "3 | \n", "True | \n", "True | \n", "6 | \n", "7 | \n", "0 | \n", "21 | \n", "0 | \n", "0 | \n", "1 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
14562 | \n", "4 | \n", "True | \n", "True | \n", "9 | \n", "6 | \n", "4 | \n", "22 | \n", "0 | \n", "0 | \n", "1 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
10102 | \n", "0 | \n", "False | \n", "False | \n", "29 | \n", "7 | \n", "4 | \n", "3 | \n", "0 | \n", "0 | \n", "1 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
578 | \n", "3 | \n", "False | \n", "False | \n", "16 | \n", "12 | \n", "1 | \n", "13 | \n", "0 | \n", "0 | \n", "1 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
9401 | \n", "1 | \n", "True | \n", "True | \n", "17 | \n", "6 | \n", "4 | \n", "5 | \n", "0 | \n", "0 | \n", "1 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
10438 | \n", "1 | \n", "False | \n", "True | \n", "16 | \n", "8 | \n", "1 | \n", "2 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
6027 | \n", "3 | \n", "False | \n", "False | \n", "23 | \n", "11 | \n", "0 | \n", "16 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
11558 | \n", "3 | \n", "False | \n", "False | \n", "25 | \n", "10 | \n", "1 | \n", "8 | \n", "0 | \n", "0 | \n", "1 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
18442 | \n", "4 | \n", "False | \n", "False | \n", "5 | \n", "2 | \n", "0 | \n", "10 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
23485 | \n", "0 | \n", "False | \n", "False | \n", "25 | \n", "7 | \n", "3 | \n", "13 | \n", "0 | \n", "0 | \n", "1 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
2706 rows × 96 columns
\n", "\n", " | ttm_class | \n", "
---|---|
3599 | \n", "9 | \n", "
14562 | \n", "5 | \n", "
10102 | \n", "2 | \n", "
578 | \n", "4 | \n", "
9401 | \n", "6 | \n", "
... | \n", "... | \n", "
10438 | \n", "6 | \n", "
6027 | \n", "8 | \n", "
11558 | \n", "5 | \n", "
18442 | \n", "4 | \n", "
23485 | \n", "6 | \n", "
2706 rows × 1 columns
\n", "