ML Clustering

Clustering is an unsupervised learning method whose task is to divide the population or data points into a number of groups, such that data points in a group are more similar to other data points in the same group and dissimilar to the data points in other groups. It is basically a collection of objects based on similarity and dissimilarity between them.

K-means clustering is a type of unsupervised learning, which is used when you have unlabeled data (i.e., data without defined categories or groups). The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K. Given the fact that the dataset provided for Categorical Feature Encoding have 100% categorical variables, we are performing the kmeans clustering analysis by employing two different encoding techniques, viz, one hot encoding and similarity encoding of the image_name feature.

In this approach, we are using KMeans clustering technique, in order to group together image_repos which have a high probability of occuring together in the same cluster_id. In addition to this we will also be using telemetry information linked with the cluster_id.

Data collection and pre-processing

import io
import boto3
import pandas as pd
import numpy as np
import warnings
import os
import matplotlib.pyplot as plt
import plotly.express as px

from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from tqdm import tqdm
from sklearn.preprocessing import StandardScaler

from dotenv import load_dotenv, find_dotenv
%matplotlib inline
load_dotenv(find_dotenv())
warnings.filterwarnings("ignore")

pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
# CEPH Bucket variables
s3_endpoint_url = os.getenv("S3_ENDPOINT")
s3_access_key = os.getenv("S3_ACCESS_KEY")
s3_secret_key = os.getenv("S3_SECRET_KEY")
s3_bucket = os.getenv("S3_BUCKET")

# s3 resource to communicate with storage
s3 = boto3.resource(
    "s3",
    endpoint_url=s3_endpoint_url,
    aws_access_key_id=s3_access_key,
    aws_secret_access_key=s3_secret_key,
)

# access the parquet file as an s3 object

obj1 = s3.Object(
    "DH-PLAYPEN", "ccx/fingerprinting/image_layers/date=2021-05-12/2021-05-12.parquet"
)
obj2 = s3.Object(
    "DH-PLAYPEN", "ccx/fingerprinting/containers/date=2021-05-12/2021-05-12.parquet"
)
obj3 = s3.Object(
    "DH-PLAYPEN", "ccx/fingerprinting/image_layers/dataframe_image_id.parquet"
)
obj4 = s3.Object("DH-PLAYPEN", "ccx/fingerprinting/containers/df_cont_image_id.parquet")
obj5 = s3.Object(
    "DH-PLAYPEN", "ccx/fingerprinting/image_layers/telemeter_image_data.parquet"
)
obj6 = s3.Object(
    "DH-PLAYPEN", "ccx/fingerprinting/containers/telemeter_cont_data.parquet"
)

obj7 = s3.Object(
    "DH-PLAYPEN", "ccx/fingerprinting/image_layers/df_image_layerid.parquet"
)
# download the file into the buffer
buffer1 = io.BytesIO()
obj1.download_fileobj(buffer1)
buffer2 = io.BytesIO()
obj2.download_fileobj(buffer2)
buffer3 = io.BytesIO()
obj3.download_fileobj(buffer3)
buffer4 = io.BytesIO()
obj4.download_fileobj(buffer4)
buffer5 = io.BytesIO()
obj5.download_fileobj(buffer5)
buffer6 = io.BytesIO()
obj6.download_fileobj(buffer6)
buffer7 = io.BytesIO()
obj7.download_fileobj(buffer7)

# read the buffer and create the dataframe
image_layers_df = pd.read_parquet(buffer1)
image_id_map = pd.read_parquet(buffer3)
image_layer_idmap = pd.read_parquet(buffer7)
containers_df = pd.read_parquet(buffer2)
cont_id_map = pd.read_parquet(buffer4)
telemeter_image_data = pd.read_parquet(buffer5)
telemeter_cont_data = pd.read_parquet(buffer6)

We now have the six sets of dataframes,

  • Image layer dataset (image_layers_df)

  • Image name mapping for sha’s of image_id from image layer dataset (image_id_mapping)

  • Image layer id mapping for sha’s of the layers of the corresponding image_id (image_layer_idmap)

  • Containers dataset (containers_df)

  • Image name mapping for sha’s of the image_id from container dataset (cont_imageid_map)

  • Information from the telemeter for the corresponding cluster_id from image_layers_dataset (telemeter_image_data)

  • Information from the telemeter for the corresponding cluster_id from container dataset (telemeter_cont_data)

Image layer Dataset

image_layers_df.head(2)
cluster_id image_id layer_image_id layer_image_level first_command first_arg archive_path
0 00003d61-9db1-4757-9cd1-84df271daeb9 sha256:337c22cabe530213b14965f9ea69a92dbeb5104... sha256:9ebb302e1fb002fb643091710dac46f8258781d... 0 icTsn2s_EIax 2v1NneeWoS_9 archives/compressed/00/00003d61-9db1-4757-9cd1...
1 00003d61-9db1-4757-9cd1-84df271daeb9 sha256:337c22cabe530213b14965f9ea69a92dbeb5104... sha256:a74396a32e85c2feeedf76052ed3297859810c8... 1 icTsn2s_EIax 2v1NneeWoS_9 archives/compressed/00/00003d61-9db1-4757-9cd1...

Pyxis data for the image SHA’s in image_id column of image_layers dataset

image_id_map = image_id_map.reset_index()
image_id_map.head(2)
image_id License architecture build-date com.redhat.build-host com.redhat.component com.redhat.license_terms description distribution-scope io.k8s.description io.k8s.display-name io.openshift.build.commit.id io.openshift.build.commit.url io.openshift.build.source-location io.openshift.expose-services io.openshift.maintainer.component io.openshift.maintainer.product io.openshift.tags maintainer name release summary url vcs-ref vcs-type vendor version io.openshift.s2i.scripts-url io.s2i.scripts-url usage io.openshift.s2i.assemble-user authoritative-source-url io.fabric8.s2i.version.jolokia org.concrt.version org.jboss.product org.jboss.product.amq.version org.jboss.product.openjdk.version org.jboss.product.version com.redhat.deployments-dir com.redhat.dev-mode io.jenkins.version
0 sha256:337c22cabe530213b14965f9ea69a92dbeb5104... GPLv2+ x86_64 2021-04-30T00:31:42.349887 cpt-1004.osbs.prod.upshift.rdu2.redhat.com ose-cli-artifacts-container https://www.redhat.com/agreements OpenShift is a platform for developing, buildi... public OpenShift is a platform for developing, buildi... OpenShift Clients a765590e1b87b014b9d81f9ea534460d6dff73f2 https://github.com/openshift/oc/commit/a765590... https://github.com/openshift/oc oc OpenShift Container Platform openshift,cli Red Hat, Inc. openshift/ose-cli-artifacts 202104292348.p0 Provides the latest release of Red Hat Univers... https://access.redhat.com/containers/#/registr... 43f412b2932a5ecfe5ebedcab38cf2915cf03813 git Red Hat, Inc. v4.8.0 None None None None None None None None None None None None None None
1 sha256:3574d6c1fcc46e1ebd41b7b887b92035ea18213... GPLv2+ x86_64 2021-04-30T00:22:29.539443 cpt-1008.osbs.prod.upshift.rdu2.redhat.com openshift-enterprise-cli-container https://www.redhat.com/agreements OpenShift is a platform for developing, buildi... public OpenShift is a platform for developing, buildi... OpenShift Client a765590e1b87b014b9d81f9ea534460d6dff73f2 https://github.com/openshift/oc/commit/a765590... https://github.com/openshift/oc oc OpenShift Container Platform openshift,cli Red Hat, Inc. openshift/ose-cli 202104292348.p0 Provides the latest release of Red Hat Univers... https://access.redhat.com/containers/#/registr... d3ae20f4e2fac18ea671cf12636d16791146a460 git Red Hat, Inc. v4.8.0 None None None None None None None None None None None None None None

Telemetry dataset for the corresponding cluster_id from image_layers dataset

telemeter_image_data.rename(columns={"_id": "cluster_id"}, inplace=True)
telemeter_image_data.head(2)
cluster_id timestamp value_workload:cpu_usage_cores:sum value_workload:memory_usage_bytes:sum value_openshift:cpu_usage_cores:sum value_openshift:memory_usage_bytes:sum value_cluster:cpu_usage_cores:sum value_cluster:memory_usage_bytes:sum
0 00003d61-9db1-4757-9cd1-84df271daeb9 1620860062 0.03230751556666692 389971968 1.9291393691359602 14560608256 1.9614468847026272 14950580224
1 00351e6e-53ce-465e-9493-cf0cd2367049 1620852056 0.030943035959259964 385421312 1.7056283926121354 14386823168 1.7365714285713953 14772244480

Containers Dataset

containers_df.head(2)
cluster_id namespace shape shape_instances image_id first_command first_arg init_container archive_path
0 00003d61-9db1-4757-9cd1-84df271daeb9 0LiT6ZNtbpYL sha256:3ecf29979b2722bf4a82a5e7a954e8685820720... 1 sha256:f46f210d6023bec16e68340b484a8881ce46d5e... None 47DEQpj8HBSa False archives/compressed/00/00003d61-9db1-4757-9cd1...
1 00003d61-9db1-4757-9cd1-84df271daeb9 0LiT6ZNtbpYL sha256:3ecf29979b2722bf4a82a5e7a954e8685820720... 1 sha256:edb9aaacf421c6dc45b20324e8699cec02f26bf... n9CdwzVF-cwZ RNOaw_AuQeIY False archives/compressed/00/00003d61-9db1-4757-9cd1...

Pyxis data for the image SHA’s in image_id column of container dataset

cont_id_map = cont_id_map.reset_index()
cont_id_map.head(2)
image_id License architecture build-date com.redhat.build-host com.redhat.component com.redhat.license_terms description distribution-scope io.k8s.description io.k8s.display-name io.openshift.build.commit.id io.openshift.build.commit.url io.openshift.build.source-location io.openshift.expose-services io.openshift.maintainer.component io.openshift.maintainer.product io.openshift.tags maintainer name release summary url vcs-ref vcs-type vendor version io.openshift.maintainer.subcomponent io.openshift.release.operator io.openshift.build.versions com.redhat.delivery.appregistry upstream-vcs-ref upstream-vcs-type upstream-version org.kubevirt.hco.csv-generator.v1 io.openshift.s2i.scripts-url io.s2i.scripts-url usage io.openshift.s2i.assemble-user display-name com.redhat.delivery.operator.bundle com.redhat.openshift.versions io.cekit.version operators.operatorframework.io.bundle.channel.default.v1 operators.operatorframework.io.bundle.channels.v1 operators.operatorframework.io.bundle.manifests.v1 operators.operatorframework.io.bundle.mediatype.v1 operators.operatorframework.io.bundle.metadata.v1 operators.operatorframework.io.bundle.package.v1 licenses CEPH_POINT_RELEASE GIT_BRANCH GIT_CLEAN GIT_COMMIT GIT_REPO RELEASE ocs.tags com.redhat.deployments-dir com.redhat.dev-mode com.redhat.dev-mode.port help operators.operatorframework.io.index.database.v1 authoritative-source-url license io.fabric8.s2i.version.jolokia io.fabric8.s2i.version.maven io.openshift.s2i.destination org.jboss.container.deployments-dir org.jboss.product org.jboss.product.eap.version org.jboss.product.openjdk.version org.jboss.product.sso.version org.jboss.product.version istio_version openshift_build operator_build run org.concrt.version org.jboss.product.amq.version io.openshift.build.commit.author io.openshift.build.commit.date io.openshift.build.commit.message io.openshift.build.commit.ref io.openshift.build.name io.openshift.build.namespace io.openshift.build.source-context-dir jenkins.build.number jenkins.tarball.url io.jenkins.version build-utility org.label-schema.vcs-ref org.label-schema.vcs-url org.label-schema.description org.label-schema.license org.label-schema.name org.label-schema.schema-version org.label-schema.vendor JAVA_VERSION com.ibm.events.commitid com.ibm.eventstreams.base-for-bedrock.icp-linux-amd64.commitid com.ibm.eventstreams.base-for-bedrock.icp-linux-amd64.job com.ibm.eventstreams.base-for-bedrock.icp-linux-amd64.license com.ibm.eventstreams.base-for-bedrock.icp-linux-amd64.maintainer com.ibm.eventstreams.base-for-bedrock.icp-linux-amd64.name com.ibm.eventstreams.base-for-bedrock.icp-linux-amd64.version com.ibm.eventstreams.openjdk-11-sdk-for-bedrock.icp-linux-amd64.commitid com.ibm.eventstreams.openjdk-11-sdk-for-bedrock.icp-linux-amd64.job com.ibm.eventstreams.openjdk-11-sdk-for-bedrock.icp-linux-amd64.license com.ibm.eventstreams.openjdk-11-sdk-for-bedrock.icp-linux-amd64.maintainer com.ibm.eventstreams.openjdk-11-sdk-for-bedrock.icp-linux-amd64.name com.ibm.eventstreams.openjdk-11-sdk-for-bedrock.icp-linux-amd64.version com.redhat.apb.runtime com.microsoft.product com.microsoft.version
0 sha256:f46f210d6023bec16e68340b484a8881ce46d5e... ASL 2.0 x86_64 2021-05-04T21:54:58.392948 cpt-1007.osbs.prod.upshift.rdu2.redhat.com kube-rbac-proxy-container https://www.redhat.com/agreements This is a proxy, that can perform Kubernetes R... public This is a proxy, that can perform Kubernetes R... kube-rbac-proxy 8d11a8fa9ce252cd25794c0d9280cbdc0c2affcb https://github.com/openshift/kube-rbac-proxy/c... https://github.com/openshift/kube-rbac-proxy Monitoring OpenShift Container Platform kubernetes OpenShift Monitoring Team <team-monitoring@red... openshift/ose-kube-rbac-proxy 202105042126.p0 https://access.redhat.com/containers/#/registr... 12ef9d3cc226f6bd4a898d4b23ffa1ec5d3d27f1 git Red Hat, Inc. v4.8.0 None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None
1 sha256:edb9aaacf421c6dc45b20324e8699cec02f26bf... GPLv2+ x86_64 2021-05-04T22:05:48.018566 cpt-1006.osbs.prod.upshift.rdu2.redhat.com ose-multus-admission-controller-container https://www.redhat.com/agreements This is a component of OpenShift Container Pla... public This is a component of OpenShift Container Pla... Container Networking Plugins a7312f5e55e9f34cc8b20f6cbfe1af0f363ca1e6 https://github.com/openshift/multus-admission-... https://github.com/openshift/multus-admission-... Networking OpenShift Container Platform openshift Doug Smith <dosmith@redhat.com> openshift/ose-multus-admission-controller 202105042126.p0 Provides the latest release of Red Hat Univers... https://access.redhat.com/containers/#/registr... 00692865fc2dd0c845bb20c688dbf2cb7e239062 git Red Hat, Inc. v4.8.0 multus None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None

Telemetry dataset for the corresponding cluster_id from containers dataset

telemeter_cont_data.rename(columns={"_id": "cluster_id"}, inplace=True)
telemeter_cont_data.head(2)
cluster_id timestamp value_workload:cpu_usage_cores:sum value_workload:memory_usage_bytes:sum value_openshift:cpu_usage_cores:sum value_openshift:memory_usage_bytes:sum value_cluster:cpu_usage_cores:sum value_cluster:memory_usage_bytes:sum
0 00003d61-9db1-4757-9cd1-84df271daeb9 1620860062 0.03230751556666692 389971968 1.9291393691359602 14560608256 1.9614468847026272 14950580224
1 00351e6e-53ce-465e-9493-cf0cd2367049 1620852056 0.030943035959259964 385421312 1.7056283926121354 14386823168 1.7365714285713953 14772244480

Using the mapping techniques from the issue, we were able to map most (not all) of the image_id from the image layers dataset and containers datasets. Information from the telemetry dataset were also extracted (issue). In the next steps, we try to merge the respective dataset with respect to the image_id and cluster_id.

In the next section, we take the information about the image name from the mapped dataset.

image_id_map = image_id_map[["image_id", "name"]]
cont_id_map = cont_id_map[["image_id", "name"]]

Using the mapping techniques from the issue, we were able to map most (not all) of the image_id from the image layers dataset and containers datsets. In the next steps, we try to merge the two datset with respect to the image_id. The ‘product name’ and ‘summary’ of the image_id that were not mapped were left blank.

In the next section, we merge the two dataset.

Merging the datasets with respect to the image_id

Merged dataframe for image layers dataset

df_image1 = pd.merge(image_layers_df, image_id_map, on="image_id", how="outer")
df_image = pd.merge(df_image1, telemeter_image_data, on="cluster_id", how="outer")
df_image = df_image[
    [
        "cluster_id",
        "name",
        "value_cluster:cpu_usage_cores:sum",
        "value_cluster:memory_usage_bytes:sum",
    ]
]
df_image.head(2)
cluster_id name value_cluster:cpu_usage_cores:sum value_cluster:memory_usage_bytes:sum
0 00003d61-9db1-4757-9cd1-84df271daeb9 openshift/ose-cli-artifacts 1.9614468847026272 14950580224
1 00003d61-9db1-4757-9cd1-84df271daeb9 openshift/ose-cli-artifacts 1.9614468847026272 14950580224

Merged dataframe for containers dataset

df_cont1 = pd.merge(containers_df, cont_id_map, on="image_id", how="outer")
df_cont = pd.merge(df_cont1, telemeter_image_data, on="cluster_id", how="outer")
df_cont = df_cont[
    [
        "cluster_id",
        "name",
        "value_cluster:cpu_usage_cores:sum",
        "value_cluster:memory_usage_bytes:sum",
    ]
]
df_cont.head(2)
cluster_id name value_cluster:cpu_usage_cores:sum value_cluster:memory_usage_bytes:sum
0 00003d61-9db1-4757-9cd1-84df271daeb9 openshift/ose-kube-rbac-proxy 1.9614468847026272 14950580224
1 00003d61-9db1-4757-9cd1-84df271daeb9 openshift/ose-kube-rbac-proxy 1.9614468847026272 14950580224

Dropping the row with nan values for name

df_image = df_image[df_image.name.notnull()]
df_cont = df_cont[df_cont.name.notnull()]
df_image = df_image.reset_index()
df_image = df_image[
    [
        "cluster_id",
        "name",
        "value_cluster:cpu_usage_cores:sum",
        "value_cluster:memory_usage_bytes:sum",
    ]
]
df_cont = df_cont.reset_index()
df_cont = df_cont = df_cont[
    [
        "cluster_id",
        "name",
        "value_cluster:cpu_usage_cores:sum",
        "value_cluster:memory_usage_bytes:sum",
    ]
]
df_cont[
    [
        "value_cluster:cpu_usage_cores:sum",
        "value_cluster:memory_usage_bytes:sum",
    ]
] = df_cont[
    [
        "value_cluster:cpu_usage_cores:sum",
        "value_cluster:memory_usage_bytes:sum",
    ]
].astype(
    float
)

df_image[
    [
        "value_cluster:cpu_usage_cores:sum",
        "value_cluster:memory_usage_bytes:sum",
    ]
] = df_image[
    [
        "value_cluster:cpu_usage_cores:sum",
        "value_cluster:memory_usage_bytes:sum",
    ]
].astype(
    float
)

Dropping the dataset containing null values for telemetry metrics. The values are with respect to cluster_id, if we have a null value for a given cluster_id and corresponding image_name linked to that cluster_id. Then the value is null for other image_name as well which is linked to the same cluster_id.

df_image = df_image[df_image["value_cluster:cpu_usage_cores:sum"].notnull()]
df_image = df_image[df_image["value_cluster:memory_usage_bytes:sum"].notnull()]
df_cont = df_cont[df_cont["value_cluster:cpu_usage_cores:sum"].notnull()]
df_cont = df_cont[df_cont["value_cluster:memory_usage_bytes:sum"].notnull()]

For image_layers dataset

for col in df_image.iloc[:, :-2].columns:
    print(f"Number of unique categories in {col} = {df_image[col].nunique()}\n")
Number of unique categories in cluster_id = 1728

Number of unique categories in name = 25

For container dataset

for col in df_cont.iloc[:, :-2].columns:
    print(f"Number of unique categories in {col} = {df_cont[col].nunique()}\n")
Number of unique categories in cluster_id = 1759

Number of unique categories in name = 350

Encoding categorical features

For encoding the categorical feature, we try to employ One Hot Encoding technique. We are clustering the dataset based on cluster_id and image_id and its corresponding telemetry information for both image_layer dataset and containers dataset.

One Hot Encoding

We have 25 unique name and 1728 cluster_id for image_layer_dataset and 350 different names and 1759 different cluster_id for container dataset. After one hot encoding, we perform Kmeans clustering for both the datasets.

Image_layer_dataset

dummy = pd.get_dummies(df_image.name)
new_image = pd.concat([df_image, dummy], axis=1)
new_image = new_image.drop(columns="name")
new_image = new_image.set_index("cluster_id")
new_image1 = new_image.iloc[:, 2:].groupby(level=0).sum()
new_image2 = new_image.iloc[:, :2].groupby(level=0).mean()
X_image = pd.concat([new_image1, new_image2], axis=1)
X_image.head(2)
3scale-amp2/apicast-gateway-rhel8 3scale-amp2/backend-rhel7 3scale-amp2/memcached-rhel7 3scale-amp2/system-rhel7 3scale-amp2/zync-rhel7 jboss-amq-6/amq63-openshift openshift/ose-cli openshift/ose-cli-artifacts openshift/ose-grafana openshift/ose-jenkins openshift/ose-must-gather openshift/ose-oauth-proxy openshift/ose-tests openshift/ose-tools rhel8/httpd-24 rhel8/mysql-80 rhel8/postgresql-10 rhel8/postgresql-12 rhel8/redis-5 rhscl/mongodb-36-rhel7 rhscl/mysql-57-rhel7 rhscl/postgresql-10-rhel7 rhscl/redis-32-rhel7 ubi8/dotnet-50 ubi8/ruby-27 value_cluster:cpu_usage_cores:sum value_cluster:memory_usage_bytes:sum
cluster_id
00003d61-9db1-4757-9cd1-84df271daeb9 0 0 0 0 0 0 5 6 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 1.961447 1.495058e+10
00351e6e-53ce-465e-9493-cf0cd2367049 0 0 0 0 0 0 5 6 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 1.736571 1.477224e+10

We now standardize the value features from the telemetry, using StandardScaler().

X_image.iloc[:, -2:] = StandardScaler().fit_transform(X_image.iloc[:, -2:])
X_image.head(2)
3scale-amp2/apicast-gateway-rhel8 3scale-amp2/backend-rhel7 3scale-amp2/memcached-rhel7 3scale-amp2/system-rhel7 3scale-amp2/zync-rhel7 jboss-amq-6/amq63-openshift openshift/ose-cli openshift/ose-cli-artifacts openshift/ose-grafana openshift/ose-jenkins openshift/ose-must-gather openshift/ose-oauth-proxy openshift/ose-tests openshift/ose-tools rhel8/httpd-24 rhel8/mysql-80 rhel8/postgresql-10 rhel8/postgresql-12 rhel8/redis-5 rhscl/mongodb-36-rhel7 rhscl/mysql-57-rhel7 rhscl/postgresql-10-rhel7 rhscl/redis-32-rhel7 ubi8/dotnet-50 ubi8/ruby-27 value_cluster:cpu_usage_cores:sum value_cluster:memory_usage_bytes:sum
cluster_id
00003d61-9db1-4757-9cd1-84df271daeb9 0 0 0 0 0 0 5 6 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.321468 -0.266216
00351e6e-53ce-465e-9493-cf0cd2367049 0 0 0 0 0 0 5 6 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.367923 -0.270567

The 25 unique categories for name and 1728 categories for cluster_id.

Encoding for name and cluster_id in container dataset

df_cont_name = df_cont
dummy = pd.get_dummies(df_cont_name.name)
new_cont = pd.concat([df_cont_name, dummy], axis=1)
new_cont = new_cont.drop(columns="name")
new_cont = new_cont.set_index("cluster_id")
new_cont1 = new_cont.iloc[:, 2:].groupby(level=0).sum()
new_cont2 = new_cont.iloc[:, :2].groupby(level=0).mean()
X_cont = pd.concat([new_cont1, new_cont2], axis=1)
X_cont.head(2)
3scale-amp2/3scale-rhel7-operator 3scale-amp2/apicast-gateway-rhel8 3scale-amp2/backend-rhel7 3scale-amp2/memcached-rhel7 3scale-amp2/system-rhel7 3scale-amp2/zync-rhel7 Calico Operator Cilium Cloud Native PostgreSQL Operator Elastic Cloud on Kubernetes F5 BIG-IP Controller Operator NGINX Ingress Operator NVIDIA GPU Operator Seldon Operator alertmanager amq7/amq-broker-rhel7-operator amq7/amq-streams-kafka-25-rhel7 amq7/amq-streams-kafka-26-rhel7 amq7/amq-streams-rhel7-operator amq7/amq-streams-rhel7-operator-metadata ansible-automation-platform/platform-resource-rhel7-operator ansible-tower cephcsi cert-manager cainjector cert-manager controller cert-manager webhook codeready-workspaces/devfileregistry-rhel8 codeready-workspaces/operator codeready-workspaces/pluginregistry-rhel8 codeready-workspaces/server-rhel8 collectd-exporter container-native-virtualization/bridge-marker container-native-virtualization/cluster-network-addons-operator container-native-virtualization/cnv-containernetworking-plugins container-native-virtualization/hostpath-provisioner-rhel8-operator container-native-virtualization/hyperconverged-cluster-operator container-native-virtualization/hyperconverged-cluster-webhook-rhel8 container-native-virtualization/kubemacpool container-native-virtualization/kubernetes-nmstate-handler-rhel8 container-native-virtualization/kubevirt-cpu-model-nfd-plugin container-native-virtualization/kubevirt-cpu-node-labeller container-native-virtualization/kubevirt-kvm-info-nfd-plugin container-native-virtualization/kubevirt-ssp-operator container-native-virtualization/node-maintenance-operator container-native-virtualization/ovs-cni-marker container-native-virtualization/ovs-cni-plugin container-native-virtualization/virt-api container-native-virtualization/virt-cdi-apiserver container-native-virtualization/virt-cdi-controller container-native-virtualization/virt-cdi-importer container-native-virtualization/virt-cdi-operator container-native-virtualization/virt-cdi-uploadproxy container-native-virtualization/virt-controller container-native-virtualization/virt-handler container-native-virtualization/virt-launcher container-native-virtualization/virt-operator container-native-virtualization/vm-import-controller-rhel8 container-native-virtualization/vm-import-operator-rhel8 costmanagement-metrics-operator distributed-tracing/jaeger-rhel8-operator grafana ibm common service webhook ibm-events-operator ibm-postgresql jboss-amq-6/amq63-openshift jboss-eap-7/eap73-rhel8-operator kube-state-metrics mcg-core mcg-operator mcr.microsoft.com/mssql/rhel8/server must-gather-service ocp-tools-4/odo-init-image ocs-operator ocs-registry openshift-compliance-content openshift-gitops-1-tech-preview/argocd-rhel8 openshift-gitops-1-tech-preview/gitops-rhel8 openshift-gitops-1-tech-preview/gitops-rhel8-operator openshift-gitops-1-tech-preview/kam-delivery-rhel8 openshift-gitops-1/gitops-rhel8 openshift-gitops-1/kam-delivery-rhel8 openshift-logging/cluster-logging-operator-bundle openshift-logging/cluster-logging-rhel8-operator openshift-logging/elasticsearch-operator-bundle openshift-logging/elasticsearch-proxy-rhel8 openshift-logging/elasticsearch-rhel8-operator openshift-logging/elasticsearch6-rhel8 openshift-logging/fluentd-rhel8 openshift-logging/kibana6-rhel8 openshift-pipelines-tech-preview/pipelines-controller-rhel8 openshift-pipelines-tech-preview/pipelines-operator-proxy-rhel8 openshift-pipelines-tech-preview/pipelines-rhel8-operator openshift-pipelines-tech-preview/pipelines-triggers-controller-rhel8 openshift-pipelines-tech-preview/pipelines-triggers-webhook-rhel8 openshift-pipelines-tech-preview/pipelines-webhook-rhel8 openshift-pipelines/pipelines-controller-rhel8 openshift-pipelines/pipelines-operator-proxy-rhel8 openshift-pipelines/pipelines-rhel8-operator openshift-pipelines/pipelines-triggers-controller-rhel8 openshift-pipelines/pipelines-triggers-core-interceptors-rhel8 openshift-pipelines/pipelines-triggers-eventlistenersink-rhel8 openshift-pipelines/pipelines-triggers-webhook-rhel8 openshift-pipelines/pipelines-webhook-rhel8 openshift-sandboxed-containers-operator openshift-serverless-1-tech-preview/eventing-kafka-channel-controller-rhel8 openshift-serverless-1/eventing-in-memory-channel-controller-rhel8 openshift-serverless-1/eventing-in-memory-channel-dispatcher-rhel8 openshift-serverless-1/eventing-mtbroker-ingress-rhel8 openshift-serverless-1/eventing-mtchannel-broker-rhel8 openshift-serverless-1/eventing-mtping-rhel8 openshift-serverless-1/eventing-sugar-controller-rhel8 openshift-serverless-1/eventing-webhook-rhel8 openshift-serverless-1/ingress-rhel8-operator openshift-serverless-1/knative-rhel8-operator openshift-serverless-1/kourier-control-rhel8 openshift-serverless-1/serverless-rhel8-operator openshift-serverless-1/serving-activator-rhel8 openshift-serverless-1/serving-autoscaler-hpa-rhel8 openshift-serverless-1/serving-autoscaler-rhel8 openshift-serverless-1/serving-controller-rhel8 openshift-serverless-1/serving-domain-mapping-rhel8 openshift-serverless-1/serving-domain-mapping-webhook-rhel8 openshift-serverless-1/serving-queue-rhel8 openshift-service-mesh/galley-rhel8 openshift-service-mesh/grafana-rhel8 openshift-service-mesh/istio-cni-rhel8 openshift-service-mesh/istio-rhel8-operator openshift-service-mesh/kiali-rhel7 openshift-service-mesh/kiali-rhel8 openshift-service-mesh/kiali-rhel8-operator openshift-service-mesh/pilot-rhel8 openshift-service-mesh/prometheus-rhel8 openshift-service-mesh/proxyv2-rhel8 openshift/compliance-operator openshift/ose-aws-ebs-csi-driver openshift/ose-aws-ebs-csi-driver-operator openshift/ose-aws-machine-controllers openshift/ose-aws-pod-identity-webhook openshift/ose-azure-disk-csi-driver openshift/ose-azure-disk-csi-driver-operator openshift/ose-azure-machine-controllers openshift/ose-baremetal-machine-controllers openshift/ose-baremetal-operator openshift/ose-baremetal-runtimecfg openshift/ose-cli openshift/ose-cli-artifacts openshift/ose-cloud-credential-operator openshift/ose-cluster-authentication-operator openshift/ose-cluster-autoscaler openshift/ose-cluster-autoscaler-operator openshift/ose-cluster-baremetal-operator openshift/ose-cluster-config-operator openshift/ose-cluster-csi-snapshot-controller-operator openshift/ose-cluster-dns-operator openshift/ose-cluster-etcd-operator openshift/ose-cluster-image-registry-operator openshift/ose-cluster-ingress-operator openshift/ose-cluster-kube-apiserver-operator openshift/ose-cluster-kube-controller-manager-operator openshift/ose-cluster-kube-descheduler-operator openshift/ose-cluster-kube-scheduler-operator openshift/ose-cluster-kube-storage-version-migrator-operator openshift/ose-cluster-logging-operator openshift/ose-cluster-machine-approver openshift/ose-cluster-monitoring-operator openshift/ose-cluster-network-operator openshift/ose-cluster-nfd-operator openshift/ose-cluster-node-tuning-operator openshift/ose-cluster-openshift-apiserver-operator openshift/ose-cluster-openshift-controller-manager-operator openshift/ose-cluster-policy-controller openshift/ose-cluster-samples-operator openshift/ose-cluster-storage-operator openshift/ose-clusterresourceoverride-rhel8-operator openshift/ose-configmap-reloader openshift/ose-console openshift/ose-console-operator openshift/ose-container-networking-plugins openshift/ose-coredns openshift/ose-csi-driver-manila openshift/ose-csi-driver-manila-operator openshift/ose-csi-driver-nfs openshift/ose-csi-external-attacher openshift/ose-csi-external-provisioner openshift/ose-csi-external-resizer openshift/ose-csi-external-snapshotter openshift/ose-csi-livenessprobe openshift/ose-csi-node-driver-registrar openshift/ose-csi-snapshot-controller openshift/ose-csi-snapshot-validation-webhook openshift/ose-deployer openshift/ose-docker-builder openshift/ose-docker-registry openshift/ose-egress-router-cni openshift/ose-etcd openshift/ose-gcp-machine-controllers openshift/ose-gcp-pd-csi-driver openshift/ose-gcp-pd-csi-driver-operator openshift/ose-grafana openshift/ose-haproxy-router openshift/ose-hyperkube openshift/ose-insights-operator openshift/ose-installer openshift/ose-ironic openshift/ose-ironic-inspector openshift/ose-ironic-ipa-downloader openshift/ose-ironic-machine-os-downloader openshift/ose-ironic-static-ip-manager openshift/ose-jenkins openshift/ose-k8s-prometheus-adapter openshift/ose-keepalived-ipfailover openshift/ose-kube-rbac-proxy openshift/ose-kube-state-metrics openshift/ose-kube-storage-version-migrator openshift/ose-kuryr-cni openshift/ose-kuryr-controller openshift/ose-libvirt-machine-controllers openshift/ose-local-storage-diskmaker openshift/ose-local-storage-operator openshift/ose-local-storage-operator-bundle openshift/ose-local-storage-static-provisioner openshift/ose-logging-elasticsearch6 openshift/ose-machine-api-operator openshift/ose-machine-config-operator openshift/ose-mdns-publisher openshift/ose-multus-admission-controller openshift/ose-multus-cni openshift/ose-multus-route-override-cni openshift/ose-multus-whereabouts-ipam-cni openshift/ose-must-gather openshift/ose-network-metrics-daemon openshift/ose-oauth-apiserver openshift/ose-oauth-proxy openshift/ose-oauth-server openshift/ose-openshift-apiserver openshift/ose-openshift-controller-manager openshift/ose-openshift-state-metrics openshift/ose-openstack-cinder-csi-driver openshift/ose-openstack-cinder-csi-driver-operator openshift/ose-openstack-machine-controllers openshift/ose-operator-lifecycle-manager openshift/ose-operator-marketplace openshift/ose-operator-registry openshift/ose-ovirt-csi-driver openshift/ose-ovirt-csi-driver-operator openshift/ose-ovirt-machine-controllers openshift/ose-ovn-kubernetes openshift/ose-prom-label-proxy openshift/ose-prometheus openshift/ose-prometheus-alertmanager openshift/ose-prometheus-config-reloader openshift/ose-prometheus-node-exporter openshift/ose-prometheus-operator openshift/ose-ptp openshift/ose-ptp-operator openshift/ose-sdn openshift/ose-service-ca-operator openshift/ose-sriov-cni openshift/ose-sriov-dp-admission-controller openshift/ose-sriov-infiniband-cni openshift/ose-sriov-network-config-daemon openshift/ose-sriov-network-device-plugin openshift/ose-sriov-network-operator openshift/ose-sriov-network-webhook openshift/ose-telemeter openshift/ose-template-service-broker openshift/ose-template-service-broker-operator openshift/ose-tests openshift/ose-thanos openshift/ose-tools openshift/ose-vertical-pod-autoscaler-rhel8-operator openshift/ose-vsphere-problem-detector openshift4/performance-addon-rhel8-operator quay/quay-container-security-operator-container quay/quay-container-security-operator-rhel8 quay/quay-operator-rhel8 rh-sso-7/sso74-openj9-openshift-rhel8 rhacm2/application-ui-rhel8 rhacm2/cert-manager-controller-rhel8 rhacm2/cert-policy-controller-rhel8 rhacm2/clusterlifecycle-state-metrics-rhel8 rhacm2/config-policy-controller-rhel8 rhacm2/console-api-rhel8 rhacm2/console-header-rhel8 rhacm2/console-rhel8 rhacm2/discovery-rhel8-operator rhacm2/endpoint-component-rhel8-operator rhacm2/endpoint-rhel8-operator rhacm2/governance-policy-propagator-rhel8 rhacm2/governance-policy-spec-sync-rhel8 rhacm2/governance-policy-status-sync-rhel8 rhacm2/governance-policy-template-sync-rhel8 rhacm2/grafana rhacm2/grc-ui-api-rhel8 rhacm2/grc-ui-rhel8 rhacm2/iam-policy-controller-rhel8 rhacm2/insights-client-rhel8 rhacm2/klusterlet-addon-controller-rhel8 rhacm2/klusterlet-addon-lease-controller-rhel8 rhacm2/klusterlet-addon-operator-rhel8 rhacm2/managedcluster-import-controller-rhel8 rhacm2/multicloud-manager-rhel8 rhacm2/multicluster-observability-rhel8-operator rhacm2/multicluster-operators-application-rhel8 rhacm2/multicluster-operators-channel-rhel8 rhacm2/multicluster-operators-deployable-rhel8 rhacm2/multicluster-operators-placementrule-rhel8 rhacm2/multicluster-operators-subscription-rhel8 rhacm2/multiclusterhub-repo-rhel8 rhacm2/multiclusterhub-rhel8 rhacm2/openshift-hive-rhel7 rhacm2/prometheus-alertmanager-rhel8 rhacm2/provider-credential-controller-rhel8 rhacm2/rcm-controller-rhel8 rhacm2/redisgraph-tls-rhel8 rhacm2/registration-rhel8 rhacm2/registration-rhel8-operator rhacm2/search-aggregator-rhel8 rhacm2/search-collector-rhel8 rhacm2/search-rhel8 rhacm2/search-ui-rhel8 rhacm2/submariner-addon-rhel8 rhacm2/thanos-receive-controller-rhel8 rhacm2/thanos-rhel7 rhacm2/work-rhel8 rhceph rhel7/couchbase-operator-admission rhel8/httpd-24 rhel8/mysql-80 rhel8/postgresql-10 rhel8/postgresql-12 rhel8/postgresql-96 rhel8/redis-5 rhmtc/openshift-migration-controller rhmtc/openshift-migration-operator rhmtc/openshift-migration-velero rhmtc/openshift-migration-velero-plugin-for-aws rhmtc/openshift-migration-velero-plugin-for-gcp rhmtc/openshift-migration-velero-plugin-for-microsoft-azure rhscl/mongodb-36-rhel7 rhscl/mysql-57-rhel7 rhscl/postgresql-10-rhel7 rhscl/postgresql-96-rhel7 rhscl/redis-32-rhel7 rook-ceph ubi8 ubi8/dotnet-50 ubi8/ruby-27 ubi8/ubi8-init volume-replication-operator value_cluster:cpu_usage_cores:sum value_cluster:memory_usage_bytes:sum
cluster_id
00003d61-9db1-4757-9cd1-84df271daeb9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 1 1 1 2 6 3 0 3 1 0 1 1 3 0 2 1 1 1 2 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 5 0 0 0 1 1 5 1 0 0 0 0 0 0 0 1 0 24 1 1 0 0 0 0 0 0 0 0 1 4 0 1 2 1 2 0 1 2 5 1 2 1 1 0 0 0 3 1 0 0 0 0 0 3 1 1 3 2 1 0 0 2 2 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.961447 1.495058e+10
002663ad-bcf4-4c7c-9530-ecb351fe4001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.345714 2.524869e+10
X_cont.iloc[:, -2:] = StandardScaler().fit_transform(X_cont.iloc[:, -2:])
X_cont.head(2)
3scale-amp2/3scale-rhel7-operator 3scale-amp2/apicast-gateway-rhel8 3scale-amp2/backend-rhel7 3scale-amp2/memcached-rhel7 3scale-amp2/system-rhel7 3scale-amp2/zync-rhel7 Calico Operator Cilium Cloud Native PostgreSQL Operator Elastic Cloud on Kubernetes F5 BIG-IP Controller Operator NGINX Ingress Operator NVIDIA GPU Operator Seldon Operator alertmanager amq7/amq-broker-rhel7-operator amq7/amq-streams-kafka-25-rhel7 amq7/amq-streams-kafka-26-rhel7 amq7/amq-streams-rhel7-operator amq7/amq-streams-rhel7-operator-metadata ansible-automation-platform/platform-resource-rhel7-operator ansible-tower cephcsi cert-manager cainjector cert-manager controller cert-manager webhook codeready-workspaces/devfileregistry-rhel8 codeready-workspaces/operator codeready-workspaces/pluginregistry-rhel8 codeready-workspaces/server-rhel8 collectd-exporter container-native-virtualization/bridge-marker container-native-virtualization/cluster-network-addons-operator container-native-virtualization/cnv-containernetworking-plugins container-native-virtualization/hostpath-provisioner-rhel8-operator container-native-virtualization/hyperconverged-cluster-operator container-native-virtualization/hyperconverged-cluster-webhook-rhel8 container-native-virtualization/kubemacpool container-native-virtualization/kubernetes-nmstate-handler-rhel8 container-native-virtualization/kubevirt-cpu-model-nfd-plugin container-native-virtualization/kubevirt-cpu-node-labeller container-native-virtualization/kubevirt-kvm-info-nfd-plugin container-native-virtualization/kubevirt-ssp-operator container-native-virtualization/node-maintenance-operator container-native-virtualization/ovs-cni-marker container-native-virtualization/ovs-cni-plugin container-native-virtualization/virt-api container-native-virtualization/virt-cdi-apiserver container-native-virtualization/virt-cdi-controller container-native-virtualization/virt-cdi-importer container-native-virtualization/virt-cdi-operator container-native-virtualization/virt-cdi-uploadproxy container-native-virtualization/virt-controller container-native-virtualization/virt-handler container-native-virtualization/virt-launcher container-native-virtualization/virt-operator container-native-virtualization/vm-import-controller-rhel8 container-native-virtualization/vm-import-operator-rhel8 costmanagement-metrics-operator distributed-tracing/jaeger-rhel8-operator grafana ibm common service webhook ibm-events-operator ibm-postgresql jboss-amq-6/amq63-openshift jboss-eap-7/eap73-rhel8-operator kube-state-metrics mcg-core mcg-operator mcr.microsoft.com/mssql/rhel8/server must-gather-service ocp-tools-4/odo-init-image ocs-operator ocs-registry openshift-compliance-content openshift-gitops-1-tech-preview/argocd-rhel8 openshift-gitops-1-tech-preview/gitops-rhel8 openshift-gitops-1-tech-preview/gitops-rhel8-operator openshift-gitops-1-tech-preview/kam-delivery-rhel8 openshift-gitops-1/gitops-rhel8 openshift-gitops-1/kam-delivery-rhel8 openshift-logging/cluster-logging-operator-bundle openshift-logging/cluster-logging-rhel8-operator openshift-logging/elasticsearch-operator-bundle openshift-logging/elasticsearch-proxy-rhel8 openshift-logging/elasticsearch-rhel8-operator openshift-logging/elasticsearch6-rhel8 openshift-logging/fluentd-rhel8 openshift-logging/kibana6-rhel8 openshift-pipelines-tech-preview/pipelines-controller-rhel8 openshift-pipelines-tech-preview/pipelines-operator-proxy-rhel8 openshift-pipelines-tech-preview/pipelines-rhel8-operator openshift-pipelines-tech-preview/pipelines-triggers-controller-rhel8 openshift-pipelines-tech-preview/pipelines-triggers-webhook-rhel8 openshift-pipelines-tech-preview/pipelines-webhook-rhel8 openshift-pipelines/pipelines-controller-rhel8 openshift-pipelines/pipelines-operator-proxy-rhel8 openshift-pipelines/pipelines-rhel8-operator openshift-pipelines/pipelines-triggers-controller-rhel8 openshift-pipelines/pipelines-triggers-core-interceptors-rhel8 openshift-pipelines/pipelines-triggers-eventlistenersink-rhel8 openshift-pipelines/pipelines-triggers-webhook-rhel8 openshift-pipelines/pipelines-webhook-rhel8 openshift-sandboxed-containers-operator openshift-serverless-1-tech-preview/eventing-kafka-channel-controller-rhel8 openshift-serverless-1/eventing-in-memory-channel-controller-rhel8 openshift-serverless-1/eventing-in-memory-channel-dispatcher-rhel8 openshift-serverless-1/eventing-mtbroker-ingress-rhel8 openshift-serverless-1/eventing-mtchannel-broker-rhel8 openshift-serverless-1/eventing-mtping-rhel8 openshift-serverless-1/eventing-sugar-controller-rhel8 openshift-serverless-1/eventing-webhook-rhel8 openshift-serverless-1/ingress-rhel8-operator openshift-serverless-1/knative-rhel8-operator openshift-serverless-1/kourier-control-rhel8 openshift-serverless-1/serverless-rhel8-operator openshift-serverless-1/serving-activator-rhel8 openshift-serverless-1/serving-autoscaler-hpa-rhel8 openshift-serverless-1/serving-autoscaler-rhel8 openshift-serverless-1/serving-controller-rhel8 openshift-serverless-1/serving-domain-mapping-rhel8 openshift-serverless-1/serving-domain-mapping-webhook-rhel8 openshift-serverless-1/serving-queue-rhel8 openshift-service-mesh/galley-rhel8 openshift-service-mesh/grafana-rhel8 openshift-service-mesh/istio-cni-rhel8 openshift-service-mesh/istio-rhel8-operator openshift-service-mesh/kiali-rhel7 openshift-service-mesh/kiali-rhel8 openshift-service-mesh/kiali-rhel8-operator openshift-service-mesh/pilot-rhel8 openshift-service-mesh/prometheus-rhel8 openshift-service-mesh/proxyv2-rhel8 openshift/compliance-operator openshift/ose-aws-ebs-csi-driver openshift/ose-aws-ebs-csi-driver-operator openshift/ose-aws-machine-controllers openshift/ose-aws-pod-identity-webhook openshift/ose-azure-disk-csi-driver openshift/ose-azure-disk-csi-driver-operator openshift/ose-azure-machine-controllers openshift/ose-baremetal-machine-controllers openshift/ose-baremetal-operator openshift/ose-baremetal-runtimecfg openshift/ose-cli openshift/ose-cli-artifacts openshift/ose-cloud-credential-operator openshift/ose-cluster-authentication-operator openshift/ose-cluster-autoscaler openshift/ose-cluster-autoscaler-operator openshift/ose-cluster-baremetal-operator openshift/ose-cluster-config-operator openshift/ose-cluster-csi-snapshot-controller-operator openshift/ose-cluster-dns-operator openshift/ose-cluster-etcd-operator openshift/ose-cluster-image-registry-operator openshift/ose-cluster-ingress-operator openshift/ose-cluster-kube-apiserver-operator openshift/ose-cluster-kube-controller-manager-operator openshift/ose-cluster-kube-descheduler-operator openshift/ose-cluster-kube-scheduler-operator openshift/ose-cluster-kube-storage-version-migrator-operator openshift/ose-cluster-logging-operator openshift/ose-cluster-machine-approver openshift/ose-cluster-monitoring-operator openshift/ose-cluster-network-operator openshift/ose-cluster-nfd-operator openshift/ose-cluster-node-tuning-operator openshift/ose-cluster-openshift-apiserver-operator openshift/ose-cluster-openshift-controller-manager-operator openshift/ose-cluster-policy-controller openshift/ose-cluster-samples-operator openshift/ose-cluster-storage-operator openshift/ose-clusterresourceoverride-rhel8-operator openshift/ose-configmap-reloader openshift/ose-console openshift/ose-console-operator openshift/ose-container-networking-plugins openshift/ose-coredns openshift/ose-csi-driver-manila openshift/ose-csi-driver-manila-operator openshift/ose-csi-driver-nfs openshift/ose-csi-external-attacher openshift/ose-csi-external-provisioner openshift/ose-csi-external-resizer openshift/ose-csi-external-snapshotter openshift/ose-csi-livenessprobe openshift/ose-csi-node-driver-registrar openshift/ose-csi-snapshot-controller openshift/ose-csi-snapshot-validation-webhook openshift/ose-deployer openshift/ose-docker-builder openshift/ose-docker-registry openshift/ose-egress-router-cni openshift/ose-etcd openshift/ose-gcp-machine-controllers openshift/ose-gcp-pd-csi-driver openshift/ose-gcp-pd-csi-driver-operator openshift/ose-grafana openshift/ose-haproxy-router openshift/ose-hyperkube openshift/ose-insights-operator openshift/ose-installer openshift/ose-ironic openshift/ose-ironic-inspector openshift/ose-ironic-ipa-downloader openshift/ose-ironic-machine-os-downloader openshift/ose-ironic-static-ip-manager openshift/ose-jenkins openshift/ose-k8s-prometheus-adapter openshift/ose-keepalived-ipfailover openshift/ose-kube-rbac-proxy openshift/ose-kube-state-metrics openshift/ose-kube-storage-version-migrator openshift/ose-kuryr-cni openshift/ose-kuryr-controller openshift/ose-libvirt-machine-controllers openshift/ose-local-storage-diskmaker openshift/ose-local-storage-operator openshift/ose-local-storage-operator-bundle openshift/ose-local-storage-static-provisioner openshift/ose-logging-elasticsearch6 openshift/ose-machine-api-operator openshift/ose-machine-config-operator openshift/ose-mdns-publisher openshift/ose-multus-admission-controller openshift/ose-multus-cni openshift/ose-multus-route-override-cni openshift/ose-multus-whereabouts-ipam-cni openshift/ose-must-gather openshift/ose-network-metrics-daemon openshift/ose-oauth-apiserver openshift/ose-oauth-proxy openshift/ose-oauth-server openshift/ose-openshift-apiserver openshift/ose-openshift-controller-manager openshift/ose-openshift-state-metrics openshift/ose-openstack-cinder-csi-driver openshift/ose-openstack-cinder-csi-driver-operator openshift/ose-openstack-machine-controllers openshift/ose-operator-lifecycle-manager openshift/ose-operator-marketplace openshift/ose-operator-registry openshift/ose-ovirt-csi-driver openshift/ose-ovirt-csi-driver-operator openshift/ose-ovirt-machine-controllers openshift/ose-ovn-kubernetes openshift/ose-prom-label-proxy openshift/ose-prometheus openshift/ose-prometheus-alertmanager openshift/ose-prometheus-config-reloader openshift/ose-prometheus-node-exporter openshift/ose-prometheus-operator openshift/ose-ptp openshift/ose-ptp-operator openshift/ose-sdn openshift/ose-service-ca-operator openshift/ose-sriov-cni openshift/ose-sriov-dp-admission-controller openshift/ose-sriov-infiniband-cni openshift/ose-sriov-network-config-daemon openshift/ose-sriov-network-device-plugin openshift/ose-sriov-network-operator openshift/ose-sriov-network-webhook openshift/ose-telemeter openshift/ose-template-service-broker openshift/ose-template-service-broker-operator openshift/ose-tests openshift/ose-thanos openshift/ose-tools openshift/ose-vertical-pod-autoscaler-rhel8-operator openshift/ose-vsphere-problem-detector openshift4/performance-addon-rhel8-operator quay/quay-container-security-operator-container quay/quay-container-security-operator-rhel8 quay/quay-operator-rhel8 rh-sso-7/sso74-openj9-openshift-rhel8 rhacm2/application-ui-rhel8 rhacm2/cert-manager-controller-rhel8 rhacm2/cert-policy-controller-rhel8 rhacm2/clusterlifecycle-state-metrics-rhel8 rhacm2/config-policy-controller-rhel8 rhacm2/console-api-rhel8 rhacm2/console-header-rhel8 rhacm2/console-rhel8 rhacm2/discovery-rhel8-operator rhacm2/endpoint-component-rhel8-operator rhacm2/endpoint-rhel8-operator rhacm2/governance-policy-propagator-rhel8 rhacm2/governance-policy-spec-sync-rhel8 rhacm2/governance-policy-status-sync-rhel8 rhacm2/governance-policy-template-sync-rhel8 rhacm2/grafana rhacm2/grc-ui-api-rhel8 rhacm2/grc-ui-rhel8 rhacm2/iam-policy-controller-rhel8 rhacm2/insights-client-rhel8 rhacm2/klusterlet-addon-controller-rhel8 rhacm2/klusterlet-addon-lease-controller-rhel8 rhacm2/klusterlet-addon-operator-rhel8 rhacm2/managedcluster-import-controller-rhel8 rhacm2/multicloud-manager-rhel8 rhacm2/multicluster-observability-rhel8-operator rhacm2/multicluster-operators-application-rhel8 rhacm2/multicluster-operators-channel-rhel8 rhacm2/multicluster-operators-deployable-rhel8 rhacm2/multicluster-operators-placementrule-rhel8 rhacm2/multicluster-operators-subscription-rhel8 rhacm2/multiclusterhub-repo-rhel8 rhacm2/multiclusterhub-rhel8 rhacm2/openshift-hive-rhel7 rhacm2/prometheus-alertmanager-rhel8 rhacm2/provider-credential-controller-rhel8 rhacm2/rcm-controller-rhel8 rhacm2/redisgraph-tls-rhel8 rhacm2/registration-rhel8 rhacm2/registration-rhel8-operator rhacm2/search-aggregator-rhel8 rhacm2/search-collector-rhel8 rhacm2/search-rhel8 rhacm2/search-ui-rhel8 rhacm2/submariner-addon-rhel8 rhacm2/thanos-receive-controller-rhel8 rhacm2/thanos-rhel7 rhacm2/work-rhel8 rhceph rhel7/couchbase-operator-admission rhel8/httpd-24 rhel8/mysql-80 rhel8/postgresql-10 rhel8/postgresql-12 rhel8/postgresql-96 rhel8/redis-5 rhmtc/openshift-migration-controller rhmtc/openshift-migration-operator rhmtc/openshift-migration-velero rhmtc/openshift-migration-velero-plugin-for-aws rhmtc/openshift-migration-velero-plugin-for-gcp rhmtc/openshift-migration-velero-plugin-for-microsoft-azure rhscl/mongodb-36-rhel7 rhscl/mysql-57-rhel7 rhscl/postgresql-10-rhel7 rhscl/postgresql-96-rhel7 rhscl/redis-32-rhel7 rook-ceph ubi8 ubi8/dotnet-50 ubi8/ruby-27 ubi8/ubi8-init volume-replication-operator value_cluster:cpu_usage_cores:sum value_cluster:memory_usage_bytes:sum
cluster_id
00003d61-9db1-4757-9cd1-84df271daeb9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 1 1 1 2 6 3 0 3 1 0 1 1 3 0 2 1 1 1 2 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 5 0 0 0 1 1 5 1 0 0 0 0 0 0 0 1 0 24 1 1 0 0 0 0 0 0 0 0 1 4 0 1 2 1 2 0 1 2 5 1 2 1 1 0 0 0 3 1 0 0 0 0 0 3 1 1 3 2 1 0 0 2 2 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.330175 -0.268450
002663ad-bcf4-4c7c-9530-ecb351fe4001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.164542 -0.014971
X_cont.shape
(1759, 352)

Here, the 353 unique name and 1759 cluster_id.

In the next step, we will perform the PCA in order to visualize different components in 2-D and 3-D.

3D plot for image_layer_dataset

pca3 = PCA(n_components=3)
pc3 = pca3.fit_transform(X_image)

3D visualization of image_layers dataset

# Visualizing the results of the 3D PCA.
ax = plt.figure(figsize=(10, 10)).gca(projection="3d")
plt.title("3D Principal Component Analysis (PCA)")
ax.scatter(
    xs=pc3[:, 0],
    ys=pc3[:, 1],
    zs=pc3[:, 2],
)
ax.set_xlabel("pca-one")
ax.set_ylabel("pca-two")
ax.set_zlabel("pca-three")
plt.title("PCA for image_layers dataset (3D)")
plt.show()
../_images/ML_clustering_part2_61_0.png

Interactive 3D plot for image layer dataset

fig = px.scatter_3d(pc3, x=pc3[:, 0], y=pc3[:, 1], z=pc3[:, 2], opacity=1)
fig.update_traces(marker=dict(size=2))
fig.show()

3D principal component analysis for container dataset

pcc3 = pca3.fit_transform(X_cont)
# Visualizing the results of the 3D PCA.
ax = plt.figure(figsize=(10, 10)).gca(projection="3d")
plt.title("3D Principal Component Analysis (PCA)")
ax.scatter(
    xs=pcc3[:, 0],
    ys=pcc3[:, 1],
    zs=pcc3[:, 2],
)
ax.set_xlabel("pca-one")
ax.set_ylabel("pca-two")
ax.set_zlabel("pca-three")
plt.title("PCA for containers dataset (3D)")
plt.show()
../_images/ML_clustering_part2_66_0.png

3D interactive visualization

fig = px.scatter_3d(pcc3, x=pcc3[:, 0], y=pcc3[:, 1], z=pcc3[:, 2], opacity=1)
fig.update_traces(marker=dict(size=2))
fig.show()

Now that we have visualized both image_layer dataset and container dataset in 3D. We now implement Kmeans to identify different clusters in the set.

Before applying KMeans by inserting the number of clusters, we first find the optimal number of clusters in the dataset by using the Elbow method.

The Elbow Method

The elbow method is used to determine the optimal numbers of clusters in K-means clustering. The elbow method plots the value of inertia produced by different values of K.

Inertia is the within-cluster sum of squares criterion. It is a metric that shows how internally coherant the clusters are. It measures how well a dataset was clustered by K-Means. It is calculated by measuring the distance between each data point and its centroid, squaring this distance, and summing these squares across one cluster. A good model is one with low inertia AND a low number of clusters ( K ).

Applying the elbow method for the image layer dataset, we set the range of cluster from 1 - 10.

# How to find the best number of clusters?

no_of_clusters = range(1, 10)
inertia = []

for f in tqdm(no_of_clusters):
    kmeans = KMeans(n_clusters=f, random_state=2)
    kmeans = kmeans.fit(X_image)
    u = kmeans.inertia_
    inertia.append(u)
100%|██████████| 9/9 [00:08<00:00,  1.08it/s]

Inertia vs Number of cluster plot

fig, (ax1) = plt.subplots(1, figsize=(16, 6))
xx = np.arange(len(no_of_clusters))
ax1.plot(xx, inertia)
ax1.set_xticks(xx)
ax1.set_xticklabels(no_of_clusters, rotation="horizontal")
plt.xlabel("Number of clusters")
plt.ylabel("SSE")
plt.title("Inertia plot per K")
Text(0.5, 1.0, 'Inertia plot per K')
../_images/ML_clustering_part2_75_1.png

From the above plot, it is seen that the optimal number of clusters for image layers dataset is 3. We will later apply it when implementing KMeans algorithm.

Next, we search for optimal number of clusters for container dataset.

# How to find the best number of clusters?

no_of_clusters = range(1, 10)
inertia = []

for f in tqdm(no_of_clusters):
    kmeans = KMeans(n_clusters=f, random_state=2)
    kmeans = kmeans.fit(X_cont)
    u = kmeans.inertia_
    inertia.append(u)
100%|██████████| 9/9 [00:18<00:00,  2.11s/it]
fig, (ax1) = plt.subplots(1, figsize=(16, 6))
xx = np.arange(len(no_of_clusters))
ax1.plot(xx, inertia)
ax1.set_xticks(xx)
ax1.set_xticklabels(no_of_clusters, rotation="horizontal")
plt.xlabel("Number of clusters")
plt.ylabel("SSE")
plt.title("Inertia plot per K")
Text(0.5, 1.0, 'Inertia plot per K')
../_images/ML_clustering_part2_79_1.png

Based on the elbow plot, we will take 4 as the optimal number of clusters, which we will later apply through KMeans.

Implementing KMeans for image_layer_dataset

We take the number of clusters = 3 and apply the KMeans algorithm.

kmeans = KMeans(n_clusters=3, random_state=2)
kmeans = kmeans.fit(X_image)
clusters = kmeans.predict(X_image)
clusters
array([0, 0, 0, ..., 0, 0, 0], dtype=int32)
cluster_image_df = pd.concat([new_image1, new_image2], axis=1)
cluster_image_df["clusters"] = clusters
cluster_image_df.head()
3scale-amp2/apicast-gateway-rhel8 3scale-amp2/backend-rhel7 3scale-amp2/memcached-rhel7 3scale-amp2/system-rhel7 3scale-amp2/zync-rhel7 jboss-amq-6/amq63-openshift openshift/ose-cli openshift/ose-cli-artifacts openshift/ose-grafana openshift/ose-jenkins openshift/ose-must-gather openshift/ose-oauth-proxy openshift/ose-tests openshift/ose-tools rhel8/httpd-24 rhel8/mysql-80 rhel8/postgresql-10 rhel8/postgresql-12 rhel8/redis-5 rhscl/mongodb-36-rhel7 rhscl/mysql-57-rhel7 rhscl/postgresql-10-rhel7 rhscl/redis-32-rhel7 ubi8/dotnet-50 ubi8/ruby-27 value_cluster:cpu_usage_cores:sum value_cluster:memory_usage_bytes:sum clusters
cluster_id
00003d61-9db1-4757-9cd1-84df271daeb9 0 0 0 0 0 0 5 6 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 1.961447 1.495058e+10 0
00351e6e-53ce-465e-9493-cf0cd2367049 0 0 0 0 0 0 5 6 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 1.736571 1.477224e+10 0
003ba133-e754-4d5a-bc57-675b386d1f05 0 0 0 0 0 0 5 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 2.755048 2.803095e+10 0
00479ead-b7fc-49c2-ae20-3990a9b3d08c 0 0 0 0 0 0 5 6 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 1.698857 1.349778e+10 0
00748c32-15c3-4586-98fb-e5078ca9f3b8 0 0 0 0 0 0 5 6 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 7.574286 6.296607e+10 0

Visualizing the results of the 3D PCA

pca_i = pca3.fit_transform(cluster_image_df)
# Visualizing the results of the 3D PCA.
ax = plt.figure(figsize=(10, 10)).gca(projection="3d")
plt.title("3D Principal Component Analysis (PCA)")
ax.scatter(
    xs=pca_i[:, 0], ys=pca_i[:, 1], zs=pca_i[:, 2], c=cluster_image_df["clusters"]
)
ax.set_xlabel("pca-one")
ax.set_ylabel("pca-two")
ax.set_zlabel("pca-three")
plt.show()
../_images/ML_clustering_part2_89_0.png

3D interactive visualization for image dataset

fig = px.scatter_3d(
    pca_i,
    x=pca_i[:, 0],
    y=pca_i[:, 1],
    z=pca_i[:, 2],
    color=cluster_image_df["clusters"],
    opacity=1,
)
fig.update_traces(marker=dict(size=2))
fig.show()

Listing out the groups of image_name in different clusters

cluster_image_df[cluster_image_df.clusters == 0].head()
3scale-amp2/apicast-gateway-rhel8 3scale-amp2/backend-rhel7 3scale-amp2/memcached-rhel7 3scale-amp2/system-rhel7 3scale-amp2/zync-rhel7 jboss-amq-6/amq63-openshift openshift/ose-cli openshift/ose-cli-artifacts openshift/ose-grafana openshift/ose-jenkins openshift/ose-must-gather openshift/ose-oauth-proxy openshift/ose-tests openshift/ose-tools rhel8/httpd-24 rhel8/mysql-80 rhel8/postgresql-10 rhel8/postgresql-12 rhel8/redis-5 rhscl/mongodb-36-rhel7 rhscl/mysql-57-rhel7 rhscl/postgresql-10-rhel7 rhscl/redis-32-rhel7 ubi8/dotnet-50 ubi8/ruby-27 value_cluster:cpu_usage_cores:sum value_cluster:memory_usage_bytes:sum clusters
cluster_id
00003d61-9db1-4757-9cd1-84df271daeb9 0 0 0 0 0 0 5 6 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 1.961447 1.495058e+10 0
00351e6e-53ce-465e-9493-cf0cd2367049 0 0 0 0 0 0 5 6 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 1.736571 1.477224e+10 0
003ba133-e754-4d5a-bc57-675b386d1f05 0 0 0 0 0 0 5 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 2.755048 2.803095e+10 0
00479ead-b7fc-49c2-ae20-3990a9b3d08c 0 0 0 0 0 0 5 6 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 1.698857 1.349778e+10 0
00748c32-15c3-4586-98fb-e5078ca9f3b8 0 0 0 0 0 0 5 6 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 7.574286 6.296607e+10 0
df1 = pd.DataFrame([])
global_mean = cluster_image_df.mean().sort_values(ascending=False)


for c in tqdm(range(0, 3)):
    c = cluster_image_df[cluster_image_df.clusters == c].iloc[:, :-3].mean()
    diff = c - global_mean
    d = diff.sort_values(ascending=False).head(20)
    d1 = pd.DataFrame(d).reset_index()
    d1.rename(columns={"index": "image_name"}, inplace=True)
    df1 = df1.append(d1.image_name)
df1 = df1.reset_index()
df1.rename(index={0: "cluster 1", 1: "cluster 2", 2: "cluster 3"}, inplace=True)
df1.drop(columns="index", inplace=True)

df1
100%|██████████| 3/3 [00:00<00:00, 198.42it/s]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
cluster 1 rhscl/mongodb-36-rhel7 ubi8/ruby-27 rhel8/httpd-24 rhscl/postgresql-10-rhel7 openshift/ose-tests jboss-amq-6/amq63-openshift 3scale-amp2/backend-rhel7 3scale-amp2/system-rhel7 3scale-amp2/zync-rhel7 openshift/ose-grafana rhel8/postgresql-10 ubi8/dotnet-50 rhscl/redis-32-rhel7 rhscl/mysql-57-rhel7 rhel8/redis-5 3scale-amp2/memcached-rhel7 3scale-amp2/apicast-gateway-rhel8 openshift/ose-oauth-proxy openshift/ose-cli openshift/ose-cli-artifacts
cluster 2 openshift/ose-cli-artifacts openshift/ose-cli openshift/ose-oauth-proxy 3scale-amp2/apicast-gateway-rhel8 3scale-amp2/memcached-rhel7 rhel8/postgresql-10 ubi8/dotnet-50 rhscl/redis-32-rhel7 rhscl/mysql-57-rhel7 rhel8/redis-5 openshift/ose-grafana 3scale-amp2/system-rhel7 3scale-amp2/zync-rhel7 jboss-amq-6/amq63-openshift 3scale-amp2/backend-rhel7 openshift/ose-tests rhel8/httpd-24 rhscl/postgresql-10-rhel7 ubi8/ruby-27 rhel8/mysql-80
cluster 3 rhel8/postgresql-12 rhel8/mysql-80 openshift/ose-jenkins openshift/ose-must-gather openshift/ose-tools openshift/ose-cli-artifacts openshift/ose-cli openshift/ose-oauth-proxy 3scale-amp2/apicast-gateway-rhel8 3scale-amp2/memcached-rhel7 rhscl/mysql-57-rhel7 ubi8/dotnet-50 rhel8/postgresql-10 rhel8/redis-5 rhscl/redis-32-rhel7 openshift/ose-grafana jboss-amq-6/amq63-openshift 3scale-amp2/backend-rhel7 3scale-amp2/zync-rhel7 3scale-amp2/system-rhel7
df = pd.DataFrame([])
for i in tqdm(range(0, 3)):
    c = cluster_image_df[cluster_image_df.clusters == i].iloc[:, -3:-1]
    v1 = c["value_cluster:cpu_usage_cores:sum"].mean()
    v2 = c["value_cluster:memory_usage_bytes:sum"].mean()
    df2 = pd.DataFrame(
        {
            "value_cluster:cpu_usage_cores:sum": [v1],
            "value_cluster:memory_usage_bytes:sum": [v2],
        }
    )
    df = df.append(df2)
df = df.reset_index()
df.rename(index={0: "cluster 1", 1: "cluster 2", 2: "cluster 3"}, inplace=True)
df.drop(columns="index", inplace=True)
100%|██████████| 3/3 [00:00<00:00, 765.80it/s]
new_df = pd.concat([df, df1], axis=1)
new_df
value_cluster:cpu_usage_cores:sum value_cluster:memory_usage_bytes:sum 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
cluster 1 3.118776 2.194704e+10 rhscl/mongodb-36-rhel7 ubi8/ruby-27 rhel8/httpd-24 rhscl/postgresql-10-rhel7 openshift/ose-tests jboss-amq-6/amq63-openshift 3scale-amp2/backend-rhel7 3scale-amp2/system-rhel7 3scale-amp2/zync-rhel7 openshift/ose-grafana rhel8/postgresql-10 ubi8/dotnet-50 rhscl/redis-32-rhel7 rhscl/mysql-57-rhel7 rhel8/redis-5 3scale-amp2/memcached-rhel7 3scale-amp2/apicast-gateway-rhel8 openshift/ose-oauth-proxy openshift/ose-cli openshift/ose-cli-artifacts
cluster 2 82.482280 8.133262e+11 openshift/ose-cli-artifacts openshift/ose-cli openshift/ose-oauth-proxy 3scale-amp2/apicast-gateway-rhel8 3scale-amp2/memcached-rhel7 rhel8/postgresql-10 ubi8/dotnet-50 rhscl/redis-32-rhel7 rhscl/mysql-57-rhel7 rhel8/redis-5 openshift/ose-grafana 3scale-amp2/system-rhel7 3scale-amp2/zync-rhel7 jboss-amq-6/amq63-openshift 3scale-amp2/backend-rhel7 openshift/ose-tests rhel8/httpd-24 rhscl/postgresql-10-rhel7 ubi8/ruby-27 rhel8/mysql-80
cluster 3 10.394131 9.272735e+10 rhel8/postgresql-12 rhel8/mysql-80 openshift/ose-jenkins openshift/ose-must-gather openshift/ose-tools openshift/ose-cli-artifacts openshift/ose-cli openshift/ose-oauth-proxy 3scale-amp2/apicast-gateway-rhel8 3scale-amp2/memcached-rhel7 rhscl/mysql-57-rhel7 ubi8/dotnet-50 rhel8/postgresql-10 rhel8/redis-5 rhscl/redis-32-rhel7 openshift/ose-grafana jboss-amq-6/amq63-openshift 3scale-amp2/backend-rhel7 3scale-amp2/zync-rhel7 3scale-amp2/system-rhel7

There are three clusters for the image layer datasets. In the above dataframe, we tried to group the image repos name on the basis of the difference between global and individual cluster mean. Also, we calculated the mean value of the corresponding telemetry data for the particular cluster.

The first group, i.e, cluster 1, represents the group of image_repos having the higher chances of occuring in same cluster_id and corresponding cpu_usage of 3.11 cores and momory, 21 GB. The second group consists of the group of image_repos having the higher probability of occuring in same cluster_id with cpu_usage 82.4 cores and memory, 813 GB. Similarly, the third group has the values for cpu_usage as 10.39 cores and memory, 92 GB.

Implementing KMeans for container dataset

We take the number of clusters = 4. Because of the large number of unique features, the program will be computationally intensive. We apply KMeans algorithm by selecting 10 clusters.

kmeans = KMeans(n_clusters=4, random_state=2)
kmeans = kmeans.fit(X_cont)
clusters = kmeans.predict(X_cont)
clusters
array([1, 2, 1, ..., 1, 1, 0], dtype=int32)
cluster_cont_df = pd.concat([new_cont1, new_cont2], axis=1)
cluster_cont_df["clusters"] = clusters
cluster_cont_df.head()
3scale-amp2/3scale-rhel7-operator 3scale-amp2/apicast-gateway-rhel8 3scale-amp2/backend-rhel7 3scale-amp2/memcached-rhel7 3scale-amp2/system-rhel7 3scale-amp2/zync-rhel7 Calico Operator Cilium Cloud Native PostgreSQL Operator Elastic Cloud on Kubernetes F5 BIG-IP Controller Operator NGINX Ingress Operator NVIDIA GPU Operator Seldon Operator alertmanager amq7/amq-broker-rhel7-operator amq7/amq-streams-kafka-25-rhel7 amq7/amq-streams-kafka-26-rhel7 amq7/amq-streams-rhel7-operator amq7/amq-streams-rhel7-operator-metadata ansible-automation-platform/platform-resource-rhel7-operator ansible-tower cephcsi cert-manager cainjector cert-manager controller cert-manager webhook codeready-workspaces/devfileregistry-rhel8 codeready-workspaces/operator codeready-workspaces/pluginregistry-rhel8 codeready-workspaces/server-rhel8 collectd-exporter container-native-virtualization/bridge-marker container-native-virtualization/cluster-network-addons-operator container-native-virtualization/cnv-containernetworking-plugins container-native-virtualization/hostpath-provisioner-rhel8-operator container-native-virtualization/hyperconverged-cluster-operator container-native-virtualization/hyperconverged-cluster-webhook-rhel8 container-native-virtualization/kubemacpool container-native-virtualization/kubernetes-nmstate-handler-rhel8 container-native-virtualization/kubevirt-cpu-model-nfd-plugin container-native-virtualization/kubevirt-cpu-node-labeller container-native-virtualization/kubevirt-kvm-info-nfd-plugin container-native-virtualization/kubevirt-ssp-operator container-native-virtualization/node-maintenance-operator container-native-virtualization/ovs-cni-marker container-native-virtualization/ovs-cni-plugin container-native-virtualization/virt-api container-native-virtualization/virt-cdi-apiserver container-native-virtualization/virt-cdi-controller container-native-virtualization/virt-cdi-importer container-native-virtualization/virt-cdi-operator container-native-virtualization/virt-cdi-uploadproxy container-native-virtualization/virt-controller container-native-virtualization/virt-handler container-native-virtualization/virt-launcher container-native-virtualization/virt-operator container-native-virtualization/vm-import-controller-rhel8 container-native-virtualization/vm-import-operator-rhel8 costmanagement-metrics-operator distributed-tracing/jaeger-rhel8-operator grafana ibm common service webhook ibm-events-operator ibm-postgresql jboss-amq-6/amq63-openshift jboss-eap-7/eap73-rhel8-operator kube-state-metrics mcg-core mcg-operator mcr.microsoft.com/mssql/rhel8/server must-gather-service ocp-tools-4/odo-init-image ocs-operator ocs-registry openshift-compliance-content openshift-gitops-1-tech-preview/argocd-rhel8 openshift-gitops-1-tech-preview/gitops-rhel8 openshift-gitops-1-tech-preview/gitops-rhel8-operator openshift-gitops-1-tech-preview/kam-delivery-rhel8 openshift-gitops-1/gitops-rhel8 openshift-gitops-1/kam-delivery-rhel8 openshift-logging/cluster-logging-operator-bundle openshift-logging/cluster-logging-rhel8-operator openshift-logging/elasticsearch-operator-bundle openshift-logging/elasticsearch-proxy-rhel8 openshift-logging/elasticsearch-rhel8-operator openshift-logging/elasticsearch6-rhel8 openshift-logging/fluentd-rhel8 openshift-logging/kibana6-rhel8 openshift-pipelines-tech-preview/pipelines-controller-rhel8 openshift-pipelines-tech-preview/pipelines-operator-proxy-rhel8 openshift-pipelines-tech-preview/pipelines-rhel8-operator openshift-pipelines-tech-preview/pipelines-triggers-controller-rhel8 openshift-pipelines-tech-preview/pipelines-triggers-webhook-rhel8 openshift-pipelines-tech-preview/pipelines-webhook-rhel8 openshift-pipelines/pipelines-controller-rhel8 openshift-pipelines/pipelines-operator-proxy-rhel8 openshift-pipelines/pipelines-rhel8-operator openshift-pipelines/pipelines-triggers-controller-rhel8 openshift-pipelines/pipelines-triggers-core-interceptors-rhel8 openshift-pipelines/pipelines-triggers-eventlistenersink-rhel8 openshift-pipelines/pipelines-triggers-webhook-rhel8 openshift-pipelines/pipelines-webhook-rhel8 openshift-sandboxed-containers-operator openshift-serverless-1-tech-preview/eventing-kafka-channel-controller-rhel8 openshift-serverless-1/eventing-in-memory-channel-controller-rhel8 openshift-serverless-1/eventing-in-memory-channel-dispatcher-rhel8 openshift-serverless-1/eventing-mtbroker-ingress-rhel8 openshift-serverless-1/eventing-mtchannel-broker-rhel8 openshift-serverless-1/eventing-mtping-rhel8 openshift-serverless-1/eventing-sugar-controller-rhel8 openshift-serverless-1/eventing-webhook-rhel8 openshift-serverless-1/ingress-rhel8-operator openshift-serverless-1/knative-rhel8-operator openshift-serverless-1/kourier-control-rhel8 openshift-serverless-1/serverless-rhel8-operator openshift-serverless-1/serving-activator-rhel8 openshift-serverless-1/serving-autoscaler-hpa-rhel8 openshift-serverless-1/serving-autoscaler-rhel8 openshift-serverless-1/serving-controller-rhel8 openshift-serverless-1/serving-domain-mapping-rhel8 openshift-serverless-1/serving-domain-mapping-webhook-rhel8 openshift-serverless-1/serving-queue-rhel8 openshift-service-mesh/galley-rhel8 openshift-service-mesh/grafana-rhel8 openshift-service-mesh/istio-cni-rhel8 openshift-service-mesh/istio-rhel8-operator openshift-service-mesh/kiali-rhel7 openshift-service-mesh/kiali-rhel8 openshift-service-mesh/kiali-rhel8-operator openshift-service-mesh/pilot-rhel8 openshift-service-mesh/prometheus-rhel8 openshift-service-mesh/proxyv2-rhel8 openshift/compliance-operator openshift/ose-aws-ebs-csi-driver openshift/ose-aws-ebs-csi-driver-operator openshift/ose-aws-machine-controllers openshift/ose-aws-pod-identity-webhook openshift/ose-azure-disk-csi-driver openshift/ose-azure-disk-csi-driver-operator openshift/ose-azure-machine-controllers openshift/ose-baremetal-machine-controllers openshift/ose-baremetal-operator openshift/ose-baremetal-runtimecfg openshift/ose-cli openshift/ose-cli-artifacts openshift/ose-cloud-credential-operator openshift/ose-cluster-authentication-operator openshift/ose-cluster-autoscaler openshift/ose-cluster-autoscaler-operator openshift/ose-cluster-baremetal-operator openshift/ose-cluster-config-operator openshift/ose-cluster-csi-snapshot-controller-operator openshift/ose-cluster-dns-operator openshift/ose-cluster-etcd-operator openshift/ose-cluster-image-registry-operator openshift/ose-cluster-ingress-operator openshift/ose-cluster-kube-apiserver-operator openshift/ose-cluster-kube-controller-manager-operator openshift/ose-cluster-kube-descheduler-operator openshift/ose-cluster-kube-scheduler-operator openshift/ose-cluster-kube-storage-version-migrator-operator openshift/ose-cluster-logging-operator openshift/ose-cluster-machine-approver openshift/ose-cluster-monitoring-operator openshift/ose-cluster-network-operator openshift/ose-cluster-nfd-operator openshift/ose-cluster-node-tuning-operator openshift/ose-cluster-openshift-apiserver-operator openshift/ose-cluster-openshift-controller-manager-operator openshift/ose-cluster-policy-controller openshift/ose-cluster-samples-operator openshift/ose-cluster-storage-operator openshift/ose-clusterresourceoverride-rhel8-operator openshift/ose-configmap-reloader openshift/ose-console openshift/ose-console-operator openshift/ose-container-networking-plugins openshift/ose-coredns openshift/ose-csi-driver-manila openshift/ose-csi-driver-manila-operator openshift/ose-csi-driver-nfs openshift/ose-csi-external-attacher openshift/ose-csi-external-provisioner openshift/ose-csi-external-resizer openshift/ose-csi-external-snapshotter openshift/ose-csi-livenessprobe openshift/ose-csi-node-driver-registrar openshift/ose-csi-snapshot-controller openshift/ose-csi-snapshot-validation-webhook openshift/ose-deployer openshift/ose-docker-builder openshift/ose-docker-registry openshift/ose-egress-router-cni openshift/ose-etcd openshift/ose-gcp-machine-controllers openshift/ose-gcp-pd-csi-driver openshift/ose-gcp-pd-csi-driver-operator openshift/ose-grafana openshift/ose-haproxy-router openshift/ose-hyperkube openshift/ose-insights-operator openshift/ose-installer openshift/ose-ironic openshift/ose-ironic-inspector openshift/ose-ironic-ipa-downloader openshift/ose-ironic-machine-os-downloader openshift/ose-ironic-static-ip-manager openshift/ose-jenkins openshift/ose-k8s-prometheus-adapter openshift/ose-keepalived-ipfailover openshift/ose-kube-rbac-proxy openshift/ose-kube-state-metrics openshift/ose-kube-storage-version-migrator openshift/ose-kuryr-cni openshift/ose-kuryr-controller openshift/ose-libvirt-machine-controllers openshift/ose-local-storage-diskmaker openshift/ose-local-storage-operator openshift/ose-local-storage-operator-bundle openshift/ose-local-storage-static-provisioner openshift/ose-logging-elasticsearch6 openshift/ose-machine-api-operator openshift/ose-machine-config-operator openshift/ose-mdns-publisher openshift/ose-multus-admission-controller openshift/ose-multus-cni openshift/ose-multus-route-override-cni openshift/ose-multus-whereabouts-ipam-cni openshift/ose-must-gather openshift/ose-network-metrics-daemon openshift/ose-oauth-apiserver openshift/ose-oauth-proxy openshift/ose-oauth-server openshift/ose-openshift-apiserver openshift/ose-openshift-controller-manager openshift/ose-openshift-state-metrics openshift/ose-openstack-cinder-csi-driver openshift/ose-openstack-cinder-csi-driver-operator openshift/ose-openstack-machine-controllers openshift/ose-operator-lifecycle-manager openshift/ose-operator-marketplace openshift/ose-operator-registry openshift/ose-ovirt-csi-driver openshift/ose-ovirt-csi-driver-operator openshift/ose-ovirt-machine-controllers openshift/ose-ovn-kubernetes openshift/ose-prom-label-proxy openshift/ose-prometheus openshift/ose-prometheus-alertmanager openshift/ose-prometheus-config-reloader openshift/ose-prometheus-node-exporter openshift/ose-prometheus-operator openshift/ose-ptp openshift/ose-ptp-operator openshift/ose-sdn openshift/ose-service-ca-operator openshift/ose-sriov-cni openshift/ose-sriov-dp-admission-controller openshift/ose-sriov-infiniband-cni openshift/ose-sriov-network-config-daemon openshift/ose-sriov-network-device-plugin openshift/ose-sriov-network-operator openshift/ose-sriov-network-webhook openshift/ose-telemeter openshift/ose-template-service-broker openshift/ose-template-service-broker-operator openshift/ose-tests openshift/ose-thanos openshift/ose-tools openshift/ose-vertical-pod-autoscaler-rhel8-operator openshift/ose-vsphere-problem-detector openshift4/performance-addon-rhel8-operator quay/quay-container-security-operator-container quay/quay-container-security-operator-rhel8 quay/quay-operator-rhel8 rh-sso-7/sso74-openj9-openshift-rhel8 rhacm2/application-ui-rhel8 rhacm2/cert-manager-controller-rhel8 rhacm2/cert-policy-controller-rhel8 rhacm2/clusterlifecycle-state-metrics-rhel8 rhacm2/config-policy-controller-rhel8 rhacm2/console-api-rhel8 rhacm2/console-header-rhel8 rhacm2/console-rhel8 rhacm2/discovery-rhel8-operator rhacm2/endpoint-component-rhel8-operator rhacm2/endpoint-rhel8-operator rhacm2/governance-policy-propagator-rhel8 rhacm2/governance-policy-spec-sync-rhel8 rhacm2/governance-policy-status-sync-rhel8 rhacm2/governance-policy-template-sync-rhel8 rhacm2/grafana rhacm2/grc-ui-api-rhel8 rhacm2/grc-ui-rhel8 rhacm2/iam-policy-controller-rhel8 rhacm2/insights-client-rhel8 rhacm2/klusterlet-addon-controller-rhel8 rhacm2/klusterlet-addon-lease-controller-rhel8 rhacm2/klusterlet-addon-operator-rhel8 rhacm2/managedcluster-import-controller-rhel8 rhacm2/multicloud-manager-rhel8 rhacm2/multicluster-observability-rhel8-operator rhacm2/multicluster-operators-application-rhel8 rhacm2/multicluster-operators-channel-rhel8 rhacm2/multicluster-operators-deployable-rhel8 rhacm2/multicluster-operators-placementrule-rhel8 rhacm2/multicluster-operators-subscription-rhel8 rhacm2/multiclusterhub-repo-rhel8 rhacm2/multiclusterhub-rhel8 rhacm2/openshift-hive-rhel7 rhacm2/prometheus-alertmanager-rhel8 rhacm2/provider-credential-controller-rhel8 rhacm2/rcm-controller-rhel8 rhacm2/redisgraph-tls-rhel8 rhacm2/registration-rhel8 rhacm2/registration-rhel8-operator rhacm2/search-aggregator-rhel8 rhacm2/search-collector-rhel8 rhacm2/search-rhel8 rhacm2/search-ui-rhel8 rhacm2/submariner-addon-rhel8 rhacm2/thanos-receive-controller-rhel8 rhacm2/thanos-rhel7 rhacm2/work-rhel8 rhceph rhel7/couchbase-operator-admission rhel8/httpd-24 rhel8/mysql-80 rhel8/postgresql-10 rhel8/postgresql-12 rhel8/postgresql-96 rhel8/redis-5 rhmtc/openshift-migration-controller rhmtc/openshift-migration-operator rhmtc/openshift-migration-velero rhmtc/openshift-migration-velero-plugin-for-aws rhmtc/openshift-migration-velero-plugin-for-gcp rhmtc/openshift-migration-velero-plugin-for-microsoft-azure rhscl/mongodb-36-rhel7 rhscl/mysql-57-rhel7 rhscl/postgresql-10-rhel7 rhscl/postgresql-96-rhel7 rhscl/redis-32-rhel7 rook-ceph ubi8 ubi8/dotnet-50 ubi8/ruby-27 ubi8/ubi8-init volume-replication-operator value_cluster:cpu_usage_cores:sum value_cluster:memory_usage_bytes:sum clusters
cluster_id
00003d61-9db1-4757-9cd1-84df271daeb9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 1 1 1 2 6 3 0 3 1 0 1 1 3 0 2 1 1 1 2 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 5 0 0 0 1 1 5 1 0 0 0 0 0 0 0 1 0 24 1 1 0 0 0 0 0 0 0 0 1 4 0 1 2 1 2 0 1 2 5 1 2 1 1 0 0 0 3 1 0 0 0 0 0 3 1 1 3 2 1 0 0 2 2 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.961447 1.495058e+10 1
002663ad-bcf4-4c7c-9530-ecb351fe4001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.345714 2.524869e+10 2
00351e6e-53ce-465e-9493-cf0cd2367049 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 1 1 1 2 6 3 0 3 1 0 1 1 3 0 2 1 1 1 2 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 5 0 0 0 1 1 5 1 0 0 0 0 0 0 0 1 0 24 1 1 0 0 0 0 0 0 0 0 1 4 0 1 2 1 2 0 1 2 5 1 2 1 1 0 0 0 3 1 0 0 0 0 0 3 1 1 3 2 1 0 0 2 2 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.736571 1.477224e+10 1
003ba133-e754-4d5a-bc57-675b386d1f05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 1 1 1 1 1 2 6 3 0 3 1 0 1 1 3 0 2 1 1 1 2 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 5 0 0 0 1 1 5 1 0 0 0 0 0 0 0 1 0 24 1 1 0 0 0 0 0 0 0 0 1 4 0 1 2 1 2 0 1 2 5 1 2 1 1 0 0 0 3 1 0 0 0 0 0 3 1 1 3 2 1 0 0 2 2 1 0 1 0 0 1 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 4 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2.755048 2.803095e+10 1
00479ead-b7fc-49c2-ae20-3990a9b3d08c 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 1 1 1 2 6 3 0 3 1 0 1 1 3 0 2 1 1 1 2 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 5 0 0 0 1 1 5 1 0 0 0 0 0 0 0 1 0 25 1 1 0 0 0 0 0 0 0 0 1 4 0 1 2 1 2 0 1 2 5 1 2 1 1 0 0 0 3 1 0 0 0 0 8 3 1 1 3 2 1 0 0 0 2 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.698857 1.349778e+10 1

Visualizing the results of the 3D PCA

pca_c = pca3.fit_transform(cluster_cont_df)
# Visualizing the results of the 3D PCA.
ax = plt.figure(figsize=(10, 10)).gca(projection="3d")
plt.title("3D Principal Component Analysis (PCA)")
ax.scatter(
    xs=pca_c[:, 0], ys=pca_c[:, 1], zs=pca_c[:, 2], c=cluster_cont_df["clusters"]
)
ax.set_xlabel("pca-one")
ax.set_ylabel("pca-two")
ax.set_zlabel("pca-three")
plt.show()
../_images/ML_clustering_part2_106_0.png

3D interactive visualization for image dataset

fig = px.scatter_3d(
    pca_c,
    x=pca_c[:, 0],
    y=pca_c[:, 1],
    z=pca_c[:, 2],
    color=cluster_cont_df["clusters"],
    opacity=1,
)
fig.update_traces(marker=dict(size=2))
fig.show()

Listing out the groups of image_name in different clusters

df1 = pd.DataFrame([])
global_mean = cluster_cont_df.mean().sort_values(ascending=False)


for c in tqdm(range(0, 4)):
    c = cluster_cont_df[cluster_cont_df.clusters == c].iloc[:, :-3].mean()
    diff = c - global_mean
    d = diff.sort_values(ascending=False).head(20)
    d1 = pd.DataFrame(d).reset_index()
    d1.rename(columns={"index": "image_name"}, inplace=True)
    df1 = df1.append(d1.image_name)
df1 = df1.reset_index()
df1.rename(
    index={0: "cluster 1", 1: "cluster 2", 2: "cluster 3", 3: "cluster 4"}, inplace=True
)
df1.drop(columns="index", inplace=True)

df1
100%|██████████| 4/4 [00:00<00:00, 167.04it/s]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
cluster 1 openshift/ose-kube-rbac-proxy openshift/ose-machine-api-operator openshift/ose-baremetal-runtimecfg openshift/ose-ironic openshift/ose-ovn-kubernetes openshift/ose-csi-livenessprobe openshift/ose-cli openshift/ose-aws-ebs-csi-driver openshift/ose-csi-external-provisioner openshift/ose-ironic-inspector openshift/ose-ironic-ipa-downloader openshift/ose-ironic-machine-os-downloader openshift/ose-csi-node-driver-registrar openshift/ose-csi-external-attacher openshift/ose-csi-external-resizer openshift/ose-csi-external-snapshotter openshift/ose-haproxy-router openshift/ose-coredns openshift/ose-keepalived-ipfailover openshift/ose-aws-machine-controllers
cluster 2 rhacm2/work-rhel8 rhacm2/registration-rhel8 rhacm2/registration-rhel8-operator rhacm2/multicloud-manager-rhel8 openshift/ose-cluster-kube-apiserver-operator openshift/ose-thanos openshift/ose-hyperkube openshift/ose-etcd openshift/ose-oauth-proxy openshift/ose-sdn openshift/ose-machine-config-operator openshift/ose-cluster-network-operator openshift/ose-prom-label-proxy openshift/ose-cluster-kube-controller-manager-... openshift/ose-operator-lifecycle-manager openshift/ose-cluster-kube-scheduler-operator openshift/ose-prometheus-config-reloader openshift/ose-oauth-apiserver openshift/ose-service-ca-operator openshift/ose-cluster-ingress-operator
cluster 3 openshift/ose-local-storage-operator openshift/ose-local-storage-diskmaker openshift/ose-local-storage-operator-bundle container-native-virtualization/kubevirt-cpu-n... container-native-virtualization/virt-operator container-native-virtualization/virt-handler container-native-virtualization/virt-controller container-native-virtualization/kubernetes-nms... container-native-virtualization/kubemacpool container-native-virtualization/node-maintenan... ubi8/dotnet-50 Cilium openshift-serverless-1/serving-queue-rhel8 container-native-virtualization/cnv-containern... container-native-virtualization/kubevirt-cpu-m... container-native-virtualization/kubevirt-kvm-i... container-native-virtualization/vm-import-cont... container-native-virtualization/virt-cdi-uploa... container-native-virtualization/virt-cdi-operator container-native-virtualization/virt-api
cluster 4 rhceph openshift/ose-kube-rbac-proxy openshift/ose-baremetal-runtimecfg openshift/ose-csi-external-snapshotter openshift/ose-csi-node-driver-registrar openshift/ose-machine-api-operator openshift/ose-csi-driver-manila openshift/ose-csi-driver-nfs openshift/ose-openstack-cinder-csi-driver openshift/ose-csi-external-provisioner rook-ceph rhel8/postgresql-12 openshift/ose-local-storage-diskmaker openshift/ose-local-storage-operator openshift/ose-cli openshift/ose-mdns-publisher container-native-virtualization/virt-launcher cephcsi container-native-virtualization/node-maintenan... container-native-virtualization/bridge-marker
df = pd.DataFrame([])
for i in tqdm(range(0, 4)):
    c = cluster_cont_df[cluster_cont_df.clusters == i].iloc[:, -3:-1]
    v1 = c["value_cluster:cpu_usage_cores:sum"].mean()
    v2 = c["value_cluster:memory_usage_bytes:sum"].mean()
    df2 = pd.DataFrame(
        {
            "value_cluster:cpu_usage_cores:sum": [v1],
            "value_cluster:memory_usage_bytes:sum": [v2],
        }
    )
    df = df.append(df2)
df = df.reset_index()
df.rename(
    index={0: "cluster 1", 1: "cluster 2", 2: "cluster 3", 3: "cluster 4"}, inplace=True
)
df.drop(columns="index", inplace=True)
100%|██████████| 4/4 [00:00<00:00, 503.22it/s]
new_df = pd.concat([df, df1], axis=1)
new_df
value_cluster:cpu_usage_cores:sum value_cluster:memory_usage_bytes:sum 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
cluster 1 6.023516 4.486924e+10 openshift/ose-kube-rbac-proxy openshift/ose-machine-api-operator openshift/ose-baremetal-runtimecfg openshift/ose-ironic openshift/ose-ovn-kubernetes openshift/ose-csi-livenessprobe openshift/ose-cli openshift/ose-aws-ebs-csi-driver openshift/ose-csi-external-provisioner openshift/ose-ironic-inspector openshift/ose-ironic-ipa-downloader openshift/ose-ironic-machine-os-downloader openshift/ose-csi-node-driver-registrar openshift/ose-csi-external-attacher openshift/ose-csi-external-resizer openshift/ose-csi-external-snapshotter openshift/ose-haproxy-router openshift/ose-coredns openshift/ose-keepalived-ipfailover openshift/ose-aws-machine-controllers
cluster 2 2.331292 1.732900e+10 rhacm2/work-rhel8 rhacm2/registration-rhel8 rhacm2/registration-rhel8-operator rhacm2/multicloud-manager-rhel8 openshift/ose-cluster-kube-apiserver-operator openshift/ose-thanos openshift/ose-hyperkube openshift/ose-etcd openshift/ose-oauth-proxy openshift/ose-sdn openshift/ose-machine-config-operator openshift/ose-cluster-network-operator openshift/ose-prom-label-proxy openshift/ose-cluster-kube-controller-manager-... openshift/ose-operator-lifecycle-manager openshift/ose-cluster-kube-scheduler-operator openshift/ose-prometheus-config-reloader openshift/ose-oauth-apiserver openshift/ose-service-ca-operator openshift/ose-cluster-ingress-operator
cluster 3 6.023818 2.880826e+10 openshift/ose-local-storage-operator openshift/ose-local-storage-diskmaker openshift/ose-local-storage-operator-bundle container-native-virtualization/kubevirt-cpu-n... container-native-virtualization/virt-operator container-native-virtualization/virt-handler container-native-virtualization/virt-controller container-native-virtualization/kubernetes-nms... container-native-virtualization/kubemacpool container-native-virtualization/node-maintenan... ubi8/dotnet-50 Cilium openshift-serverless-1/serving-queue-rhel8 container-native-virtualization/cnv-containern... container-native-virtualization/kubevirt-cpu-m... container-native-virtualization/kubevirt-kvm-i... container-native-virtualization/vm-import-cont... container-native-virtualization/virt-cdi-uploa... container-native-virtualization/virt-cdi-operator container-native-virtualization/virt-api
cluster 4 9.784137 6.328147e+10 rhceph openshift/ose-kube-rbac-proxy openshift/ose-baremetal-runtimecfg openshift/ose-csi-external-snapshotter openshift/ose-csi-node-driver-registrar openshift/ose-machine-api-operator openshift/ose-csi-driver-manila openshift/ose-csi-driver-nfs openshift/ose-openstack-cinder-csi-driver openshift/ose-csi-external-provisioner rook-ceph rhel8/postgresql-12 openshift/ose-local-storage-diskmaker openshift/ose-local-storage-operator openshift/ose-cli openshift/ose-mdns-publisher container-native-virtualization/virt-launcher cephcsi container-native-virtualization/node-maintenan... container-native-virtualization/bridge-marker

There are four clusters for the containers datasets. In the above dataframe, we tried to group the image repos name on the basis of the difference between global and individual cluster mean. Also, we calculated the mean value of the corresponding telemetry data for the particular cluster.

The first group, i.e, cluster 1, represents the group of image_repos having the higher chances of occuring in same cluster_id and corresponding cpu_usage of 6.02 cores and momory, 44 GB. The second group consists of the group of image_repos having the higher probability of occuring in same cluster_id with cpu_usage 2.3 cores and memory, 17 GB. Similarly, the third group has the values for cpu_usage as 6.02 cores and memory, 28 GB. The third group has the values for cpu_usage as 9.78 cores and memory, 63 GB.

Conclusions

In this notebook, we employed KMeans ML clustering technique in order to group together image repos and their corresponding mean telemetry values per cluster. We begin with two datasets (image layer dataset, containers dataset) containing cluster_id, corresponding image_repos and telemetry data (i.e. cpu_cores and memory_bytes). The telemetry values were extracted with respect to the cluster_id, which mean for the given cluster_id containing two different image repos, we have the same telemetry value. We then employed One hot encoding technique for categorical image repos name and grouped them with respect to its frequency in a cluster_id. The analysis is then followed by finding the optimal number of clusters by the use of Elbow method and then employing KMeans algorithm for the one hot encoded data.

The results is that we got 3 groups of image repos with corresponding mean telemetry data for image layer dataset. And group of 4 for corresponding containers dataset. Each cluster group shows the image repos name which have the high probability of occuring in the same cluster_id and their corresponding mean telemetry values.