susql-operator

Example: Aggregating GPU Workload running on OpenShift AI Jupyter Notebook

The following instructions show step by step how to use SusQL to aggregate energy data consumed by a GPU utilizing Jupyter notebook running on OpenShift AI.

Prerequisites

The following are assumed to be installed and available.

Create a Jupyter Notebook

Any code that runs in a Jupyter notebook can be aggregated by SusQL. The following sample code demonstrates the use of GPU resources. This is also a good test case to verify that GPU is configured correctly.

pip install pycaret[full]

import torch
import time

if torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

matrix_size = 16384

x = torch.randn(matrix_size, matrix_size)
y = torch.randn(matrix_size, matrix_size)

x_gpu = x.to(device)
y_gpu = y.to(device)
torch.cuda.synchronize()

for i in range(10):
    start = time.time()
    result_gpu = torch.matmul(x_gpu, y_gpu)
    print("Run time using device",result_gpu.device,"is","{:.7f}".format(time.time() - start))

A Jupyter Notebook can be created and run through the following steps:

Attach a SusQL label to the Jupyter Notebook server:

Although the OpenShift Web Console can be used to set labels on existing workloads, this is also easy to do from the command line:

The following command removes a SusQL label on the Jupyter Notebook Server pod, in case one happens to be defined.

$ oc label pod $(oc get po -n rhods-notebooks | grep jupyter | head -1 |  cut -f 1 -d" ") -n rhods-notebooks "susql.label/1-"
pod/jupyter-nb-kube-3aadmin-0 unlabeled

Next, this command sets the label Susql.label/1 to openshiftaij for the Jupyter notebook server running in namespace rhods-notebooks.

$ oc label pod $(oc get po -n rhods-notebooks | grep jupyter | head -1 |  cut -f 1 -d" ") -n rhods-notebooks "susql.label/1=openshiftaij"
pod/jupyter-nb-kube-3aadmin-0 labeled

And, finally, this command can verify that the label has been set

$ oc describe pod $(oc get po -n rhods-notebooks | grep jupyter | head -1 |  cut -f 1 -d" ") -n rhods-notebooks | grep -i susql
                  susql.label/1=openshiftaij

Create the SusQL LabelGroup

First create a LabelGroup definition file called openshiftaij.yaml as follows:

---
apiVersion: susql.ibm.com/v1
kind: LabelGroup
metadata:
    name: openshiftaij
    namespace: rhods-notebooks
spec:
    labels:
        - openshiftaij
---

And apply the file:

$ oc apply -f openshiftaij.yaml
labelgroup.susql.ibm.com/openshiftaij created

Visualize

If you have cloned the GitHub susql-operator repository, you could also run the test/susqltop command to view energy aggregation from the command line.

$ test/susqltop
NameSpace           LabelGroup                              Labels                                                    TotalEnergy (J)
rhods-notebooks     openshiftaij                            ["openshiftaij"]                                          17963.00