Run inference on AWS Inferentia
Now we can use the compiled model to run an inference workload on an AWS Inferentia node.
Create a pod for inference
Check the image that we'll run the inference on:
This is a different image than we used for training and has been optimized for inference.
Now we can deploy a Pod for inference. This is the the manifest file for running the inference Pod:
apiVersion: v1
kind: Pod
metadata:
name: inference
namespace: aiml
labels:
role: inference
spec:
nodeSelector:
node.kubernetes.io/instance-type: inf2.xlarge
containers:
- command:
- sh
- -c
- sleep infinity
image: ${AIML_DL_INF_IMAGE}
name: inference
resources:
limits:
aws.amazon.com/neuron: 1
serviceAccountName: inference
For the Inference we've set the nodeSelector
section to specify a inf2 instance type.
In the resources
limits
section again we specify that we need a neuron core to run this Pod to expose the API.
Again Karpenter detects the pending Pod which this time needs a inf2 instance with needs Neuron cores. So Karpenter launches an inf2 instance which has the Inferentia chip. You can again monitor the instance provisioning with the following command:
...
{
"level": "INFO",
"time": "2024-09-19T18:53:34.266Z",
"logger": "controller",
"message": "launched nodeclaim",
"commit": "6e9d95f",
"controller": "nodeclaim.lifecycle",
"controllerGroup": "karpenter.sh",
"controllerKind": "NodeClaim",
"NodeClaim": {
"name": "aiml-v64vm"
},
"namespace": "",
"name": "aiml-v64vm",
"reconcileID": "7b5488c5-957a-4051-a657-44fb456ad99b",
"provider-id": "aws:///us-west-2b/i-0078339b1c925584d",
"instance-type": "inf2.xlarge",
"zone": "us-west-2b",
"capacity-type": "on-demand",
"allocatable": {
"aws.amazon.com/neuron": "1",
"cpu": "3920m",
"ephemeral-storage": "89Gi",
"memory": "14162Mi",
"pods": "58",
"vpc.amazonaws.com/pod-eni": "18"
}
}
...
The inference Pod should be scheduled on the node provisioned by Karpenter. Check if the Pod is in it's ready state:
It can take up to 12 minutes to provision the node, add it to the EKS cluster, and start the pod.
We can use the following command to get more details on the node that was provisioned to schedule our pod onto:
This output shows the capacity this node has:
{
"aws.amazon.com/neuron": "1",
"aws.amazon.com/neuroncore": "2",
"aws.amazon.com/neurondevice": "1",
"cpu": "4",
"ephemeral-storage": "104845292Ki",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"memory": "16009632Ki",
"pods": "58",
"vpc.amazonaws.com/pod-eni": "18"
}
We can see that this node as a aws.amazon.com/neuron
of 1. Karpenter provisioned this node for us as that's how many neuron the Pod requested.
Run an inference
This is the code that we will be using to run inference using a Neuron core on Inferentia:
import os
import time
import torch
import torch_neuronx
import json
import numpy as np
from urllib import request
from torchvision import models, transforms, datasets
## Create an image directory containing a small kitten
os.makedirs("./torch_neuron_test/images", exist_ok=True)
request.urlretrieve(
"https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg",
"./torch_neuron_test/images/kitten_small.jpg",
)
## Fetch labels to output the top classifications
request.urlretrieve(
"https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json",
"imagenet_class_index.json",
)
idx2label = []
with open("imagenet_class_index.json", "r") as read_file:
class_idx = json.load(read_file)
idx2label = [class_idx[str(k)][1] for k in range(len(class_idx))]
## Import a sample image and normalize it into a tensor
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
eval_dataset = datasets.ImageFolder(
os.path.dirname("./torch_neuron_test/"),
transforms.Compose(
[
transforms.Resize([224, 224]),
transforms.ToTensor(),
normalize,
]
),
)
image, _ = eval_dataset[0]
image = torch.tensor(image.numpy()[np.newaxis, ...])
## Load model
model_neuron = torch.jit.load("resnet50_neuron.pt")
## Predict
results = model_neuron(image)
# Get the top 5 results
top5_idx = results[0].sort()[1][-5:]
# Lookup and print the top 5 labels
top5_labels = [idx2label[idx] for idx in top5_idx]
print("Top 5 labels:\n {}".format(top5_labels))
This Python code does the following tasks:
- It downloads and stores an image of a small kitten.
- It fetches the labels for classifying the image.
- It then imports this image and normalizes it into a tensor.
- It loads our previously created model.
- It runs the prediction on our small kitten image.
- It gets the top 5 results from the prediction and prints these to the command-line.
We copy this code to the Pod, download our previously uploaded model, and run the following commands:
Top 5 labels:
['tiger', 'lynx', 'tiger_cat', 'Egyptian_cat', 'tabby']
As output we get the top 5 labels back. We are running the inference on an image of a small kitten using ResNet-50's pre-trained model, so these results are expected. As a possible next step to improve performance we could create our own data set of images and train our own model for our specific use case. This could improve our prediction results.
This concludes this lab on using AWS Inferentia with Amazon EKS.