Skip to main content

Provisioning Node Pools for LLM Workloads

In this lab, we'll use Karpenter to provision the Trainium-1 nodes necessary for handling the Mistral-7B chatbot workload. As an autoscaler, Karpenter creates the resources required to run machine learning workloads and distribute traffic efficiently.

tip

To learn more about Karpenter, check out the Karpenter module in this workshop.

Karpenter has already been installed in our EKS Cluster and runs as a deployment:

~$kubectl get deployment -n kube-system
NAME        READY   UP-TO-DATE   AVAILABLE   AGE
...
karpenter   2/2     2            2           11m

Since the Ray Cluster creates head and worker pods with different specifications for handling various EC2 families, we'll create two separate node pools to handle the workload demands.

Here's the first Karpenter NodePool that will provision one Head Pod on x86 CPU instances:

~/environment/eks-workshop/modules/aiml/chatbot/nodepool/nodepool-x86.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: x86-cpu-karpenter
spec:
template:
metadata:
labels:
type: karpenter
instanceType: mixed-x86
provisionerType: Karpenter
workload: rayhead
vpc.amazonaws.com/has-trunk-attached: "true" # Required for Pod ENI
spec:
requirements:
- key: "karpenter.k8s.aws/instance-family"
operator: In
values: ["c5", "m5", "r5"]
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
- key: "karpenter.sh/capacity-type"
operator: In
values: ["on-demand"]
expireAfter: 720h
terminationGracePeriod: 24h
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: x86-cpu-karpenter
limits:
cpu: "256"
disruption:
consolidateAfter: 300s
consolidationPolicy: WhenEmptyOrUnderutilized

---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: x86-cpu-karpenter
spec:
amiFamily: AL2
amiSelectorTerms:
- alias: al2@latest
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
deleteOnTermination: true
encrypted: true
volumeSize: 200Gi
volumeType: gp3
detailedMonitoring: true
role: ${KARPENTER_NODE_ROLE}
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
- tags:
kubernetes.io/cluster/eks-workshop: owned
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
kubernetes.io/role/internal-elb: "1"
tags:
app.kubernetes.io/created-by: eks-workshop
A

We're asking the NodePool to start all new nodes with a Kubernetes label type: karpenter, which will allow us to specifically target Karpenter nodes with pods for demonstration purposes. Since there are multiple nodes being autoscaled by Karpenter, there are additional labels added such as instanceType: mixed-x86 to indicate that this Karpenter node should be assigned to x86-cpu-karpenter pool.

B

The NodePool CRD supports defining node properties like instance type and zone. In this example, we're setting the karpenter.sh/capacity-type to initially limit Karpenter to provisioning On-Demand instances, as well as karpenter.k8s.aws/instance-family to limit to a subset of specific instance types. You can learn which other properties are available here. Compared to the previous lab, there are more specifications defining the unique constraints of the Head Pod, such as defining an instance family of r5, m5, and c5 nodes.

C

A NodePool can define a limit on the amount of CPU and memory managed by it. Once this limit is reached Karpenter will not provision additional capacity associated with that particular NodePool, providing a cap on the total compute.

This secondary NodePool will provision Ray Workers on trn1.2xlarge instances:

~/environment/eks-workshop/modules/aiml/chatbot/nodepool/nodepool-trn1.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: trainium-trn1
spec:
template:
metadata:
labels:
instanceType: trn1.2xlarge
provisionerType: Karpenter
neuron.amazonaws.com/neuron-device: "true"
vpc.amazonaws.com/has-trunk-attached: "true" # Required for Pod ENI
spec:
taints:
- key: aws.amazon.com/neuron
value: "true"
effect: "NoSchedule"
requirements:
- key: node.kubernetes.io/instance-type
operator: In
values: ["trn1.2xlarge"]
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
- key: "karpenter.sh/capacity-type"
operator: In
values: ["on-demand"]
expireAfter: 720h
terminationGracePeriod: 24h
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: trainium-trn1
limits:
aws.amazon.com/neuron: 2
cpu: 16
memory: 64Gi
disruption:
consolidateAfter: 300s
consolidationPolicy: WhenEmptyOrUnderutilized

---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: trainium-trn1
spec:
amiFamily: AL2
amiSelectorTerms:
- alias: al2@latest
instanceStorePolicy: RAID0
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
deleteOnTermination: true
encrypted: true
volumeSize: 500Gi
volumeType: gp3
role: ${KARPENTER_NODE_ROLE}
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
- tags:
kubernetes.io/cluster/eks-workshop: owned
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
kubernetes.io/role/internal-elb: "1"
tags:
app.kubernetes.io/created-by: eks-workshop
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
aws-neuron: "true"
A

We're asking the NodePool to start all new nodes with a Kubernetes label provisionerType: Karpenter, which will allow us to specifically target Karpenter nodes with pods for demonstration purposes. Since there are multiple nodes being autoscaled by Karpenter, there are additional labels added such as instanceType: trn1.2xlarge to indicate that this Karpenter node should be assigned to trainium-trn1 pool.

B

The NodePool CRD supports defining node properties like instance type and zone. In this example, we're setting the karpenter.sh/capacity-type to initially limit Karpenter to provisioning On-Demand instances, as well as karpenter.k8s.aws/instance-type to limit to a subset of specific instance type. You can learn which other properties are available here. In this case, there are specifications matching the requirements of the Ray Workers that will run on trn1.2xlarge instances type.

C

A Taint defines a specific set of properties that allow a node to repel a set of pods. This property works with its matching label, a Toleration. Both tolerations and taints work together to ensure that pods are properly scheduled onto the appropriate pods. You can learn more about the other properties in this resource.

D

A NodePool can define a limit on the amount of CPU and memory managed by it. Once this limit is reached Karpenter will not provision additional capacity associated with that particular NodePool, providing a cap on the total compute.

Both of these defined node pools will allow Karpenter to properly schedule nodes and handle the workload demands of the Ray Cluster.

Apply the NodePool and EC2NodeClass manifests for both pools:

~$kubectl kustomize ~/environment/eks-workshop/modules/aiml/chatbot/nodepool \
| envsubst | kubectl apply -f-
ec2nodeclass.karpenter.k8s.aws/trainium-trn1 created
ec2nodeclass.karpenter.k8s.aws/x86-cpu-karpenter created
nodepool.karpenter.sh/trainium-trn1 created
nodepool.karpenter.sh/x86-cpu-karpenter created

Once properly deployed, check for the node pools:

~$kubectl get nodepool
NAME                NODECLASS
trainium-trn1       trainium-trn1 
x86-cpu-karpenter   x86-cpu-karpenter

As seen from the above command, both node pools have been properly provisioned, allowing Karpenter to allocate new nodes into the newly created pools as needed.