Provisioning Node Pools for LLM Workloads
In this lab, we'll use Karpenter to provision the Trainium-1 nodes necessary for handling the Mistral-7B chatbot workload. As an autoscaler, Karpenter creates the resources required to run machine learning workloads and distribute traffic efficiently.
To learn more about Karpenter, check out the Karpenter module in this workshop.
Karpenter has already been installed in our EKS Cluster and runs as a deployment:
NAME READY UP-TO-DATE AVAILABLE AGE
...
karpenter 2/2 2 2 11m
Since the Ray Cluster creates head and worker pods with different specifications for handling various EC2 families, we'll create two separate node pools to handle the workload demands.
Here's the first Karpenter NodePool
that will provision one Head Pod
on x86 CPU
instances:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: x86-cpu-karpenter
spec:
template:
metadata:
labels:
type: karpenter
instanceType: mixed-x86
provisionerType: Karpenter
workload: rayhead
vpc.amazonaws.com/has-trunk-attached: "true" # Required for Pod ENI
spec:
requirements:
- key: "karpenter.k8s.aws/instance-family"
operator: In
values: ["c5", "m5", "r5"]
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
- key: "karpenter.sh/capacity-type"
operator: In
values: ["on-demand"]
expireAfter: 720h
terminationGracePeriod: 24h
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: x86-cpu-karpenter
limits:
cpu: "256"
disruption:
consolidateAfter: 300s
consolidationPolicy: WhenEmptyOrUnderutilized
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: x86-cpu-karpenter
spec:
amiFamily: AL2
amiSelectorTerms:
- alias: al2@latest
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
deleteOnTermination: true
encrypted: true
volumeSize: 200Gi
volumeType: gp3
detailedMonitoring: true
role: ${KARPENTER_NODE_ROLE}
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
- tags:
kubernetes.io/cluster/eks-workshop: owned
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
kubernetes.io/role/internal-elb: "1"
tags:
app.kubernetes.io/created-by: eks-workshop
We're asking the NodePool
to start all new nodes with a Kubernetes label type: karpenter
, which will allow us to specifically target Karpenter nodes with pods for demonstration purposes. Since there are multiple nodes being autoscaled by Karpenter, there are additional labels added such as instanceType: mixed-x86
to indicate that this Karpenter node should be assigned to x86-cpu-karpenter
pool.
The NodePool CRD supports defining node properties like instance type and zone. In this example, we're setting the karpenter.sh/capacity-type
to initially limit Karpenter to provisioning On-Demand instances, as well as karpenter.k8s.aws/instance-family
to limit to a subset of specific instance types. You can learn which other properties are available here. Compared to the previous lab, there are more specifications defining the unique constraints of the Head Pod
, such as defining an instance family of r5
, m5
, and c5
nodes.
A NodePool
can define a limit on the amount of CPU and memory managed by it. Once this limit is reached Karpenter will not provision additional capacity associated with that particular NodePool
, providing a cap on the total compute.
This secondary NodePool
will provision Ray Workers
on trn1.2xlarge
instances:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: trainium-trn1
spec:
template:
metadata:
labels:
instanceType: trn1.2xlarge
provisionerType: Karpenter
neuron.amazonaws.com/neuron-device: "true"
vpc.amazonaws.com/has-trunk-attached: "true" # Required for Pod ENI
spec:
taints:
- key: aws.amazon.com/neuron
value: "true"
effect: "NoSchedule"
requirements:
- key: node.kubernetes.io/instance-type
operator: In
values: ["trn1.2xlarge"]
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
- key: "karpenter.sh/capacity-type"
operator: In
values: ["on-demand"]
expireAfter: 720h
terminationGracePeriod: 24h
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: trainium-trn1
limits:
aws.amazon.com/neuron: 2
cpu: 16
memory: 64Gi
disruption:
consolidateAfter: 300s
consolidationPolicy: WhenEmptyOrUnderutilized
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: trainium-trn1
spec:
amiFamily: AL2
amiSelectorTerms:
- alias: al2@latest
instanceStorePolicy: RAID0
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
deleteOnTermination: true
encrypted: true
volumeSize: 500Gi
volumeType: gp3
role: ${KARPENTER_NODE_ROLE}
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
- tags:
kubernetes.io/cluster/eks-workshop: owned
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
kubernetes.io/role/internal-elb: "1"
tags:
app.kubernetes.io/created-by: eks-workshop
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
aws-neuron: "true"
We're asking the NodePool
to start all new nodes with a Kubernetes label provisionerType: Karpenter
, which will allow us to specifically target Karpenter nodes with pods for demonstration purposes. Since there are multiple nodes being autoscaled by Karpenter, there are additional labels added such as instanceType: trn1.2xlarge
to indicate that this Karpenter node should be assigned to trainium-trn1
pool.
The NodePool CRD supports defining node properties like instance type and zone. In this example, we're setting the karpenter.sh/capacity-type
to initially limit Karpenter to provisioning On-Demand instances, as well as karpenter.k8s.aws/instance-type
to limit to a subset of specific instance type. You can learn which other properties are available here. In this case, there are specifications matching the requirements of the Ray Workers
that will run on trn1.2xlarge
instances type.
A Taint
defines a specific set of properties that allow a node to repel a set of pods. This property works with its matching label, a Toleration
. Both tolerations and taints work together to ensure that pods are properly scheduled onto the appropriate pods. You can learn more about the other properties in this resource.
A NodePool
can define a limit on the amount of CPU and memory managed by it. Once this limit is reached Karpenter will not provision additional capacity associated with that particular NodePool
, providing a cap on the total compute.
Both of these defined node pools will allow Karpenter to properly schedule nodes and handle the workload demands of the Ray Cluster.
Apply the NodePool
and EC2NodeClass
manifests for both pools:
ec2nodeclass.karpenter.k8s.aws/trainium-trn1 created
ec2nodeclass.karpenter.k8s.aws/x86-cpu-karpenter created
nodepool.karpenter.sh/trainium-trn1 created
nodepool.karpenter.sh/x86-cpu-karpenter created
Once properly deployed, check for the node pools:
NAME NODECLASS
trainium-trn1 trainium-trn1
x86-cpu-karpenter x86-cpu-karpenter
As seen from the above command, both node pools have been properly provisioned, allowing Karpenter to allocate new nodes into the newly created pools as needed.