Large Language Models with Ray Serve

Before you start

Prepare your environment for this section:

~$prepare-environment aiml/chatbot

This will make the following changes to your lab environment:

Installs Karpenter in the Amazon EKS cluster
Creates an IAM Role for the Pods to use

You can view the Terraform that applies these changes here.

Mistral 7B, a 7.3B parameter model, is one of the most powerful language model for its size to date. It represents a significant advancement in language model technology, combining powerful capabilities like Text generation and completion, Information extraction, Data analysis, API interaction, Complex reasoning tasks with practical efficiency.

This section will focus on gaining insights into the intricacies of deploying LLMs efficiently on EKS.

For deploying and scaling LLMs, this lab will utilize AWS Trainium within the Trn1 family, such as trn1.2xlarge. Additionally, the chatbot inference workloads will utilize the Ray Serve module for building online inference APIs and streamlining the deployment of machine learning models, as well as the Gradio UI for accessing the Mistral-7B chatbot.