Training Llama-2 Model on AWS Trainium
In this blog we will run multi-node training jobs using AWS Trainium accelerators in Amazon EKS. Specifically, you will pretrain Llama-2-7b on 4 AWS EC2 trn1.32xlarge instances using a subset of the RedPajama dataset.
Selecting the Right Llama-2 Model Size
Choosing the appropriate model size of Llama-2 depends on your specific requirements. The largest model might not always be necessary for optimal performance. It’s crucial to consider factors like computational resources, response times, and cost efficiency. Make an informed decision by assessing the needs and limitations of your application thoroughly.
Performance Boost
While Llama-2 can achieve high-performance inference on GPUs, Neuron accelerators take performance to the next level. Neuron accelerators are purpose-built for machine learning workloads, providing hardware acceleration that significantly enhances Llama-2’s inference speeds. This translates to faster response times and improved user experiences when deploying Llama-2 on Trn1 instances.
Solution Architecture
Deploying the Solution
Prerequisites
Before we begin, ensure you have all the prerequisites in place to make the deployment process smooth and hassle-free.
Ensure that you have installed the following tools on your EC2.
- EC2 Instance please ensure you have 100GB+ of storage
- AWS CLI
- kubectl
- Git
- Docker
- terraform
- Python, pip, jq, unzip
Clone the Data on EKS repository
git clone https://github.com/awslabs/data-on-eks.git
cd data-on-eks/ai-ml/trainium-inferentia
By default MPI operator is not installed and its set to false. We will run the below export commands to set environment variables.
export TF_VAR_enable_mpi_operator=true
export TF_VAR_enable_fsx_for_lustre=true
export TF_VAR_region=us-west-2
export TF_VAR_trn1_32xl_min_size=4
export TF_VAR_trn1_32xl_desired_size=4
Run the install script to provision an EKS cluster with all the add-ons needed for the solution.
./install.sh
Verify the resources
Verify the Amazon EKS Cluster
aws eks --region us-west-2 describe-cluster --name trainium-inferentia
# Creates k8s config file to authenticate with EKS
aws eks --region us-west-2 update-kubeconfig --name trainium-inferentia
kubectl get nodes # Output shows the EKS Managed Node group nodes
Distributed training
Once the EKS Cluster is deployed, you can proceed with the next steps of building neuronx-nemo-megatron container image and pushing the image to ECR.Build the neuronx-nemo-megatron container image
Navigate to examples/llama2 directory
cd examples/llama2/
1-llama2-neuronx-pretrain-build-image.sh
script to build the neuronx-nemo-megatron container image and push the image into ECR.
When prompted for a region, enter the region in which you launched your EKS cluster, above.
./1-llama2-neuronx-pretrain-build-image.sh
Launch and connect to a CLI pod
In this step we need access to the shared FSx storage. To copy files to this storage, we’ll first launch and connect to a CLI pod running the neuronx-nemo-megatron docker image that you created above. Run the following script to launch the CLI pod:
./2-launch-cmd-shell-pod.sh
kubectl get pod -w
kubectl exec -it cli-cmd-shell -- /bin/bash
Download the Llama tokenizer and Redpajama dataset to FSx
From within the CLI pod, we’ll download the Llama tokenizer files. These files are protected by Meta’s Llama license, so you will need to run thehuggingface-cli login
command to login to Hugging Face using your access token. The access token is found under Settings → Access Tokens on the Hugging Face website.
huggingface-cli login
ENTER
.
Next, you download the llama7-7b tokenizer files to /shared/llama7b_tokenizer by running the following python code:
python3 <
cd /shared
git clone https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T-Sample \
data/RedPajama-Data-1T-Sample
Tokenize the dataset
Tokenize the dataset using the preprocessing script included with neuronx-nemo-megatron. This preprocessing step will take ~60 minutes to run on a trn1.32xl instance.
cd /shared
# Clone the neuronx-nemo-megatron repo, which includes the required scripts
git clone https://github.com/aws-neuron/neuronx-nemo-megatron.git
# Combine the separate redpajama files to a single jsonl file
cat /shared/data/RedPajama-Data-1T-Sample/*.jsonl > /shared/redpajama_sample.jsonl
# Run preprocessing script using llama tokenizer
python3 neuronx-nemo-megatron/nemo/scripts/nlp_language_modeling/preprocess_data_for_megatron.py \
--input=/shared/redpajama_sample.jsonl \
--json-keys=text \
--tokenizer-library=huggingface \
--tokenizer-type=/shared/llama7b_tokenizer \
--dataset-impl=mmap \
--output-prefix=/shared/data/redpajama_sample \
--append-eod \
--need-pad-id \
--workers=32
Modify dataset and tokenizer paths in the training script
Note: When we later launch our training jobs in EKS, the training pods will run the training script from within neuronx-nemo-megatron/nemo/examples directory on FSx. This is convenient, because it will let you modify your training script directly on FSx without requiring that you rebuild the neuronx-nemo-megatron container for every change. Modify the test_llama.sh script/shared/neuronx-nemo-megatron/nemo/examples/nlp/language_modeling/test_llama.sh
to update the following two lines. These lines tell the training pod workers where to find the Llama tokenizer and the dataset on the FSx filesystem.
Run:
sed -i 's#^\(: ${TOKENIZER_PATH=\).*#\1/shared/llama7b_tokenizer}#' /shared/neuronx-nemo-megatron/nemo/examples/nlp/language_modeling/test_llama.sh
sed -i 's#^\(: ${DATASET_PATH=\).*#\1/shared/data/redpajama_sample_text_document}#' /shared/neuronx-nemo-megatron/nemo/examples/nlp/language_modeling/test_llama.sh
: ${TOKENIZER_PATH=$HOME/llamav2_weights/7b-hf}
: ${DATASET_PATH=$HOME/examples_datasets/llama_7b/book.jsonl-processed_text_document}
: ${TOKENIZER_PATH=/shared/llama7b_tokenizer}
: ${DATASET_PATH=/shared/data/redpajama_sample_text_document}
CTRL-X
, then y
, then ENTER
.
When you are finished, type exit
or press CTRL-d
to exit the CLI pod.
If you no longer need the CLI pod you can remove it by running:
kubectl delete pod cli-cmd-shell
kubectl get all -n mpi-operator
Run pre-compilation job
Run the pre-compilation script
./3-llama2-neuronx-mpi-compile.sh
kubectl get pods | grep compile
and wait until you see that the compile job shows ‘Completed’.
When pre-compilation is complete, you can then launch the pre-training job on 4 trn1.32xl nodes by running the following script:
Run training job
./4-llama2-neuronx-mpi-train.sh
View training job output
To monitor the training job output – first, find the name of the launcher pod associated with your training job:
kubectl get pods | grep launcher
kubectl get pod test-mpi-train-launcher-xxx -o json | jq -r ".metadata.uid"
UID
in the following command with the above value.
kubectl exec -it test-mpi-train-worker-0 -- tail -f /shared/nemo_experiments/UID/0/log
CTRL-C
to quit the tail command.
Monitor Trainium accelerator utilization
To monitor Trainium accelerator utilization you can use the neuron-top command. Neuron-top is a console-based tool for monitoring Neuron and system-related performance metrics on trn1/inf2/inf1 instances. You can launch neuron-top on one of the worker pods as follows:
kubectl exec -it test-mpi-train-worker-0 -- /bin/bash -l neuron-top
View training job metrics in TensorBoard
TensorBoard is a web-based visualization tool that is commonly used to monitor and explore training jobs. It allows you to quickly monitor training metrics, and you can also easily compare metrics across different training runs. TensorBoard logs available in the /shared/nemo_experiments/ directory on the FSx for Lustre filesystem. Run the following script to create a TensorBoard deployment so you can visualize your Llama-2 training job progress:
./5-deploy-tensorboard.sh
Stopping the training job
To stop your training job and remove the launcher/worker pods, run the following command:
kubectl delete mpijob test-mpi-train
kubectl get pods
to confirm that the launcher/worker pods have been removed.
Cleaning up
To remove the resources created using this solution, run the cleanup script:
cd data-on-eks/ai-ml/trainium-inferentia
./cleanup.sh