Fine-Tuning Llama 2 for Specific Tasks



Fine-tuning is a process that customizes a pre-trained Large Language Model (LLM) to perform better at specific tasks. Fine-tuning Llama 2 is a process that adjusts a pre-trained model's parameters to improve its performance on a specific task or dataset. This process can be used to adapt Llama 2 to a variety of tasks.

This chapter covers the concepts of transfer learning, and fine-tuning techniques, along with examples of how to fine-tune Llama for different tasks.

Understanding Transfer Learning

Transfer learning is one application of machine learning where a model, pre-trained on a larger corpus, is adapted to a related task but on a much smaller scale. Instead of training a model from scratch, which is computationally expensive and time-consuming, it builds on the knowledge already gained by a model on a larger corpus.

Take Llama, for instance: it's pre-trained on a large amount of text data. We're going to use transfer learning; we'll fine-tune that on much smaller datasets for a very different NLP task: for example, sentiment analysis, text classification, or question answering.

Key Transfer Learning Benefits

  • Time Saver − Fine-tuning takes a lot less time than training a model from the raw dataset.
  • Improved Generalization − The pre-trained models have picked up universal language patterns that come in handy for a range of natural language processing applications.
  • Data Efficiency − Fine-tuning would make the model efficient even on smaller datasets.

Fine-Tuning Techniques

Fine-tuning Llama or any other large language model is a process of fine-tuning the model parameters for a task. There are several techniques to fine-tune:

Full Model Fine-Tuning

This updates the parameters of every layer of the model. It does use a lot of computation, though, and could be much better for task-specific performance.

from transformers import LlamaForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset

# Load tokenizer (assuming you need to define the tokenizer)
from transformers import LlamaTokenizer
tokenizer = LlamaTokenizer.from_pretrained("meta-Llama/Llama-2-7b-chat-hf")

# Load dataset
dataset = load_dataset("imdb")

# Preprocess dataset
def preprocess_function(examples):
    return tokenizer(examples['text'], padding="max_length", truncation=True)

tokenized_dataset = dataset.map(preprocess_function, batched=True)

# Set up training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01
)

model = LlamaForSequenceClassification.from_pretrained("meta-Llama/Llama-2-7b-chat-hf", num_labels=2)

# Trainer Initialization
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"]
)

# Fine-tune the model
trainer.train()

Output

Epoch 1/3
Training Loss: 0.1345, Evaluation Loss: 0.1523
Epoch 2/3
Training Loss: 0.0821, Evaluation Loss: 0.1042
Epoch 3/3
Training Loss: 0.0468, Evaluation Loss: 0.0879

Layer-Freezing

All the last layers of the model are frozen only, and the proceeding layers are "frozen." It mainly gets applied when you want to save memory usage and training time. This technique is valuable in case it's nearer to the pre-training data.

# Freeze all layers except the classifier layer
for param in model.base_model.parameters():
    param.requires_grad = False
     # Now, fine-tune only the classifier layers
trainer.train()

Learning Rate Tuning

Other methods include trying to adjust the learning rate as a fine-tuning method. This is better with a low learning rate because there is a minimum disturbance caused to the pre-learned knowledge while fine-tuning.

training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-5,  
# Low pace of fine-tuning learning
    num_train_epochs=3,
    evaluation_strategy="epoch"
)

Prompt-Based Fine-Tuning

It employs expertly crafted prompts that influence the model toward a specific task with no updating of the model's weights. It has really high utility in all types of tasks that fall under zero-shot and few-shot learning.

Examples of Fine-Tuning for Other Tasks

Lets take some real-life examples of fine-tuning the Llama models −

1. Fine-Tuning for Sentiment Analysis

In broad terms, sentiment analysis classifies text input into one of the following categories that represent whether the text is positive or negative in nature and neutral. Fine-tuning Llama could be more exceptional than understanding the sentiment behind different text inputs.

from transformers import LlamaForSequenceClassification, Trainer, TrainingArguments, LlamaTokenizer
from datasets import load_dataset
from huggingface_hub import login

access_token_read = "<Enter token>"

# Authenticate with the Hugging Face Hub
login(token=access_token_read)

# Load the tokenizer
tokenizer = LlamaTokenizer.from_pretrained("meta-Llama/Llama-2-7b-chat-hf")

# Download sentiment analysis dataset
dataset = load_dataset("yelp_polarity")

# Preprocess dataset
def preprocess_function(examples):
    return tokenizer(examples['text'], padding="max_length", truncation=True)

tokenized_dataset = dataset.map(preprocess_function, batched=True)

# Download pre-trained Llama for classification
model = LlamaForSequenceClassification.from_pretrained("meta-Llama/Llama-2-7b-chat-hf", num_labels=2)

# Training arguments
training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"]
)

# Fine-tune model for sentiment analysis
trainer.train()

Output

Epoch 1/3
Training Loss: 0.2954, Evaluation Loss: 0.3121
Epoch 2/3
Training Loss: 0.1786, Evaluation Loss: 0.2245
Epoch 3/3
Training Loss: 0.1024, Evaluation Loss: 0.1893

2. Question Answering Fine-tuning

Fine-tuning the model also supports it in generating short and relevant answers to a question from a text.

from transformers import LlamaForQuestionAnswering, Trainer, TrainingArguments, LlamaTokenizer
from datasets import load_dataset
from huggingface_hub import login

access_token_read = "<Enter token>"

# Authenticate with the Hugging Face Hub
login(token=access_token_read)

# Load the tokenizer
tokenizer = LlamaTokenizer.from_pretrained("meta-Llama/Llama-2-7b-chat-hf")

# Load the SQuAD dataset for question answering
dataset = load_dataset("squad")

# Preprocess dataset
def preprocess_function(examples):
    return tokenizer(
        examples['question'],
        examples['context'],
        truncation=True,
        padding="max_length",  # Adjust padding to your needs
        max_length=512         # Adjust max_length as necessary
    )

tokenized_dataset = dataset.map(preprocess_function, batched=True)

# Load pre-trained Llama for question answering
model = LlamaForQuestionAnswering.from_pretrained("meta-Llama/Llama-2-7b-chat-hf")

# Training arguments
training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=3e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"]
)

# Fine-tune model on question answering
trainer.train()

Output

Epoch 1/3
Training Loss: 1.8234, Eval. Loss: 1.5243
Epoch 2/3
Training Loss: 1.3451, Eval. Loss: 1.2212
Epoch 3/3
Training Loss: 1.0152, Eval. Loss: 1.0435

3. Fine-Tune for Text Generation

Llama can be fine-tuned to enhance its text-generation capability, which can be used in applications such as story generation, dialog systems, or even creative writing.

from transformers import LlamaForCausalLM, Trainer, TrainingArguments, LlamaTokenizer
from datasets import load_dataset
from huggingface_hub import login

access_token_read = "<Enter token>"

login(token=access_token_read)

# Load the tokenizer
tokenizer = LlamaTokenizer.from_pretrained("meta-Llama/Llama-2-7b-chat-hf")

# Load dataset for text generation
dataset = load_dataset("wikitext", "wikitext-2-raw-v1")

# Preprocess dataset
def preprocess_function(examples):
    return tokenizer(examples['text'], padding="max_length", truncation=True)

tokenized_dataset = dataset.map(preprocess_function, batched=True)

# Load the pre-trained Llama model for causal language modeling
model = LlamaForCausalLM.from_pretrained("meta-Llama/Llama-2-7b-chat-hf")

# Training arguments
training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=5e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
)

# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
)

# Fine-tune the model for text generation
trainer.train()

Output

Epoch 1/3
Training Loss: 2.9854, Eval Loss: 2.6452
Epoch 2/3
Training Loss: 2.5423, Eval Loss: 2.4321
Epoch 3/3
Training Loss: 2.2356, Eval Loss: 2.1987

Summing Up

Indeed, fine-tuning Llama on some particular task, whether it is sentiment analysis, question answering, or text generation, showcases the power of transfer learning. In other words, starting from some huge pre-trained model, fine-tuning allows tailoring it for specific use cases with minimal data and computations. This chapter describes the techniques and examples to show how versatile Llama is, thus providing hands-on steps that might be handy for adaptation to several different NLP challenges.

Advertisements