Skip to main content
  1. Data Science Blog/

Stanford Alpaca

·1033 words·5 mins· loading · ·
AI/ML Models Language Models (LLMs) Technology Trends & Future Language Models (LLMs) Specific AI Models AI Models AI Model Training Open Source AI Machine Learning

Stanford-Alpaca

Stanford Alpaca
#

Introduction
#

Stanford Alpaca Github Report

  • Stanford Alpaca is An “Instruction-following” LLaMA Model
  • This is the repo aims to build and share an instruction-following LLaMA model. The repo contains:
    • The 52K instruction-following data used for fine-tuning the model.
    • The code for generating the data.
    • The code for fine-tuning the model.
    • The code for recovering Alpaca-7B weights from our released weight diff.

Overview
#

  • The current “Alpaca 7B model” is fine-tuned from a “7B LLaMA” model on 52K instruction-following data generated by the techniques in the Self-Instruct paper.
  • Alpaca 7B model behaves similarly to the text-davinci-003 model on the Self-Instruct instruction-following evaluation suite.
  • Alpaca is still under development, and there are many limitations that have to be addressed.
  • Alphaca is not yet fine-tuned to be safe and harmless.
  • Initial release contains the data generation procedure, dataset, and training recipe.
  • Model weights can be released if the creators of LLaMA gives permission.
  • Live demo to help readers better understand the capabilities and limits of Alpaca is available.
  • Based on followin papers:
    • LLaMA: Open and Efficient Foundation Language Models. Hugo2023
    • Self-Instruct: Aligning Language Model with Self Generated Instructions. Yizhong2022
  • Data Release
    • alpaca_data.json contains 52K instruction-following data we used for fine-tuning the Alpaca model. This JSON file is a list of dictionaries, each dictionary contains the following fields: Instruction, input, output (text-davinci-003 geneated answer).

Highlevel Activities of the Alpaca Project
#

Highlevel Actitivies done by Stanford Alpaca team and Project Output

  1. Data Generation: The team used OpenAI’s GPT-3.5 model to generate a dataset of 52,000 instruction-response pairs. They did this by providing GPT-3.5 with a variety of instructions and asking it to produce corresponding responses.

  2. Fine-Tuning: They used this generated dataset to fine-tune Meta’s LLaMA model, making it better at following instructions.

  3. Evaluation: The fine-tuned Alpaca model was then evaluated for its ability to follow instructions effectively, comparing its performance to more advanced models.

Output: The project resulted in a fine-tuned version of the LLaMA model, called Alpaca, which is smaller, more efficient, and capable of following instructions well.

Capablitities
#

This model can perform following tasks.

Data Generation
#

Output
#

  • This process produced an instruction-following dataset with 52K examples obtained at a much lower cost (less than $500).
  • The dataset of 52K generated data is much more diverse than the data released by self-instruct.

Process
#

  • Built on the data generation pipeline from self-instruct and made the following modifications:
  • Used text-davinci-003 to generate the instruction data instead of davinci.
  • Wrote a new prompt (prompt.txt) that explicitly gave the requirement of instruction generation to text-davinci-003.
  • Adopted much more aggressive batch decoding, i.e., generating 20 instructions at once, which significantly reduced the cost of data generation.
  • Simplified the data generation pipeline by discarding the difference between classification and non-classification instructions.
  • Only generated a single instance for each instruction

Fine-tuning
#

Created a fine-tuned model using standard Hugging Face training code. fine-tuned LLaMA-7B and LLaMA-13B with the following hyperparameters:

- Hyperparameter	LLaMA-7B	LLaMA-13B
- Batch size	128	128
- Learning rate	2e-5	1e-5
- Epochs	3	5
- Max length	512	512
- Weight decay	0	0

Fine-tuning Dependency and LLaMa Installation
#

To reproduce fine-tuned model, first install the requirements

pip install -r requirements.txt

Command to finetune
#

  • Below is a command that fine-tunes LLaMA-7B with our dataset on a machine with 4 A100 80G GPUs in FSDP full_shard mode.
  • We were able to reproduce a model of similar quality as the one we hosted in our demo with the following command using Python 3.10.
  • Replace <your_random_port> with a port of your own, <your_path_to_hf_converted_llama_ckpt_and_tokenizer> with the path to your converted checkpoint and tokenizer (following instructions in the PR), and <your_output_dir> with where you want to store your outputs.
torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
    --model_name_or_path <your_path_to_hf_converted_llama_ckpt_and_tokenizer> \
    --data_path ./alpaca_data.json \
    --bf16 True \
    --output_dir <your_output_dir> \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --tf32 True

The same script also works for OPT fine-tuning. Here’s an example for fine-tuning OPT-6.7B

torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
    --model_name_or_path "facebook/opt-6.7b" \
    --data_path ./alpaca_data.json \
    --bf16 True \
    --output_dir <your_output_dir> \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'OPTDecoderLayer' \
    --tf32 True

Addressing OOM
#

  • Naively, fine-tuning a 7B model requires about 7 x 4 x 4 = 112 GB of VRAM.

  • Commands given above enable parameter sharding, so no redundant model copy is stored on any GPU. If you’d like to further reduce the memory footprint, here are some options:

  • Turn on CPU offload for FSDP with –fsdp “full_shard auto_wrap offload”. This saves VRAM at the cost of longer runtime.

  • DeepSpeed stage-3 (with offload) can at times be more memory efficient than FSDP with offload. Here’s an example to use DeepSpeed stage-3 with 4 GPUs with both parameter and optimizer offload:

pip install deepspeed
torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
    --model_name_or_path <your_path_to_hf_converted_llama_ckpt_and_tokenizer> \
    --data_path ./alpaca_data.json \
    --bf16 True \
    --output_dir <your_output_dir> \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --deepspeed "./configs/default_offload_opt_param.json" \
    --tf32 True

The DeepSpeed library also provides some helpful functions to estimate memory usage.

  • LoRA fine-tunes low-rank slices of the query, key, and value embedding heads. This can reduce the total memory footprint from 112GB to about 7x4=28GB.
  1. Convert Meta’s released weights into huggingface format. Follow this guide: https://huggingface.co/docs/transformers/main/model_doc/llama
  2. Make sure you cloned the released weight diff into your local machine. The weight diff is located at: https://huggingface.co/tatsu-lab/alpaca-7b/tree/main
  3. Run this function with the correct paths. E.g., python weight_diff.py recover –path_raw <path_to_step_1_dir> –path_diff <path_to_step_2_dir> –path_tuned <path_to_store_recovered_weights> Once step 3 completes, you should have a directory with the recovered weights, from which you can load the model like the following
import transformers
alpaca_model = transformers.AutoModelForCausalLM.from_pretrained("<path_to_store_recovered_weights>")
alpaca_tokenizer = transformers.AutoTokenizer.from_pretrained("<path_to_store_recovered_weights>")

Related

From Claw Code to Clean Room: A Developer's Guide to Re-implementing Software Without Getting Sued
·2854 words·14 mins· loading
AI Ethics & Governance Software Development Technology Trends & Future Clean Room Design Intellectual Property AI Code Generation Software Copyright Trade Secrets Software Development
From Claw Code to Clean Room: A Developer’s Guide to Re-implementing Software Without Getting …
100 Websites You Only Need on the Internet
·1402 words·7 mins· loading
Data Science Resources Data Science Artificial Intelligence Developer Tools AI Tools Productivity Tools Online Learning
100 Websites You Only Need on the Internet # The internet has billions of pages. Most of them are …
The AI Leadership Playbook: A Reusable Workflow Template
·939 words·5 mins· loading
Business & Career Artificial Intelligence Career Development AI Integration Generative AI Future of Work
The AI Leadership Playbook: A Reusable Workflow Template # Part 7 of the Human Skills, AI-Expanded …
Agentic AI for Business Leaders: When Agents Help and When They Do Not
·967 words·5 mins· loading
Artificial Intelligence Business & Career Technology Trends & Future Career Development AI Integration Generative AI Future of Work
Agentic AI for Business Leaders: When Agents Help and When They Do Not # Part 6 of the Human …
AI for Technology Executives: Scenarios and Prompts
·1169 words·6 mins· loading
Business & Career Artificial Intelligence Technology Trends & Future Career Development AI Integration Generative AI Cybersecurity
AI for Technology Executives: Scenarios and Prompts # Part 5 of the Human Skills, AI-Expanded …