A
Aider logoAiderTeach a small open model your task on a single GPU

Fine-Tune an Open Model on One GPU (QLoRA)

setuproll@setuproll
84.0Overall score

A budget fine-tuning pipeline that adapts an open model to your domain with QLoRA, fitting training onto a single consumer GPU and exporting straight to a GGUF you can run in Ollama. For builders who have a few thousand labeled examples and want a specialized model without renting a cluster.

84.0Score
980Votes
5Components

Install this build

Export
terminal
pip install unsloth && python train.py --model llama-3.1-8b --4bit

Components

Model

  • Llama 3.1 8B
  • Qwen3 8B
  • Gemma 3 12B

Stack

  • Unsloth
  • TRL
  • PEFT
  • bitsandbytes
  • Weights & Biases

Hardware

  • 1x RTX 4090 24GB
  • 16GB works for 8B with 4-bit

Export

  • llama.cpp GGUF convert
  • Ollama Modelfile

How it works

  • Format your examples as instruction or chat JSONL
  • Unsloth loads the base model in 4-bit and trains LoRA adapters
  • Track loss and eval samples in Weights & Biases as it runs
  • Merge, convert to GGUF, and serve the tuned model in Ollama

Summary

A budget fine-tuning pipeline that adapts an open model to your domain with QLoRA, fitting training onto a single consumer GPU and exporting straight to a GGUF you can run in Ollama. For builders who have a few thousand labeled examples and want a specialized model without renting a cluster.

84.0 score 980 votes

0 Reviews

Your rating
Sign in to post

Loading discussion...