LoRA vs. Full Parameter Fine-Tuning
When to look at LoRA vs. fine-tuning?
Both LoRA (and more generally PEFT) are ways to fine-tune LLMs. When and why should we pick one over the other? First let’s understand the differences.
Full fine-tuning
- Update all parameters of the model.
- Requires lots of GPU memory, long training runs, and careful optimization.
- Produces a single “frozen” model per task.
LoRA (Low-Rank Adapters)
- Freeze the base model; inject trainable low-rank matrices into attention layers.
- Train far fewer parameters (often <1%).
- Can load/swap adapters on the fly.
- Nearly identical inference cost to the base model.
- These differences matter because they affect:
- Cost (GPU hours, memory)
- Speed (training + inference latency)
- Flexibility (can you swap adapters per task?)
- Performance (final accuracy or quality)
Full parameter fine tuning is good for:
- tasks requiring high accuracy and task-specific understanding (legal or financial document analysis)
- specialized vocabulary or complex subject matter, like medicine, law, or finance
- comprehensive adaptation to the new data
- You want the model to forget pretraining quirks (e.g., toxicity, bias)
- You plan to ship a single high-quality model, not many variants of that model (adapters)
LoRA is good for
- When you have lower resources (GPUs)
- You’re adapting a foundation model to a narrow task (e.g., sentiment classification, SQL translation).
- The task is somewhat generic and an existing LLM can perform well