LoRA vs. Full Parameter Fine-Tuning

When to look at LoRA vs. fine-tuning?

Both LoRA (and more generally PEFT) are ways to fine-tune LLMs. When and why should we pick one over the other? First let’s understand the differences.

Full fine-tuning

  • Update all parameters of the model.
  • Requires lots of GPU memory, long training runs, and careful optimization.
  • Produces a single “frozen” model per task.

LoRA (Low-Rank Adapters)

  • Freeze the base model; inject trainable low-rank matrices into attention layers.
  • Train far fewer parameters (often <1%).
  • Can load/swap adapters on the fly.
  • Nearly identical inference cost to the base model.
  • These differences matter because they affect:
  • Cost (GPU hours, memory)
  • Speed (training + inference latency)
  • Flexibility (can you swap adapters per task?)
  • Performance (final accuracy or quality)

Full parameter fine tuning is good for:

  • tasks requiring high accuracy and task-specific understanding (legal or financial document analysis)
  • specialized vocabulary or complex subject matter, like medicine, law, or finance
  • comprehensive adaptation to the new data
  • You want the model to forget pretraining quirks (e.g., toxicity, bias)
  • You plan to ship a single high-quality model, not many variants of that model (adapters)

LoRA is good for

  • When you have lower resources (GPUs)
  • You’re adapting a foundation model to a narrow task (e.g., sentiment classification, SQL translation).
  • The task is somewhat generic and an existing LLM can perform well