"SPARC: Subspace-Aware Prompt Adaptation for Robust Continual Learning in LLMs"
Below podcast on this paper is generated with Google's Illuminate.
https://arxiv.org/abs/2502.02909
The challenge is enabling LLMs to learn new tasks continually without losing old knowledge, known as catastrophic forgetting, especially under resource constraints. This paper introduces SPARC to tackle this problem efficiently.
This paper proposes SPARC, a framework using subspace-guided prompt tuning, to allow LLMs to learn continually by adapting prompts in a low-dimensional space derived from PCA.
-----
📌 SPARC's strength is in efficient continual learning. It cleverly uses PCA to identify task subspaces. Orthogonal prompt initialization then ensures minimal interference. This approach significantly reduces parameter updates, making it highly scalable.
📌 PCA-based subspace identification is key. It allows SPARC to adapt prompts in a low-dimensional space. This focuses learning on task-relevant features. The cosine similarity metric effectively guides prompt reuse or orthogonalization.
📌 By freezing most LLM parameters, SPARC preserves pre-trained knowledge. Fine-tuning only soft prompts achieves strong performance. This highlights the effectiveness of targeted, subspace-aware adaptation for continual learning in LLMs.
----------
Methods Explored in this Paper 🔧:
→ SPARC uses prompt tuning. It adapts LLMs to new tasks by adding small, trainable vectors called soft prompts to the input. This keeps the main LLM parameters frozen.
→ Principal Component Analysis, or PCA, is used to find important features of each task's data. PCA reduces the data into a lower-dimensional subspace. This subspace captures the most important information for each task.
→ SPARC checks if new tasks are similar to old ones using cosine similarity. Cosine similarity measures the overlap between task subspaces. If tasks are similar, SPARC reuses existing prompts. This saves computation and helps transfer knowledge.
→ For dissimilar tasks, SPARC creates new prompts in orthogonal subspaces. Orthogonal subspaces are independent. This ensures new tasks don't interfere with old ones, preventing forgetting.
→ SPARC is parameter-efficient. It only trains the soft prompts, which are a tiny fraction of the LLM's parameters. This makes it scalable and resource-friendly. SPARC can also integrate with LoRA for further efficiency.
-----
Key Insights 💡:
→ Subspace-guided prompt tuning effectively mitigates catastrophic forgetting in LLMs during continual learning.
→ Reusing prompts for similar tasks and creating orthogonal prompts for dissimilar tasks balances knowledge retention and adaptation.
→ PCA helps in identifying task-relevant features and creating efficient, low-dimensional prompts.
→ SPARC is highly parameter-efficient, requiring fine-tuning of only a small percentage of model parameters. This makes continual learning more practical for large models.
-----
Results 📊:
→ SPARC achieves no forgetting in task-incremental learning.
→ In domain-incremental learning, the average forgetting ratio is below 5%.
→ SPARC maintains over 97% prior knowledge retention.
→ SPARC fine-tunes only 0.04% of the model's parameters. Combining SPARC with LoRA, it fine-tunes only 1% of parameters while improving accuracy.
→ SPARC consistently outperforms baseline fine-tuning methods in knowledge retention and transfer.