A new fine-tuning technique aims to solve “catastrophic forgetting,” a limitation that often complicates repeated model updates in enterprise deployments.
Researchers at MIT, the Improbable AI Lab, and ETH Zurich have introduced a fine-tuning method designed to let models learn new tasks while preserving previously acquired capabilities.
To prevent degrading existing capabilities, many organizations isolate new tasks into separate fine-tuned models or adapters. That fragmentation increases costs and adds governance complexity, requiring teams to continually retest models to avoid regression.
The new technique, called self-distillation fine-tuning (SDFT), is designed to address that tradeoff.
The researchers said that SDFT “leverages in-context learning by using a demonstration-conditioned model as its own teacher, generating on-policy training signals that preserve prior capabilities while acquiring new skills.”
They added that it consistently outperforms Supervised Fine Tuning (SFT) “across skill learning and knowledge acquisition tasks,” achieving higher new-task accuracy “while substantially reducing catastrophic forgetting.”
In experiments, the researchers found the method enables models to accumulate new skills sequentially while preserving performance on prior tasks, a capability that could simplify how enterprises update and specialize production models over time.
The need and the solution
Despite rapid advances in foundation models, most enterprise AI systems remain static once deployed. Prompting and retrieval can adjust behavior at inference time, but the model’s parameters do not change to internalize new skills or knowledge.
As a result, each new fine-tuning cycle risks catastrophic forgetting, where gains on a new task degrade performance on earlier ones.
“To enable the next generation of foundation models, we must solve the problem of continual learning: enabling AI systems to keep learning and improving over time, similar to how humans accumulate knowledge and refine skills throughout their lives,” the researchers noted.
Reinforcement learning offers a way to train on data generated by the model’s own policy, which reduces forgetting. However, it typically requires explicit reward functions, which are not easy in every situation.
SDFT suggests an alternative. Instead of inferring a reward function, it uses the model’s in-context learning ability to generate on-policy learning signals from demonstrations.
During training, the same model plays two roles. A teacher version is conditioned on both the query and expert examples. A student version sees only the query, reflecting real-world deployment. The student updates its parameters to align with the teacher’s predictions on its own generated outputs.
“In sequential learning experiments, SDFT enables a single model to accumulate multiple skills over time without performance regression, establishing on-policy distillation as a practical path to continual learning from demonstrations,” the researchers said.
Challenges to overcome
SDFT appears quite realistic as the technique removes the need for maintaining “model zoos” of separate adapters or fine-tuned variants, according to Lian Jye Su, chief analyst at Omdia.
However, whether this translates to commercial deployment remains to be seen as certain challenges persist.
For instance, SDFT requires significantly more training time and roughly 2.5 times the computing power of standard SFT. It also depends on sufficiently capable base models with strong in-context learning ability.
Sanchit Vir Gogia, chief analyst at Greyhound Research, also warned that SDFT does not eliminate the need for regression infrastructure. Because the model learns from its own generated rollouts, enterprises must ensure reproducibility through strict version control and artifact logging.
“Consolidation shifts operational complexity from model count to governance depth,” Gogia said.
The costs can be offset, according to Su, by avoiding catastrophic forgetting of key context and complex reward functions in reinforcement learning. But it may be a while before this reaches enterprises. “SDFT will most likely be experimented with first for internal developer tools and general assistants where the risk of a ‘self-taught error’ will be lower than in regulated domains like financial or medical decision-making,” said Faisal Kawoosa, founder and lead analyst at Techarc.
Go to Source
Author: