
Information source:
https://www.marktechpost.com/2025/09/08/a-new-mit-study-shows-reinforcement-learning-minimizes-catastrophic-forgetting-compared-to-supervised-fine-tuning/
A fundamental challenge facing artificial intelligence systems is being answered. The latest research from MIT found that reinforcement learning training methods can significantly reduce the phenomenon of "catastrophic forgetting" in large language models and robotic systems, a breakthrough that could redefine the continuous learning capabilities of AI systems. This study proposes mathematical rules for quantifying the degree of forgetting for the first time and reveals the fundamental impact of different training methods on the memory retention ability of AI systems.
Catrosy forgetting is a core pain point in current AI system deployment. When the model learns new tasks, the skills and knowledge you have mastered before are often lost, which seriously limits the practicality and scalability of the AI system. By comparing the two mainstream training methods of reinforcement learning and supervised fine-tuning, the research team found an unexpected result: although both methods can achieve similar performance on new tasks, reinforcement learning can better retain the original knowledge.
The researchers proposed a concise and powerful "law of empirical forgetting", which expresses it in mathematical formulas as the degree of forgetting is proportional to the KL divergence. This discovery not only provides a tool for quantifying the knowledge-keeping capabilities of AI systems, but more importantly, it points the way to design better training algorithms.
Experimental verification spans multiple AI fields

To verify this theory, the research team designed comprehensive experiments covering multiple fields such as natural language processing, robotic control and computer vision. In large-scale language model experiments, the researchers used Qwen 2.5 3B-Instruct as the basic model to fine-tune tasks such as mathematical reasoning, scientific question-and-answer, and tool use. The results show that while the reinforcement learning trained models master new skills, they maintain stable performance in standard benchmarks such as HellaSwag, MMLU, TruthfulQA and HumanEval.
In contrast, although the model with supervised fine-tuning can also achieve good performance on new tasks, its performance in the original benchmarks showed a significant decline. This contrast clearly demonstrates the fundamental difference in knowledge-keeping abilities between the two training methods.
Robot control experiments further confirm the universality of this discovery. The research team used the OpenVLA-7B model to train object grabbing and placement tasks in the SimplerEnv environment. The experimental results show that the robot system with reinforcement learning training still maintains good general operating capabilities after learning new operating skills, while the system with supervised fine-tuning training has deteriorated while the performance of the new task has improved.
In order to understand the mechanism of this phenomenon more deeply, the research team designed a simplified experimental environment for ParityMNIST. In this controlled experimental setup, researchers were able to accurately measure and analyze the impact of different training methods on model behavior. The experimental results not only reproduce the phenomena observed in complex systems, but more importantly, they verify the predictive relationship between KL divergence and degree of forgetting, providing a solid experimental basis for theoretical analysis.
In-depth analysis of the theoretical mechanism

The theoretical analysis of this phenomenon by the research team reveals the deep mechanism behind it. The online strategy update method adopted by reinforcement learning is naturally conservative. During training, the model samples from its own generated output and gradually adjusts the behavior through the reward signal. This update method naturally limits the learning process to a range close to the distribution of the basic model, thereby reducing the damage to the original knowledge.
Supervised fine-tuning adopts completely different optimization strategies. It is optimized for fixed target labels, which may have a large gap with the output distribution of the underlying model. Although this "forced pull" training method can quickly meet the performance requirements of new tasks, it often comes at the expense of original knowledge.
Theoretical analysis further shows that the strategy gradient algorithm has the mathematical properties of converging to the minimum solution of KL. This feature explains from a theoretical level why reinforcement learning can minimize interference to original knowledge while maintaining performance. The research team summarized this finding as "RL's Razor Principle", emphasizing the natural advantages of reinforcement learning in knowledge preservation.
The researchers also tested a variety of possible alternative explanations, including weight spatial variation, hidden characterization variation, update sparseness, and other distribution metrics such as reverse KL divergence, total variational distance and L2 distance. However, no indicator can achieve the prediction accuracy of forward KL divergence, which further confirms that distribution proximity is a key factor in determining the degree of forgetting.
Far-reaching impact on the development of AI
The impact of this study goes far beyond the academic level of theoretical contributions. First, it puts forward new thoughts on the evaluation criteria of AI systems. Traditional evaluation methods focus mainly on the performance of the model on a specific task, while ignoring the knowledge-keeping ability. The research results show that new task performance and KL conservatism should be considered when evaluating AI systems, which provides a scientific basis for establishing a more comprehensive AI evaluation system.
From the perspective of practical applications, this discovery provides important guidance for the development of long-term deployment of AI systems. In practical applications, AI systems often need to constantly learn new skills and knowledge while maintaining their original abilities. Although the traditional supervised fine-tuning method is highly efficient in training, its limitations in continuous learning scenarios have become an important factor restricting the practicality of AI systems.
The hybrid training method proposed by the research team provides new ideas for solving this problem. Optimal performance-retaining trade-offs are possible by combining supervised fine-tuning efficiency with reinforcement learning knowledge retention capabilities and explicitly optimizing KL divergence constraints. This approach opens up new technological paths for designing AI systems that can support lifelong learning.
In specific application areas, this discovery has direct practical value. For large language models, being able to learn specific domain knowledge while maintaining general capabilities will greatly enhance their value in professional applications. For robot systems, being able to master new operating skills while maintaining basic operating capabilities is a key requirement for real intelligence.
The future direction of technological development
This study provides a new theoretical framework and practical guidance for the continuous learning ability of AI systems, but also opens up more research directions worth exploring. How to improve learning efficiency while maintaining knowledge, how to design more refined KL constraint mechanisms, and how to extend these principles to more complex multimodal and multitasking learning scenarios are important topics in future research.
From the perspective of engineering practice, how to transform these theoretical discoveries into practical training algorithms and tools is also an important technical challenge. The KL divergence measurement method proposed by the research team provides a tool for quantitative forgetting, but further engineering optimization is still needed to effectively monitor and control this indicator during actual training.
More broadly, this study emphasizes the important value of basic theoretical research to the development of AI technology. By deeply understanding the mechanisms of AI system learning and forgetting, we can not only improve existing training methods, but also provide scientific basis for designing next-generation AI systems. With the widespread application of AI systems in more fields, the requirements for their continuous learning ability will become higher and higher. This research result provides an important theoretical basis and technical direction to meet these requirements.