TodayInfo
  • world
  • military
  • finance
  • technology
  • history
  • sports
  • entertainment
  • food
  • travel
  1. TodayInfo
  2. technology

MIT: Cracking the problem of artificial intelligence memory: How to avoid "learning the new and forgetting the old" in reinforcement learning

2025-09-10 19:28:02 HKT

Information source:
https://www.marktechpost.com/2025/09/08/a-new-mit-study-shows-reinforcement-learning-minimizes-catastrophic-forgetting-compared-to-supervised-fine-tuning/

A fundamental challenge facing artificial intelligence systems is being answered. The latest research from MIT found that reinforcement learning training methods can significantly reduce the phenomenon of "catastrophic forgetting" in large language models and robotic systems, a breakthrough that could redefine the continuous learning capabilities of AI systems. This study proposes mathematical rules for quantifying the degree of forgetting for the first time and reveals the fundamental impact of different training methods on the memory retention ability of AI systems.

Catrosy forgetting is a core pain point in current AI system deployment. When the model learns new tasks, the skills and knowledge you have mastered before are often lost, which seriously limits the practicality and scalability of the AI ​​system. By comparing the two mainstream training methods of reinforcement learning and supervised fine-tuning, the research team found an unexpected result: although both methods can achieve similar performance on new tasks, reinforcement learning can better retain the original knowledge.

The researchers proposed a concise and powerful "law of empirical forgetting", which expresses it in mathematical formulas as the degree of forgetting is proportional to the KL divergence. This discovery not only provides a tool for quantifying the knowledge-keeping capabilities of AI systems, but more importantly, it points the way to design better training algorithms.

Experimental verification spans multiple AI fields

To verify this theory, the research team designed comprehensive experiments covering multiple fields such as natural language processing, robotic control and computer vision. In large-scale language model experiments, the researchers used Qwen 2.5 3B-Instruct as the basic model to fine-tune tasks such as mathematical reasoning, scientific question-and-answer, and tool use. The results show that while the reinforcement learning trained models master new skills, they maintain stable performance in standard benchmarks such as HellaSwag, MMLU, TruthfulQA and HumanEval.

In contrast, although the model with supervised fine-tuning can also achieve good performance on new tasks, its performance in the original benchmarks showed a significant decline. This contrast clearly demonstrates the fundamental difference in knowledge-keeping abilities between the two training methods.

Robot control experiments further confirm the universality of this discovery. The research team used the OpenVLA-7B model to train object grabbing and placement tasks in the SimplerEnv environment. The experimental results show that the robot system with reinforcement learning training still maintains good general operating capabilities after learning new operating skills, while the system with supervised fine-tuning training has deteriorated while the performance of the new task has improved.

In order to understand the mechanism of this phenomenon more deeply, the research team designed a simplified experimental environment for ParityMNIST. In this controlled experimental setup, researchers were able to accurately measure and analyze the impact of different training methods on model behavior. The experimental results not only reproduce the phenomena observed in complex systems, but more importantly, they verify the predictive relationship between KL divergence and degree of forgetting, providing a solid experimental basis for theoretical analysis.

In-depth analysis of the theoretical mechanism

The theoretical analysis of this phenomenon by the research team reveals the deep mechanism behind it. The online strategy update method adopted by reinforcement learning is naturally conservative. During training, the model samples from its own generated output and gradually adjusts the behavior through the reward signal. This update method naturally limits the learning process to a range close to the distribution of the basic model, thereby reducing the damage to the original knowledge.

Supervised fine-tuning adopts completely different optimization strategies. It is optimized for fixed target labels, which may have a large gap with the output distribution of the underlying model. Although this "forced pull" training method can quickly meet the performance requirements of new tasks, it often comes at the expense of original knowledge.

Theoretical analysis further shows that the strategy gradient algorithm has the mathematical properties of converging to the minimum solution of KL. This feature explains from a theoretical level why reinforcement learning can minimize interference to original knowledge while maintaining performance. The research team summarized this finding as "RL's Razor Principle", emphasizing the natural advantages of reinforcement learning in knowledge preservation.

The researchers also tested a variety of possible alternative explanations, including weight spatial variation, hidden characterization variation, update sparseness, and other distribution metrics such as reverse KL divergence, total variational distance and L2 distance. However, no indicator can achieve the prediction accuracy of forward KL divergence, which further confirms that distribution proximity is a key factor in determining the degree of forgetting.

Far-reaching impact on the development of AI

The impact of this study goes far beyond the academic level of theoretical contributions. First, it puts forward new thoughts on the evaluation criteria of AI systems. Traditional evaluation methods focus mainly on the performance of the model on a specific task, while ignoring the knowledge-keeping ability. The research results show that new task performance and KL conservatism should be considered when evaluating AI systems, which provides a scientific basis for establishing a more comprehensive AI evaluation system.

From the perspective of practical applications, this discovery provides important guidance for the development of long-term deployment of AI systems. In practical applications, AI systems often need to constantly learn new skills and knowledge while maintaining their original abilities. Although the traditional supervised fine-tuning method is highly efficient in training, its limitations in continuous learning scenarios have become an important factor restricting the practicality of AI systems.

The hybrid training method proposed by the research team provides new ideas for solving this problem. Optimal performance-retaining trade-offs are possible by combining supervised fine-tuning efficiency with reinforcement learning knowledge retention capabilities and explicitly optimizing KL divergence constraints. This approach opens up new technological paths for designing AI systems that can support lifelong learning.

In specific application areas, this discovery has direct practical value. For large language models, being able to learn specific domain knowledge while maintaining general capabilities will greatly enhance their value in professional applications. For robot systems, being able to master new operating skills while maintaining basic operating capabilities is a key requirement for real intelligence.

The future direction of technological development

This study provides a new theoretical framework and practical guidance for the continuous learning ability of AI systems, but also opens up more research directions worth exploring. How to improve learning efficiency while maintaining knowledge, how to design more refined KL constraint mechanisms, and how to extend these principles to more complex multimodal and multitasking learning scenarios are important topics in future research.

From the perspective of engineering practice, how to transform these theoretical discoveries into practical training algorithms and tools is also an important technical challenge. The KL divergence measurement method proposed by the research team provides a tool for quantitative forgetting, but further engineering optimization is still needed to effectively monitor and control this indicator during actual training.

More broadly, this study emphasizes the important value of basic theoretical research to the development of AI technology. By deeply understanding the mechanisms of AI system learning and forgetting, we can not only improve existing training methods, but also provide scientific basis for designing next-generation AI systems. With the widespread application of AI systems in more fields, the requirements for their continuous learning ability will become higher and higher. This research result provides an important theoretical basis and technical direction to meet these requirements.

Latest Posts
  • I only realized that I eat less meat in September, and it is recommended to eat "high potassium" vegetables often, with strong legs and feet, and strong in autumn. I only realized that I eat less meat in September, and it is recommended to eat "high potassium" vegetables often, with strong legs and feet, and strong in autumn. food | 2025-09-11
  • How big is the difference between "buying a house with full payment" and "loan for 30 years"? Calculate the calculations and found that the loss was huge How big is the difference between "buying a house with full payment" and "loan for 30 years"? Calculate the calculations and found that the loss was huge finance | 2025-09-11
  • Trump's trump card, millions of jobs in the United States disappeared overnight, US experts: RMB will rise to 6 Trump's trump card, millions of jobs in the United States disappeared overnight, US experts: RMB will rise to 6 finance | 2025-09-11
  • Wait for a summer! Manchester United abandons the company and finds its next home. At the age of 33, he will move to the Bundesliga Wait for a summer! Manchester United abandons the company and finds its next home. At the age of 33, he will move to the Bundesliga sports | 2025-09-11
  • The Chinese Super League team quoted a 22-year-old Brazilian genius, worth 330 million! The biggest foreign aid in the winter window is about to come out The Chinese Super League team quoted a 22-year-old Brazilian genius, worth 330 million! The biggest foreign aid in the winter window is about to come out sports | 2025-09-11
  • Renai Reef failed to replenish supplies, the Philippines discovered something was wrong, China stroked its pen and established a protected area on Huangyan Island Renai Reef failed to replenish supplies, the Philippines discovered something was wrong, China stroked its pen and established a protected area on Huangyan Island military | 2025-09-11
  • Zhao Rui revealed the inside story of the transfer: Xinjiang has no desire to keep people, and he said he is sorry to Xinjiang fans! Zhao Rui revealed the inside story of the transfer: Xinjiang has no desire to keep people, and he said he is sorry to Xinjiang fans! sports | 2025-09-11
  • The scumbag was heartbroken, so she turned around and chose an honest person. Now that her husband cheated on him again, does Na Ying regret it? The scumbag was heartbroken, so she turned around and chose an honest person. Now that her husband cheated on him again, does Na Ying regret it? entertainment | 2025-09-11
  • A recent photo of 52-year-old Li Bingbing was exposed, and her figure became a hot search: The truth about running for 19 years, finally I can't hide it A recent photo of 52-year-old Li Bingbing was exposed, and her figure became a hot search: The truth about running for 19 years, finally I can't hide it entertainment | 2025-09-11
  • "Rice Flavor Top"! Hakka dustpan bin: wrapped in a migration story, chewing a hundred years of fireworks in one bite "Rice Flavor Top"! Hakka dustpan bin: wrapped in a migration story, chewing a hundred years of fireworks in one bite food | 2025-09-11
  • Joey Yung's singing in the county has caused controversy. The latest news: He has withdrawn from the middle school playground for concerts, and the minimum ticket price has been changed from 188 yuan to 180 yuan Joey Yung's singing in the county has caused controversy. The latest news: He has withdrawn from the middle school playground for concerts, and the minimum ticket price has been changed from 188 yuan to 180 yuan entertainment | 2025-09-11
  • Father and son turn against each other, and sister become stepmothers! You can't imagine the pain behind "Young Master of Beijing" Zhang Ruoyun Father and son turn against each other, and sister become stepmothers! You can't imagine the pain behind "Young Master of Beijing" Zhang Ruoyun entertainment | 2025-09-11
  • Also writing ambitions and desires on your face, the current situation of Lan Yingying and Xin Zhilei, Hao Lei is really right Also writing ambitions and desires on your face, the current situation of Lan Yingying and Xin Zhilei, Hao Lei is really right entertainment | 2025-09-11
  • Sun Li's black dress and brilliant jewelry debut attracted attention Sun Li's black dress and brilliant jewelry debut attracted attention entertainment | 2025-09-11
  • Chen He, who has been divorced for 10 years, now shows his love with high profile, fully reflects his status in the world. Chen He, who has been divorced for 10 years, now shows his love with high profile, fully reflects his status in the world. entertainment | 2025-09-11

©2025 TodayInfo. ALL RIGHTS RESERVED.