Beyond a million tokens: benchmarking and enhancing long-term memory in llms
Best AI papers explained - Podcast tekijän mukaan Enoch H. Kang
Kategoriat:
This paper introduces a research paper focused on improving **Large Language Model (LLM) performance on tasks requiring long-term conversational memory**. The authors address limitations in existing evaluation methods by presenting a new framework that automatically generates **long, coherent conversations up to 10 million tokens** and **BEAM**, a benchmark dataset with 100 dialogues and 2,000 probing questions designed to test ten distinct memory abilities, including contradiction resolution and temporal reasoning. To enhance LLMs, the authors propose **LIGHT**, a human-cognition-inspired framework that integrates three complementary memory systems: episodic, working, and a scratchpad for salient facts. Experimental results demonstrate that even state-of-the-art LLMs struggle with dialogue lengthening, while the LIGHT framework **consistently improves performance** across various models.
