The amazing paper mentioned by Google's tech guru Jeff Dean: Titans Let AI provide a simple explanation. Titans gives AI "true memory," enabling it to remember important things like a human, forget unimportant things, and learn and memorize as it is used. Three impressive features: 1. Solved the "goldfish memory" problem in AI. Transformer: Like a top student, remembers everything clearly, but can't hold too much in its brain (can only read a few thousand words). Traditional RNNs are like compression maniacs, cramming everything into a small box, resulting in them remembering nothing. Titans' solution - Short-term memory: using attention mechanisms to precisely process currently seen content. - Long-term memory: Using neural networks as the "brain" to encode important information into parameters. - Persistent memory: Storing knowledge about the task itself Like the human brain, the three types of memory each have their own function. 2. Able to judge what is worth remembering Core innovation: Drawing inspiration from the human memory system: unexpected events are more easily remembered, defined as a surprise metric. Read the news: Seeing "The weather is nice today" → Don't be surprised, no need to remember it. - I was surprised to see "Life discovered on Mars," so I quickly wrote it down. - Follow-up reports → Although I'm not so surprised anymore, it's still worth remembering because it's related to previous major events. How Titans work: - Current surprise: How different is this information from what I've seen before? - Historical surprises: Have any significant events occurred recently? - Adaptive forgetting: How long should this memory be retained? 3. Learn as you use it, and you'll become smarter the more you use it. Traditional models are fixed once trained; during testing, they can only "recall" rather than "learn." Titans' memory module was still updating during testing, adjusting the memory in real time upon seeing new content. How dramatic were the experimental results? Extremely long text comprehension, Needle in Haystack task Find a key piece of information in a 16,000-word article; Titans accuracy rate: 96%+. My strongest opponent, Mamba2: 5.4% (basically a guess) BABILong's extremely difficult reasoning task: deducing from a million-word document. Titans, with less than 1/70th the number of parameters, defeated the Llama 3.1 with 70 billion parameters, and even surpassed the GPT-4. They also perform well in routine tasks. - Language modeling: Better than Transformer and all linear RNNs - Time series prediction: Leading across 7 datasets - Gene sequence analysis: Achieving optimal state-of-the-art (SOTA) level Why can't other models do it? The dilemma of Transformer: Want to remember 1 million words? Memory explodes, can't calculate, can only view windows of fixed length. The problem with linear RNNs is that they compress history into a vector or matrix, which is like summarizing a book into a single sentence. Too much information is lost, there is no forgetting mechanism, and over time the "brain" becomes confused. Titans' advantages - Deep memory: Using multi-layered neural networks as memory is far more powerful than a single matrix. - Momentum mechanism: Look not only at the present, but also at recent trends. - The Gate of Forgetting: Forget what should be forgotten, remember what should be remembered. - Parallel training: Although complex, it is not slow. Technical ingenuity Transforming "learning" into "memory" involves using a memory module that essentially performs gradient descent, but this is done during testing, making it equivalent to a "meta-learner". Many existing methods have been unified: - The Forgotten Gate of Mamba? A Special Case of Titans - DeltaNet's incremental rules? A simplified version of Titans - TTT test-time training? Titans added momentum and forgetting. Why is this job important? This opened up new avenues of thought, moving beyond simply "enlarging the model" or "optimizing attention," and rethinking the architecture from the perspective of the memory system. Addressing real pain points: long document analysis, long video comprehension, and continuous learning scenarios. The last analogy Transformer = camera memory, it can remember everything it sees, but it can only look at a small part at a time. Traditional RNNs are like taking notes, summarizing everything into a few sentences, but losing the details. Titans = Human Brain Short-term memory: processes current information. - Long-term memory: storing important experiences Meta-memory: Knowing how to learn Forget unimportant things What makes it strong? 1. Can remember more: Expanding to 2 million tokens, other models would have collapsed long ago. 2. Remember more accurately: Know what's important and what should be forgotten. 3. Gets smarter the more you use it: It's still learning during testing. 4. The theory is guaranteed: there are mathematical proofs and experiments. 5. The experiments are very impressive: all tasks are at or near the state-of-the-art (SOTA) level. That's really awesome!
Loading thread detail
Fetching the original tweets from X for a clean reading view.
Hang tight—this usually only takes a few seconds.