
Building @dair_ai • Prev: Meta AI, Elastic, PhD • New cohort: dair-ai.thinkific.com/courses/claude-code-for-everyone-2…
Page 1 • Showing 7 tweets
Great paper on Agentic Memory. LLM agents need both long-term and short-term memory to handle complex tasks. However, the default approach today treats these as separate components, each with its own heuristics, controllers, and optimization strategies. But memory isn't two independent systems. It's one cognitive process that decides what to store, retrieve, summarize, and forget. This new research introduces AgeMem, a unified framework that integrates long-term and short-term memory management directly into the agent's policy through tool-based actions. Instead of relying on trigger-based rules or auxiliary memory managers, the agent learns when and how to invoke memory operations: ADD, UPDATE, DELETE for long-term storage, and RETRIEVE, SUMMARY, FILTER for context management. It uses a three-stage progressive RL strategy. First, the model learns long-term memory storage. Then it masters short-term context management. Finally, it coordinates both under full task settings. To handle the fragmented experiences from memory operations, they design a step-wise GRPO (Group Relative Policy Optimization) that transforms cross-stage dependencies into learnable signals. The results across five long-horizon benchmarks: > On Qwen2.5-7B, AgeMem achieves 41.96 average score compared to 37.14 for Mem0, a 13% improvement. > On Qwen3-4B, the gap widens: 54.31 vs 44.70. Adding long-term memory alone provides +10-14% gains. > Adding RL training adds another +6%. > The full unified system with both memory types achieves up to +21.7% improvement over no-memory baselines. The unified memory management through learnable tool-based actions outperforms fragmented heuristic pipelines, enabling agents to adaptively decide what to remember and forget based on task demands. Paper: https://arxiv.org/abs/2601.01885 Learn to build effective AI agents in our academy: https://dair-ai.thinkific.com
This is happening mad fast! I started to realize this when moving all my workflows to Claude Code Skills. Painful at first, but then suddenly moving at speeds never imaginable. I hear more companies embracing skills, which accelerate things more. Good read!
I tried Codex on ChatGPT today. Claude Code is just irreplaceable to me at this point. And with this new Skills feature, the edge it gives is just too good to pass on. I am sure Codex will get better. Will keep trying future iterations. What’s your experience?
I understand where Karpathy is coming from. Honestly, the sparsity and rapid progress don't bother me at all. I try not make it a race. It's wide open now, and creative solutions and workflows can come from anywhere and anyone. And this is not just happening in coding, it's also happening in research and lots of knowledge-intensive domains. You spend a couple of hours on Claude Code, and you quickly realize how much more capable you are than you thought you were. That's what keeps me going. It's also a good opportunity to go deeper into areas you would otherwise not have the time for. Domain expertise is a force multiplier. I would encourage people to keep experimenting and sharing notes. Spend at least 2 hours a day playing around with tools like Claude Code. Try to build systems that compound over time. Always be thinking about how to inject the best context for the agents. Context engineering is where the game is intensifying, and literally anyone can contribute to it. We are all trying to figure it out. Just keep an open mind. Tight-knit communities are more important than ever. But most importantly, build, build, and build.
Claude Code can now run agents asynchronously. Huge for productivity. You can run many subagents in the background to explore your codebase. Work continues uninterrupted. When subagents complete tasks, they wake up/report to the main agent. Workflows feel faster already!
Brilliant post on using coding agents. The workflow described here is as close as it gets to my own. From creating rules and skills to optimizing workflows, testing, and more.
I love this figure from Anthropic's new talk on "Skills > Agents". Here are my notes: The more skills you build, the more useful Claude Code gets. And it makes perfect sense. Procedural knowledge and continuous learning for the win! Skills essentially are the way you make Claude Code more knowledgeable over time. This is why I had argued that Skills is a good name for this functionality. Claude Code acquires new capabilities from domain experts (they are the ones building skills). Claude Code can evolve the skills as needed and forget the ones it doesn't need anymore. It's a collaborative effort, which can easily be expanded to entire teams, communities, and orgs (via plugins). Skills are particularly useful for workflows where information and requirements constantly change. Finance, code, science, and human-in-the-loop workflows are all great use cases for Skills. You can build new Skills using the built-in skill creation tool, so you are always building new skills with all the best practices. Or you can do what I did, which is build my own skill creator to build custom skills catered to the work I do. Just more levels of customization that Skills also enables. Skills flexibility enables future capabilities to be easily integrated everywhere. Competitors don't have anything remotely close to this type of ecosystem. The deep understanding of Anthropic engineers on the importance of better context management tools and agent harnesses is something to admire. Very bullish on Claude Code.