189 tweets
Claude Code 2.1.0 is officially out! claude update to get it We shipped: - Shift+enter for newlines, w/ zero setup - Add hooks directly to agents & skills frontmatter - Skills: forked context, hot reload, custom agent support, invoke with / - Agents no longer stop when you deny a tool use - Configure the model to respond in your language (eg. Japanese, Spanish) - Wildcard support for tool permissions: eg. Bash(*-h*) - /teleport your session to http://claude.ai/code - Overall: 1096 commits https://github.com/anthropics/claude-code/blob/main/CHANGELOG.md… If you haven't tried Claude Code yet: https://code.claude.com/docs/en/setup Lmk what you think!
The spread between how one-person dev teams are building software is fascinating: 1. Multiple agents, shipping at inference speed, not reading the code (but very involved designing it) - some 2. Heavy use of AI IDEs and a single AI agent - many 3. Mostly in the IDE - fewer
opus 4.5 is the model for sync work gpt 5.2 is the one for async
BREAKING: Google Research just dropped the textbook killer. Its called "Learn Your Way" and it uses LearnLM to transform any PDF into 5 personalized learning formats. Students using it scored 78% vs 67% on retention tests. The education revolution is here.
My 2026 AI predictions podcast with @reidhoffman—LinkedIn cofounder, Microsoft board member, and former OpenAI board member: Reid’s spiciest predictions: - If you’re not recording every single meeting and using agents to amplify your work process, it’s going to feel like using a horse and buggy vs. a car. - AI becomes the scapegoat for everything—electricity prices, eggs, jobs. Most blame will be wrong, but some real impacts will also start hitting. This will make the discourse uglier. - No major AI player will have a major stumble. It will continue being a close horse race. But OpenAI will learn how to play catch up instead of always playing with a lead. - 10x to 100x more people will have their computer doing work for them while they’re out doing other things—agents break out beyond coding. - Apple continues to be behind in AI and the gap will be “stunning.” Dan’s spiciest predictions: - Programming trifurcates into three skills: traditional engineering + AI, vibe coding, and a new third thing—agentic engineering (think highly technical engineer with 4 Claude Code tabs open at once, never looking at code). - OpenAI realizes it is missing the most valuable coding market because they’re stuck in the innovators dilemma: caught between serving traditional engineers + AI, or agentic engineers. - Creation becomes the new addiction as the dopamine hit of making things with Claude Code and other tools starts to spread. AI commandments that are most likely to be broken this year: - Interpretability: We’ll allow models to communicate with each other in non-human readable formats, and it will work.—Reid - Alignment: We’ll realize that more disagreeable AI that forms its own opinions are quite useful as autonomy increases. This will be more likely as orchestrators get better—the orchestrator can deal with the pain-in-the-ass model, instead of the user.—Dan Reid’s pick for most underrated AI category in 2026: Biology. There’s a chance we find a “move 37” in bio this year. Watch below! Timestamps: Introduction: 00:00:52 The future of work is an entrepreneurial mindset: 00:02:20 Creation is addictive (and that’s okay): 00:05:22 Why discourse around AI might get uglier this year: 00:09:22 AI agents will break out of coding in 2026: 00:17:03 What makes Anthropic’s Opus 4.5 such a good model: 00:24:18 Who will win the agentic coding race: 00:28:46 Why enterprise AI will finally land this year: 00:36:13 How Reid defines AGI: 00:43:16 The most underrated category to watch in AI right now: 00:55:33
ICYMI - Claude Code in the Claude Desktop app! Benefits: → Visual session management instead of terminal tabs → Parallel sessions via git worktrees → Run locally or in the cloud → One-click to open in VS Code or CLI Same Claude Code. Better ergonomics.
This is the new emdash. A dead giveaway of ChatGPT writing. That’s not efficient writing. That’s laziness.
After rewatching Home Alone, I couldn’t stop wondering: how plausible is the oversleep that leaves Kevin behind? So I wrote a tiny paper and ran the numbers. Merry Christmas!
Drops everything to read
Demystifying evals for AI agents
everyone on my timeline is “ralph-pilled” right now. but if you’ve ever let an ai coding session run for 40–60 minutes, you’ve felt this: it starts repeating itself, undoing its own fixes, and confidently going in circles. most explanations of ralph are either: • terminal priestcraft, or • “it’s just a loop lol” (true, but missing the point) @GeoffreyHuntley coined and popularized the technique and wrote the canonical post. start there if you want the origin story and the philosophical framing. this is the 5-minute, copy-paste, no-mysticism version. ## 1) ralph is not “an agent that remembers forever” ralph is the opposite: it’s an agent that forgets on purpose. in geoff’s purest description, ralph is literally a bash loop that keeps starting over with fresh context: same task. new brain each iteration. the “memory” is not the chat. it’s the filesystem + git. if it’s not written to a file, it doesn’t exist. ## 2) the only insight that matters: context pollution every ai coding session has a context window (working memory). stuff goes in: • files it read • commands it ran • outputs it produced • wrong turns it took • half-baked plans it hallucinated at 2:13am here’s the cursed part: you can keep adding, but you can’t delete. failures accumulate like plaque. eventually you hit the familiar symptom cluster: • repeating itself • “fixing” the same bug in slightly different ways • confidently undoing its own previous fix • circular reasoning, but with commit rights that’s context pollution. once you’re there, “try harder” doesn’t work. adding more instructions doesn’t help. more tokens don’t help. more patience doesn’t help. once the ball is in the gutter, adding spin doesn’t save it. ralph doesn’t try to clean the memory. it throws it away and starts fresh. ## 3) if you rotate constantly, how do you make progress? you externalize state. the trick is simple: progress persists. failures don’t. context (bad for state) • dies with the convo • persists forever • polluted by dead ends files + git (good for state) • only what you choose to write • can’t be edited • can be patched / rolled back“memory” can drift • git doesn’t hallucinate each fresh agent starts clean, then reconstructs reality from files. ## 4) the anchor file (source of truth) every ralph setup needs a single source-of-truth file that survives rotations and tells a brand-new agent what reality currently looks like. in my cursor implementation, that file is ralph_task.md: state lives in .ralph/: what are the other files and their purpose in running ralph correctly? • guardrails.md: learned constraints (“signs”) • progress.md: what’s done / what’s next • errors.log: what blew up • activity.log: tool usage + token tracking the loop reads these every iteration. fresh context. persistent state. the loop is not the technique. state hygiene is the technique. format doesn’t matter. invariants do. ## 5) why the claude code plugin approach is accidentally anti-ralph let me be explicit: the claude code plugin approach is accidentally anti-ralph. it keeps pounding the model in a single session until context rots. the session grows until it falls apart. no visibility into context health. no deliberate rotation. claude code treats context rot as an accident. ralph treats it as a certainty. ralph solves this by starting fresh sessions before pollution builds up. deliberate rotation, not accidental compaction. the claude code plugin lets a single session grow until it inevitably rots, with no real visibility into when context has gone bad. ralph assumes pollution is coming and rotates deliberately before it happens. instead of repeating the same mistakes over and over, ralph records failures as guardrails so they don’t recur. and while claude code locks you into a single model, ralph-technique should be flexible enough for you to use the right model for the job as conditions change. ## 6) why i built a cursor port (model selection matters) i built this because cursor lets you extend the agent loop like a real system (scripts, parsers, signals), and because model choice matters in practice. different models fail in different ways. ralph lets you exploit that instead of being stuck with one failure mode. cursor makes it trivial to swap models per iteration. different brains for different failure modes. this is deeply under-discussed compared to “one agent to rule them all.” practical guidance: • starting a new project → opus (architecture matters) • stuck on something weird → codex i’m getting better results on some workloads with gpt-codex models than opus 4.5. vibes? tokenization? inductive bias? the gods? idk. but it’s repeatable. and yes, i’ve used this to port very large repos (tens of thousands of loc) to typescript without it faceplanting every 10 minutes. that’s the whole point: long-running implementation work where humans become the bottleneck. ## 7) the architecture (cursor version) (if you don’t care about plumbing, you can skip this section. the only point is that vibes get turned into signals.) key features: • accurate, practical token tracking (a proxy, not tokenizer theology) • gutter detection (same command fails repeatedly, file thrashing) • real-time monitoring via logs • interactive model selection none of this is magic. it’s just turning “it’s losing it” into mechanics. ## 8) quick start (3 commands, no incense) repo: https://github.com/agrimsingh/ralph-wiggum-cursor 1) install this creates .cursor/ralph-scripts/ and initializes .ralph/. 2) write the anchor file 3) run ralph optional: watch it like it’s a fish tank. ## 9) guardrails: how ralph stops repeating the same dumb mistake ralph will do something stupid. the win condition is not “no mistakes.” the win condition is the same mistake never happens twice. when something breaks, the agent adds a sign to .ralph/guardrails.md: guardrails are append-only. mistakes evaporate. lessons accumulate. next iteration reads guardrails first. cheap. brutal. effective. it’s basically kaizen, but for a golden retriever with a soldering iron. ## 10) “isn’t this just slop?” saw this tweet earlier: fair concern. there are two modes of development: - exploration — figuring out what to build, experimenting, making architectural decisions - implementation — building the thing you’ve already designed ralph is for #2. if you’re exploring, use interactive mode. be deeply involved. make creative decisions. but once you know what you’re building - a rest api with these endpoints, a cli with these commands, tests for these functions - that’s implementation. that’s ralph territory. “but won’t it produce slop?” only if you let it. ralph has: • checkboxes (explicit success criteria) • tests (code must pass) • types (errors get caught) • guardrails (failures don’t repeat) • git review (you still review everything) ralph with proper feedback loops produces more consistent code than a tired developer at 2am. “why wouldn’t i want to be involved?” you ARE involved. your role just changes. • you define what “done” means • you add constraints when things go wrong • you review outcomes, not keystrokes • you decide when to intervene think of it as steering, not rowing. ## 11) when NOT to use ralph ralph is for implementation, not exploration. use ralph when the specs are crisp, success is machine-verifiable (tests, types, lint), and the work is bulk execution like crud, migrations, refactors, or porting. it shines when you can clearly define “done” and express it as checkboxes, then let the loop grind through implementation without losing the plot. don’t use ralph when you’re still deciding what to build, when taste and judgment matter more than correctness, or when you can’t cleanly define what “done” even means. if the real work is thinking, exploring, or making creative decisions, looping is the wrong tool - that’s interactive territory. if you can’t write checkboxes, you’re not ready to loop. you’re ready to think. ## 12) the one-liner takeaway ralph works because it treats ai like a volatile process, not a reliable collaborator. your progress should persist. your failures should evaporate. everything else - loops, scripts, signals - is just furniture around that idea.
everyone is locked in right now new year, new me, new energy but after the motivation fades, reality kicks in so you’re locked in...now what? this is where most people lose it not because they’re lazy but because no one teaches you what to actually do with your day this is not about grinding 24/7 or turning into a robot just small habits that help you stop wasting time and start seeing progress Start your day offline most of people are fond of picking their phone up first to doomscroll ct, check discord or telegram for what happened while they’re sleeping don’t let that be your first action when you wake up in the morning if you’re a man/woman of faith, make a small prayer and allow your brain 20- 30 mins quiet time before you let the world inside that quiet time is where you gain clarity for the day but if you wake up and its social media content, you’ll mostly react to what you see and not use it to build yourself Decide on one thing that actually matters for each day you don’t need to do 10 things per day to call it a productive day you only need to prioritize one thing and do it to the T 👌🏾 you maximize your time each day when you write down a goal you’re trying to achieve and work only on just that Do the hard thing first our brain is wired in a way to always avoid the hard stuff and go for the easier ones that task you’re avoiding is the right one you need to do first once you do it, you’ll feel a huge burden has been taken out of your shoulders if you push it to later, it follows you all day like a crying child Limit Your Inputs you do not need to know everything happening on ct today. check ct less if you don’t have any business on here unless you’re a creator that need to be in the algorithm doomscrolling all day without any meaningful interaction is just time wasting me lock in is about focus, not awareness Produce Something Every Day i have made it a habit that i must make one meaningful post per day. it is not much but it’s honest work make it a habit that something must leave you daily be it a post, a line of code, a design, a document, or anything if nothing comes out of you, nothing compounds consumption is neutral, production moves the needle Live a good lilfestyle Living a good lifestyle is very underrated in a space that prioritized grinding 24/7 It’s not healthy sitting at your desk all day and bending your back like a 96 years old grandpa 👴 take a walk, stretch a bit, or go to the gym and lift weights your mind works better when your body is in good shape and not tired Review each day at night before you sleep ask this three simple questions: what did i do well? what did i waste time on? what should i do better tomorrow? this is not self hate or anything. just getting feedback. same way you’re assessed back in college, you need to do this daily to ensure you’re on the track SLEEEPPPPP you’re not Optimus prime, your body gets tired and weak when you don’t rest rest is also part of the system you cannot lock in on 4-5 hours of sleep per day, you’re killin yourself lowkey go to bed on time and wake up early to start your day. TL;dr lock in is not about doing everything it’s about doing the right things every day use your 24 hours well do the boring bits stay consistent that’s how real momentum is built if you stick with it long enough, people will call you lucky.
I am not sure if other developers feel like this. But I feel kinda depressed. Like everyone else, I have been using Claude code (for a while, it’s not a recent thing lol). And it’s incredible. I have never found coding more fun. The stuff you can do and the speed you can do it at now. Is absolutely insane. And I’m using it to ship a lot. And solve customer problems faster. So all around it’s a win. But at the same time. The skill I spent 10,000s of hours getting good at. Programming. The thing I spent most of my life getting good at. Is becoming a full commodity extremely quickly. As much fun as it is. And as much as I like using the tools. There’s something disheartening about the thing you spent most of your life getting good at. Now being mostly useless.
I'm Boris and I created Claude Code. Lots of people have asked how I use Claude Code, so I wanted to show off my setup a bit. My setup might be surprisingly vanilla! Claude Code works great out of the box, so I personally don't customize it much. There is no one correct way to use Claude Code: we intentionally build it in a way that you can use it, customize it, and hack it however you like. Each person on the Claude Code team uses it very differently. So, here goes.
I'm not joking and this isn't funny. We have been trying to build distributed agent orchestrators at Google since last year. There are various options, not everyone is aligned... I gave Claude Code a description of the problem, it generated what we built last year in an hour.
We need a shorthand way of saying: "An AI did the work, but I vouch for the result" Saying "I did it" feels slightly sketchy, but saying "Claude did it" feels like avoiding responsibility
A few days ago I shared a life calendar I built: your entire life, shown as weeks on your iPhone lock screen. A lot of people asked for it, so here it is: https://thelifecalendar.com I also added a yearly view to visualize the progress of the current year. Happy New Year
Weft. I wanted a personal board where agents could pick up some of my daily tasks. Think Trello, but agents do your tasks for you. Agents can connect to Gmail, Docs, GitHub, or any MCP server. All tasks have access to a Sandbox with Claude Code. Built on @CloudflareDev
If you have multiple interests, don't let anyone convince you that you should narrow your focus. You may be confused for a while, but if you stick it out, you will blow past everyone else.