We doomscroll, you upskill.
Finding signal on X is harder than ever. We curate high-value insights on AI, Startups, and Product so you can focus on what matters.
95 tweets
The @ilyasut episode 0:00:00 – Explaining model jaggedness 0:09:39 - Emotions and value functions 0:18:49 – What are we scaling? 0:25:13 – Why humans generalize better than models 0:35:45 – Straight-shotting superintelligence 0:46:47 – SSI’s model will learn from deployment 0:55:07 – Alignment 1:18:13 – “We are squarely an age of research company” 1:29:23 – Self-play and multi-agent 1:32:42 – Research taste Look up Dwarkesh Podcast on YouTube, Apple Podcasts, or Spotify. Enjoy!
We're releasing a visual agent & workflow builder Fully open source Built on http://useworkflow.dev Outputs "𝚞𝚜𝚎 𝚠𝚘𝚛𝚔𝚏𝚕𝚘𝚠" code Supports AI "text to workflow" Powered by @aisdk & AI Elements Sample integrations (@resend, @linear, @slack) Clone & ship your own product, or embed AI workflow building capabilities into existing ones. Demo: http://workflow-builder.dev Deploy: http://vercel.com/templates/ai/workflow-builder…
JUST IN: Report shows OpenAI needs to raise at least $207 billion by 2030 to stay in business, but they will still be losing money if they do so.
guys what gpt-5 model should i use in cursor gpt-5.1 codex gpt-5.1 codex mini gpt 5.1 codex high gpt 5.1 codex fast gpt 5.1 codex high fast gpt 5.1 codex low gpt 5.1 codex low fast gpt 5.1 fast gpt 5.1 high gpt 5.1 high fast gpt 5.1 low gpt 5.1 low fast gpt 5 codex high gpt 5 codex fast gpt 5 codex high gpt 5 codex high fast gpt-5.1 gpt 5 codex gpt-5 gpt 5 fast gpt 5 medium gpt 5 medium fast gpt 5 high gpt 5 high fast gpt 5 low gpt 5 low fast gpt 5.1 codex mini high gpt 5.1 codex mini low gpt-5-mini gpt-5-nano gpt-5-pro thanks in advance
AI Agents will fade away, like microservices did. Painful to scale, and difficult to deploy. Eventually, you will see them hidden behind a wall of well-engineered solutions. Hype doesn't survive complexity.
my current ai stack (where 90% of my work happens) 1) Opus 4.5 via Claude Code: landing pages, copy, websites, search optimization, data analysis, tools…w/ skills + subagents 2) Gemini 3 Pro & GPT 5.1 for research. I call these “advisory agents” use either web or in Cursor 3) Nano Banana for creative assets of all kinds 4) Claude Desktop for some writing / content stuff 5) MCPs: perplexity & firecrawl I don’t really build node based workflows, just vibe across the stack and build my own tools when I want to automate something
Holy shit. I’ve used ChatGPT every day for 3 years. Just spent 2 hours on Gemini 3. I’m not going back. The leap is insane — reasoning, speed, images, video… everything is sharper and faster. It feels like the world just changed, again.
How Google Finally Leapfrogged Rivals With New Gemini Rollout
Sharing an interesting recent conversation on AI's impact on the economy. AI has been compared to various historical precedents: electricity, industrial revolution, etc., I think the strongest analogy is that of AI as a new computing paradigm (Software 2.0) because both are fundamentally about the automation of digital information processing. If you were to forecast the impact of computing on the job market in ~1980s, the most predictive feature of a task/job you'd look at is to what extent the algorithm of it is fixed, i.e. are you just mechanically transforming information according to rote, easy to specify rules (e.g. typing, bookkeeping, human calculators, etc.)? Back then, this was the class of programs that the computing capability of that era allowed us to write (by hand, manually). With AI now, we are able to write new programs that we could never hope to write by hand before. We do it by specifying objectives (e.g. classification accuracy, reward functions), and we search the program space via gradient descent to find neural networks that work well against that objective. This is my Software 2.0 blog post from a while ago. In this new programming paradigm then, the new most predictive feature to look at is verifiability. If a task/job is verifiable, then it is optimizable directly or via reinforcement learning, and a neural net can be trained to work extremely well. It's about to what extent an AI can "practice" something. The environment has to be resettable (you can start a new attempt), efficient (a lot attempts can be made), and rewardable (there is some automated process to reward any specific attempt that was made). The more a task/job is verifiable, the more amenable it is to automation in the new programming paradigm. If it is not verifiable, it has to fall out from neural net magic of generalization fingers crossed, or via weaker means like imitation. This is what's driving the "jagged" frontier of progress in LLMs. Tasks that are verifiable progress rapidly, including possibly beyond the ability of top experts (e.g. math, code, amount of time spent watching videos, anything that looks like puzzles with correct answers), while many others lag by comparison (creative, strategic, tasks that combine real-world knowledge, state, context and common sense). Software 1.0 easily automates what you can specify. Software 2.0 easily automates what you can verify.
Absolutely insane stat. Opus 4.5 outperformed EVERY SINGLE HUMAN CANDIDATE EVER in Anthropic's notoriously difficult take-home exam for prospective performance engineering candidates.
A number of people are talking about implications of AI to schools. I spoke about some of my thoughts to a school board earlier, some highlights: 1. You will never be able to detect the use of AI in homework. Full stop. All "detectors" of AI imo don't really work, can be defeated in various ways, and are in principle doomed to fail. You have to assume that any work done outside classroom has used AI. 2. Therefore, the majority of grading has to shift to in-class work (instead of at-home assignments), in settings where teachers can physically monitor students. The students remain motivated to learn how to solve problems without AI because they know they will be evaluated without it in class later. 3. We want students to be able to use AI, it is here to stay and it is extremely powerful, but we also don't want students to be naked in the world without it. Using the calculator as an example of a historically disruptive technology, school teaches you how to do all the basic math & arithmetic so that you can in principle do it by hand, even if calculators are pervasive and greatly speed up work in practical settings. In addition, you understand what it's doing for you, so should it give you a wrong answer (e.g. you mistyped "prompt"), you should be able to notice it, gut check it, verify it in some other way, etc. The verification ability is especially important in the case of AI, which is presently a lot more fallible in a great variety of ways compared to calculators. 4. A lot of the evaluation settings remain at teacher's discretion and involve a creative design space of no tools, cheatsheets, open book, provided AI responses, direct internet/AI access, etc. TLDR the goal is that the students are proficient in the use of AI, but can also exist without it, and imo the only way to get there is to flip classes around and move the majority of testing to in class settings.
I’m starting to get into a habit of reading everything (blogs, articles, book chapters,…) with LLMs. Usually pass 1 is manual, then pass 2 “explain/summarize”, pass 3 Q&A. I usually end up with a better/deeper understanding than if I moved on. Growing to among top use cases. On the flip side, if you’re a writer trying to explain/communicate something, we may increasingly see less of a mindset of “I’m writing this for another human” and more “I’m writing this for an LLM”. Because once an LLM “gets it”, it can then target, personalize and serve the idea to its user.
Gemini Nano Banana Pro can solve exam questions *in* the exam page image. With doodles, diagrams, all that. ChatGPT thinks these solutions are all correct except Se_2P_2 should be "diselenium diphosphide" and a spelling mistake (should be "thiocyanic acid" not "thoicyanic") :O
>be Andrej Karpathy >studied computer science from Toronto to Stanford, specializing in deep learning >became Tesla’s director of AI in his early 30s, leading the Autopilot vision team >helped build the foundations of OpenAI as one of its earliest researchers >teaches millions through free lectures, notebooks, and open-source work >keeps his life simple, quiet, and focused on learning >steps away from big titles when he feels the need to reset >builds small AI projects for fun, shares them openly >lives calmly, thinking deeply, working on what he believes matters most Has Karpathy quietly optimized life in a way most people never figure out?
claude opus 4.5 is finally here, so we tested it against gemini 3 to see which one is the better at vibe coding we took a new startup idea and went from idea to landing page to prototype to ad creative in one sitting. you also see how @boringmarketer uses claude skills to get the most out of claude opus 4.5 (and how you can do the same). if you’re wondering whether opus 4.5 is worth using, this 1hr tutorial will help you decide and learn how to get the most from it.