We doomscroll, you upskill.
Finding signal on X is harder than ever. We curate high-value insights on AI, Startups, and Product so you can focus on what matters.
35 tweets
Some of my favorite books I finished this year Nonfiction - antimimetics - tiny experiments - inner compass -conscious accomplishment - yoga of work - breakneck - here after - things become other things - money together - thinking about leaving - why greatness can’t be planned Fiction - stoner (weird) - husk - how we disappeared - house of doors
Sharing an interesting recent conversation on AI's impact on the economy. AI has been compared to various historical precedents: electricity, industrial revolution, etc., I think the strongest analogy is that of AI as a new computing paradigm (Software 2.0) because both are fundamentally about the automation of digital information processing. If you were to forecast the impact of computing on the job market in ~1980s, the most predictive feature of a task/job you'd look at is to what extent the algorithm of it is fixed, i.e. are you just mechanically transforming information according to rote, easy to specify rules (e.g. typing, bookkeeping, human calculators, etc.)? Back then, this was the class of programs that the computing capability of that era allowed us to write (by hand, manually). With AI now, we are able to write new programs that we could never hope to write by hand before. We do it by specifying objectives (e.g. classification accuracy, reward functions), and we search the program space via gradient descent to find neural networks that work well against that objective. This is my Software 2.0 blog post from a while ago. In this new programming paradigm then, the new most predictive feature to look at is verifiability. If a task/job is verifiable, then it is optimizable directly or via reinforcement learning, and a neural net can be trained to work extremely well. It's about to what extent an AI can "practice" something. The environment has to be resettable (you can start a new attempt), efficient (a lot attempts can be made), and rewardable (there is some automated process to reward any specific attempt that was made). The more a task/job is verifiable, the more amenable it is to automation in the new programming paradigm. If it is not verifiable, it has to fall out from neural net magic of generalization fingers crossed, or via weaker means like imitation. This is what's driving the "jagged" frontier of progress in LLMs. Tasks that are verifiable progress rapidly, including possibly beyond the ability of top experts (e.g. math, code, amount of time spent watching videos, anything that looks like puzzles with correct answers), while many others lag by comparison (creative, strategic, tasks that combine real-world knowledge, state, context and common sense). Software 1.0 easily automates what you can specify. Software 2.0 easily automates what you can verify.
One of my favorite lessons I’ve learnt from working with smart people: Action produces information. If you’re unsure of what to do, just do anything, even if it’s the wrong thing. This will give you information about what you should actually be doing. Sounds simple on the surface - the hard part is making it part of your every day working process.
New workflow: 1. Open a voice memo app 2. Brain dump your thoughts on a topic; record it 3. Transcribe it 4. Import transcript into NotebookLM 5. Get it to generate a slide deck using Nano Banana Pro 6. See your rambling thoughts visualized into structured & beautiful slides and feel smart
The best are always learning. Read like crazy. Think alone. Keep a journal. Write stuff down the moment you see it. Review regularly. Memorize the big ideas to fluency. Attack your best ideas. And never get high on your own supply. You don't have to be gifted. You do have to be deliberate.
I’m non-technical but want to deeply understand AI. @karpathy's “Intro to LLMs” is the best resource I’ve found so far. Here are my biggest takeaways and questions from his 60-minute talk: 1. A large language model is “just two files.” Under the hood, an LLM like LLaMA‑2‑70B is literally (1) a giant parameters file (the learned weights) and (2) a small run file (code that implements the neural net and feeds data through it). Question: If the architecture code is tiny and public, what actual moat is left besides the weights? 2. Open‑weights vs closed models. LLaMA‑2 is open‑weights: architecture + weights + paper are public. GPT‑4, Claude, etc. are closed: you get an API/web UI but not the actual model. Question: For a company, when is “renting” a closed model strategically worse than owning an open‑weights model? 3. Training vs inference: training is the hard, expensive part. Running the model (inference) is cheap; getting the weights (training) is a major industrial process. Question: Where is the greatest axis of innovation in front of us to lower the cost of training significantly? 4. Pre‑training compresses ~10 TB of internet text. LLaMA‑2‑70B is trained on roughly 10 TB of scraped internet text, compressed into 140 GB of parameters—a ~100× lossy compression of “internet knowledge.” Question: Given that we’ve run out of knowledge on the internet to pre-train models on, is new data going to be the limiting factor on model improvement moving forward? 5. Training scale: ~6,000 GPUs × 12 days ≈ ~$2M for LLaMA‑2‑70B. That’s already described as “rookie numbers” compared to modern frontier models, which are ~10× bigger in data/compute and cost tens to hundreds of millions. Question: How far are we from “more compute” no longer being a competitive advantage? 6. Frontier models just scale this up by another ~10×. State‑of‑the‑art models (i.e. GPT‑5) simply dial up parameters, data, and compute by large factors relative to LLaMA‑2‑70B. Question: How much of GPT‑5‑style capability is just more scale vs genuinely new algorithms? 7. Core objective of an LLM predict the next word in a sequence. LLMs are trained to take a sequence like “the cat sat on the” and predict the probability distribution over the next word (“mat” with ~97%, etc.). Question: The beauty and the curse of LLMs is them being probabilistic. How can we create the right constraints such that people trust LLMs in enterprise settings? 8. Architecture is known: the Transformer. We know all the math and wiring (layers, attention, etc.); that part is transparent and simple relative to behavior. Question: If the architecture is commoditized, where exactly do you build sustainable differentiation? And how much more shelf life is there on the Transformer before a new architecture takes over? 9. Parameters are a black box. Billions of weights cooperate to solve next‑word prediction, but we don’t really know “what each one does”—only how to adjust them to lower loss. Rabbit hole: Read about mechanistic interpretability work. 10. Treat LLMs as empirical artifacts, not engineered machines. They’re less like cars (fully understood mechanisms) and more like organisms we poke, test, benchmark, and characterize behaviorally. Rabbit hole: Understand the current process for evals & if/what limitations exist in today’s eval tools. 11. Pre‑training vs. fine-tuning. Pre-training favors quantity over quality; Fine-tuning flips that: maybe ~100k really good dialogs matter more than another terabyte of web junk. Question: How much incremental performance can fine tuning and RHLF drive for models? Is it a fraction of what pre training does for performance or is it more meaningful than that? 12. Knowledge vs behavior. Pre-training loads the model with world knowledge; Fine-tuning teaches it to be helpful, harmless, and to respond in Q&A format. Rabbit hole: I’d love to deeply understand how exactly a model is fine tuned from beginning to end. 13. Reinforcement learning from human feedback (RLHF) via comparisons. It’s often easier for labelers to rank several options vs. write the best one from scratch; RLHF uses these rankings to further improve the model. Question: When exactly does it make sense to fine tune a model vs. use RHLF & does the answer depend on the domain of knowledge the model will be used for? 14. Closed vs open models. Closed models are stronger but opaque; open‑weights models are weaker but hackable, fine‑tunable, and deployable on your own infra. Question: As companies deploy agents, what is the most important consideration to make as they think about their AI tech stack? 15. Scaling laws: performance is a smooth, predictable function of model size and data. Given parameters (N) and data (D), you can predict next‑token accuracy with surprising reliability, and the curve hasn’t obviously saturated yet. Question: If capabilities keep scaling smoothly, what non‑technical bottlenecks (data rights, energy, chips, regulation) become the real limiters? 16. GPU and data “gold rush” is driven by scaling law confidence. Since everyone believes “more compute → better model,” there’s a race to grab GPUs, data, and money. Question: Let’s assume scaling laws no longer scale. Who is most screwed when the music stops? 17. LLMs as tool-using agents, not just text predictors. Modern LLMs don’t just “think in text”; they orchestrate tools. Given a natural-language task, the model decides to (1) browse the web, (2) call a calculator or write Python to compute ratios and extrapolations, (3) generate plots with matplotlib, and (4) even hand off to an image model (like DALL·E) to create visuals. The intelligence is increasingly in the coordination layer: the LLM becomes a kind of “foreman” that plans, calls tools, checks outputs, and weaves everything back into a coherent answer. 18. How do LLMs know when to make a tool call? “It emits special words, e.g. |BROWSER|. It captures the output that follows, sends it off to a tool, comes back with the result and continues the generation. How does the LLM know to emit these special words? Finetuning datasets teach it how and when to browse, by example.” 19. System 1 vs System 2 thinking applied to LLMs. Concept popularized in Thinking Fast and Slow. System 1 = fast, instinctive; System 2 = slower, deliberate, tree‑searchy reasoning. Right now LLMs mostly operate in System 1 mode: same “chunk time” per token. Rabbit hole: Explore how “chain‑of‑thought” method works & what limitations still exist in System 2 thinking for LLMs. 20. Desired future: trade time for accuracy. This was before the first reasoning model (GPT O1) came out. At the time, Karpathy talked about this idea of wanting to be able to say: “Here’s a hard problem, take 30 minutes,” and get a more accurate answer than a quick reply; currently, the models can’t do that in a principled way. 21. Model self‑improvement example: AlphaGo’s two stages. AlphaGo first imitates human Go games, then surpasses humans via self‑play and a simple, cheap reward signal (did you win?). Question: What’s the best way to improve models in domains where there isn’t a simple reward function, like creative writing or design? 22. Retrieval‑augmented generation (RAG) as “local browsing.” Instead of searching the internet, the model searches your uploaded files and pulls snippets into its context before answering. Question: Where does RAG break down in production? 23. Think of LLMs as the kernel process of a new operating system. This process is coordinating resources including tools, memory, and I/O for problem-solving. Future LLM will: - read/generate text - have more knowledge than any single human about all subjects - browse the internet - use existing software infrastructure - see and generate images and video - hear and speak and generate music - think for a long time using system 2 - “self-improve” in domains with a reward function - customized and fine-tuned - communicate with other LLMs Rabbit hole: Draw out the LLM OS and explain it to someone. This will show how well you understand the technology. 24. The LLM OS is reminiscent of today’s operating systems. The finite context window is like working memory; browsing/RAG are like paging data in from disk or the internet; rapidly growing closed vs. open ecosystem; Managing what’s in context is a core challenge. Rabbit hole: Explore techniques for working across many context windows & longer-running tasks. 25. New computing stack → new security problems. Just as OS’ created new attack surfaces (malware, exploits), LLM‑centric stacks create their own families of attacks. Jailbreaks, adversarial prompting, adversarial suffixes, and prompt injection. Question: security for AI systems seems orders of magnitude harder than traditional software because the # of edge cases feels infinite. Is this assumption right or wrong? 26: LLMs are a new computing paradigm with huge promise and serious challenges. They compress internet‑scale knowledge, act as operating‑system‑like kernels, orchestrate tools and modalities, and open up both transformative products and novel security risks. Question: what is the most nascent part of the LLM OS that needs to be built up in order to accelerate diffusion of the technology? Link to the full “Intro to LLMs” video below
I’m starting to get into a habit of reading everything (blogs, articles, book chapters,…) with LLMs. Usually pass 1 is manual, then pass 2 “explain/summarize”, pass 3 Q&A. I usually end up with a better/deeper understanding than if I moved on. Growing to among top use cases. On the flip side, if you’re a writer trying to explain/communicate something, we may increasingly see less of a mindset of “I’m writing this for another human” and more “I’m writing this for an LLM”. Because once an LLM “gets it”, it can then target, personalize and serve the idea to its user.
I love this figure from Anthropic's new talk on "Skills > Agents". Here are my notes: The more skills you build, the more useful Claude Code gets. And it makes perfect sense. Procedural knowledge and continuous learning for the win! Skills essentially are the way you make Claude Code more knowledgeable over time. This is why I had argued that Skills is a good name for this functionality. Claude Code acquires new capabilities from domain experts (they are the ones building skills). Claude Code can evolve the skills as needed and forget the ones it doesn't need anymore. It's a collaborative effort, which can easily be expanded to entire teams, communities, and orgs (via plugins). Skills are particularly useful for workflows where information and requirements constantly change. Finance, code, science, and human-in-the-loop workflows are all great use cases for Skills. You can build new Skills using the built-in skill creation tool, so you are always building new skills with all the best practices. Or you can do what I did, which is build my own skill creator to build custom skills catered to the work I do. Just more levels of customization that Skills also enables. Skills flexibility enables future capabilities to be easily integrated everywhere. Competitors don't have anything remotely close to this type of ecosystem. The deep understanding of Anthropic engineers on the importance of better context management tools and agent harnesses is something to admire. Very bullish on Claude Code.
Gemini Nano Banana Pro can solve exam questions *in* the exam page image. With doodles, diagrams, all that. ChatGPT thinks these solutions are all correct except Se_2P_2 should be "diselenium diphosphide" and a spelling mistake (should be "thiocyanic acid" not "thoicyanic") :O
Claude Code course by @AnthropicAI it's FREE, check it out if you haven't yet here's the link to the course https://anthropic.skilljar.com/claude-code-in-action…
A number of people are talking about implications of AI to schools. I spoke about some of my thoughts to a school board earlier, some highlights: 1. You will never be able to detect the use of AI in homework. Full stop. All "detectors" of AI imo don't really work, can be defeated in various ways, and are in principle doomed to fail. You have to assume that any work done outside classroom has used AI. 2. Therefore, the majority of grading has to shift to in-class work (instead of at-home assignments), in settings where teachers can physically monitor students. The students remain motivated to learn how to solve problems without AI because they know they will be evaluated without it in class later. 3. We want students to be able to use AI, it is here to stay and it is extremely powerful, but we also don't want students to be naked in the world without it. Using the calculator as an example of a historically disruptive technology, school teaches you how to do all the basic math & arithmetic so that you can in principle do it by hand, even if calculators are pervasive and greatly speed up work in practical settings. In addition, you understand what it's doing for you, so should it give you a wrong answer (e.g. you mistyped "prompt"), you should be able to notice it, gut check it, verify it in some other way, etc. The verification ability is especially important in the case of AI, which is presently a lot more fallible in a great variety of ways compared to calculators. 4. A lot of the evaluation settings remain at teacher's discretion and involve a creative design space of no tools, cheatsheets, open book, provided AI responses, direct internet/AI access, etc. TLDR the goal is that the students are proficient in the use of AI, but can also exist without it, and imo the only way to get there is to flip classes around and move the majority of testing to in class settings.
Next up… Slide Decks! Turn your sources into a detailed deck for reading OR a set of presentation-ready slides. They are fully customizable, so you can tailor them to any audience, level, and style. Officially rolling out to Pro users now (free users in the coming weeks)!

>be Andrej Karpathy >studied computer science from Toronto to Stanford, specializing in deep learning >became Tesla’s director of AI in his early 30s, leading the Autopilot vision team >helped build the foundations of OpenAI as one of its earliest researchers >teaches millions through free lectures, notebooks, and open-source work >keeps his life simple, quiet, and focused on learning >steps away from big titles when he feels the need to reset >builds small AI projects for fun, shares them openly >lives calmly, thinking deeply, working on what he believes matters most Has Karpathy quietly optimized life in a way most people never figure out?