Ben Tossell

dad 3 under 3 // droid dealer @FactoryAI // dev tool investor + writer bensbites.com // founder makerpad (acq by zapier)

online

bensbites.com

Joined July 2009

487Following

180,100Followers

Page 1 • Showing 12 tweets

Ben Tossell @bentossell

agent request timeline 2024: we want memory 2025: we want to select what memories 2026: remember every single thing i've ever told you

Ben Tossell @bentossell

i built a file-explorer... for local+remote machines, in one. with all the clawdbot stuff flying around i find it hard to 'see' my files on my remote machines. so i built a combined file explorer for any machine you have access to. a bit buggy but works pretty well so far. open-source.

View

Ben Tossell @bentossell

introducing agent-loops + ui viewer i gave droid+gpt5.2 codex dannys tweet https://x.com/dannypostma/status/2011683…… asked to reverse engineer it then rebuilt matts loop system (gh issue → pr) https://x.com/mattpocockuk/status/201150…… hooked them together so you can run loops by creating issues locally, on gh or in your own ui. (+ stole @badlogicgames's session generator) repos: - agent-loop https://github.com/bentossell/agent-loop… - loop ui https://github.com/bentossell/ralph-loop……

View

Ben Tossell @bentossell

redesigned my personal site http://bentossell.com (open source on gh bentossell/bentossell)

View

Ben Tossell @bentossell

what can i use to embed a mini browser inside a browser based web app (that can properly browse the web)

Ben Tossell @bentossell

agents are the software market from now on build something agents choose cli/api first

Ben Tossell @bentossell

I've spent 3 billion tokens in four months. Every single one through a terminal, watching an agent write code I couldn't write myself. You may class me as a 'vibe-coder'. But I think that term overlooks any kind of skill involved in the work itself. Much like 'no-code' did circa 2019 (when I started my no-code education company later acquired by Zapier). I don't read the code. But I read the agent output religiously. And in doing so, I'm picking up a ton of knowledge around how code works, how projects work, where things fail, where they succeed. That's my version of learning to program. The new technical class. ## What I've actually shipped A few things I've actually shipped in these last few months: **Personal Site.** I revamped my personal site and made it look like a terminal CLI tool and it was so much better than my previous attempt at the start of this year. **Feed.** I built a simple social tracker for mentions of Factory on Twitter, posts from our subreddit, and GitHub issues. It's open-source and I've gotten 100+ stars on it with several folks cloning for themselves. **Factory Wrapped.** I built the first version of our 'wrapped' product. Showed it to the team and they loved it, so they wanted to bake it into the actual product itself, which is now live. Adding new guides, rearranging things. This wouldn't technically feel like coding, but to me it is. It's still the same process. **Custom CLIs.** I've created a few CLIs—like a Pylon CLI which then has been picked up by the team to help with customer support queries. I built a CLI to help users with adding tokens to their accounts. Plus a Linear and Gmail CLI. **A crypto tracker.** I invested in a co that accurately predicts positive, negative or neutral signals in dynamic data (financial, weather, fitness, protein folding). So I built a tracker that automatically opens and closes short/long positions based on the predictions - kinda like a mini-hedge fund. **Droidmas.** Twelve days, twelve experiments or games that touched the different themes people are talking about on Twitter—memory, context management, vibe coding, things of that nature. **An AI-directed video demo system.** Effectively, I give it a prompt to create a video. It opens up ghostty, runs the commands, can open other windows like a browser, records the screen. Acts as its own director, producer and editor. The agent itself is watching what's happening during the recording and can respond as and when things happen. If there's an issue or a bug or it needs to wait for a response, it will do that. I used this to create a video that was posted by OpenAI. **A Telegram bot powered by Droid Exec** so I could have my local repos synced on a VPS and just chat to my repos as a chatbot. I try to as closely mimic the CLI experience but from a messaging app (I dislike Telegram but couldn't be bothered with the arduous Whatsapp for Business setup). And about 50 other things I'm not mentioning or have been left to die. ## How I actually work I use a CLI exclusively. Terminal over web interfaces, always. It's just more capable as a general agent, and I get to see it work. I may have an idea for something, or a pain, or there's an issue with something that I feel like could be solved with code (basically everything these days). So I'll just spin up a new project in Droid (Factory's CLI). I generally just talk to the model a couple of times to start feeding in context about what I'm trying to do, then I'll switch into spec mode to start getting a plan going on what I wanna build. In spec mode I'll basically question a bunch of things. Like I don't understand what this is, or why would we need that over this, can't we do it this way? I'll link docs and GitHub repos for the agent to explore. Then I let Opus 4.5 with autonomy high just rip. I'll watch the stream, see what's happening, and when there are any errors. I may jump in to question it or guide it down a different path. I start the server, test it, give feedback and iterate. So I kind of build ahead of myself first. I try and just build the thing. And then all of the gaps and all of the issues that I run into are the opportunities for me to learn. Is that a thing that is part of the system that I've seen across other repos that I should build up a sort of templated system to handle? Should this go into an agents.md that actually follows me around and does the same thing on all of the other repos I'm going to be working on? ## My agents.md setup I've been spending more time trying to figure out the best agents.md setup for myself because this is effectively like the instruction manual. I've got a repos folder locally—that's where all my coded projects go. In that repos folder is an agents.md that says to explicitly set up each new repo with what to do and not to do, how to do things with GitHub, how to commit, all that kind of stuff. And whether it should use my work GitHub account or my personal GitHub account. Running tests. End-to-end tests is one of these things I never really paid attention to previously. But now I'm really keen to have end-to-end tests on everything. Given my current knowledge and capability, when I'm building things and testing them, there often might be silly bugs that I just should have caught or tested had there been tests in the first place. And I often look at others' agents.md files to see what I can borrow for my own. I'm constantly trying to improve my doc to make each and every new working session smoother. ## Coding on the go I'm also making sure that I install the Droid GitHub app on every repo that I create. So when I'm deploying to GitHub, I make sure I'm submitting pull requests so I can have Droid review it—and I can tag Droid to make fixes itself with a custom prompt. I can trigger it from issues or from pull requests. It lets me code from my phone, and add new things when I'm out and about. That in combination with my Telegram bot makes it really easy for me to do things when I'm not at my desk. I also use Slack with my agent. I create a new channel for each repo and just fire off things as and when. I often spin up new channels for new ideas. Slack's a great 1-person product (+ agent(s)). ## What I've been learning **Bash commands.** It really clicked for me when I'd been running the changelog process for a while—it's the same process over and over. I finally understood the 'workflow'. So I got droid to create the slash command flow and it's the first slash command that I actually have used properly, which runs a number of bash commands and also prompts the model to do certain things like reading through GitHub diffs, checking what is behind a feature flag and what's not, putting things into the right sections of new features, bug fixes, that kind of thing. From there I started getting more into bash + cli's. I've stopped using MCPs—I use the CLI versions of most things over MCPs. Yes, because MCPs take up context but mostly I feel like it's simpler - I usually only need a few of the tools an MCP would include. So with Supabase, Vercel and Github, I'm always using the CLI's over the MCP's. I often build my own CLIs for things. For example, I built my own Linear CLI so I could query my own issues and run everything from the terminal instead of going to the desktop or web interface. **VPS.** I abstractly knew what it was—it's like another computer that is on all the time somewhere else. But until I truly needed one I didn't really know what I needed to do there, and there's still a lot I need to learn. But effectively, now when I'm running the crypto tracker, I have a ton of data that's being pulled every single minute and I need that to always stay on. I also use the VPS when using my Droid Telegram bot and use something called SyncThing to sync my local repos to my VPS so that my repos are always up to date and they're in the same state as I left it. So I can just pick it up on the go. **Skills.** I've tried to use them a bit more. I've been using them not only just as knowledge, but also with bash commands + CLIs. I've got a Gmail CLI that I can pull into any projects, it's portable, it lives at my root directory. So anytime I need Gmail in my system—I've got a Gmail triage system that I use—it just uses the CLI. ## The new programmable layer of abstraction Not to be like everyone else on Twitter when they see Andrej Karpathy tweeting something, but this really rang true to me: **there's a new programmable layer of abstraction to master.** When it was the no-code days, the abstraction layer that I was mastering was drag and drop tools like Webflow, Zapier, and Airtable—stitching them together and making it feel like real software (until you hit a limit). But now instead of me thinking I've got to learn to write code from scratch in order to be able to do all of this, what I need to learn is actually **how to work with an AI agent.** How can I prompt it well? How can I make sure it's got the right context? And also how can it help me understand what we're doing, how do the pieces work together, how can I improve my own system over time? Including all of the things like agents, subagents, prompts, context, memory, skills, hooks, etc. ## Learning from others I read people like Peter Steinberger who is an *actual* programmer and is shipping a ton like crazy. And seeing in his posts almost the simplicity of his system, where he just talks to the model, lets it do its thing, doesn't really worry about extra slash commands, subagents, hooks, skills(although he's coming round to skills) - this just gives me permission and confidence that I don't need some ultra complex system. Looking at Twitter you see a lot of people really optimising or potentially over-optimising their own system. That can feel daunting for folks like me, but also that's what I think some of the beauty of this is: it's a completely customizable system, so you can make it work for you however you'd like it to work. You can have a plan mode that you create with a custom slash command that runs for twenty minutes like Kieran does, or you can just talk to the model like Peter does. Another thing while following other engineers is seeing their open source software, cloning it, using it myself, trying to improve it, or just taking parts of it and making that my own. Like Peter's recent summarize YouTube for example, I just took it, removed the Chrome extension part, kept it as the CLI, and now I can just talk to that anywhere I want to. And like Mario, reading things like his MCP post where he talks about CLIs over MCPs, gave me the nudge to dive in more to bash and CLIs. ## The learning process I'm not building things for tens of thousands of people to use in production. So there are going to be bugs, there are gonna be issues, and I run into them plenty. And it's just a reminder that this is a gap in your knowledge, not in the capability that you have now. My role is identifying the gaps or finding those gaps and thinking: how do I make sure this never happens again? Or how do I make sure I understand this part of the system enough that if it's gonna happen again, I'll catch it. Even the simplest things from when I first started using agents to code—like, why can't I use GitHub Pages when I've got dynamic data and I want multiple users to be able to use something? That's a very, very simple thing that programmers know. But it was something I just learned because I was building something, I was trying to build something different than the tools allowed me. So then I said, okay, so what do we need to do? Like all you need to do is just ask the model. The model knows everything that you don't. You can just keep asking it. **It's your ever patient, over-your-shoulder, expert programmer.** You can add in your agents.md "I am not a programmer, you need to explain things very simply for me." You can just tweak it exactly how you want to. ## Contributing to real products I've even contributed improvements to our own product—some simple things, but improvements nonetheless. There's a team of engineers at Factory that are extremely experienced and good at what they do, and I'm learning a lot by just watching them, looking at their PRs. We have internal lunch and learns where people say "this is how I scope new product features", "here's how I bug fix", things like that, which have been really helpful. So this whole thing is just a really big learning experience for me, and I'm really enjoying learning "to code", or, learning to work with code. ## Why this is different I've tried to learn to code many times in my life, and every time it was type in these characters, hit enter, and do you see hello world? It was kind of do this, then that, then this happens. And maybe it would have been helpful for me to learn all that, but I just still think that's so different to what it is today. For me to be able to build the things I've built now, if I'd taken that other path, I would have had to code for many months, many years to get to a point where I could feel like I could write the code myself. So instead I'm coming at it from a point of view of I understand systems thinking for projects built with code. I accidentally learned that when I was running my last company with no-code education. You're still learning that okay, Webflow is the front end, Zapier is the API routes, the connective tissue, the data flows, and Airtable is your database. So I learned the systems of that previously, and I think that's helping me today understand some of those pieces. There is so much you can learn. And often I'll see something that someone posts on Twitter and I'm like, I have no idea what that is or what I can do with it, but I'll bet you I can play around with it. **No piece of software feels unattainable.** I can just git clone it and say, what the hell does this thing do? Okay, I've been thinking about this—is this thing gonna do anything related to what I thought? And it's just all exploration. It's so much fun. ## Asking the "silly" questions There have been countless times where I think about silly questions—to me or silly questions that other programmers would never ask—that I have the permission to ask, because there's no one watching me and no one shooting me down for being stupid or saying the wrong thing. Like, why do we use all these frameworks, these different types of frameworks? Because they are abstractions for humans writing code. So why—if an LLM is super smart—why couldn't it just be simpler code written, less dependencies, less potential surface areas for bugs? Is that a silly thought or a good thought? And I can learn that it might not be a silly thought. But okay, yes, there are these many projects that the model has been trained on, which is why often things will be built in certain frameworks. So it's just building up this understanding of the code world, the engineering world that I didn't deserve to be in, but I'm absolutely part of now. ## Beyond "vibe coding" Yes, you can call it vibe coding, but I think vibe coding misses the point. I'm trying to actually learn the systems. I'm trying to really understand what is going on, how can I improve, how can I be a new age programmer, what is this new technical class? That's what I think is the most interesting thing here. I can't categorically call myself non-technical but I also can't call myself a programmer. Nor would I want to. **I'm part of this new technical class and I don't know what it's called.** But I think vibe coding gives a negative connotation to it, much like no-code gave a negative connotation to that group. ## It feels like a game Some people have likened this new way of programming to a game. Factorio is the one that people talk about. I've never played it. I'm not much of a gamer. But this whole paradigm feels like a real game to me, and the output is I'm building stuff that I want to build. A ton of things just don't end up anywhere on GitHub. They don't end up live. They are just mere explorations of parts of a system or a topic. Others end up published and other people use it - I had a CTO fork my personal site and use it for himself! Big boss stuff (for me!) If someone posts "oh, I built this React grab tool for example". Okay, cool, can I build my own? Like why? This one looks really good. Well, just because I want to. I can just explore things for the sake of exploring things. **Every idea you've ever had can be exercised, can be explored, and it doesn't need to be good.** And you'll learn along the way. ## Permission to throw things away Previously, if I'd learnt to code to build a really crappy version of something I was thinking of, like a big idea that I had, and then no one wanted it, I'd be too emotionally invested in that idea to be able to just throw it away. With no-code, I could effectively build a version of that big idea in an hour, a couple of hours, a weekend. And if no one liked it, no one wanted to pay for it, it was rubbish, then I could just throw it away. It wasn't that much of my time or my energy into something that ultimately wasn't going to be something good for someone else. And I feel like the same is true today. We're gonna see an explosion of software. Many of it won't be good, but lots of it is already great. There are expert programmers who are shipping things like absolute crazy that are all good projects. So we're just gonna have this absolute plethora of coded projects out there that you can use, clone, tweak, remix. It'll take a lot less time than if you had to learn to code or if you're reading the files or you're writing the files or anything like that. It's just a lot quicker. The feedback loop is quicker. The process is quicker. You can just do anything at any time and just consistently keep churning out stuff. ## Fail forward The way to learn about code is to build ahead of your capability and fail forward. I feel like everyone who is not technical today who wants to be in this world, who wants to do stuff like this, can absolutely do it. They just need some permission to do that. To play around. You must think of it like play. Sign up to a CLI agent like Droid. Say you want to build a personal website. Say you want to build a little RSS feed tracker, a little to-do list, a little workout app. Whatever you want to do, you just spin it up, start working on it. Every little hiccup, bug, or issue you run into—question it. Okay, why did this come up? Why did you hit those errors? You know you don't know how to code, so you shouldn't get bogged down with bugs - expert programmers hit bugs all the time. And you can take it to other places. You can go to ChatGPT or Claude and give it to different models for different perspectives. You're always gonna have all of the choice up there and all the different variations. ## Just pick one There are just so many different tools, so many different options. Ultimately, just pick one and just stick with it. Just learn that system. They all look fairly similar. They all work similarly. Obviously, I use Droid because I work at Factory. But also they get the best output of any models. (yay for model agnostic) Ultimately, what I want and what I need from a tool is: **is this one gonna help me get the furthest I can in the least amount of time with the least amount of trouble?** The more I have to do with using the tools themselves, the harder it is. Things like IDEs—I've tried a bunch. I used to use one in particular for a long time. It's just got so much extra stuff that I just don't need or care about. I just want to talk to a model, have code written. If I need to inspect some markdown files, I can now use what I've just recently discovered is a file manager in the terminal. So I can just look through that, or I can open it up in Zed, which is what I use now, just to view markdown files, edit them. If it's a changelog, for example, I want to tweak something briefly, go back to the CLI, and then just let it rip from there. And any tool or feature I think I'm missing, I'll have a crack at building it myself - like a terminal file viewer. *This whole thing is just a really big learning experience for me, and I'm really enjoying it. Build, fail forward, and keep shipping.*

Ben Tossell @bentossell

terminals are the UI of the future (+past)

Ben Tossell @bentossell

opus 4.6 one shot refactored 5k lines of CSS into tailwind minor notes from 5.3-codex

Ben Tossell @bentossell

This is the kind of foundational knowledge that I think you need to work with today's AI coding agents, if you're not that technical. ## Local vs Remote Firstly, you have to understand what the hell local and remote mean. Local just means it's on your computer. Remote means it's in the cloud. If you're working on a Google Doc with no internet access on your phone, you're working on it locally to that device (that machine). It only lives there. But when you connect to the internet and you save it to your Google Drive, it has then got a remote destination. You can go on your computer then and access the same document that you were editing on your phone. That is the difference between local and remote. What's only on your computer vs in the cloud. ## Terminals When you're using a coding agent. You may be using it in your terminal. A terminal is just an app, that can run commands on your computer. But the tricky thing is that you need to know the exact commands that it needs to run. It can list, edit, move, and create files, all things like that. With coding agents, you no longer need to know those commands. Except maybe 'cd' - which means change directory (folder). So you'd say 'cd /Users/bentossell/project/bensbites/' and then start your agent IN that folder. ## Directories & Paths Computers have got their own kind of language which is a little bit difficult to understand at first, but when you see enough of it, you start recognising it. A directory is a folder. So when you're creating a directory, you're just creating a folder. That's all it is. When you're trying to find a specific file or folder, it's got a sort of destination URL. If you're on a website, website.com/product/category, that full URL is the URL path. With your file system, you have the same thing. It's a path to your file. So you can say, it's in my home directory, which is your /Users/your_username (as I showed above with cd). Then ~/projects/bensbites/ would be the path to that folder or file. The ~/ (called a tilde) is a shortcut for your home directory (a replacement for /Users/your_username/). So if you're in your home directory, you can just say ~/projects/bensbites/ and it will know where to go. ## What Coding Agents Are So now when you hear about what a coding agent is, it's effectively like talking to ChatGPT or Claude. It's an LLM, it's an AI system that can respond to you, but it also has access to the tools that your computer has access to as we mentioned above. BTW: I posted this on my cookbook site where i share my sessions and i have explainers for terms/tools that you may find hard to remember. ## System Prompts Coding agents come with their own system prompts. So Claude Code has its own system prompt, Factory's Droid has one, Codex has one, and so on. Some of these companies actually publish their system prompt so you can view it. Other companies do not, but there are accounts on Twitter that actually go and sort of trick the system to expose its own system prompt. So you can always go and look at them. Effectively, what they will say is, "You are an expert software engineer. These are the things you can do. These are the things you shouldn't do. This is the company that created you. Your knowledge cutoff is this date, but it is currently this date." (because LLMs have knowledge only up to a certain date). Coding agents can use web search tools, so you can get up-to-date information. They also might include a reference to their own documentation. So if you're asking, "Okay, how do I set this thing up in my coding agent?" It can look at its own documentation to guide you (or do it for you). ## Markdown Files & agents.md Another thing that is very common and talked about are markdown files (.md). A markdown file is just a file format for plain text with some formatting (headers, lists, etc). If you use PowerPoint then you'll know that when you export a presentation, it'll be in a PPTX format. If you export one on Mac, it'll be a .keynote format. If you use Excel, it'll be a .xlsx file, but if you use Google Sheets, it'll be a .csv or something else. These markdown files allow you to give your agent your own instructions beyond the system prompt. Agents look for an AGENTS.md file (except claude looks for CLAUDE.md). In that file, you could have a prompt that says, "I'm not very technical at all, so anytime you're talking about code, please explain exactly what it is at all times," or, "Explain it to me like I'm five," and you'll see how the difference is output between not having that instruction and then having that instruction. Developers often put a lot of different things in here, like, "This is my tech stack. These are the key files to look at. These are the commands you need to use to run this particular project." And it's all hierarchical, where the closest markdown file to what you're working on will be read first before the preceding one... When editing `Button.tsx`, the agent reads the closest AGENTS.md first (components/), then works its way up to the root (home). ## Skills And now there are things called skills. Skills you can think about as reusable workflows, so something you use over and over again. Skills are their own instruction files. So if you work with a lot of projects, and you want to work in a particular format, so say, you're working on newsletter content, you've got your /project/newsletter/, and you have a skill that is for writing newsletter headlines. So you'll give it your newsletter content and say, "Write a headline based on this content," and in your skill.md, you have instructions that say, "Never use emoji, always use lowercase, make sure it's maximum four words. Try and pick out the most important topic from the newsletter content," and things of that nature. Skills can also be instructions and tools. So it could be the same instructions with a tool that generates an image based on the content. For actual code projects you'd likely have a skill for frontend design (what users see). I use one called agent-browser which has instructions for the agent on how to use the agent-browser tool (so an agent can look at chrome, click, scroll, read, type, etc). ## Frontmatter & Progressive Disclosure The differences with skills and agents.md are very little, actually. The skill basically just gets called by the agent as and when it needs it instead of always added to the system prompt. A couple of terms worth mentioning are progressive disclosure and frontmatter. Frontmatter is effectively like metadata. If you were writing a blog post and you've got your title and your tagline, you're writing the metadata, the SEO data, to say, this is what a search term should hit. When the user types these words, it should hit like the tagline and description. So like you would put, if you're talking about AI, you would have AI in there somewhere. Frontmatter is that for these documents. So you just have, name: Newsletter-content-writer description: Use this skill when asked to write any newsletter content. The agent itself prepopulates all of those titles and descriptions. It can look at that and say, "Okay, I've taken in my system prompt. That's the first thing that I know, and then I've also taken in the agents.md file to know any instructions that the user has specified. And then I can also see they've got these ten skills that I have access to. So I now know that if they ask me anything about newsletter content, I will pick up these skills and use them proactively." And that leads us to progressive disclosure is. It just the 'loads them itself' part. You don't have to specify, "Use this skill to do this task." Although, I actually do that a lot because I just wanna make sure that it is definitely using that skill. ## How It All Connects Everything you're doing is giving extra context for the agent, and we'll get into context later. Extra context, extra instructions, extra abilities to the agent that works for you in a particular project or across all of your projects. You can have specific instrustions and skills per project or for all projects. ## Dot Files & Where Skills Live Where do your skills live? In your filesystem some files are hidden by default. You can view them by pressing Command+Shift+. (on mac). These files are called dot files. They look like this: .factory, .claude, .codex, etc. They have your configuration, session information, settings and skills. ~/.factory/skills/ is where they'll be. And there's often an agents.md in there. So you should make sure that that is where your key instructions are. ## Commands Custom slash commands are if you want to run a workflow quickly. So you could have /newsletter, and it would pull the transcript from your Transcripts folder. It would format it using a specific skill. It would rewrite it. Your agent can set them up for you or you can create them manually in ~/.factory/commands/ ## Bash Touching on the tools briefly. When you start working with an agent, you'll see your computer typing out a bunch of commands. This is basically what you'd be doing if you were really technical and lived in a terminal all day (without an agent). The main tool here is called Bash. Agents love it because they can just do stuff; run commands, look things up, move files around, chain a bunch of operations together. ## Context Windows & Tokens AI models have a memory limit. It's called the context window. And it's measured in tokens. A token is roughly 4 characters or 3/4 of a word. So if a model has a 100k context window, it can hold about 100,000 tokens in its memory at once. The agent can only see what fits in that window. So if you've got a massive codebase with hundreds of files, it can't just load all of them at once. It has to be smart about what it looks at. It'll read the files it needs, do some work, maybe forget some stuff to make room for other stuff. file - that all takes up space.Or if the tool has a big system prompt, and you've given it a massiveagents.md The more you cram in there, the worse it gets at focusing on what matters. (Although Factory's Droid compaction is phenomenal - compaction/compression is the agent summarising the context and trying to keep the important bits). should be focused. That's why skills are separate files that only get loaded when needed. You're managing the agent's attention.When you're working with agents, you want to give them the right context. It's almost the entire job really. That's why youragents.md And if the agent seems to forget something you told it earlier in a long conversation, that's why. It fell out of the context window. You might need to remind it or start a fresh session. # Now let's talk about coding projects. ## Git & Version Control Git is probably the most important thing to understand if you're going to be doing any kind of real work with coding agents. Git is version control - saving your work, but you can go back to any previous save at any time. When you're working on a project and you make a change, you commit it. A commit is just a snapshot. It's like saying, "Okay, this is what the project looks like right now, and I'm happy with it, so I'm going to save this checkpoint." And you write a little message, like "added the new header" or "fixed that bug," so future you knows what that change was. And then you've got branches. A branch is like a parallel version of your project. You might have your main branch, which is the real version, the live thing, and then you create a new branch to try something out. Like, "I want to experiment with this new design", you branch off, do your work, and if it works, you merge it back into main (bring the new work back onto the main work to incorporate it). If it doesn't, you just delete that branch and nothing's changed. Local and remote comes back here. When you're working on your computer, your commits are local to your machine. They only exist on your computer. When you push, you're sending those commits to the remote, which is usually GitHub, somewhere in the cloud. And when you pull, you're bringing down changes that someone else pushed, or that you pushed from a different machine. So push is upload, pull is download, basically. And a repo, a repository, is just a project folder that has git set up. That's all it is. It's got a hidden .git folder inside it that tracks all of these commits and branches. So when someone says "clone this repo," they're saying download this project from GitHub to your machine so you can work on it locally. The coding agent will do all of this for you. It'll commit, it'll push, it'll create branches. It's good to understand what it's doing as it may ask you "Do you want me to commit this?" and you'll know what that means. If (when) something goes wrong, git is how you undo it. ## Environment Variables & API Keys These are basically secrets that your code needs to work but you don't want to write directly into your code. So if you're connecting to Stripe or OpenAI or whatever service, they give you an API key (effectively a unique password). You don't want to just put that password in your code because if you push that to GitHub, now everyone can see your password. Bad. So what you do is you create a file called .env, and you put your secrets in there. Like, OPENAI_API_KEY=sk-whatever. And then your code reads from that file instead of having the actual key written in it. The .env file stays on your machine, you never push it. There's usually a .gitignore file that tells git, "Hey, don't ever upload the .env file," so it stays safe. Environment variables aren't just for API keys. You might have different settings for when you're developing versus when it's live. Like, your database URL might be different locally than in production. So you'd have different .env files for different environments. When you're setting up a new project and it asks you to "add your API keys to .env," that's what it means. Create that file, put your keys in there, and the code will pick them up. The coding agent can help you set this up, but it'll ask you for the actual keys because obviously it doesn't know your passwords. Open the file directly and add your API keys - don't just paste them to the agent (although I've done this a lot). ## Dependencies & Package Managers When you're building stuff, you're not writing every piece of code from scratch. You're using code that other people have already written. These are called dependencies, packages, or libraries, kind of used interchangeably. If you want to connect to a database, there's many packages for that. If you want to send emails, there's packages for that. You just install them and use them. And the thing that manages all of these is called a package manager. If you're working with JavaScript, which is what most web stuff uses, you've got npm, which stands for Node Package Manager. There's also Bun, which is a newer, faster alternative that's getting popular. Does the same thing, just quicker. If you're working with Python, you've got pip. There's others, but those are the main two you'll see. So when you clone a repo and you want to run it, the first thing an agent will do is npm install or pip install. It looks at a file in the project, package.json for JavaScript or requirements.txt for Python, and it downloads all the packages that project needs. They go into a folder called node_modules for JavaScript, and you'll notice that folder is massive. Like, thousands of files. That's normal. Don't worry about it. Don't touch it. It's just all the packages you need. The coding agent does this automatically most of the time. When it needs a new package, it'll install it. ## Running Projects Locally So once you've got your project set up and your dependencies installed, you need to actually run the thing. And this is where you'll experience localhost. Localhost is just your computer pretending to be a web server. Your agent will start the server and you go to localhost, and it'll show you your project running on your machine. You'll see localhost:3000 or localhost:8080 or some number. That number is the port. It's like different channels. So you could have one project running on 3000 and another on 3001 at the same time. They don't interfere with each other because they're on different ports. To run a project, you usually do something like npm run dev or npm start. It depends on how the project is set up. There'll usually be a README file that tells you, or the coding agent will figure it out. And when you run it, the terminal will show you the URL, like "ready on localhost:3000," and you just open that in your browser. It's running, it's live, but only on your machine. No one else can see it, you can't share that link with anyone. It's local. You're developing, you're testing, you're making sure it works before you deploy it to the actual internet where everyone can see it. When you make changes to your code while the dev server is running, most of the time it'll automatically refresh. You don't have to restart it. That's called hot reloading. It just picks up the changes. ## Deployment You've built something locally and it works. Now you want to put it on the actual internet so other people can see it. You have to deploy it. There's a few popular platforms for this. Vercel is probably the most common for anything built with React or Next.js. Netlify is similar, really good for static sites and simple apps. Cloudflare has Pages and Workers which are great for anything that needs to be fast everywhere. All of these have their own CLI tools which makes it easier for the agent to deploy for you. You can say "deploy this to Vercel" and the agent will run the Vercel CLI, push your code, and give you a live URL. These tools also have something called MCPs, Model Context Protocols, which let the agent interact with these services even more directly (similar to CLIs). It can check your deployments, see logs, manage environment variables, all that. The one thing to remember with deployment is environment variables. Your .env file doesn't get deployed, that's the point. So any secrets your app needs, you have to add them in the platform's dashboard. Vercel has environment variables in project settings, Cloudflare has the same, they all do. If your app works locally but breaks when deployed, check your environment variables first. ## Reading Errors So when something breaks, and things will break, the terminal is going to show you an error. It'll usually be red text, looks scary, very intimidating. But it's actually trying to help you. The key thing to look for is the error message itself, which is usually at the top or the bottom of all that red text. It'll say something like "cannot find module" or "undefined is not a function" or "connection refused." And then there's usually a stack trace, which is all those file paths and line numbers. It's showing you where the error happened. Like, "the problem is in this file, on line 47." So you can go look at that line and see what's wrong. The good news is you don't need to understand most of this yourself. You copy the error, you paste it to the coding agent, and you say "fix this." The agent is really good at reading errors because it's seen millions of them. It'll go, "Oh, this means you forgot to install that dependency," or "You've got a typo here." And it'll fix it for you. Over time, you'll start recognizing common ones. Like, "ENOENT" just means file not found. "ECONNREFUSED" means it couldn't connect to something, maybe your database isn't running. You'll pick up the patterns. ## Chrome DevTools & Console Errors There's another source of errors that's really useful, and that's your browser's developer tools. Chrome DevTools, specifically. If something's not working on your website, like a button doesn't do anything or data isn't loading, the answer is usually in the console. You open DevTools by right-clicking anywhere on the page and clicking Inspect, or pressing Command+Option+I on Mac or Ctrl+Shift+I on Windows. Then you click the Console tab. And you'll see any errors that JavaScript is throwing, any failed network requests, all sorts of useful stuff. The red text in there is what you want to pay attention to. It might say something like "Failed to fetch" or "Uncaught TypeError" or "404 Not Found." That's your clue. You copy that, you paste it to the agent, and you say "I'm seeing this in the console when I click the submit button." That's way more useful than just "it doesn't work." And there's a Network tab too, which shows you all the requests your page is making. If you're trying to load data from an API and it's not working, you can see in the Network tab whether the request is even being sent, what it's sending, and what the server responded with. Super useful for debugging. You can screenshot that or copy the error details and give them to the agent. This is another use-case for agent-browser - the agent can look at the site for you, inspect the console and read the error messages and pass them back to the agent's context to fix. ## Errors, Bug Fixing & Tests You'll hit errors and bugs all the time. Actual developers do too. The wrong way to handle them is what most people do at first. Something breaks, you go "fix this," the agent tries something, it doesn't work, you go "still broken," it tries something else, still doesn't work, and you're going round and round in circles. You can burn through hours like this (I have). Before you say "fix this," give the agent context. What were you trying to do? What did you expect to happen? What actually happened? Paste the errors. You should ask the agent to investigate before it fixes. Like, "Don't fix it yet. First, tell me what you think is wrong and why." Because if it just starts changing stuff without understanding the problem, it might fix the symptom but not the cause. And then you've got a different bug next week. Tests are huge for this. A test is just code that checks if your other code works. So you might have a test that says, "When I click this button, this thing should happen." And you can run all your tests to make sure nothing's broken. The agent can write tests for you/itself. A good workflow is: bug happens, you ask the agent to write a test that reproduces the bug first. So now you've got a failing test. Then your agent fixes the bug, and the test passes. Now you know it's actually fixed, and if that bug ever comes back, the test will catch it. This is called test-driven development, kind of. Also, logs. You can ask the agent to add logging so you can see what's happening. Like, print statements that say "got to this point" or "this variable is this value." Helps you and the agent figure out where things go wrong. The best agents have tools for this now. They'll run the code, see the error, inspect variables, check what's happening at each step. ## Before You Go In Circles I mentioned not going round and round trying to fix something. But everyone does it. So what I find helpful to remind myself is: 1. Commit your work. Before you start chasing a bug, make sure everything you've done so far is saved. Git commit. That way, if you make things worse, you can always go back. 2. Consider starting a fresh session. Sometimes the agent has gone down a rabbit hole on one feature and it's got confused context from all the failed attempts. 3. Check your setup. Is your .env file actually there? Are all the keys correct? Did you maybe set up the environment variables locally but forget to add them to your hosting platform like Vercel or GitHub secrets? 4. Think about updating your context. Your agents.md, your skills, whatever instructions you've given. If you keep running into the same type of problem, maybe the agent needs better guidance. Say the agent keeps trying to use a tool that doesn't work, or it keeps formatting something wrong, or it keeps forgetting that your project has a specific setup. That's your cue to add that information to your agents.md. "We use pnpm instead of npm." "Always run migrations before testing." "The API endpoint is this, not that." It's tricky if you're not technical. How do you know what to add? When the agent finally does solve it, ask it. "What was the problem? What should I add to my instructions so we don't hit this again?" It can diagnose the problem and suggest an instruction to help avoid it. ## The Agent Loop Agents are basically a loop. They plan, they act, they observe, and then they repeat. You give it a task. "Build me a landing page." First, it plans. It thinks, okay, I need to create these files, I need to install these dependencies, I need to set up this structure. Then it acts. It starts creating files, writing code, running commands. Then it observes. It looks at what happened. Did it work? Is there an error? Does the page actually load? And based on what it observes, it either moves on to the next thing or it goes back and fixes what broke. And this loop keeps going until the task is done or it gets stuck. When it gets stuck, that's when it asks you for help. "I tried this but it didn't work, what should I do?" And you give it more context or a different direction, and it continues the loop. ## Autonomy Modes Different agents have different levels of autonomy. Some will ask you before they do anything. "I'm about to create this file, is that okay?" And you have to approve every single action. Safer but slow. Others will just make changes, create files, run commands, all without asking. Much faster, but you have to trust it. And sometimes it'll go down a wrong path and you've got to reel it back in. Most agents let you configure this. Like, "Ask me before deleting files but go ahead and create new ones." Or you might have a mode for exploring where it asks a lot, and a mode for execution where it just does the work. I like to let it run and just watch. I can see what it's doing in real time. If it starts doing something weird, I'll jump in. But I'm not approving every single thing. You'll start getting more comfortable with more autonomy the more you use it. That's the foundational stuff. You don't need to memorise it all. Start building something, let the agent guide you, and you'll pick it up as you go. The best way to learn is by doing. What'd I miss?

Ben Tossell @bentossell

2025 was the year of the builder particularly for becoming a technical builder

A great time to be a builder

Ben Tossell @bentossell

many sub-agents in one session is hard to follow why not just multiple windows with instances looking at the same folder? then use a 'conflict specialist' or similar to review and fix