Understanding Clawdbot Architecture and Implementation

Press Space for next Tweet

I took a look inside Clawdbot (aka Moltbot) architecture and how it handles agent executions, tool use, browser, etc. there are many lessons to learn for AI engineers. learning how Clawd works under the hood allows a better understanding of the system and its capabilities, and most importantly, what it's GOOD at and BAD at. this started as a personal curiosity about how Clawd handles its memory and how reliable it is. in this article i'll go through the surface-level of how Clawd works. # What Clawd is TECHNICALLY so everybody knows Clawd is a personal assistant you can run locally or through model APIs and access as easy as on your phone. but what is it really? at its core, Clawdbot is a Typescript CLI application. it's not Python, Next.js, or a web app. it's a process that > runs on your machine and exposes a gateway server to handle all channel connections (telegram, whatsapp, slack, etc.) > makes calls to LLM APIs (Anthropic, OpenAI, local, etc.) > executes tools locally, > and does whatever you want on your computer. # The Architecture to explain the architecture more simply, here's an example of what happens when you message Clawd all the way to how you get an output. here's what happens when you prompt Clawd on a messanger: ## 1. Channel Adapter A Channel Adapter takes you message and processes it (normalize, extract attachments). Different messengers and input streams have their dedicated adapters. ## 2. Gateway Server The Gateway Server which is the task/session coordinator takes your message and passes it to the right session. this is the heart of the Clawd. It handles multiple overlapping requests. to serialize operations, Clawd uses a lane-based command queue. A session has its own dedicated lane, and low-risk parallizable tasks can run in parallel lanes (crone jobs). This is in contrast to using async/await spaghetti. over parallilization hurts reliability and brings out a huge swarm of debugging nightmares. > Default to Serial, go for Parallel explicitly if you've worked with agents you've already realized this to some extent. this is also the insight from Cognition's Don’t Build Multi-Agents blog post. a simple async setup per agent will leave you with a dump of interleaved garbage. logs will be unreadable, and if they share states, race conditions will be a constant fear you must account for in development. Lane is an abstraction over queues with serialization is the default architecture instead of an afterthought. as a developer, you write code manually, and the queue handles the race conditions for you. the mental model shifts from "what do I need to lock?" to "what's safe to parallalize?" ## 3. Agent Runner this is where the actual AI comes in. it figures out which model to use, picks the API key (if none work it marks the profile in cooldown and tries next), and falls back to a different model if the primary one fails. the agent runner assembles the system prompt prompt dynamically with available tools, skills, memory, and then adds the session history (from a `.jsonl` file). this is next passed to the context window gaurd and makes sure if there is enough context space. if the context is almost full, it either compacts the session (summarize the context) or fails gracefully. ## 4. LLM API CALL the LLM call itself streams responses and holds an abstraction over different providers. it can also request extended thinking if the model supports it. ## 5. Agentic Loop if the LLM returns a tool call response, Clawd executes it locally and adds the results to the conversation. This is repeated until the LLM responds with final text or hits max turns (default ~20). this is also where the magic happens: > Computer Use which i will get to. ## 6. Response Path pretty standard. responses get back to you through the channel. the session is also persisted through a basic jsonl with each line a json object of the user message, tool calls, results, resposnes, etc. this is how Clawd remembers (session based memory). this covers the basic architecture. now let's jump on some of the more critical components. # How Clawd Remembers without a proper memory system, an ai assistant is just as good as a goldfish. Clawd handles this through two systems: - Session transcripts in JSONL as mentioned. - Memory files as markdowns in `MEMORY[.]md` or the `memory/` folder. For searching, it uses a hybrid of vector search and keyword matches. This captures the best of both worlds. So searching for "authentication bug" finds both documents mentioning "auth issues" (semantic) and exact phrase (keyword match). for the vector search SQLite is used and for keyword search FTS5 which is also a SQLite extention. the embedding provider is configurable. It also benefits from Smart Synching which triggers when file watcher triggers on file changes. this markdown is generated by the agent itself using a standard 'write' file tool. There's no special memory-write API. the agent simply writes to `memory/*.md`. once a new conversation starts a hok grabs the previous conversation, and writes a summary in markdown. Clawd's memory system is surprisingly simple and very similar to what we have implemented in @CamelAIOrg as workflow memories. No merging of memories, no monthly/weekly memory compressions. This simplicity can be an advantage or a pitfall depending on your perspective, but I'm always in favor of explainable simplicity rather than complex spaghetti. the memory persists forever and old memories have basically equal weight, so we can say there's no forgetting curve. # Clawd's Claws: How it uses your Computer this is one of the MOAT's of Clawd: you give it a computer and let it use. So how does it use the computer? It's basically similar to what you think. Clawd gives the agent significant computer access at your own risks. it uses an exec tool to run shell commands on: • sandbox: the default, wher commands run in a Docker container • directly on host machine • on remote devices Aside from that Clawd also has Filesystem tools (read, write, edit), Browser tool, which is Playwrite-based with semantic snapshots, and Process management (process tool) for background long-term commands, kill processes, etc. ## The Safety (or a lack of none?) Similar to Claude Code there is an allowlist for commands the user would like to approve (allow once, always, deny prompts to the user). safe commands (such as jq, grep, cut, sort, uniq, head, tail, tr, wc) are pre-approved already. dangerous shell constructs are blocked by default. the safety is very similar to what Claude Code has installed. the idea is to have as much autonomy as the user allows. # Browser: Semantic Snapshots the browser tool does not primirily use screenshots, but uses semantic snapshots instead, which is a text-based representation of the page's accessibility tree (ARIA). so an agent would see: This gives away four significant advantages. As you may have guessed, the act of browsing websites is not necessarily a visual task. while a screenshot would have 5 MB of size, a semantic snapshot would have less than 50 KB, and the fraction of the token cost of an image.

Topics

artificial intelligence machine learning software engineering programming automation product management technology

Read the stories that matter.The stories and ideas that actually matter.

Save hours a day in 5 minutesTurn hours of scrolling into a five minute read.