Explainable AI: Unlock How Small Models Actually Work

Press Space for next Tweet

We’ve developed a new way to train small AI models with internal mechanisms that are easier for humans to understand. Language models like the ones behind ChatGPT have complex, sometimes surprising structures, and we don’t yet fully understand how they work. This approach helps us begin to close that gap.

Understanding neural networks through sparse circuits

Topics

interpretable ai model interpretability neural network transparency language model explainability ai alignment mechanistic interpretability model transparency

Topics

interpretable ai model interpretability neural network transparency language model explainability ai alignment mechanistic interpretability model transparency

We doomscroll, you upskill.Get the stories and ideas that matter

slop ⇢ substance ⇢ signal5 minutes a day, zero spam.

We doomscroll, you upskill.

slop ⇢ substance ⇢ signal

Newsletter

We doomscroll, you upskill.

slop ⇢ substance ⇢ signal