Anthropic employees Sholto Douglas & Trenton Bricken did an interview recently with Dwarkesh Patel, pieces here and there was about the circuit tracing insights.
Eg, "If you look at the circuit, you can see that it's not actually doing any of the math, it's paying attention to that you think the answer's four and then it's reasoning backwards about how it can manipulate the intermediate computation to give you an answer of four."
This type of stuff is really important in my opinion. Getting this type of stuff open sourced allows academics and other researchers to try and do this type of interpretability research on a more level playing field.
I think the more people looking at this the better. I have a feeling there will be some breakthroughs in identifying important circuits and being able to make more efficient model architectures that are bootstrapped from some identified primitives.
This is a new tool which relies on existing introspection libraries like TransformerLens (which is similar in spirit to Garcon) to build an attribution graph. This graph displays intermediate computational steps the model took to sample a token.
Same here! Then I immediately thought: I wish people would stop misusing words followed by I guess I think I’m in charge of words now. Then, idly: I’m starting to resemble that “Old Man Yells at Cloud” meme
Yeah, I actually have a decades-old two-layer board that I need to reproduce and I would love to be able to feed images of it into some sort of tool and have it generate a schematic (or at least a netlist) automatically.
You and me both. The reverse engineering tools are out there even if most of the search results are AI slop that recommends common layout tools. If I really needed the work done though I'd just pay one of the overseas services and clean up from there.
rob-olmos|9 months ago
https://www.dwarkesh.com/p/sholto-trenton-2 -- search the transcript for "circuit" for the quick bits.
Eg, "If you look at the circuit, you can see that it's not actually doing any of the math, it's paying attention to that you think the answer's four and then it's reasoning backwards about how it can manipulate the intermediate computation to give you an answer of four."
https://transformer-circuits.pub/
Tostino|9 months ago
I think the more people looking at this the better. I have a feeling there will be some breakthroughs in identifying important circuits and being able to make more efficient model architectures that are bootstrapped from some identified primitives.
sanex|9 months ago
https://open.spotify.com/episode/3H46XEWBlUeTY1c1mHolqh?si=L...
jexp|9 months ago
Have fun
https://gist.github.com/jexp/8d991d1e543c5a576a3f1ee70132ce7...
ofou|9 months ago
[1]: https://transformer-circuits.pub/2021/garcon/index.html
e_ameisen|9 months ago
This is a new tool which relies on existing introspection libraries like TransformerLens (which is similar in spirit to Garcon) to build an attribution graph. This graph displays intermediate computational steps the model took to sample a token.
For more details on the method, see this paper: https://transformer-circuits.pub/2025/attribution-graphs/met....
For examples of using it to study Gemma 2, check out the linked notebooks: https://github.com/safety-research/circuit-tracer/blob/main/...)
We also document some findings on Claude 3.5 Haiku here: https://transformer-circuits.pub/2025/attribution-graphs/bio...)
Eduard|9 months ago
dvh|9 months ago
1wheel|9 months ago
https://www.neuronpedia.org/gemma-2-2b/graph?slug=pcb-tracin...
Henchman21|9 months ago
Funny things, thoughts.
AdamH12113|9 months ago
buescher|9 months ago
Archit3ch|9 months ago
tacker2000|9 months ago
mrheosuper|9 months ago
asadm|9 months ago
forgotpwagain|9 months ago
qtwhat|9 months ago