top | item 42596219

(no title)

ashepp | 1 year ago

I've been reading about CAST (Causal Analysis based on Systems Theory) and noticed some interesting parallels with mechanistic interpretability work. Rather than searching for root causes, CAST provides frameworks for analyzing how system components interact and why they "believe" their decisions are correct - which seems relevant to understanding neural networks.

I'm curious if anyone has tried applying formal safety engineering frameworks to neural net analysis. The methods for tracing complex causal chains and system-level behaviors in CAST seem potentially useful, but I'd love to hear from people who understand both fields better than I do. Is this a meaningful connection or am I pattern-matching too aggressively?

discuss

triclops200|1 year ago

I do AI/ML research for a living (my degrees were in theoretical CS and AI/ML and my [unfinished] phD work was in computational creativity [essentially AGI]). I also do SRE work as a living.

and yeah that's a useful way of characterizing some of the behaviors of some kinds of neural networks. There's a point at which the distinction between belief and "frequency (or probability-amplitude) state filter" become less apparent, though, that's more of a function-of-medium vs function-of-system distinction.

However, systems like these can often become mediums, themselves, for more complex systems. Additionally, a system which has "closed-the-loop" by understanding the medium and the system as coupled as "self" and separate from the environment along with a direction/goal is a pretty decent, if imprecise, definition of a strange loop. Contradiction resolution between internal component beliefs gives a possible (imo, highly probable) mechanistic explaination for the phenomenon of free energy minimization in such systems. External contradiction resolution extends it to active inference.