Differentiable Logic Cellular Automata

[+] bob1029|1 year ago|reply

This is very interesting. I've been chasing novel universal Turing machine substrates. Collecting them like Pokémon for genetic programming experiments. I've played around with CAs before - rule 30/110/etc. - but this is a much more compelling take. I never thought to model the kernel like a digital logic circuit.

The constraints of boolean logic, gates and circuits seem to create an interesting grain to build the fitness landscape with. The resulting parameters can be directly transformed to hardware implementations or passed through additional phases of optimization and then compiled into trivial programs. This seems better than dealing with magic floating points in the billion parameter black boxes.

[+] fnordpiglet|1 year ago|reply

Yeah this paper feels profoundly important to me. The ability to differentiate automata means you can do backward propagating optimization on Boolean circuit designs to learn complex discrete system behaviors. That’s phenomenal.

[+] pizza|1 year ago|reply

check out difflogic. differentiable neural net logic circuits that can be compiled to cuda or c code. their prototypical demo is an mnist classifier that can run at > 1M images/sec on cpu!

[+] satvikpendem|1 year ago|reply

What a busy beaver you are.

[+] throwaway13337|1 year ago|reply

This is exciting.

Michael Levin best posited for me the question of how animal cells can act cooperatively without a hierarchy. He has some biological experiments showing, for example, eye cells in a frog embryo will move to where the eye should go even if you pull it away. The question I don't think he could really answer was 'how do the cells know when to stop?'

Understanding non-hierarchical organization is key to understanding how society works, too. And to solve the various prisioner's delimmas at various scales in our self-organizing world.

It's also about understanding bare complexity and modeling it.

This is the first time I've seen the ability to model this stuff.

So many directions to go from here. Just wow.

[+] fc417fc802|1 year ago|reply

> The question I don't think he could really answer was 'how do the cells know when to stop?'

I'm likely missing something obvious but I'll ask anyway out of curiosity. How is this not handled by the well understood chemical gradient mechanisms covered in introductory texts on this topic? Essentially cells orient themselves within multiple overlapping chemical gradients. Those gradients are constructed iteratively, exhibiting increasingly complex spatial behavior at each iteration.

[+] Jerrrrrry|1 year ago|reply

What are Cognitive Light Cones? (Michael Levin Interview)

https://www.youtube.com/watch?v=YnObwxJZpZc

[+] EMIRELADERO|1 year ago|reply

I've been thinking a lot about "intelligence" lately, and I feel like we're at a decisive point in figuring out (or at least greatly advance our understanding of) how it "works". It seems to me that intelligence is an emergent natural behavior, not much different than classical Newtonian mechanics or electricity. It all seems to boil down to simple rules in the end.

What if everything non-discrete about the brain is just "infrastructure"? Just supporting the fundamentally simple yet important core processes that do the actual work? What if it all boils down to logic gates and electrical signals, all the way down?

Interesting times ahead.

[+] ekez|1 year ago|reply

There’s something compelling about these, especially w.r.t. their ability to generalize. But what is the vision here? What might these be able to do in the future? Or even philosophically speaking, what do these teach us about the world? We know a 1D cellular automata is Turing equivalent, so, at least from one perspective, NCA/these aren’t terribly suprising.

[+] data-ottawa|1 year ago|reply

Potentially it would be useful if you could enter a grid from satelite images and simulate wildfire spread or pollution spread or similar problems.

[+] achille|1 year ago|reply

these are going to be the dominant lifeforms on earth exceeding bacteria, plants and humans in terms of energy consumption

cellular automata that interact with their environment, ones that interact with low level systems and high level institutions. to some approximation we, humans are just individual cells interacting in these networks. the future of intelligence aint llms, but systems of automata with metabolic aspects. automata that co-evolve, consume energy and produce value. ones that compete, ones that model each other.

we're not being replaced, we're just participants in a transformation where boundaries between technological and cellular systems blur and eventually dissolve. i'm very thankful to be here to witness it

see: https://x.com/zzznah/status/1803712504910020687

[+] emmelaich|1 year ago|reply

The self-healing properties suggest biological evolution to me.

[+] calebm|1 year ago|reply

I love playing around with cellular automata for doing art. It's amazing what kind of patterns can emerge (example: https://gods.art/math_videos/hex_func27l_21.html). I may have to try to play with these DLCA.

[+] j_bum|1 year ago|reply

Lovely! Thanks for sharing. Would these patterns keep generating indefinitely?

[+] SomeHacker44|1 year ago|reply

Reminds me of the old movie Andromeda Strain.

[+] mempko|1 year ago|reply

There are a lot of cool ideas here. Maybe a small observation but the computation is stateful. Each cell has a memory and perception of it's environment. Compare this to say your modern NN which are stateless. Has there been any work on statefull LLMs for instance?

[+] marmakoide|1 year ago|reply

Self-plug here, but very related => Robustness and the Halting Problem for Multicellular Artificial Ontogeny (2011)

Cellular automata where the update rule is a perceptron coupled with a isotropic diffusion. The weights of the neural network are optimized so that the cellular automata can draw a picture, with self-healing (ie. rebuild the picture when perturbed).

Back then, auto-differentiation was not as accessible as it is now, so the weights where optimized with an Evolution Strategy. Of course, using gradient descent is likely to be way better.

[+] spyder|1 year ago|reply

Hmm.. could this be used for the ARC-AGI challenge? Maybe even combine with this recent one: https://news.ycombinator.com/item?id=43259182

[+] eyvindn|1 year ago|reply

:)

[+] JFuzz|1 year ago|reply

This is wild. Long time lurker here, avid modeling and simulation user-I feel like there’s some serious potential here to help provide more insight into “emergent behavior” in complex agent behavior models. I’d love to see this applied to models like a predator/prey model, and other “simple” models that generate complex “emergent” outcomes but on massive scales… I’m definitely keeping tabs on this work!

[+] emmelaich|1 year ago|reply

The result checkerboard pattern is the opposite (the NOT) of the target pattern. But this is not remarked upon. Is it too unimportant to mention or did I miss something?

[+] eyvindn|1 year ago|reply

thanks for catching this, the figure for the target was inverted when exporting for publication, corrected now.

[+] itishappy|1 year ago|reply

They're learning features, not the exact image (that's why it's so good at self healing). It should be invariant to shifts.

[+] mikewarot|1 year ago|reply

If I understand the article correctly, this research shows that you can compress some 2d image into a circuit design, that if replicated exactly many times in a grid, it will spontaneously output the desired image.

I'm interested in a nearby, but dissimilar project, almost it's reciprocal, wherein you can generate a logic design that is NOT uniform, but where every cell is independent, to allow for general purpose computing. It seems we could take this work, and use it to evolve a design that could be put into an FPGA, and make far better utilization than existing programming methods allow, at the cost of huge amounts of compute to do the training.

[+] akomtu|1 year ago|reply

This is a groundbreaking shit, and it's not about checkerboards or lizards. The Navier-Stokes differential equation governing fluid motion is an update rule that predicts the next state of any given point given its neighborhood and a few previous states. The key insight is that all the complexity of the fluid and air motion, the formation of clouds and the motion of flames, is governed by a simple law, by an equation that's uniformly and simultaneously applied to each point in space and all it needs is the immediate neighborhood of that point and its immediate history. Discovering this equation by looking at real-life samples is what's called science. It should be possible in principle to apply this DLCA model to a video recording of smoke, constrained to a thin 2D layer for simplicity, and let this model derive the Navier-Stokes equation. When you take this one step further and consider that the update rule itself may be changing according to another update rule, we'll get into some interesting territory. This might be why in our brain neurons are connected to a neighborhood of thousand neurons instead of just ten or so.

Following the tradition, Google execs are going to dismiss this discovery as irrelevant to the ads business, and a couple years later when DLCA will have turned the world upside down, they'll try to take credit saying it's their employees who have made the discovery.

[+] Cladode|1 year ago|reply

Continuous relaxation of boolean algebra is an old idea with much literature. Circuit synthesis is a really well-researched field, with an annual conference and competition [1]. Google won the competition 2 years ago. I wonder if you have tried your learner against the IWLS competition data sets. That would calibrate the performance of your approach. If not, why not?

[1] https://www.iwls.org/iwls2025/

[+] srcreigh|1 year ago|reply

The Conway's game of life example isn't so impressive. The network isn't really reverse engineering rules, it's being trained on data that is equivalent to the rules. It's sort of like teaching + by giving it 400 data points triplets (a,b,c) with 1 <= a,b <= 20 and c = a + b.

[+] andrewflnr|1 year ago|reply

It wasn't meant to be much more than a sanity check, as I read it anyway.

[+] justinnk|1 year ago|reply

This is very interesting! I think an exciting direction would be to arrive at minimal circuits that are to some extent comprehensible by humans. Now, this might not be possible for every system, but certainly the rules of Conway‘s GoL can be expressed in less than 350 logic gates per cell?

This also reminds me of using Hopfield networks to store images. Seems like Hopfield networks are a special case of this where the activation function of each cell is a simple sum, but I’m not sure. Another difference is that Hopfield networks are fully connected, so the neighborhood is the entire world, i.e., they are local in time but not local in space. Maybe someone can clarify this further?

[+] juxtaposicion|1 year ago|reply

It’s interesting to see how differentiable logic/binary circuits can be made cheap at inference time.

But what about the theoretical expressiveness of logic circuits vs baselines like MLPs? (And then of course compared to CNNs and other kernels.) Are logic circuits roughly equivalent in terms of memory and compute being used? For my use case, I don’t care about making inference cheaper (eg the benefit logical circuits brings). But I do care about the recursion in space and time (the benefit from CAs). Would your experiments work if you still had a CA, but used dumb MLPs?

[+] scarmig|1 year ago|reply

Well, with all 16 logic gates available, they can express all Boolean circuits (you could get that even with NAND or NOR gates, of course, if you are working with arbitrary as opposed to fixed connectivity). And so you could have a 32 bit output vector which could be taken as a float (and you could create any circuit that computes any bitwise representation of a real).

As for efficiency, it would depend on the problem. If you're trying to learn XOR, a differentiable logic gate network can learn it with a single unit with 16 parameters (actually, 4, but the implementation here uses 16). If you're trying to learn a linear regression, a dumb MLP would very likely be more efficient.

[+] gwern|1 year ago|reply

A MLP must be compilable to some arrangement of logic gates, so you could always try a tack like initializing everything as randomly-wired/connected MLPs, and perhaps doing some pretraining, before compiling to the logic gate version and training the logic gates directly. Or take the MLP random initialization, and imitate its distributions as your logic gate distribution for initialization.

[+] max_|1 year ago|reply

So this does not need large training data sets like traditional models?

The lizard and the Game of life example seem to illustate that you only need one data points to create or "reverse" engineer a an algorithm that "generates" something Equal to the data point.

How is this different from using a neural network and then over fitting it?

Maybe that instead learning trained weights, the Cellular Automata learns a combination of logic (a circuit).

So the underlying, problems with over fitting an neural network (a model being un able to generalise) still hold for this "logic cellular automata"?

[+] alex_abt|1 year ago|reply

> magine trying to reverse-engineer the complex, often unexpected patterns and behaviors that emerge from simple rules. This challenge has inspired researchers and enthusiasts that work with cellular automata for decades.

Can someone shed some light on what makes this a problem worth investigating for decades, if at all?

[+] BriggyDwiggs42|1 year ago|reply

https://writings.stephenwolfram.com/2024/08/whats-really-goi...

One example is that stephen wolfram argues, I think compellingly, that machine learning “hitches on to” chaotic systems defined by simple rules and rides them for a certain number of steps in order to produce complex behaviors. If this is true, easily going in the reverse direction could give us lots of insight into ML.

[+] achille|1 year ago|reply

yes, think of it this way: why is it that bathing the Earth with 10^55 Boltzmann constants make it seemingly emit a Tesla?

can we construct a warm winter garment without having to manually pick open cotton poppies?

if we place energy in the right location, can we have slime mold do computation for us?

how do we organize matter and energy in order to watch a funny cat video?

[+] jderick|1 year ago|reply

Could this be used to train an LLM? It seems the hidden states could be used to learn how to store history.

[+] sim04ful|1 year ago|reply

This is a very interesting paper. Question though: it seems the cells gates since they're updated using a "global" gradient descent that it isn't truly parallel.

Is there any promise towards a strictly local weight adjustment method ?

90 comments