Predictive coding has been unified with backpropagation

[+] cs702|5 years ago|reply

EDIT: Before you read my comment below, please see https://news.ycombinator.com/item?id=26702815 and https://openreview.net/forum?id=PdauS7wZBfC for a different view.

--

If the results hold, they seem significant enough to me that I'd go as far as saying the authors of the paper would end up getting an important award at some point, not just for unifying the fields of biological and artificial intelligence, but also for making it trivial to train models in a fully distributed manner, with all learning done locally -- if the results hold.

Here's the paper: "Predictive Coding Approximates Backprop along Arbitrary Computation Graphs"

https://arxiv.org/abs/2006.04182

I'm making my way through it right now.

[+] YeGoblynQueenne|5 years ago|reply

Note that the paper was rejected for publication in ICLR 2021:

https://openreview.net/forum?id=PdauS7wZBfC

[+] babel_|5 years ago|reply

Interesting follow up reading:

"Relaxing the Constraints on Predictive Coding Models" (https://arxiv.org/abs/2010.01047), from the same authors. Looks at ways to remove neurological implausibility from PCM and achieve comparable results. Sadly they only do MNIST in this one, and are not as ambitious in testing on multiple architectures and problems/datasets, but the results are still very interesting and it covers some of the important theoretical and biological concerns.

"Predictive Coding Can Do Exact Backpropagation on Convolutional and Recurrent Neural Networks" (https://arxiv.org/abs/2103.03725), from different authors. Uses an alternative formulation that means it always converges to the backprop result within a fixed number of iterations, rather than approximately converges "in practice" within 100-200 iterations. Not only is this a stronger guarantee, it means they achieve inference speeds within spitting distance of backprop, levelling the playing field. (Edit: also noted by eutropia)

It'd be interesting to see what a combination of these two could do, and at this point I feel like a logical next step would be to provide some setting in popular ML libraries such that backprop can be switched for PCM. Being able to verify this research just be adding a single extra line for the PCM version, and perhaps replicating state-of-the-art architectures, would be quite valuable.

[+] eutropia|5 years ago|reply

Here's a more recent paper (March, 2021) which cites the above paper: https://arxiv.org/abs/2103.04689 "Predictive Coding Can Do Exact Backpropagation on Any Neural Network"

[+] abraxas|5 years ago|reply

I’m going to personally flog any researcher who titles their next paper “Predictive Coding Is All You Need”. You’ve been warned.

[+] andyxor|5 years ago|reply

the thing is about every week there is a paper published with groundbreaking claims, with this question in particular being very popular, trying to unify neuroscience and deep learning in some way, in search for computational foundations of AI. Mostly this is driven by success of DL in certain industrial applications.

Unfortunately most of these papers are heavy on theory but light on empirical evidence. If we follow the path of natural sciences, theory has to agree with evidence. Otherwise it's just another theory unconstrained by reality, or worse, pseudo-science.

[+] nl|5 years ago|reply

I don’t think anyone familiar with the field is in anyway surprised by this results.

The breakthrough seems really limited to showing it holds for graphs. We already knew this was practically true though anyway.

[+] JackFr|5 years ago|reply

My background is as an interested amateur, but

> also for making it trivial to train models in a fully distributed manner, with all learning done locally

seems like a really huge development.

At the same time I remain pretty skeptical of claims of unifying the fields of biological and artificial intelligence. I think the recent tremendous successes in AI & ML lead to an unjustified over confidence that we are close to understanding the way biological systems must work.

[+] klmadfejno|5 years ago|reply

I'm trying to imagine how that works. Imagine you've got a nueral net. One node identifies the number of feet. One node identifies that number of wings. One node identifies color. This feeds into a layer that tries to predict what animal it is.

With backprop, you can sort of assume that given enough scale your algo will identify these important features. With local learning, wouldn't you get a tendency to identify the easily identifiable features many times? Is there a need for a sort of middleman like a one arm bandit kind of thing that makes a decision to spawn and despawn child nodes to explore the space more?

[+] nmca|5 years ago|reply

Interesting discussion on the ICLR openreview, resulting in a reject:

https://openreview.net/forum?id=PdauS7wZBfC

[+] marmaduke|5 years ago|reply

The review is great, it contains all the interesting points and counterpoints, in a much more succinct format than the article itself.

[+] justicezyx|5 years ago|reply

Another well received paper [1], but I want to point out that ICLR should really have an industry track.

The type of research in [1] (exhaustive analytic study on various parameters on RL training), is clearly beyond typical academia environment, probably also beyond normal industry labs. Note the paper was from Google Brain.

The study consumes a lot of people's time, and computing time. It's no doubt very useful and valuable. But I dont think they should be judged by the same group of reviewers with the other work from normal universities.

[1] https://openreview.net/forum?id=nIAxjsniDzg

[+] justicezyx|5 years ago|reply

Copied from this URL, the final review comments that 1) summarized the other reviews, 2) describes the rational for rejection:

``` This paper extends recent work (Whittington & Bogacz, 2017, Neural computation, 29(5), 1229-1262) by showing that predictive coding (Rao & Ballard, 1999, Nature neuroscience 2(1), 79-87) as an implementation of backpropagation can be extended to arbitrary network structures. Specifically, the original paper by Whittington & Bogacz (2017) demonstrated that for MLPs, predictive coding converges to backpropagation using local learning rules. These results were important/interesting as predictive coding has been shown to match a number of experimental results in neuroscience and locality is an important feature of biologically plausible learning algorithms.

The reviews were mixed. Three out of four reviews were above threshold for acceptance, but two of those were just above. Meanwhile, the fourth review gave a score of clear reject. There was general agreement that the paper was interesting and technically valid. But, the central criticisms of the paper were:

Lack of biological plausibility The reviewers pointed to a few biologically implausible components to this work. For example, the algorithm uses local learning rules in the same sense that backpropagation does, i.e., if we assume that there exist feedback pathways with symmetric weights to feedforward pathways then the algorithm is local. Similarly, it is assumed that there paired error neurons, which is biologically questionable.

Speed of convergence The reviewers noted that this model requires many more iterations to converge on the correct errors, and questioned the utility of a model that involves this much additional computational overhead.

The authors included some new text regarding biological plausibility and speed of convergence. They also included some new results to address some of the other concerns. However, there is still a core concern about the importance of this work relative to the original Whittington & Bogacz (2017) paper. It is nice to see those original results extended to arbitrary graphs, but is that enough of a major contribution for acceptance at ICLR? Given that there are still major issues related to (1) in the model, it is not clear that this extension to arbitrary graphs is a major contribution for neuroscience. And, given the issues related to (2) above, it is not clear that this contribution is important for ML. Altogether, given these considerations, and the high bar for acceptance at ICLR, a "reject" decision was recommended. However, the AC notes that this was a borderline case. ```

The core reason is that the proposed model lacks biological plausibility. Or, if ignoring this weakness, the model is then computationally more intensive.

I HAVE NOT read the paper, but the review seems mostly based "feeling"; i.e., the reviewers feel that this work is not above the bar. Note that I am not criticizing the reviewers here, in my past review career of maybe in the range of 100+ papers, which I did until 6 years ago, most of them are junks. For the ones that are truly good work, which checks all the boxes: new result, hard problem, solid validation, it was easy to accept.

For yet a few other papers, which all seem to fall into the feeling category, everything looks right, but it was always on a borderline. And the review results can vary substantially based on the reviewers' own backgrounds.

[+] blueyes|5 years ago|reply

I'm glad people are talking about this, and the similarity between predictive coding and the action of biological neurons is interesting. But we shouldn't fetishize predictive coding. There's a wider discussion going on, and several theories as to how back propagation might work in the brain.

https://www.cell.com/trends/cognitive-sciences/fulltext/S136...

https://www.nature.com/articles/s41583-020-0277-3

[+] andyxor|5 years ago|reply

there is no evidence of back-propagation in the brain.

See Professor Edmund T. Rolls books on biologically plausible neural networks:

"Brain Computations: What and How" (2020) https://www.amazon.com/gp/product/0198871104

"Cerebral Cortex: Principles of Operation" (2018) https://www.oxcns.org/b12text.html

"Neural Networks and Brain Function" (1997) https://www.oxcns.org/b3_text.html

[+] AbrahamParangi|5 years ago|reply

This is an overly strong claim for the paper (which is good!) backing it.

If anyone is interested in the reader's digest version of the original paper check out https://www.youtube.com/watch?v=LB4B5FYvtdI

[+] visarga|5 years ago|reply

nice video

[+] klmadfejno|5 years ago|reply

> Predictive coding is the idea that BNNs generate a mental model of their environment and then transmit only the information that deviates from this model. Predictive coding considers error and surprise to be the same thing. Hebbian theory is specific mathematical formulation of predictive coding.

This is an excellent, concise explanation. It sounds intuitive as something that could work. Would love to try and dabble with this. Any resources?

[+] hctaw|5 years ago|reply

I don't know enough about biology or ML to know if what I'm posting below is totally wrong, but here goes.

"Backprop" == "Feedback" of a non-linear dynamical system. Feedback is mathematical description of the behavior of systems, not a literal one.

I don't know of BNNs are incapable of backprop anymore than an RLC filter is incapable of "feedback" when analyzing the ODE of the latter tells you that there's a feedback path (which is what, physically? The return path for charge?)

So what makes BNN incapable of feedback? Are they mechanically and electrically insulated from eachother? How do they share information, and what is the return path?

Other than that I wish more unification was done on ML algorithms and dynamical systems, just in general. There's too much crossover to ignore.

[+] khawkins|5 years ago|reply

> Other than that I wish more unification was done on ML algorithms and dynamical systems, just in general. There's too much crossover to ignore.

Check out this work, "Deep relaxation: partial differential equations for optimizing deep neural networks" by Pratik Chaudhari, Adam Oberman, Stanley Osher, Stefano Soatto & Guillaume Carlier.

https://link.springer.com/article/10.1007/s40687-018-0148-y

[+] andyxor|5 years ago|reply

The back-prop learning algorithm requires information non-local to the synapse to be propagated from output of the network backwards to affect neurons deep in the network.

There is simply no evidence for this global feedback loop, or global error correction, or delta rule training in neurophysiological data collected in the last 80 years of intensive research. [1]

As for "why", biological learning it is primarily shaped by evolution driven by energy expenditures constraints and survival of the most efficient adaptation engines. One can speculate that iterative optimization akin to the one run by GPUs in ANNs is way too energy inefficient to be sustainable in a living organism.

Good discussion on biological constraints of learning (from CompSci perspective) can be found in Leslie Valiant book [2]. Prof. Valiant is the author of PAC [3] one of the few theoretically sound models of modern ML, so he's worth listening to.

[1] https://news.ycombinator.com/item?id=26700536

[2] https://www.amazon.com/Circuits-Mind-Leslie-G-Valiant/dp/019...

[3] https://en.wikipedia.org/wiki/Probably_approximately_correct...

[+] nerdponx|5 years ago|reply

The article says this:

> The backpropagation algorithm requires information to flow forward and backward along the network. But biological neurons are one-directional. An action potential goes from the cell body down the axon to the axon terminals to another cell's dendrites. An axon potential never travels backward from a cell's terminals to its body.

The point of the research here is that backpropagation turns out not to be necessary to fit a neural network, and that it can be approximated with predictive coding, which does not require end-to-end backwards information flow.

[+] jdonaldson|5 years ago|reply

Yeah, I don't like this title. Coding for backprop is worth getting excited about, but please don't assume it supersedes all forms of "predictive coding". Plenty of predictive learning techniques do just fine without it, including our own brains.

In keeping with the No-Free-Lunch theorem, it's also highly desirable in general to have a variety of approaches at hand for solving certain predictive coding problems. Yes, this makes ML (as a field) cumbersome, but it also prevents us from painting ourselves into a corner.

[+] nerdponx|5 years ago|reply

Is this "coding for backprop", or "coding for the same results as backprop"?

[+] adamnemecek|5 years ago|reply

I think that this sort of forward backward thing is a very general idea. There’s a one to many relationship called the adjoint, and a many to one relationship called the norm.

I wrote something about this here https://github.com/adamnemecek/adjoint

[+] tsmithe|5 years ago|reply

In fact, the compositional structure underlying that of predictive coding [0,1] is abstractly the same as that underlying backprop [2]. (Disclaimer: [0,1] are my own papers; I'm working on a more precise and extensive version of [1] right now!)

[0] https://arxiv.org/abs/2006.01631 [1] https://arxiv.org/abs/2101.10483 [2] https://arxiv.org/abs/1711.10455

[+] selimthegrim|5 years ago|reply

What were you going to say about Young tableaux?

[+] xzvf|5 years ago|reply

At scale, Evolutionary Strategies (ES) are a very good approximation of the gradient as well. Don’t recommend to jump just yet to conclusions and unifications.

[+] jnwatson|5 years ago|reply

The author's point is that predictive coding is a plausible mechanism by which biological neurons work. ES are not.

ANNs have deviated widely from their biological inspiration, most notably in the way that information flows, since backpropagation requires two way flow and biological axons are one-directional.

If predictive coding and backpropagation are shown to have similar power, then there's a rough idea that the way that ANNs work isn't too far from how brains work (with lots and lots of caveats).

[+] Animats|5 years ago|reply

Is this approach "more local", in the sense that you could build hardware where local units got work done with less communication? That would have potential. It's feasible to build ICs with a few million simple compute units if they don't have to talk to each other or to memory much. GPUs are a few hundred or a few thousand parallel units that talk to memory a lot.

[+] ilaksh|5 years ago|reply

Does anyone know of a simple code example that demonstrates the original predictive coding concept from 1999? Ideally applied to some type of simple image/video problem.

I thought I saw a Matlab explanation of that 99 paper but have not found it again.

[+] lukeplato|5 years ago|reply

The paper was posted on HN a few months ago: https://news.ycombinator.com/item?id=24693609

[+] 0lmer|5 years ago|reply

But does predictive coding perceived as a valid theory for cortical neurons functioning? There was a paper from 2017 drawing similar conclusions about backprop approximation with Spike-Timing-Dependent Plasticity: https://arxiv.org/abs/1711.04214 Looks more grounded to current models of neuronal functioning. Nevertheless, it changed nothing in the field of deep learning since then.

[+] jwmullally|5 years ago|reply

Some general background on STDP for the thread:

Biological neurons don't just emit constant 0...1 float values, they communicate using time sensitive bursts of voltage known as "spike trains". Spiking Neural Networks (SNN) are a closer aproximation of natural networks than typical ML ANNs. [0] gives a quick overview.

Spike-Timing-Dependant-Plasticity is a local learning rule experimentally observed in biological neurons. It's a form of Hebbian learning, aka "Neurons that fire together wire together."

Summary from [1]. The top graph gives a clear picture of how the rule works.

> With STDP, repeated presynaptic spike arrival a few milliseconds before postsynaptic action potentials leads in many synapse types to Long-Term Potentiation (LTP) of the synapses, whereas repeated spike arrival after postsynaptic spikes leads to Long-Term Depression (LTD) of the same synapse.

---

[0]: https://towardsdatascience.com/deep-learning-versus-biologic...

[1]: http://www.scholarpedia.org/article/Spike-timing_dependent_p...

[+] andyxor|5 years ago|reply

as long as the model requires delta rule, or 'teacher signal' based error correction it is not biologically plausible.

[+] phreeza|5 years ago|reply

This was already shown for MLPs some years ago, and it is not really that surprising that it applies to many other architectures. Note that while learning can take place locally, it does still require an upward and downward stream of information flow, which is not supported by the neuroanatomy in all cases. So while it is an interesting avenue of research, I don't think it's anywhere near as revolutionary as this blog post makes it out to be.

[+] fouric|5 years ago|reply

> Predictive coding is the idea that BNNs generate a mental model of their environment and then transmit only the information that deviates from this model. Predictive coding considers error and surprise to be the same thing.

This reminds me of a Slate Star Codex article on Friston[1].

[1] https://slatestarcodex.com/2018/03/04/god-help-us-lets-try-t...

[+] laplacesdemon48|5 years ago|reply

1) How similar is this to creating a resnet? [1]. What are some key differences and similarities?

2) Has a CNN version of this been implemented in PyTorch?

[1] https://arxiv.org/pdf/1512.03385.pdf (Figure 2)

85 comments