top | item 45390170

(no title)

hashta | 5 months ago

One caveat that’s easy to miss: the "simple" model here didn’t just learn folding from raw experimental structures. Most of its training data comes from AlphaFold-style predictions. Millions of protein structures that were themselves generated by big MSA-based and highly engineered models.

It’s not like we can throw away all the inductive biases and MSA machinery, someone upstream still had to build and run those models to create the training corpus.

discuss

aDyslecticCrow|5 months ago

What i take away is the simplicity and scaling behavior. The ML field often sees an increase in module complexity to reach higher scores, and then a breakthrough where a simple model performs on-par with the most complex. That such a "simple" architecture works this well on its own, means we can potentially add back the complexity again to reach further. Can we add back MSA now? where will that take us?

My rough understanding of field is that a "rough" generative model makes a bunch of decent guesses, and more formal "verifiers" ensure they abide by the laws of physics and geometry. The AI reduce the unfathomably large search-space so the expensive simulation doesn't need to do so much wasted work on dead-ends. If the guessing network improves, then the whole process speeds up.

- I'm recalling the increasingly complex transfer functions in redcurrant networks,

- The deep pre-processing chains before skip forward layers.

- The complex normalization objectives before Relu.

- The convoluted multi-objective GAN networks before diffusion.

- The complex multi-pass models before full-convolution networks.

So basically, i'm very excited by this. Not because this itself is an optimal architecture, but precisely because it isn't!

nextos|5 months ago

> Can we add back MSA now?

Using MSAs might be a local optimum. ESM showed good performance on some protein problems without MSAs. MSAs offer a nice inductive bias and better average performance. However, the cost is doing poorly on proteins where MSAs are not accurate. These include B and T cell receptors, which are clinically very relevant.

Isomorphic Labs, Oxford, MRC, and others have started the OpenBind Consortium (https://openbind.uk) to generate large-scale structure and affinity data. I believe that once more data is available, MSAs will be less relevant as model inputs. They are "too linear".

godelski|5 months ago

Is this so unusual? Almost everything that is simple was once considered complex. That's the thing about emergence, you have to go through all the complexities first to find the generalized and simpler formulations. It should be obvious that things in nature run off of relatively simple rulesets, but it's like looking at a Game of Life and trying to reverse engineer those rules AND the starting parameters. Anyone telling you such a task is easy is full of themselves. But then again, who seriously believes that P=NP?

hashta|5 months ago

To people outside the field, the title/abstract can make it sound like folding is just inherently simple now, but this model wouldn’t exist without the large synthetic dataset produced by the more complex AF. The "simple" architecture is still using the complex model indirectly through distillation. We didn’t really extract new tricks to design a simpler model from scratch, we shifted the complexity from the model space into the data space (think GPT-5 => GPT-5-mini, there’s no GPT-5-mini without GPT-5)

slashdave|5 months ago

> It should be obvious that things in nature run off of relatively simple rulesets

Only if you are willing to call a billion years of evolutionary selection a "simple ruleset"

mapmeld|5 months ago

And AlphaFold was validated with experimental observation of folded proteins using X-rays

slashdave|5 months ago

Correct. For those that might not follow, the MSA is used to generalize from known PDB structures to new sequences. If you train on AlphaFold2 results, those results include that generalization, so that your model no longer needs that capability (you can rely on rote memorization). This simple conclusion seems to have escaped the authors.