Generative A.I. arrives in the gene editing world of CRISPR

[+] miraculixx|1 year ago|reply

Reading their blog post I wonder if an LLMs is really the best way to do this. If I got it right, they used the LLM to enumerate potential protein DNA sequences. Does that really need an LLM? Enumeration is not novel, nor are LLMs particularily good at it. If you want to computationally parallelize the search in a large enumeration space it would be much easier to simply, well, do that instead of taking a detour via a statistical parrot.

In a nutshell this sounds more like a case of "we wanted something with AI in the title".

[+] mxwsn|1 year ago|reply

It's not an English LLM, but a "protein" language model, where tokens represent amino acids or nucleotides. Learning a transformer language model on such data simply learns a distribution over sequences of tokens. It's a fine approach conceptually that in many ways is the "right" way or most elegant method, and not a stretch at all.

[+] dekhn|1 year ago|reply

I don't have a direct answer to your question. My guess is that LLMs are too limited to make truly great solutions in biology but sequential modelling is a key component that will not be replaced any time soon. For example, transformers were key to AlphaFold's success, but they still needed many other steps to make accurate predictions.

I worked on a predecessor to LLMs - HMMs for protein modelling. They were, and still are for most people the best way to model protein sequences. It's usually done as prediction, rather than generation (IE, you use the model to classify an unknown sequence into a known category, rather than asking the model to generate new instances of a category). HMMs for proteins are a bit stuffy, and they model local changes well, but struggle with long-range interactions that LLMs seem to excel at (for example, an HMM will do a good job of letting you stuff a few more residues into a protein in a localized region such as a hinge, but are not so great at modelling groups of residues that are located far-apart in sequence space but close in protein space).

One detail of the bitter lesson is, imho, that statistical parrots are better than they "should" be, probably for the same reason that mathematics is unexpectedly proficient in modelling physics: to some degree, the models recapitulate the true latent space of the underlying system well enough to generalize outside the original observation space.

[+] changoplatanero|1 year ago|reply

I think your intuition is off here. The number of sequences to enumerate is much greater than the number of atoms in the universe. You need a smart way to enumerate these and that's what the LLM is for. The statistical parrot is not a detour its a shortcut.

[+] meowkit|1 year ago|reply

The real power of LLMs is they can model anything as a “language” given the right sequence training data.

Warning: the following is my opinion.

In the same way that MLP “neurons” are universal approximators, it seems that LLMs are universal mappers.

They have the potential to help us organize and translate the immense quantity of data being generated by modern methods in all respective disciplines. We might create a model that translates english to protein synthesis, and vice versa, which would be pretty useful given my lay understanding of biochem.

To your point - this probably is NOT the best way to do this in an objective sense. But to my mind we are hitting upper limits as finite beings and need things like this, which utilize native language constructs, to move forward.

[+] bglazer|1 year ago|reply

First the search space is way too large for brute force enumeration. We’re talking like 10^300 combinations. Also the hard part isn’t just listing amino acid sequences, its finding ones that do what you want them to. The only way to figure that out is by testing them, which is difficult and expensive. So you need an algorithm that is good at only listing sequences that are likely to work. That’s precisely what LLM’s are good at: finding patterns and sequences that are correlated in a useful way

[+] swamp40|1 year ago|reply

Well hopefully it's trained on genetic DNA sequences and not Reddit threads. If so, it should do pretty well predicting the next sequence given previous sequences. There are probably all sorts of undiscovered patterns.

[+] luckman212|1 year ago|reply

To be fair, having AI in the title landed it on the front page of HN, so...

[+] Karellen|1 year ago|reply

Is this going to be as good as when AI arrived in the world of materials science?

https://www.404media.co/google-says-it-discovered-millions-o...

Or is it only just going to be as good at generating headlines?

[+] mxwsn|1 year ago|reply

Gift article (no paywall): https://www.nytimes.com/2024/04/22/technology/generative-ai-...

Preprint: https://www.biorxiv.org/content/10.1101/2024.04.22.590591v1

[+] neuronexmachina|1 year ago|reply

And their repo: https://github.com/Profluent-AI/OpenCRISPR

[+] byearthithatius|1 year ago|reply

Hey thanks:) That was nice of you

[+] JoeH2|1 year ago|reply

Very kind of you! Thank you!

[+] swamp40|1 year ago|reply

Imagine an AI learning from photos/videos of a person and their DNA sequence? And also a list of diseases, health records, etc. Then asking it for predictions while giving it feedback afterwards so it can tune itself.

You could even guarantee privacy. That would be some really useful data.

[+] paxys|1 year ago|reply

You mean exactly what 23andMe tried to do, and failed miserably at.

[+] vessenes|1 year ago|reply

I like this a lot; you could have a multimodal setup with a DNA transformer, an image transformer and an LLM. Extremely fundable startup.

[+] salynchnew|1 year ago|reply

The real endgame here isn't to just enumerate and then patent those sequences, right?

[+] VikingCoder|1 year ago|reply

Captain Trips

One of the four horsemen of the AI apocalypse.

[+] a-r-t|1 year ago|reply

So the six finger hands were just a foreshadowing?

[+] unknown|1 year ago|reply

[deleted]

[+] pointlessone|1 year ago|reply

What can possibly go wrong if we let ChatGPT edit our DNA?

[+] arcticfox|1 year ago|reply

This model has nothing to do with ChatGPT other than transformers. And as someone that could desperately use some advances in gene editing, this lowbrow dismissal is frustrating.

[+] Mindless2112|1 year ago|reply

Have you seen how generative AI thinks hands work? Just a few edits and reality can catch up.

[+] genghisjahn|1 year ago|reply

Have you seen DNA? Mistakes and dupes and hallucinations all over the place. Ever since Sherlock Crick and Doctor Watson started meddling with it.

[+] m3kw9|1 year ago|reply

And how does ChatGPT edit your dna

[+] unknown|1 year ago|reply

[deleted]

[+] spywaregorilla|1 year ago|reply

[deleted]

[+] throwup238|1 year ago|reply

[deleted]

[+] torotonnato|1 year ago|reply

[deleted]

[+] sharpshadow|1 year ago|reply

I still consider biological life as the best ‘robot’ because it can create more of itself.

As long as robots are incapable of recreation I don’t see the threat.

One could say all maschines today are infertile.

[+] jprete|1 year ago|reply

What about computer viruses?

[+] unknown|1 year ago|reply

[deleted]

[+] m3kw9|1 year ago|reply

[deleted]

[+] JangoSteve|1 year ago|reply

Do you mean for AI to do the entire job of researching, creating, testing, manufacturing, and distributing a cure, or just for AI to be involved? And do you mean completely eradicating a disease, or just producing a cure for it? And do you mean an outright cure, or also a treatment or vaccine? If the latter in all cases, here's an example:

https://www.technologyreview.com/2022/08/26/1058743/i-was-th...

If you mean the former in any case, it'll probably be a while, if ever in our lifetimes.

56 comments