Deep learning can debug biology

[+] posterboy|9 years ago|reply

I guess this is deliberately written to be cryptic, as it's an advertisement. Terms are used before they are introduced, but maybe I'm not in the target group. What's called prediction would technically be inference. Hand wavy explanations of machine learning used for differentiation of chemical analysis data seem to be the sell here.

[+] saurabh20n|9 years ago|reply

Agree that this is not written with the rigor of a journal paper. Our intention was to communicate the simple wins we've had employing deep learning in comparison to the tools of the prior generation. XCMS is the most widely used library: http://www.bioconductor.org/packages/release/bioc/html/xcms..... It requires very painful parameter tuning. Internally, we had also built our own custom targeted analysis. In the targeted pipeline, we had to pre-specify "acetaminophen, shikimate, chorismate...". After building this deep learning workflow, we have exclusively switched over to it: no chemicals pre-specified, no parameter tuning. With about 185 engineered yeast that need analysis, with replicates, feeding conditions, and controls, these simplifications have been helpful.

We are getting easy wins over microbial data. Human data is noisier and we're testing over that now. More later.

If you have microbial data, or have used XCMS in the past and would like to compare, happy to chat. email me at saurabh@20n.

[+] karmel|9 years ago|reply

It might be the lack of detail in the piece, but it's unclear to me why this isn't a hammer to kill a fly-- that is, why wouldn't a much simpler peak-finding algorithm be appropriate here? What is the NN doing that's more than just peak finding over many molecules? Is there some interdependency that I am missing, or is this just signal-processing over millions of independent traces?

[+] dre85|9 years ago|reply

As far as I understand, NN is used here to find patterns which discriminate between sample cohorts (healthy vs disease). Peak finding gives you a list of peaks, but it doesn't tell you which of them discriminate between cohorts.

[+] dre85|9 years ago|reply

I find this very interesting. As a related topic, would it be possible to use deep learning to classify samples based on the quantities of pre-identified chemicals? If so, how would this work roughly? Does anybody have any ideas? Traditionally people use linear discriminant analysis, PCA, PLS, etc. I can't really wrap my head around the use of multiple neutral network layers for such problems.

[+] saurabh20n|9 years ago|reply

One possibility is as an extension of the untargeted analysis: run the analysis over different kinds of samples. The output for each sample is the list of major peaks (and intensities). Use this as the "image" to train a (shallow) network.

You might even get away without specifying pre-identified chemicals. Adding that list would only help.

[+] 100ideas|9 years ago|reply

Who's working on ChemStructure2Vec? Could the Word2Vec approach be used to predict novel structures with functions in-between desired sets of known chemicals?

[+] flipperkid|9 years ago|reply

From what I've seen, most use molecular fingerprints or cheminformatic descriptors such as RDKit provides. Google and Vijay Pande's group at Stanford had a recent publication on Molecular Graph Convolutions. I believe a lot of interesting research will come out in the area of molecular feature's over the next few years.

http://research.google.com/pubs/pub45548.html

[+] cing|9 years ago|reply

People are working on such things! https://arxiv.org/abs/1610.02415

[+] unknown|9 years ago|reply

[deleted]

11 comments