top | item 30341016

(no title)

praccu | 4 years ago

Shameless self promotion: I wrote one of the more cited papers in the field [0], back in 2016.

A key challenge: very few labs have enough data.

Something I view as a key insight: a lot of labs are doing absurdly labor intensive exploratory synthesis without clear hypotheses guiding their work. One of our more useful tasks turned out to be interactively helping scientists refine their experiments before running them.

Another was helping scientists develop hypotheses for _why_ reactions were occuring, because they hadn't been able to build principled models that predicted which properties were predictive of reaction formation.

Going all the way to synthesis is nice, but there's a lot of lower hanging fruit involved in making scientists more effective.

[0] https://www.nature.com/articles/nature17439

discuss

entee|4 years ago

This is true. Getting datasets with the necessary quality and scale for molecular ML is hard and uncommon. Experimental design is also a huge value add, especially given the enormous search space (estimates suggest there are more possible drug-like structures than there are stars in the universe). The challenge is figuring out how to do computational work in a tight marriage with the lab work to support and rapidly explore the hypotheses generated by the computational predictions. Getting compute and lab to mesh productively is hard. Teams and projects have to be designed to do so from the start to derive maximum benefit.

Also shameless plug: I started a company to do just that, anchored to generating custom million-to-billion point datasets and using ML to interpret and design new experiments at scale.

probably_wrong|4 years ago

> A key challenge: very few labs have enough data.

It is also getting harder, not easier, to get.

I am working right now on a retro synthesis project. Our external data provider is raising prices while removing functionality, and no one bats an eye. At the same time our own data is considered a business secret and therefore impossible to share.

As someone who does NLP research where the code, data and papers are typically free, this drives me insane.

cinntaile|4 years ago

Are you using NLP to guide what molecules are probably worthwhile to try and synthesize?

czbond|4 years ago

Question: How are labs doing the exploratory work without a clear hypothesis? Are they essentially doing some version of brute force?

hashimotonomora|4 years ago

Experienced chemists can look at molecule diagrams and have an intuition as to its activity and similarity to other known molecules. It’s like most of science and math: most discoveries begin with intuition and are demonstrated rigorously afterwards. I believe Poincare said something to this end.

kortex|4 years ago

The brain is incredibly good at pattern matching while not necessarily being able to articulate why they came to that decision. Organic chemistry has these types of relations in spades. Say for example crystallization. You can kinda brute force it; there's only a few dozen realistic solvents to try, but that's a single solvent system. Then there's binary and ternary solvent systems. Then there's heat/cooling profiles, antisolvent addition, all kinds of things. Hundreds or thousands of possible experiments.

You might just decide that a compound "needs" isopropanol/acetone, plus a bit of water, cause something vaguely similar you encountered years ago crystallized well. You often start with some educated guesses and refine based on what you see.

But there's often no clear hypothesis, no single physical law the system obeys.

kilroy123|4 years ago

I'm trying to get a startup off the ground that tackles this.

Would love to chat more with you about this.

malux85|4 years ago

Me too, also tech nomad. I'll email you

formerly_proven|4 years ago

> Something I view as a key insight: a lot of labs are doing absurdly labor intensive exploratory synthesis without clear hypotheses guiding their work.

This lets you stumble over unknown unknowns. Taylor et al discovered high-speed steel by ignoring the common wisdom and doing a huge number of trials, arriving at a material and treatment protocol that improved over the then-state-of-the-art tool steels by an order of magnitude or more. The treatment mechanism was only understood 50-60 years later.