What's worrisome is that on the graphical abstract you have nucleophilic substitution in the neopentyl position, a reaction every chemist knows won't proceed rapidly. And it takes place in dimethyl ether solvent, which every chemist knows it's a gas.
It looks a bit what you see in bad teaching materials, chemistry that is almost correct, but won't work well for some reason we are not telling the kids about. Please alleviate my concerns! :-)
I am interested as to why you chose to focus on reaction prediction. As you acknowledge in the introduction, the acquistion of this skill is a routine part of graduate education in synthetic chemistry.
On the other hand, the key difficulty in synthetic chemistry, and the one that occupies the majority of a chemist's time is the identification of the correct reagent(s), the correct solvent, and the correct time, temperature, and concentration such that the desired reaction proceeds in a convenient amount of time and with the correct chemo- and regio-selectivity, that the reaction conditions are tolerated by the rest of the molecule, and that the product can be easily isolated from the reaction byproducts.
In my opinion, as long as these problems remain, then being able to turn retrosynthetic analysis over to a machine appears to me to provide little benefit.
I took three semesters of orgo in undergrad, and if it taught me anything, it's that there are exceptions to nearly every rule. There are so many complicated molecular orbital interactions, requiring years of study. And even then there are always things that break these rules in unexpected ways or produce several products.
How do you overcome this? Can you predict yield percentages of each product? What about chirality?
Can your system design synthesis pathways? Can it optimize for final product yield? How does it handle the thermodynamics and kinetics of reactions?
In any case, cool project. It's a very difficult domain.
First of all, interesting and fascinating work! It is nice to see some chemistry over here.
I'm a grad student in computational chemistry. I am fascinated by the idea that our imagination, or limits of our chemical intuition, is the limiting factor for all kinds of cool advances. Through that I have recently been studying machine learning and I am interested in using it for catalyst optimization and design.
What is your opinion on the state of computationally assisted inverse design of molecules and the role of machine learning in it? The problem is a bit more open-ended compared to reaction optimization, but I could imagine that after proper formulation of the design guidelines the computer could help a lot.
I'm just an old geezer programmer, but my daughter is studying e-tox at UC Davis. I have discussed with her how important computer code is becoming in the ability to recreate results in studies.
This is a good example of providing that kind of transparency.
"Reaction prediction remains one of the major challenges for organic chemistry and is a prerequisite for efficient synthetic planning. It is desirable to develop algorithms that, like humans, “learn” from being exposed to examples of the application of the rules of organic chemistry. We explore the use of neural networks for predicting reaction types, using a new reaction fingerprinting method. We combine this predictor with SMARTS transformations to build a system which, given a set of reagents and reactants, predicts the likely products. We test this method on problems from a popular organic chemistry textbook."
Very interesting, great concept! The paper is on
my to-read list!
I am only afraid that the datasets you have used might not be of sufficiently quality for a neural network application. There are old recipes when the state of art in chemistry was at an earlier stage e.g. before the discovery of specific mechanisms, molecule classes, analytics and general concepts. Also, as mentioned in this thread, there are aspects of the synthetic chemists work and experience that might not be taken into consideration in this approach.
Yes, the datasets were the real limiting factor for this project, and as you point out, reactions depend on many more things than their reagents. Hopefully this work will inspire someone to build a better dataset of reaction setups and outcomes!
I'm not a chemist, and didn't read the paper, but would it be helpful if the neural network had additional inputs coming from e.g. a (simplified) Schrodinger equation solver?
Orbital energies, or other solutions from the Schrodinger equation, would probably help the prediction if they were included as inputs. If you were to do this, you'd have to be a little careful about the cost of doing the quantum mechanics calculation on a whole data set of reactants in your reaction database, but it could be feasible with a cheap method.
Theoretically, yes, if we knew all of the physics laws accurately enough. However, there are a lot of behaviors that are difficult to predict in the aggregate. That's one of the reasons we end up with different specializations: quantum mechanics, atomic physics, molecular chemistry, organic chemistry, etc.
Another example would be protein folding. Even within the same "level" of chemistry, predicting the three-dimensional structure of a protein molecule based purely on the chemistry we understand and the protein sequence is a hard problem. We're getting better at it, but it's still hard.
duvenaud|9 years ago
This paper is just a first step - what we'd really like to use this for is designing recipes for synthesizing new molecules.
I would also be remiss if I didn't link to a closely-related paper from another group that came out at the same time: http://pubs.acs.org/doi/abs/10.1021/ci5006614
HarryHirsch|9 years ago
It looks a bit what you see in bad teaching materials, chemistry that is almost correct, but won't work well for some reason we are not telling the kids about. Please alleviate my concerns! :-)
subnaught|9 years ago
On the other hand, the key difficulty in synthetic chemistry, and the one that occupies the majority of a chemist's time is the identification of the correct reagent(s), the correct solvent, and the correct time, temperature, and concentration such that the desired reaction proceeds in a convenient amount of time and with the correct chemo- and regio-selectivity, that the reaction conditions are tolerated by the rest of the molecule, and that the product can be easily isolated from the reaction byproducts.
In my opinion, as long as these problems remain, then being able to turn retrosynthetic analysis over to a machine appears to me to provide little benefit.
echelon|9 years ago
How do you overcome this? Can you predict yield percentages of each product? What about chirality?
Can your system design synthesis pathways? Can it optimize for final product yield? How does it handle the thermodynamics and kinetics of reactions?
In any case, cool project. It's a very difficult domain.
peaceb|9 years ago
I'm a grad student in computational chemistry. I am fascinated by the idea that our imagination, or limits of our chemical intuition, is the limiting factor for all kinds of cool advances. Through that I have recently been studying machine learning and I am interested in using it for catalyst optimization and design.
What is your opinion on the state of computationally assisted inverse design of molecules and the role of machine learning in it? The problem is a bit more open-ended compared to reaction optimization, but I could imagine that after proper formulation of the design guidelines the computer could help a lot.
andrew-lucker|9 years ago
davidwihl|9 years ago
kenrick95|9 years ago
Roboprog|9 years ago
This is a good example of providing that kind of transparency.
iopuy|9 years ago
"Reaction prediction remains one of the major challenges for organic chemistry and is a prerequisite for efficient synthetic planning. It is desirable to develop algorithms that, like humans, “learn” from being exposed to examples of the application of the rules of organic chemistry. We explore the use of neural networks for predicting reaction types, using a new reaction fingerprinting method. We combine this predictor with SMARTS transformations to build a system which, given a set of reagents and reactants, predicts the likely products. We test this method on problems from a popular organic chemistry textbook."
wuschel|9 years ago
I am only afraid that the datasets you have used might not be of sufficiently quality for a neural network application. There are old recipes when the state of art in chemistry was at an earlier stage e.g. before the discovery of specific mechanisms, molecule classes, analytics and general concepts. Also, as mentioned in this thread, there are aspects of the synthetic chemists work and experience that might not be taken into consideration in this approach.
duvenaud|9 years ago
amelius|9 years ago
jnwei|9 years ago
jamez1|9 years ago
What actually makes it hard to predict a chemical reaction? Can't we empirically deduce them from quantum mechanics?
grzm|9 years ago
Another example would be protein folding. Even within the same "level" of chemistry, predicting the three-dimensional structure of a protein molecule based purely on the chemistry we understand and the protein sequence is a hard problem. We're getting better at it, but it's still hard.
Houshalter|9 years ago