(no title)
pleonasticity | 1 year ago
My biggest problem with this application of AI is trying to approximate DFT, which itself is an unreliable approximation. The claim is it lets you amortize the expensive DFT to search the space, but it’s also true that especially for inorganic materials, training sets do not appear to promote strong generalization. So you embark on an expensive task to wind up back with unreliable DFT. I think perhaps the best goal would be to try to make DFT itself better, and I have seen impressive albeit computationally expensive approaches, e.g. FermiNet by DeepMind.
montecarl|1 year ago
pleonasticity|1 year ago
denhaus|1 year ago
However, the alternative of using, for example - experimental data, is that the synthesis procedures, measurement parameters, sample impurities, and even differences between experimental apparatus means training datasets of even modest size are insanely heterogeneous. So models either are either trained to predict differences between materials due to experimental discrepancies, trained on very small datasets, or must have a slew of post-hoc physics-based adjustments added to get reasonable numbers.
Higher order computational methods (including simply more intensive, non-high throughput DFT) are accurate but expensive as you know. Some of them have systematic error in the way DFT does, and are essentially based on user choice of (many!) parameters. Charged defect calculations are on example of this. Finding large (>10^4) training sets with similar parameters for computation is difficult. “ML” for these kinds of calculations usually consists of like, calculating a hundred (or 10) crystals within a narrow chemical system, doing a linear regression on one variable (eg, valence of cation on some site), and getting numbers +\- 10% of a “true” number.
GGA/meta-GGA DFT, on the other hand, can be applied at a sufficient fidelity to get real(ish) numbers in a homogenous way across huge numbers of crystals. So you are correct, you are predicting an approximate number for a property in many cases. But if we know the approximate number is wrong due to systematic error (and we can, in some situations) we can apply corrections or higher order methods to get the right(ish) answer. More, it’s highly dependent on which property you’re interested in. Some properties, like band gap, can be off by a lot. Others, like formation energy, can be calculated pretty accurately even with run-of-the-mill GGA DFT. Elastic moduli are generally ok.
in summary, approximating DFT with ML is just the least messy way to get real-ish answers across a large number of materials. Of course, there’s a point at which low-fidelity DFT calculations are - (1) so cheap and (2) so inaccurate, generally - that having an ML model approximate them is pointless. Most large DBs of materials now use good enough DFT that the numbers they calculate are not pointless for ML to learn from.
In the future, I think models trained on large numbers of DFT calculations will have to be applied to narrow sets of higher fidelity calculations by tuning. Much like you can fine tune a generalized LLM to do specific things. That might be where ML can actually bring real value to materials design.
Also, it’s worth considering that synthesizing novel materials can be insanely difficult. So 1 in 4 is not bad in my opinion.