> A generalist generative-AI system such as ChatGPT ... is simply data-hungry. To apply such a generative-AI system to chemistry, hundreds of thousands — or possibly even millions — of data points would be needed.
> A more chemistry-focused AI approach trains the system on the structures and properties of molecules. ... Such AI systems fed with 5,000–10,000 data points can already beat conventional computational approaches to answering chemical questions[4] . The problem is that, in many cases, even 5,000 data points is far more than are currently available.
The latter is the general idea behind Julia's SciML, to use the existing scientific knowledge base we have, to augment the training intelligently and reduce the hunger for data. The paper they link to uses one particular way of integrating that knowledge, but it's likely that Julia's way of doing things - ML in the same language as the scientific code and its types, and the composability from the type hierarchy and multiple dispatch - would make it much easier to explore many other ways of integrating data and scientific knowledge, and help figure out more fruitful ways. Maybe the current approach will hit a roadblock and the Julia ecosystem will catch up and show us new ways forward, or maybe we'll just brute force our way to more and more data and chalk this one up to the "bitter lesson" as well.
sundarurfriend|2 years ago
> A generalist generative-AI system such as ChatGPT ... is simply data-hungry. To apply such a generative-AI system to chemistry, hundreds of thousands — or possibly even millions — of data points would be needed.
> A more chemistry-focused AI approach trains the system on the structures and properties of molecules. ... Such AI systems fed with 5,000–10,000 data points can already beat conventional computational approaches to answering chemical questions[4] . The problem is that, in many cases, even 5,000 data points is far more than are currently available.
The latter is the general idea behind Julia's SciML, to use the existing scientific knowledge base we have, to augment the training intelligently and reduce the hunger for data. The paper they link to uses one particular way of integrating that knowledge, but it's likely that Julia's way of doing things - ML in the same language as the scientific code and its types, and the composability from the type hierarchy and multiple dispatch - would make it much easier to explore many other ways of integrating data and scientific knowledge, and help figure out more fruitful ways. Maybe the current approach will hit a roadblock and the Julia ecosystem will catch up and show us new ways forward, or maybe we'll just brute force our way to more and more data and chalk this one up to the "bitter lesson" as well.
cookieperson|2 years ago