(no title)
epups | 1 year ago
I think his central point is fair and interesting. The test train split is apparently legit, as they used structures released before 2021 for training and the rest for testing. However, there was no real check for duplicates, and the success rate might be inflated by a bunch of "me too", low hanging fruit structures that are very slight variations from what we know.
However, I'm not sure I agree with his skepticism. LLMs suffer from the exact same problems - getting it to write a Snake game in any language is trivial, but it is almost certainly regurgitating - , but can be useful as well. I mean, if for various reasons people are publishing very similar structures out there, there's certainly value in speeding up or reducing that work considerably.
dekhn|1 year ago
AF3 stands as one of the greatest achievments in machine learning/structural biology we've yet seen.
They do remove duplicates by sequence similarity (filtered PDB).
Please assume the DM folks really do know what they are doing.