Any potential for something like this to be added to gpt-4's training data set. Being far from an expert, I assume the figures and any non-text data could pose a problem, but it also seems like providing it with a huge high-quality source of scientific reasoning could lead to some pretty amazing and revolutionary results. gpt inspiring research directions or even coauthoring. Anyone who knows more about this have any thoughts? Have I just drunk the gpt cool aid or is their some potential for it to revolutionize science quite soon?
My money is on the next iteration of gpt being multi-modal (See image-gpt). So this kind of thing would fit right in. I don't think this would lead to the kind of scientific revolution you are thinking of, given the tendency of gpt to confabulate things instead of basing itself on facts. That may be entering cool-aid territory :)
Could be interesting, and maybe useful, but the distinctive thing about scientific papers is that they tend to be reports on things that happened in the real world. Might be useful on something self referential like math or philosophy though.
The value of content tagging (e.g., PhysML) and keyword tagging (e.g., ScienceWise) is apparent in aggregate, like for searching. That benefit is to consumers while the burden is currently on the content creators.
I don't know of any incentives within academia or grant processes that would motivate content authors to tag their content. With the exception of creators of the tagging systems. That means bulk analysis (whether using a grammar or natural language machine learning method) is key.
Citation graphs (which is mostly what's been done previously[0]) pale in comparison to text analysis. The possibility of enabling complex searches would be a big leap forward in science.
There have been efforts to tag keywords in arXiv [0] and to identify sections of articles [1]. A conference on Mathematical Knowledge Management [2] was held last week; some participants have already been analyzing ArXiv.
Hopefully integration with Kaggle expands the number of teams taking advantage of the knowledge in the corpus.
This is really cool. I worked on the AI Index Report and had to bug karpathy to get his copy of arxiv papers to do analysis. (He's been slowly collecting papers for arxiv sanity for years). Getting them all from the arxiv API would have taken months.
This will enable tons of useful stats gathering about the fields represented on arxiv. Hopefully it will also lead to new scientific insights as well!
Very cool, I really look forward to seeing science move forward along these paths. All the same - have to post the obligatory (and recent) xkcd rebuttal- https://xkcd.com/2341/
Wow, that's the perfect XKCD for my work right now. I'm integrating ML into my very traditional field of science and it seems like the most fitting place is on the "boring" problems.
[+] [-] techbio|5 years ago|reply
I took a look at this potential a couple of years ago but on PubMed:
https://techbio.org/b-tracing-psych-signals-lit.php
[+] [-] gabcoh|5 years ago|reply
[+] [-] phreeza|5 years ago|reply
[+] [-] woah|5 years ago|reply
[+] [-] jessriedel|5 years ago|reply
[+] [-] mellosouls|5 years ago|reply
[+] [-] ajflores1604|5 years ago|reply
[+] [-] physicsgraph|5 years ago|reply
I don't know of any incentives within academia or grant processes that would motivate content authors to tag their content. With the exception of creators of the tagging systems. That means bulk analysis (whether using a grammar or natural language machine learning method) is key.
Citation graphs (which is mostly what's been done previously[0]) pale in comparison to text analysis. The possibility of enabling complex searches would be a big leap forward in science.
[0] https://physicsderivationgraph.blogspot.com/2020/05/literatu...
[+] [-] physicsgraph|5 years ago|reply
Hopefully integration with Kaggle expands the number of teams taking advantage of the knowledge in the corpus.
[0] http://sciencewise.info/ [1] https://github.com/OMdoc/OMDoc/wiki/PhysML [2] https://cicm-conference.org/2020/cicm.php
[+] [-] iandanforth|5 years ago|reply
This will enable tons of useful stats gathering about the fields represented on arxiv. Hopefully it will also lead to new scientific insights as well!
[+] [-] newman8r|5 years ago|reply
https://arxiv.org/help/bulk_data
[+] [-] etaioinshrdlu|5 years ago|reply
[+] [-] vansul|5 years ago|reply
[+] [-] bryanrasmussen|5 years ago|reply
on edit: obviously it is also sounds quite a bit like you might need a research team and 5 years to do it right https://xkcd.com/1425/
[+] [-] canjobear|5 years ago|reply
[+] [-] djaque|5 years ago|reply