I can see this being useful iif the content is generated on demand and then discarded.
Publishing AI generated material is generally speaking a horrible idea and does nobody any good (at least until accuracy levels get much much better.)
Even if they do it well and truthfully (which they don't) current LLMs can only summarize, digest, and restate. There is no non-transient value add. LLMs may have a place to help query, but there is no reason to publish LLM regurgitations alongside the ground truth used to generate them.
I think bootstrapping documentation with LLM output is a great practice. It's a wiki, people can update it from a baseline, just as long as they can see what was LLM generated to know that it shouldn't be taken as absolute truth.
The hardest part of good documentation is getting started. Once there are docs in place it's usually much easier to revise and correct than it would have been to write correctly by hand the first time. Think of it like automating a rough draft.
current LLMs can only summarize, digest, and restate. There is no non-transient value add.
Though, at a stretch, Wikipedia itself could be considered based around summarization, digesting, and restating/citing things said elsewhere, given its policy of verifiability: "Even if you are sure something is true, it must have been previously published in a reliable source before you can add it." Now, LLMs aren't well known for their citation skills, to be fair.. :-)
excellent virtue signalling here -- however, commercial publishers, competitive attorneys, advertising sales and others are literally falling over themselves in an avalanche of doing exactly this, that you advise against (politely).
This moment reminds me very much of the late 1990s when it was common knowledge that "claim jumping a domain name" is very rude and not advisable, or the common knowledge among intellectuals that "ads will ruin the Internet" .. yes, polite people did not make companies to claim jump domain name registration, or push annoying and repetitive ads on the Internet..
I would love for something like this to be attached to LibGen where it reads the millions of scientific papers. As in my opinion human knowledge today is more than what a group of people can handle let alone individuals. Their is lot of domain specific knowledge that would translate and be used in other domains but unless by chance a human with speciality in both domains sees it will not get ported or assimilated in the 2nd domain.
This is categorically untrue. Publishing material generated like this is going to be generally better than human generated content. It takes less time, can be systematically tested and rigorous, and you can specifically avoid the pitfalls of bias and prejudice.
A system like this is multilayered, with prompts going through the whole problem solving process, considering the information presented, assuring quality and factuality, assigning the necessary citations and documentation for claims.
Accuracy isn't a problem. The way in which AI is used creates the problem - ChatGPT and most chat based models are single pass, query/response type interactions with models. Sometimes you get a second pass with a moderation system, doing a review to ensure offensive or illegal things get filtered out. Without any additional testing and prompt engineering, you're going to run into hallucinations, inefficient formulations, random "technically correct but not very useful" generations, and so forth. Raw ChatGPT content shouldn't be published without significant editing and going through the same quality review process any human written text should go through.
What Storm accomplishes is an algorithmic and methodical series of problem solving steps, each of which can be tested and verified and validated. This is synthesized in a particular way, intended as a factual reference article. Presumably you could insert debiasing and checks for narrative or political statements, ensuring attribution and citation occur for quotations, and rephrasing anything generated by the AI as a neutral, academic statement of fact with no stylistic and artistic features.
This is significantly different from the almost superficial interactions you get with chatbots, unless you specifically engineer your prompts and cycle through similar problem solving methods.
Tasks like this are well within the value add domain of current AI capabilities.
Compared to the absolute trash of SEO optimized blog posts, the agenda driven, ulterior laden rants and rambles in social media, and the "I'm oh-so-cleverly influencing the narrative" articles posted to Wikipedia by humans, content like this is a clear winner in quality, in my opinion.
AI isn't at the point where it's going to spit out well grounded novel answers to things like "what's the cure for cancer?" but it can absolutely produce a principled and legible explanation of a phenomenon or collection of facts about a thing.
> current LLMs can only summarize, digest, and restate. There is no non-transient value add.
No, you're wrong. LLMs create new experiences after deployment, either by assisting humans, or by solving tasks they can validate, such as code or game play. In fact any deployed LLM gets to be embedded in a larger system - a chat room, a code running environment, a game, a simulation, a robot or inside a company - it can learn from iterative tasks because each following iteration carries some kind of real world feedback.
Besides that, LLMs trivially learn new concepts and even new skills with a short explanation or demonstration, they can be pulled out of their training distribution and collect experiences doing new things. If OpenAI has 100M users and they consume 10K tokens/user/month, that makes for 1 trillion tokens of human-AI interaction rich with new experiences and feedback.
In the text modality LLMs have consumed most of the high quality human text, that is why all SOTA models are roughly on par, they trained on the same data. That means easy time is over, AI has caught up with all human language data. But from now on AI models need to create experiences of their own, because learning from your own mistakes is much faster. The more they get used, the more feedback and new information they collect. The environment is the teacher, not everything is written in books.
And all that text - the trillions of tokens they are going to speak to us - in turn contributes to scientific discoveries and progress, and percolate back into the next training set. LLMs have massive impact at language level on people so by extension on the physical world and culture. They have already influenced language and the arts.
LLMs can create new experiences, learn new skills, and have a significant impact through widespread deployment and interaction. There is "value add" if you look at the grand picture.
I looked into this to see where it was getting new information, and as far as I can tell, it is searching wikipedia exclusively. Useful for sure, but not exactly what I was expecting based on the title.
There are wikipedias in other languages - Maybe this framework could be adapted to translate the search terms, fetch mulitlingual sources, translate them back, and use those as comparisons.
I've found a lot of stuff out through similar by-hand techniques that would be difficult to discover on english search. I'd be curious to see how much differential there is between accounts across language barriers.
As a base for researching the idea, Wikipedia seems like a decent data source.
For broader implementation you would want to develop the approach further. The idea of sampling other-language Wikipedia mentioned in a sibling comment seems to be a decent next step.
Extending it to bringing in from wider sources would be another step. I doubt it would be infallible but it would be really interesting to see how it compares to humans performing the same task. Especially if there were a additional ability to verify written articles and make corrections.
As long as the LLM Moderator deems it safe discourse let the best idea win! I'd love a debate between 2 highly-accurate and context-aware LLMs - if such a thing existed.
Otherwise it would be like reading HN or Reddit debates where 2 egomaniacs who are both wrong continually straw man each other with statements peppered with lies and parroted disinfo, aint got time for that.
> While the system cannot produce publication-ready articles that often require a significant number of edits, experienced Wikipedia editors have found it helpful in their pre-writing stage.
So it can't produce articles that require many edits? Meaning it can produce publication-ready articles that don't need lots of edits? Or it can't produce publication-ready articles, and the articles produced require lots of edits? I can't make sense of this statement.
An AI assistant app that mixes AI features with traditional personal productivity. The AI can work in the background to answer multiple chats, handle tasks, and stream/feed entries.
I don’t know how well this works (demo is broken on mobile), but I like the idea.
Imagine an infinite wiki where articles are generated on the fly (from reputable sources - with links), including links to other articles (which are also generated) etc.
I actually like this sort of interface more than chat.
From my experiments, this thing is pretty bad. It mixes up things that have similar names, it pulls in entirely unrelated concepts, the articles it generates are mind-numbingly repetitive and verbose (although notably with slightly different "facts" each time things are restated), its citations are often completely unrelated to the topic at hand, and facts are cited by references that don't back them up.
I mean, the spelling and syntax of the sentences is mostly correct, just like any LLM content. But there's ultimately still no coherence to the output.
I guess this is a good thing for increasing coverage of neglected areas. But given how cleverly LLMs can hide hallucinations, I feel like at least a few different auditor bots should also sign off on edits to ensure everything is correct.
What's the point of a tool that helps you research a topic if said tool has to approve your topic first? It refused to research my topic because it was sensitive.
I saved a full snapshot of Wikipedia (and Stack Overflow) in the weeks before ChatGPT launched, and every day I'm more glad that I did. They will become the Low Background Steel of text.
The thing is that the Wiki mods will need to be more diligent with uncited things. I also see 2 massive opportunities here. First is that they can have agents check the cited source and verify whether the source backs up what's said to a reasonable degree. Second opportunity is fitting in things only found in other language Wikis that either be incorporated into the english one or help introduce new articles. Believe it or not, LLMs can't generate english answers for things answered only in Russian (or any language) in the training data.
You know that wikipedia keeps revisions on all articles. I'm sure you could put together a script to make a copy any time of each page from a certain point of time.
this is important as it collects and reports its references. a) it’s the correct paradigm for using llms. b) through human interactions, it can learn from its mistakes.
Oh dear lord .... sub heading states - Storm - Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
Good luck with this storm, wiki's the world over.
Just a thought but ... maybe someone should ask an org like the Internet Archive to snap-shot Wikipedia asap and label it Pre-Storm and After-Storm
LLM mediocrity is just a reflection of human mediocrity, and my bet is on the average LLM to get way better much faster than the average human doing the same.
Hmm something about this title containing the word 'research' disturbs me. I associate that word with rigorous scientific methods that leads to fact based knowledge or maybe some new hypothesis, not some LLM hallucinating sources, references, quotes and all the other garbage they spit out when challenged over a point of fact. Horrifying to think peeps might turn towards these tools for factual information.
This anthropomorphism really bothers me. These tools are useful for what they’re good for, but I really dislike the agency people keep trying to give to them.
Yes, I came to the comments to say the same thing. The LLM is not doing research - it is aggregating data associated with terms and reorganizing text based on what previous responses to a similar prompt would look like.
At the most generous level of scrutiny, the only part that could be related to research would be the aggregation of sources - but that is only a precursor to research and likely is too generalized to be as accurate as a specialist preparing data for actual research.
[+] [-] lukev|1 year ago|reply
Publishing AI generated material is generally speaking a horrible idea and does nobody any good (at least until accuracy levels get much much better.)
Even if they do it well and truthfully (which they don't) current LLMs can only summarize, digest, and restate. There is no non-transient value add. LLMs may have a place to help query, but there is no reason to publish LLM regurgitations alongside the ground truth used to generate them.
[+] [-] CuriouslyC|1 year ago|reply
The hardest part of good documentation is getting started. Once there are docs in place it's usually much easier to revise and correct than it would have been to write correctly by hand the first time. Think of it like automating a rough draft.
[+] [-] petercooper|1 year ago|reply
Though, at a stretch, Wikipedia itself could be considered based around summarization, digesting, and restating/citing things said elsewhere, given its policy of verifiability: "Even if you are sure something is true, it must have been previously published in a reliable source before you can add it." Now, LLMs aren't well known for their citation skills, to be fair.. :-)
[+] [-] mistrial9|1 year ago|reply
This moment reminds me very much of the late 1990s when it was common knowledge that "claim jumping a domain name" is very rude and not advisable, or the common knowledge among intellectuals that "ads will ruin the Internet" .. yes, polite people did not make companies to claim jump domain name registration, or push annoying and repetitive ads on the Internet..
but..
[+] [-] xbmcuser|1 year ago|reply
[+] [-] elicksaur|1 year ago|reply
If it doesn’t have value for being saved and published, why would it have value for the person viewing it ephemerally?
[+] [-] observationist|1 year ago|reply
A system like this is multilayered, with prompts going through the whole problem solving process, considering the information presented, assuring quality and factuality, assigning the necessary citations and documentation for claims.
Accuracy isn't a problem. The way in which AI is used creates the problem - ChatGPT and most chat based models are single pass, query/response type interactions with models. Sometimes you get a second pass with a moderation system, doing a review to ensure offensive or illegal things get filtered out. Without any additional testing and prompt engineering, you're going to run into hallucinations, inefficient formulations, random "technically correct but not very useful" generations, and so forth. Raw ChatGPT content shouldn't be published without significant editing and going through the same quality review process any human written text should go through.
What Storm accomplishes is an algorithmic and methodical series of problem solving steps, each of which can be tested and verified and validated. This is synthesized in a particular way, intended as a factual reference article. Presumably you could insert debiasing and checks for narrative or political statements, ensuring attribution and citation occur for quotations, and rephrasing anything generated by the AI as a neutral, academic statement of fact with no stylistic and artistic features.
This is significantly different from the almost superficial interactions you get with chatbots, unless you specifically engineer your prompts and cycle through similar problem solving methods.
Tasks like this are well within the value add domain of current AI capabilities.
Compared to the absolute trash of SEO optimized blog posts, the agenda driven, ulterior laden rants and rambles in social media, and the "I'm oh-so-cleverly influencing the narrative" articles posted to Wikipedia by humans, content like this is a clear winner in quality, in my opinion.
AI isn't at the point where it's going to spit out well grounded novel answers to things like "what's the cure for cancer?" but it can absolutely produce a principled and legible explanation of a phenomenon or collection of facts about a thing.
[+] [-] tiptup300|1 year ago|reply
[+] [-] visarga|1 year ago|reply
No, you're wrong. LLMs create new experiences after deployment, either by assisting humans, or by solving tasks they can validate, such as code or game play. In fact any deployed LLM gets to be embedded in a larger system - a chat room, a code running environment, a game, a simulation, a robot or inside a company - it can learn from iterative tasks because each following iteration carries some kind of real world feedback.
Besides that, LLMs trivially learn new concepts and even new skills with a short explanation or demonstration, they can be pulled out of their training distribution and collect experiences doing new things. If OpenAI has 100M users and they consume 10K tokens/user/month, that makes for 1 trillion tokens of human-AI interaction rich with new experiences and feedback.
In the text modality LLMs have consumed most of the high quality human text, that is why all SOTA models are roughly on par, they trained on the same data. That means easy time is over, AI has caught up with all human language data. But from now on AI models need to create experiences of their own, because learning from your own mistakes is much faster. The more they get used, the more feedback and new information they collect. The environment is the teacher, not everything is written in books.
And all that text - the trillions of tokens they are going to speak to us - in turn contributes to scientific discoveries and progress, and percolate back into the next training set. LLMs have massive impact at language level on people so by extension on the physical world and culture. They have already influenced language and the arts.
LLMs can create new experiences, learn new skills, and have a significant impact through widespread deployment and interaction. There is "value add" if you look at the grand picture.
[+] [-] pstorm|1 year ago|reply
[+] [-] pksebben|1 year ago|reply
There are wikipedias in other languages - Maybe this framework could be adapted to translate the search terms, fetch mulitlingual sources, translate them back, and use those as comparisons.
I've found a lot of stuff out through similar by-hand techniques that would be difficult to discover on english search. I'd be curious to see how much differential there is between accounts across language barriers.
[+] [-] Lerc|1 year ago|reply
For broader implementation you would want to develop the approach further. The idea of sampling other-language Wikipedia mentioned in a sibling comment seems to be a decent next step.
Extending it to bringing in from wider sources would be another step. I doubt it would be infallible but it would be really interesting to see how it compares to humans performing the same task. Especially if there were a additional ability to verify written articles and make corrections.
[+] [-] manishsharan|1 year ago|reply
[+] [-] _akhe|1 year ago|reply
Otherwise it would be like reading HN or Reddit debates where 2 egomaniacs who are both wrong continually straw man each other with statements peppered with lies and parroted disinfo, aint got time for that.
[+] [-] samgriesemer|1 year ago|reply
> While the system cannot produce publication-ready articles that often require a significant number of edits, experienced Wikipedia editors have found it helpful in their pre-writing stage.
So it can't produce articles that require many edits? Meaning it can produce publication-ready articles that don't need lots of edits? Or it can't produce publication-ready articles, and the articles produced require lots of edits? I can't make sense of this statement.
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] adr1an|1 year ago|reply
[+] [-] agilob|1 year ago|reply
An AI assistant app that mixes AI features with traditional personal productivity. The AI can work in the background to answer multiple chats, handle tasks, and stream/feed entries.
https://old.reddit.com/r/LocalLLaMA/comments/1b8uvpw/does_fr...
[+] [-] brap|1 year ago|reply
Imagine an infinite wiki where articles are generated on the fly (from reputable sources - with links), including links to other articles (which are also generated) etc.
I actually like this sort of interface more than chat.
[+] [-] rrr_oh_man|1 year ago|reply
[+] [-] skywhopper|1 year ago|reply
I mean, the spelling and syntax of the sentences is mostly correct, just like any LLM content. But there's ultimately still no coherence to the output.
[+] [-] barbarr|1 year ago|reply
[+] [-] pksebben|1 year ago|reply
1 - https://arxiv.org/abs/2402.05120
[+] [-] _akhe|1 year ago|reply
[+] [-] ranyume|1 year ago|reply
[+] [-] cess11|1 year ago|reply
What do the authors call what they're doing? Magic?
[+] [-] LeoPanthera|1 year ago|reply
[+] [-] barbarr|1 year ago|reply
[+] [-] pksebben|1 year ago|reply
There's a concurrent explosion of 'veracity' analysis - it'll be fun to run those against wikipedia a year from now and your data.
Incidentally, are you interested in mirroring your dataset and making it more robust? I'm sure I've got a few TB of storage lying around somewhere...
[+] [-] jakderrida|1 year ago|reply
[+] [-] tiptup300|1 year ago|reply
[+] [-] schainks|1 year ago|reply
[+] [-] jankovicsandras|1 year ago|reply
There's a small ironically funny typo in the first line: knolwedge
[+] [-] wwarner|1 year ago|reply
[+] [-] spxneo|1 year ago|reply
Wreaking havoc on the digital Akashic records.
[+] [-] zingelshuher|1 year ago|reply
[+] [-] jankovicsandras|1 year ago|reply
[+] [-] Logans_Run|1 year ago|reply
Good luck with this storm, wiki's the world over. Just a thought but ... maybe someone should ask an org like the Internet Archive to snap-shot Wikipedia asap and label it Pre-Storm and After-Storm
[+] [-] achrono|1 year ago|reply
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] tossandthrow|1 year ago|reply
what if that is not the case? what if the quality of this type of content actually increases?
[+] [-] whitehexagon|1 year ago|reply
[+] [-] madeofpalk|1 year ago|reply
[+] [-] devmor|1 year ago|reply
At the most generous level of scrutiny, the only part that could be related to research would be the aggregation of sources - but that is only a precursor to research and likely is too generalized to be as accurate as a specialist preparing data for actual research.
[+] [-] mistermann|1 year ago|reply
The presence of the word "scientific" in this statement disturbs me.
[+] [-] echo8899|1 year ago|reply
[+] [-] aaron695|1 year ago|reply
[deleted]