Storm: LLM system that researches a topic and generates full-length wiki article

[+] lukev|1 year ago|reply

I can see this being useful iif the content is generated on demand and then discarded.

Publishing AI generated material is generally speaking a horrible idea and does nobody any good (at least until accuracy levels get much much better.)

Even if they do it well and truthfully (which they don't) current LLMs can only summarize, digest, and restate. There is no non-transient value add. LLMs may have a place to help query, but there is no reason to publish LLM regurgitations alongside the ground truth used to generate them.

[+] CuriouslyC|1 year ago|reply

I think bootstrapping documentation with LLM output is a great practice. It's a wiki, people can update it from a baseline, just as long as they can see what was LLM generated to know that it shouldn't be taken as absolute truth.

The hardest part of good documentation is getting started. Once there are docs in place it's usually much easier to revise and correct than it would have been to write correctly by hand the first time. Think of it like automating a rough draft.

[+] petercooper|1 year ago|reply

current LLMs can only summarize, digest, and restate. There is no non-transient value add.

Though, at a stretch, Wikipedia itself could be considered based around summarization, digesting, and restating/citing things said elsewhere, given its policy of verifiability: "Even if you are sure something is true, it must have been previously published in a reliable source before you can add it." Now, LLMs aren't well known for their citation skills, to be fair.. :-)

[+] mistrial9|1 year ago|reply

excellent virtue signalling here -- however, commercial publishers, competitive attorneys, advertising sales and others are literally falling over themselves in an avalanche of doing exactly this, that you advise against (politely).

This moment reminds me very much of the late 1990s when it was common knowledge that "claim jumping a domain name" is very rude and not advisable, or the common knowledge among intellectuals that "ads will ruin the Internet" .. yes, polite people did not make companies to claim jump domain name registration, or push annoying and repetitive ads on the Internet..

but..

[+] xbmcuser|1 year ago|reply

I would love for something like this to be attached to LibGen where it reads the millions of scientific papers. As in my opinion human knowledge today is more than what a group of people can handle let alone individuals. Their is lot of domain specific knowledge that would translate and be used in other domains but unless by chance a human with speciality in both domains sees it will not get ported or assimilated in the 2nd domain.

[+] elicksaur|1 year ago|reply

How could it be true that the content generated would have value only if it is not persisted?

If it doesn’t have value for being saved and published, why would it have value for the person viewing it ephemerally?

[+] observationist|1 year ago|reply

This is categorically untrue. Publishing material generated like this is going to be generally better than human generated content. It takes less time, can be systematically tested and rigorous, and you can specifically avoid the pitfalls of bias and prejudice.

A system like this is multilayered, with prompts going through the whole problem solving process, considering the information presented, assuring quality and factuality, assigning the necessary citations and documentation for claims.

Accuracy isn't a problem. The way in which AI is used creates the problem - ChatGPT and most chat based models are single pass, query/response type interactions with models. Sometimes you get a second pass with a moderation system, doing a review to ensure offensive or illegal things get filtered out. Without any additional testing and prompt engineering, you're going to run into hallucinations, inefficient formulations, random "technically correct but not very useful" generations, and so forth. Raw ChatGPT content shouldn't be published without significant editing and going through the same quality review process any human written text should go through.

What Storm accomplishes is an algorithmic and methodical series of problem solving steps, each of which can be tested and verified and validated. This is synthesized in a particular way, intended as a factual reference article. Presumably you could insert debiasing and checks for narrative or political statements, ensuring attribution and citation occur for quotations, and rephrasing anything generated by the AI as a neutral, academic statement of fact with no stylistic and artistic features.

This is significantly different from the almost superficial interactions you get with chatbots, unless you specifically engineer your prompts and cycle through similar problem solving methods.

Tasks like this are well within the value add domain of current AI capabilities.

Compared to the absolute trash of SEO optimized blog posts, the agenda driven, ulterior laden rants and rambles in social media, and the "I'm oh-so-cleverly influencing the narrative" articles posted to Wikipedia by humans, content like this is a clear winner in quality, in my opinion.

AI isn't at the point where it's going to spit out well grounded novel answers to things like "what's the cure for cancer?" but it can absolutely produce a principled and legible explanation of a phenomenon or collection of facts about a thing.

[+] tiptup300|1 year ago|reply

are llms able to look at a list of categories, read content and then determine which of the categories apply?

[+] visarga|1 year ago|reply

> current LLMs can only summarize, digest, and restate. There is no non-transient value add.

No, you're wrong. LLMs create new experiences after deployment, either by assisting humans, or by solving tasks they can validate, such as code or game play. In fact any deployed LLM gets to be embedded in a larger system - a chat room, a code running environment, a game, a simulation, a robot or inside a company - it can learn from iterative tasks because each following iteration carries some kind of real world feedback.

Besides that, LLMs trivially learn new concepts and even new skills with a short explanation or demonstration, they can be pulled out of their training distribution and collect experiences doing new things. If OpenAI has 100M users and they consume 10K tokens/user/month, that makes for 1 trillion tokens of human-AI interaction rich with new experiences and feedback.

In the text modality LLMs have consumed most of the high quality human text, that is why all SOTA models are roughly on par, they trained on the same data. That means easy time is over, AI has caught up with all human language data. But from now on AI models need to create experiences of their own, because learning from your own mistakes is much faster. The more they get used, the more feedback and new information they collect. The environment is the teacher, not everything is written in books.

And all that text - the trillions of tokens they are going to speak to us - in turn contributes to scientific discoveries and progress, and percolate back into the next training set. LLMs have massive impact at language level on people so by extension on the physical world and culture. They have already influenced language and the arts.

LLMs can create new experiences, learn new skills, and have a significant impact through widespread deployment and interaction. There is "value add" if you look at the grand picture.

[+] pstorm|1 year ago|reply

I looked into this to see where it was getting new information, and as far as I can tell, it is searching wikipedia exclusively. Useful for sure, but not exactly what I was expecting based on the title.

[+] pksebben|1 year ago|reply

That gives me an idea.

There are wikipedias in other languages - Maybe this framework could be adapted to translate the search terms, fetch mulitlingual sources, translate them back, and use those as comparisons.

I've found a lot of stuff out through similar by-hand techniques that would be difficult to discover on english search. I'd be curious to see how much differential there is between accounts across language barriers.

[+] Lerc|1 year ago|reply

As a base for researching the idea, Wikipedia seems like a decent data source.

For broader implementation you would want to develop the approach further. The idea of sampling other-language Wikipedia mentioned in a sibling comment seems to be a decent next step.

Extending it to bringing in from wider sources would be another step. I doubt it would be infallible but it would be really interesting to see how it compares to humans performing the same task. Especially if there were a additional ability to verify written articles and make corrections.

[+] manishsharan|1 year ago|reply

At what point will it be just LLM Bots arguing with Other LLM Bots on Wikepedia edits ?

[+] _akhe|1 year ago|reply

As long as the LLM Moderator deems it safe discourse let the best idea win! I'd love a debate between 2 highly-accurate and context-aware LLMs - if such a thing existed.

Otherwise it would be like reading HN or Reddit debates where 2 egomaniacs who are both wrong continually straw man each other with statements peppered with lies and parroted disinfo, aint got time for that.

[+] samgriesemer|1 year ago|reply

Small thing, but the blurb on the README says

> While the system cannot produce publication-ready articles that often require a significant number of edits, experienced Wikipedia editors have found it helpful in their pre-writing stage.

So it can't produce articles that require many edits? Meaning it can produce publication-ready articles that don't need lots of edits? Or it can't produce publication-ready articles, and the articles produced require lots of edits? I can't make sense of this statement.

[+] unknown|1 year ago|reply

[deleted]

[+] adr1an|1 year ago|reply

It gives you a draft that you should keep working on. For example, fact checking.

[+] agilob|1 year ago|reply

Nucleo AI Alpha

An AI assistant app that mixes AI features with traditional personal productivity. The AI can work in the background to answer multiple chats, handle tasks, and stream/feed entries.

https://old.reddit.com/r/LocalLLaMA/comments/1b8uvpw/does_fr...

[+] brap|1 year ago|reply

I don’t know how well this works (demo is broken on mobile), but I like the idea.

Imagine an infinite wiki where articles are generated on the fly (from reputable sources - with links), including links to other articles (which are also generated) etc.

I actually like this sort of interface more than chat.

[+] rrr_oh_man|1 year ago|reply

Check out https://github.com/MxDkl/AutoWiki (there are project with similar names doing stuff like this)

[+] skywhopper|1 year ago|reply

From my experiments, this thing is pretty bad. It mixes up things that have similar names, it pulls in entirely unrelated concepts, the articles it generates are mind-numbingly repetitive and verbose (although notably with slightly different "facts" each time things are restated), its citations are often completely unrelated to the topic at hand, and facts are cited by references that don't back them up.

I mean, the spelling and syntax of the sentences is mostly correct, just like any LLM content. But there's ultimately still no coherence to the output.

[+] barbarr|1 year ago|reply

I guess this is a good thing for increasing coverage of neglected areas. But given how cleverly LLMs can hide hallucinations, I feel like at least a few different auditor bots should also sign off on edits to ensure everything is correct.

[+] pksebben|1 year ago|reply

This method has actually been proven effective at increasing reliability / decreasing hallucinations [1]

1 - https://arxiv.org/abs/2402.05120

[+] _akhe|1 year ago|reply

This would be useful for RAG when a Wiki doesn't exist. findOrCreate

[+] ranyume|1 year ago|reply

What's the point of a tool that helps you research a topic if said tool has to approve your topic first? It refused to research my topic because it was sensitive.

[+] cess11|1 year ago|reply

Kinda weird to promote automated reordering and rephrasing of information as research.

What do the authors call what they're doing? Magic?

[+] LeoPanthera|1 year ago|reply

I saved a full snapshot of Wikipedia (and Stack Overflow) in the weeks before ChatGPT launched, and every day I'm more glad that I did. They will become the Low Background Steel of text.

[+] barbarr|1 year ago|reply

Good analogy! There's good reason to believe that web archives "uncontaminated" by LLM output will have some unique value in the future (if not now).

[+] pksebben|1 year ago|reply

That's gonna be a lot of fun to play with in a year or so.

There's a concurrent explosion of 'veracity' analysis - it'll be fun to run those against wikipedia a year from now and your data.

Incidentally, are you interested in mirroring your dataset and making it more robust? I'm sure I've got a few TB of storage lying around somewhere...

[+] jakderrida|1 year ago|reply

The thing is that the Wiki mods will need to be more diligent with uncited things. I also see 2 massive opportunities here. First is that they can have agents check the cited source and verify whether the source backs up what's said to a reasonable degree. Second opportunity is fitting in things only found in other language Wikis that either be incorporated into the english one or help introduce new articles. Believe it or not, LLMs can't generate english answers for things answered only in Russian (or any language) in the training data.

[+] tiptup300|1 year ago|reply

You know that wikipedia keeps revisions on all articles. I'm sure you could put together a script to make a copy any time of each page from a certain point of time.

[+] schainks|1 year ago|reply

How do you browse this snapshot? I'm interested in this solution, too

[+] jankovicsandras|1 year ago|reply

This looks cool!

There's a small ironically funny typo in the first line: knolwedge

[+] wwarner|1 year ago|reply

this is important as it collects and reports its references. a) it’s the correct paradigm for using llms. b) through human interactions, it can learn from its mistakes.

[+] spxneo|1 year ago|reply

I hope somebody took a snapshot of the entire internet before 2020, that is our only defence against knowledge laundry.

Wreaking havoc on the digital Akashic records.

[+] zingelshuher|1 year ago|reply

Expect sh*t load of AI hallucinations. As if Wiki isn't bad enough with BS some intentionally posting.

[+] jankovicsandras|1 year ago|reply

One

[+] Logans_Run|1 year ago|reply

Oh dear lord .... sub heading states - Storm - Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

Good luck with this storm, wiki's the world over. Just a thought but ... maybe someone should ask an org like the Internet Archive to snap-shot Wikipedia asap and label it Pre-Storm and After-Storm

[+] achrono|1 year ago|reply

LLM mediocrity is just a reflection of human mediocrity, and my bet is on the average LLM to get way better much faster than the average human doing the same.

[+] unknown|1 year ago|reply

[deleted]

[+] tossandthrow|1 year ago|reply

there is this sentiment of Ai induced deterioration and pollution.

what if that is not the case? what if the quality of this type of content actually increases?

[+] whitehexagon|1 year ago|reply

Hmm something about this title containing the word 'research' disturbs me. I associate that word with rigorous scientific methods that leads to fact based knowledge or maybe some new hypothesis, not some LLM hallucinating sources, references, quotes and all the other garbage they spit out when challenged over a point of fact. Horrifying to think peeps might turn towards these tools for factual information.

[+] madeofpalk|1 year ago|reply

This anthropomorphism really bothers me. These tools are useful for what they’re good for, but I really dislike the agency people keep trying to give to them.

[+] devmor|1 year ago|reply

Yes, I came to the comments to say the same thing. The LLM is not doing research - it is aggregating data associated with terms and reorganizing text based on what previous responses to a similar prompt would look like.

At the most generous level of scrutiny, the only part that could be related to research would be the aggregation of sources - but that is only a precursor to research and likely is too generalized to be as accurate as a specialist preparing data for actual research.

[+] mistermann|1 year ago|reply

> Hmm something about this title containing the word 'research' disturbs me. I associate that word with rigorous scientific methods...

97 comments