top | item 32058532

(no title)

jka | 3 years ago

Trying to ignore any hype, lofty sci-fi ideas, or potential philosophical questions for a moment: roughly speaking, it sounds like this is a search engine, for use in a neat and thought-provoking use case.

There's an architecture diagram[1] alongside the source code, and my summary would be:

- The system has in-house web indexes built from Common Crawl[2] data

- The system receives snippets of text from Wikipedia and determines whether existing citations exist and whether they are valid

- If no valid citation exists, then the system performs queries against the indexes to find relevant URLs

It'd be interesting to learn how this approach fares compared to pasting the relevant paragraphs of text into search engines and excluding site:wikipedia.org from the results.

Something about feedback loops and data quality makes me wary that too much application of automated systems like this would lead to a degradation of content quality (each updated copy an imperfect translation or reference to an existing one).

[1] - https://github.com/facebookresearch/side/tree/a595fb09c85233...

[2] - https://commoncrawl.org/

discuss

No comments yet.