top | item 42953039

(no title)

pkamb | 1 year ago

Is anyone building a public domain repository / AI training ground for old newspapers? Anything before 1930 has no restrictions. Newspapers.com has pretty good content but the interface and search is extremely lacking. Google News was abandoned a decade ago. This seems like something where AI could really help, for once. Not in training chatbots or whatever but actually just providing great search for articles in books, newspapers, and magazines.

discuss

bikeshaving|1 year ago

There’s also a fascinating proposal I read somewhere where you create a training set with a knowledge cutoff of 1900 or 1930 and see if the resulting AI could predict the future or independently discover later scientific breakthroughs.

themarbz|1 year ago

I'm imagining a model trained on pre-1930 data that only speaks "old-timey English"...

adonovan|1 year ago

> How do I make an HTML view of a SQL database?

Well old chap, you'll need a shoeshine box full of vacuum tubes and some brass flanges...

TomatoCo|1 year ago

I'm imagining a text-to-speech model that only speaks in the Transatlantic Accent

tombert|1 year ago

I think that idea is capital. I'm really hoping that chatbots starting using 23-skidoo in conversations.

mrweasel|1 year ago

The interface and search could probably be solved without the use of AI. Seems like mostly an OCR problem. Both ElasticSearch and Sphinx are already really good, and I'm sure that there are other open source or commercial search engines available, or hire ex-Google engineers, Google doesn't seem interested in search anymore.

pkamb|1 year ago

Newspapers have nearly identical newswire columns printed in 100+ newspapers, but with slightly different headlines and content. Or OCR breaking due to words being physically next to each other but in separate stories. The Newspapers.com search has fine OCR but is difficult and time consuming to use because of those issues. Seems like something "AI" could solve easily.