One important item was a custom OCR for the magazine scans, particularly those with text on a crazy background. This was discussed in a Patreon call but might also be discussed in the podcast, I haven't listened through just yet. Another important distinction is it is actually being run and curated as a library (backend is preservica.com), not a grab bag like Archive.org can end up being, so the data will be more consistently correct.
Tiktaalik|1 year ago
That actually would be a topic of particular interest to this community.
Some of the layouts of these enthusiast magazines are so chaotic (looking at you Hardcore Gamefan) that the current technology for parsing text from scans wasn't good enough and they had to develop their own.