Wouldn't it be easier to cutoff pre-2020-ish, and ask it to create the transformer architecture of gpt? 1900 is so long ago I doubt most documents are good quality if they've been digitised at all. Most likely just low quality scanned images of inconsistent, half-illegible typewriter documents. Transcribed with OCR at best.
kccqzy|1 year ago
throwup238|1 year ago
cellis|1 year ago