Semi-OT (similar language): The national archives in Sweden and Finland published a model for OCR:ing handwritten Swedish text from the 1600s to the 1800s with what to me seems like a very level of accuracy given the source material. (4% character error rate)
They have also published a fairly large volume of OCR:ed texts (IIRC birth/death notices from church records) using this model online. As a beginner genealogist it's been fun to follow.
> Preserving historical and cultural heritage: Organizations and nonprofits that are custodians of heritage have been using Mistral OCR to digitize historical documents and artifacts, ensuring their preservation and making them accessible to a broader audience.
For this task, general models will always perform poorly. My company trains custom gen ai models for document understanding. We recently trained a VLM for the German government to recognize documents written in old German handwriting, and it performed with exceptionally high accuracy.
lysace|1 year ago
https://readcoop.eu/model/the-swedish-lion-i/
https://www.transkribus.org/success-story/creating-the-swedi...
https://huggingface.co/Riksarkivet
They have also published a fairly large volume of OCR:ed texts (IIRC birth/death notices from church records) using this model online. As a beginner genealogist it's been fun to follow.
Thaxll|1 year ago
riquito|1 year ago
> Preserving historical and cultural heritage: Organizations and nonprofits that are custodians of heritage have been using Mistral OCR to digitize historical documents and artifacts, ensuring their preservation and making them accessible to a broader audience.
butovchenkoy|11 months ago
rvnx|1 year ago
thadt|1 year ago
butovchenkoy|11 months ago
anothermathbozo|1 year ago