Hey badmonster,
Yeah, Tesseract was a quick way to get a first version, so I'm using that now. From what I can tell it should be good for multiple languages, but not very good at handwriting. This is something I'd like to improve in the future.
There's one other main difference with how I'm handling PDFs. I'm not using a PDF client for viewing, I convert every page into a full-size image and thumbnail and use those in the page display. That gives me more control over the user interface and I think that it is a nicer experience, especially on mobile when you want to zoom in.
The data model did take a few iterations to get something that felt right. I landed on everything being a field. When you create a page in a space with no templates, it will automatically make a default template with one field: a markdown block. Everything being a field means that images and PDFs are also fields, and I can search everything within one table.
peterwoodman|2 months ago
There's one other main difference with how I'm handling PDFs. I'm not using a PDF client for viewing, I convert every page into a full-size image and thumbnail and use those in the page display. That gives me more control over the user interface and I think that it is a nicer experience, especially on mobile when you want to zoom in.
The data model did take a few iterations to get something that felt right. I landed on everything being a field. When you create a page in a space with no templates, it will automatically make a default template with one field: a markdown block. Everything being a field means that images and PDFs are also fields, and I can search everything within one table.
If you want to see what I mean on a demo page: https://www.panto.app/share/3fb92532-0383-400c-ac4d-384370c6...