Ask HN: How to transcribe 1000s of handwritten notes
My handwriting is not great!
None of the off the shelf solutions come even close to recognizing my handwriting.
Can you think of anything better than just opening every single file and manually transcribing it?
I have been thinking about training a model to first divide the images into lines of text. Then, it will be easier to transcribe, and automatically those transcriptions will be associated with areas of the image, in case I figure out a good handwriting model.
[+] [-] throwaway211|1 year ago|reply
If a note's a minute, 1000 notes are around 16 hours of reading. Scale time needed depending on if it takes less or more than a minute to read. Add a note reference to the start of each recording, like a zettelkasten, so the scanned file, recording and text cross-reference.
If assessing other solutions, that's at least an upper bound on the cost of any other solution.
[+] [-] mvkel|1 year ago|reply
Any techie will desperately try to come up with a tech solution to this problem.
A few months of development later, you might have something that yields trustworthy output.
But 16 hours? No tech solution will be done faster than that.
Don't build a factory for a one-off.
[+] [-] dSebastien|1 year ago|reply
Documented here: https://notes.dsebastien.net/30+Areas/33+Permanent+notes/33....
[+] [-] bambax|1 year ago|reply
[+] [-] smarm52|1 year ago|reply
https://zapier.com/blog/best-text-dictation-software/#window...
https://otter.ai/
(Haven't actually tried Otter, but it gets a LOT of good reviews.)
[+] [-] BetterWhisper|1 year ago|reply
I can recommend https://www.videototextai.com/ for transcribing huge amounts of audio. (Disclaimer, I am the founder of VideoToTextAI)
[+] [-] ujkiolp|1 year ago|reply
* after STT, there is objectively less info in the storage format
* OP cannot take advantage of rapidly advancing OCR tech on the storage
* inevitably OP might end up saving the originals “just in case”- rendering this entire process useless
[+] [-] bcx|1 year ago|reply
In the days before good software transcription I saved a ton of time I grad school by splitting up interviews and using mechanical Turk or up work ( can’t remember which one, to transcribe 1 minute snippets, and then took another pass)
[+] [-] bckr|1 year ago|reply
[+] [-] giantg2|1 year ago|reply
[+] [-] canadaduane|1 year ago|reply
note: I have no relation to MacWhisper, just a happy customer.
[+] [-] GianFabien|1 year ago|reply
[+] [-] kqr|1 year ago|reply
For knowledge to be reusable, it needs to be actively maintained, curated, summarised, integrated. It takes work so one shouldn't bother at all if one doesn't expect to want to refer to it later.
[+] [-] bckr|1 year ago|reply
I’m interested in these journals for autobiographical / psychiatric reasons. Therefore, indeed, the more recent information is more valuable, but not with such a steep drop-off.
The oldest 10% might contain 5% of the value.
[+] [-] simonw|1 year ago|reply
[+] [-] TheMiddleMan|1 year ago|reply
https://cloud.google.com/vision/docs/handwriting
I threw together a basic UI with the transcribed text in an editable area next to the image where I would edit any adjustments as it wasn't 100% perfect.
[+] [-] bckr|1 year ago|reply
[+] [-] daemonologist|1 year ago|reply
[+] [-] scovetta|1 year ago|reply
[+] [-] chrisjj|1 year ago|reply
[+] [-] jhayward|1 year ago|reply
[+] [-] pcherna|1 year ago|reply
[+] [-] user_agent|1 year ago|reply
[+] [-] disqard|1 year ago|reply
Sadly, it is no longer software I would recommend:
https://news.ycombinator.com/item?id=36609641
[+] [-] EasyMark|1 year ago|reply
[+] [-] wriggler|1 year ago|reply
It is designed to do exactly what you are looking for, and has been used very successfully by many others for that same purpose (I’m the founder).
It is not as cheap per page as Google Document AI, for example, but it does tend to be much more accurate for handwriting, so usually ends up cheaper when editing time is factored in.
If you find it does work well with your handwriting, please get in touch and I can try to fit the pricing to your use case.
[+] [-] menomatter|1 year ago|reply
update: I tired it and it works to some degree and a lot better than chatgpt.
[+] [-] bckr|1 year ago|reply
I don’t see a way to fine tune on here, though. Is that right or am I missing it?
[+] [-] kwanbix|1 year ago|reply
[+] [-] dougdimmadome|1 year ago|reply
I found an app online (I wont even name it) which promised incredibly accurate handwriting transcription. Signed up and found it was true, but they were just sending images directly to chatGPT and returning the result and then charging a fee on top.
I started working on an open source version. It took me only a few hours and I'm sure anyone else could pull it together. used chatGPT example code to connect to API and send an image with a prompt along the lines of "please transcribe the text in this image and return only that, nothing else". even with that instruction it still sometimes prefaces with "sure! I can do that.", which I think is the AI equivalent of Homer Simpson writing "ok" in the "please leave this section blank" part of the form. Anyhoo, I had a basic job queue written, pull in images in order of file creation date and fire them off, append the text to a text file after. There was some cleanup of the file required (weird line breaks) but it saved me days of typing.
You still need a chatGPT API key for it but it does take a good bit of the work out.
At the moment I'm investigating using a free local model. LLava is just as accurate but takes longer than sending it to ChatGPT. but if you were worried about burning credits it would be the way to go.
[+] [-] tmaly|1 year ago|reply
I put special stop words like highlight/return so then I can post process and ensure the markdown formatting looks good.
[+] [-] ProllyInfamous|1 year ago|reply
[+] [-] imvetri|1 year ago|reply
My notes will have instructions to reach the black mass state, a computer image scanner will try to learn my handwritings, take them as instructions, connect dots etc.
The design of this system is cryptic and challenging. because, side effect to create a computational program will result in a circling thoughts for me. And its hard for me to convert it into an action.
Taking that as an inspiration, this program is a circling program, which means, it will constantly spiral upwards in a value that is definitive to its actions in the past.
All my notes has information or points or ideas about this fictional concept. I burned the notes which were repetitive, kept the rest.
When I did that, It created more head space for me. The headspace, helped to solve problems and have more space for more learnings.
[+] [-] bckr|1 year ago|reply
> When I did that, It created more head space for me.
This is essentially the idea behind Getting Things Done and Building a Second Brain
As I said to another commenter, I’ve been able to separate out the project notes and ideas from my autobiographical diaries. These latter I want to keep and read.
Thanks for your interesting comment and good luck building your system!
[+] [-] wilabroard|1 year ago|reply
(Likely under the hood Mathpix has done exactly what you're proposing, with image segmentation, text/image/math classification, then transcription.)
I've been using an Apple Shortcuts automation that turns my handwritten PDFs into notes in Obsidian, with the transcription up top and the PDF embedded below. Could pretty easily be adapted to turn a library of PDFs into a folder of Obsidian markdown notes. Here's a writeup: https://riddle.press/a-marriage-between-handwritten-notes-an...
[+] [-] praving5|1 year ago|reply
[+] [-] bckr|1 year ago|reply
Take my journals, and run a relatively simple word separation algorithm over them.
Shuffle up those words and pay to have them annotated.
Reconstruct the dataset from there.
[+] [-] freddealmeida|1 year ago|reply
Works with English and Japanese. Sadly I'm no longer with the team there but the work is solid. Try it out.
[+] [-] mariocesar|1 year ago|reply
If you're into dreaming up cool solutions, you could try using smart pens or tablets to write stuff and then teach a model to recognize your handwriting. But for now, it's just a dream.
[+] [-] ant6n|1 year ago|reply
You have to think about what your goal is. Handwritten notes can be perfectly digitized into handwritten notes. What do you need the ocr for? Publishing? … transcribe what you need, or better, rewrite.
Searching? As you scan, make a basic index so that you can refer to the notes. Organize the folders properly with your notes, use a useful naming scheme.
[+] [-] constantinum|1 year ago|reply
Try LLMwhisperer[1] pdf extraction API. You are only one "curl" command away from extracting your handwritten text.
The best thing is it preserves the layout of your notes, which means it can keep tables as tables and lists as lists.
Check this screen grab for extracting handwritten notes > https://imgur.com/fXk0tcR
[1]: https://llmwhisperer.unstract.com/ [2]: Try it with your document here > https://pg.llmwhisperer.unstract.com/
[edited] added links
[+] [-] sjhaba|1 year ago|reply
[+] [-] tcsenpai|1 year ago|reply
It will take time but you will have a pretty tailored solution.
Also of course: first of all try to process the images so that they only are white and black (not greyscale, actual B/W pictures)
[+] [-] canucker2016|1 year ago|reply
Take scans of your journal pages, split the jpegs/pics into word fragments, display a couple of fragments to captcha clients, generate completed journal entries when the consensus gets reasonably high for each word fragment.
Not sure how captcha services start from scratch - probably ask around/check with google search.
Privacy goes out the door, but you should be able to show disjointed word fragments so no one could reconstruct enough of a single journal entry to expose your more personal info unless they were very determined. Or maybe split the scans into individual letter fragments instead?
Then monetize this for other people in the same situation...
[+] [-] bckr|1 year ago|reply
And if anyone decides to do this, let me know!
Privacy is one of the reasons I would pay for a service like this, rather than pay a person to (try) to do it.
These journals contain a lot of psychiatric-level information about me, which is both what makes it valuable and sensitive.
[+] [-] Suppafly|1 year ago|reply
That's basically what Amazon Mechanical Turk is, without the captcha bit.