Ask HN: How to transcribe 1000s of handwritten notes

[+] throwaway211|1 year ago|reply

Can you read them? Speech to text perhaps. That can also be done locally.

If a note's a minute, 1000 notes are around 16 hours of reading. Scale time needed depending on if it takes less or more than a minute to read. Add a note reference to the start of each recording, like a zettelkasten, so the scanned file, recording and text cross-reference.

If assessing other solutions, that's at least an upper bound on the cost of any other solution.

[+] mvkel|1 year ago|reply

This is the best answer.

Any techie will desperately try to come up with a tech solution to this problem.

A few months of development later, you might have something that yields trustworthy output.

But 16 hours? No tech solution will be done faster than that.

Don't build a factory for a one-off.

[+] dSebastien|1 year ago|reply

You made my day. It's obviously an awesome approach!

Documented here: https://notes.dsebastien.net/30+Areas/33+Permanent+notes/33....

[+] bambax|1 year ago|reply

Great solution. And, if the notes don't contain confidential information, you could totally hire someone on Fiverr to read them for you. Or on Mechanical Turk, have the same notes be read more than once by different people, so you can compare and more easily find errors in transcription later.

[+] smarm52|1 year ago|reply

Some good transcription solutions:

https://zapier.com/blog/best-text-dictation-software/#window...

https://otter.ai/

(Haven't actually tried Otter, but it gets a LOT of good reviews.)

[+] BetterWhisper|1 year ago|reply

Reading the notes aloud is a really good solution without having to spend a ton of time on trying to OCR handwriting.

I can recommend https://www.videototextai.com/ for transcribing huge amounts of audio. (Disclaimer, I am the founder of VideoToTextAI)

[+] ujkiolp|1 year ago|reply

Bad solution simply because of information loss!

* after STT, there is objectively less info in the storage format

* OP cannot take advantage of rapidly advancing OCR tech on the storage

* inevitably OP might end up saving the originals “just in case”- rendering this entire process useless

[+] bcx|1 year ago|reply

Additionally, you could hire other people to read them, dividing the task into whatever manageable chunks or even having multiple people read the same parts for agreement.

In the days before good software transcription I saved a ton of time I grad school by splitting up interviews and using mechanical Turk or up work ( can’t remember which one, to transcribe 1 minute snippets, and then took another pass)

[+] bckr|1 year ago|reply

Great recommendation, thank you. I have considered this and it’s definitely the simplest way to achieve what I want.

[+] giantg2|1 year ago|reply

I have a similar issue but reading them won't work because the person who wrote them passed away. It there another solution that could transcribe this sort of thing (maybe the original use case would have been for historical texts)?

[+] canadaduane|1 year ago|reply

Using MacWhisper (or other similar whisper.cpp app or utility), you could do it all on-device for a free or one-time fee, too.

note: I have no relation to MacWhisper, just a happy customer.

[+] GianFabien|1 year ago|reply

I have about 5000 pages of research notes. I have found that the quality and usefulness of the material varies greatly. Much of the older material is of little relevance with the passing of time. As futile it may seem, I'm finding that re-reading and summarizing rather than straight transcribing is effective. I'm refreshing my memory of what I did discover and only typing up what is relevant now. Fortunately I'm a fast touch typist, so I can stare at the handwritten page and type; only glancing at the screen after a paragraph or two. Two things I find useful to retain are the dates of the original materials and bibliographic references.

[+] kqr|1 year ago|reply

This is important, I think. Allen Ward expands on the topic of "reusable knowledge" standing in contrast to putting notes in a binder and sticking the binder in a cabinet somewhere and then pretending you have stored knowledge.

For knowledge to be reusable, it needs to be actively maintained, curated, summarised, integrated. It takes work so one shouldn't bother at all if one doesn't expect to want to refer to it later.

[+] bckr|1 year ago|reply

I think this is the best approach for project-related information. It’s essentially the Second Brain approach.

I’m interested in these journals for autobiographical / psychiatric reasons. Therefore, indeed, the more recent information is more valuable, but not with such a steep drop-off.

The oldest 10% might contain 5% of the value.

[+] simonw|1 year ago|reply

I'll throw in another vote for AWS Textract, I've had great results for it against 19th century handwriting: https://simonwillison.net/2022/Aug/25/sfms-archive/

[+] TheMiddleMan|1 year ago|reply

I've found decent success with Googles Cloud Vision API for transcribing cursive writing on the backs of 1000s of family photos.

https://cloud.google.com/vision/docs/handwriting

I threw together a basic UI with the transcribed text in an editable area next to the image where I would edit any adjustments as it wasn't 100% perfect.

[+] bckr|1 year ago|reply

Thanks! I did try the vision demo in the console. One problem might be that my handwriting is idiosyncratic / there may be more training data available for historical handwriting styles?

[+] daemonologist|1 year ago|reply

Yeah OCR remains an area where the open source solutions can't quite compete on quality with what the cloud providers offer. I've found that (unless you have a cost-prohibitive number of documents to process) if there are complex layouts, handwriting, etc. it's worth going to Google or AWS.

[+] scovetta|1 year ago|reply

Take photos of them, or cut the binding and scan them all, and then feed the work out to mechanical turk?

[+] chrisjj|1 year ago|reply

Or avoid cutting by using a book scanner e.g. https://www.amazon.co.uk/CZUR-Professional-Document-Auto-Fla...

[+] jhayward|1 year ago|reply

This is likely the fastest and cheapest option. Pennies per page. Double- or triple- assign them when they show signs of large differences between expected grammar, word choice, or spelling patterns.

[+] pcherna|1 year ago|reply

I have a hundred or so pages of handwritten letters in Hungarian, but got useless results from AWS Textract and from transkribus. However, I also have about the same number of pages (written by the same person) that I have already gotten hand-transcribed into Hungarian. How might I approach using the already-transcribed stuff to train some kind of AI model or text-recognition model to work on the rest?

[+] user_agent|1 year ago|reply

A hint that might help at least partially: novadays for managing digital and handwritten notes I juse Joplin, but before that I was an avid Evernote user. Having a paid plan active gives you access to Evernote's OCR function on their backend. I had a lot of handwritten notes uploaded as attachments to Evernote, and I remember that despite my handwritnig being awful their softwre was able to parse it and allow me to, among others, perform quite advanced searches on my handwritten notes. I'm not sure if there's a way to make Evernote's OCR backend work for you in scenarios more elastic that what it's been built for, but I wanted to menion that there's this unique OCR tech that I think does far better job that any standalone OCR software I tried (for my handwriting style which I consider awful). It might be worth researching further for you.

[+] disqard|1 year ago|reply

I used to use Evernote for a while, and like you, was a fan of its handwriting OCR.

Sadly, it is no longer software I would recommend:

https://news.ycombinator.com/item?id=36609641

[+] EasyMark|1 year ago|reply

Be careful about Evernote, they got bought out by a somewhat questionable company that has a history of buying up companies and basically not improving them like the old owners.

[+] wriggler|1 year ago|reply

Have you tried https://www.handwritingOCR.com?

It is designed to do exactly what you are looking for, and has been used very successfully by many others for that same purpose (I’m the founder).

It is not as cheap per page as Google Document AI, for example, but it does tend to be much more accurate for handwriting, so usually ends up cheaper when editing time is factored in.

If you find it does work well with your handwriting, please get in touch and I can try to fit the pricing to your use case.

[+] menomatter|1 year ago|reply

Does it work for Arabic and hebrew? I am trying to teach myself how to fine tune a model and thought doing this with my own arabic notes could be a fun project. Not sure where to start though.

update: I tired it and it works to some degree and a lot better than chatgpt.

[+] bckr|1 year ago|reply

Will try the free trial this week, thank you.

I don’t see a way to fine tune on here, though. Is that right or am I missing it?

[+] kwanbix|1 year ago|reply

Sounds super cool, but why "per month" and not some "per page" pricing?

[+] dougdimmadome|1 year ago|reply

I was in a similar situation last month. Not quite 1000s of pages but close to 100. Just enough to make typing them out seem like too much work.

I found an app online (I wont even name it) which promised incredibly accurate handwriting transcription. Signed up and found it was true, but they were just sending images directly to chatGPT and returning the result and then charging a fee on top.

I started working on an open source version. It took me only a few hours and I'm sure anyone else could pull it together. used chatGPT example code to connect to API and send an image with a prompt along the lines of "please transcribe the text in this image and return only that, nothing else". even with that instruction it still sometimes prefaces with "sure! I can do that.", which I think is the AI equivalent of Homer Simpson writing "ok" in the "please leave this section blank" part of the form. Anyhoo, I had a basic job queue written, pull in images in order of file creation date and fire them off, append the text to a text file after. There was some cleanup of the file required (weird line breaks) but it saved me days of typing.

You still need a chatGPT API key for it but it does take a good bit of the work out.

At the moment I'm investigating using a free local model. LLava is just as accurate but takes longer than sending it to ChatGPT. but if you were worried about burning credits it would be the way to go.

[+] tmaly|1 year ago|reply

I record myself reading my hand written notes, then I just upload the mp3 of the recording to MS 365 to transcribe.

I put special stop words like highlight/return so then I can post process and ensure the markdown formatting looks good.

[+] ProllyInfamous|1 year ago|reply

Whisper.app will do this locally on Apple Silicon, FYI.

[+] imvetri|1 year ago|reply

I have my 3 years of paper, I wanted to use it to experiment building a black mass program. A blackmass program is a concept which will yield to a black mass in the computer, capable of building conceptual cool tech like automating your daily work, self experimentation, self learning etc.

My notes will have instructions to reach the black mass state, a computer image scanner will try to learn my handwritings, take them as instructions, connect dots etc.

The design of this system is cryptic and challenging. because, side effect to create a computational program will result in a circling thoughts for me. And its hard for me to convert it into an action.

Taking that as an inspiration, this program is a circling program, which means, it will constantly spiral upwards in a value that is definitive to its actions in the past.

All my notes has information or points or ideas about this fictional concept. I burned the notes which were repetitive, kept the rest.

When I did that, It created more head space for me. The headspace, helped to solve problems and have more space for more learnings.

[+] bckr|1 year ago|reply

> I burned the notes which were repetitive, kept the rest.

> When I did that, It created more head space for me.

This is essentially the idea behind Getting Things Done and Building a Second Brain

As I said to another commenter, I’ve been able to separate out the project notes and ideas from my autobiographical diaries. These latter I want to keep and read.

Thanks for your interesting comment and good luck building your system!

[+] wilabroard|1 year ago|reply

For anyone whose handwritten notes have equations or pictures, Mathpix is stellar. Their APIs can take PDFs as input and return markdown with latex and embedded images. The handwriting recognition is pretty good on my cursive -- good enough anyway that a plain old LLM like Llama 3 can fix the typos.

(Likely under the hood Mathpix has done exactly what you're proposing, with image segmentation, text/image/math classification, then transcription.)

I've been using an Apple Shortcuts automation that turns my handwritten PDFs into notes in Obsidian, with the transcription up top and the PDF embedded below. Could pretty easily be adapted to turn a library of PDFs into a folder of Obsidian markdown notes. Here's a writeup: https://riddle.press/a-marriage-between-handwritten-notes-an...

[+] praving5|1 year ago|reply

If those notes are really worthy and meaningful to you, then hire someone to type them out for you. If there is something that money can buy, then save your time!

[+] bckr|1 year ago|reply

You’re right, and thanks to another one of the commenters, I have an idea for how I could do this.

Take my journals, and run a relatively simple word separation algorithm over them.

Shuffle up those words and pay to have them annotated.

Reconstruct the dataset from there.

[+] freddealmeida|1 year ago|reply

I built this firm a decade ago. https://www.cogent.co.jp/en/

Works with English and Japanese. Sadly I'm no longer with the team there but the work is solid. Try it out.

[+] mariocesar|1 year ago|reply

It seems like using speech-to-text is a faster alternative. You can also consider outsourcing the work. I know abbyy.com offers a service for this. Even though you may not be their target market, they have services for implementing hybrid machine learning and data entry solutions.

If you're into dreaming up cool solutions, you could try using smart pens or tablets to write stuff and then teach a model to recognize your handwriting. But for now, it's just a dream.

[+] ant6n|1 year ago|reply

Scan into pdf and organize them, keep as PDF.

You have to think about what your goal is. Handwritten notes can be perfectly digitized into handwritten notes. What do you need the ocr for? Publishing? … transcribe what you need, or better, rewrite.

Searching? As you scan, make a basic index so that you can refer to the notes. Organize the folders properly with your notes, use a useful naming scheme.

[+] constantinum|1 year ago|reply

I'm unsure how recognisable your handwriting is, but the following tech understood mine.

Try LLMwhisperer[1] pdf extraction API. You are only one "curl" command away from extracting your handwritten text.

The best thing is it preserves the layout of your notes, which means it can keep tables as tables and lists as lists.

Check this screen grab for extracting handwritten notes > https://imgur.com/fXk0tcR

[1]: https://llmwhisperer.unstract.com/ [2]: Try it with your document here > https://pg.llmwhisperer.unstract.com/

[edited] added links

[+] sjhaba|1 year ago|reply

Have you tried chatgpt? 10k image requests should be pretty cheap

[+] tcsenpai|1 year ago|reply

Theoretical solution: train a model on your handwriting. There should be plenty of easy (relatively) to use apps and frameworks for that.

It will take time but you will have a pretty tailored solution.

Also of course: first of all try to process the images so that they only are white and black (not greyscale, actual B/W pictures)

[+] canucker2016|1 year ago|reply

How about creating a crowdsourced captcha service?

Take scans of your journal pages, split the jpegs/pics into word fragments, display a couple of fragments to captcha clients, generate completed journal entries when the consensus gets reasonably high for each word fragment.

Not sure how captcha services start from scratch - probably ask around/check with google search.

Privacy goes out the door, but you should be able to show disjointed word fragments so no one could reconstruct enough of a single journal entry to expose your more personal info unless they were very determined. Or maybe split the scans into individual letter fragments instead?

Then monetize this for other people in the same situation...

[+] bckr|1 year ago|reply

I love this idea. It’s way overengineered for this problem, and I already have a startup that requires my complete attention, but thank you for writing this out.

And if anyone decides to do this, let me know!

Privacy is one of the reasons I would pay for a service like this, rather than pay a person to (try) to do it.

These journals contain a lot of psychiatric-level information about me, which is both what makes it valuable and sensitive.

[+] Suppafly|1 year ago|reply

>Then monetize this for other people in the same situation...

That's basically what Amazon Mechanical Turk is, without the captcha bit.

142 comments