top | item 44832270

(no title)

taylorlapeyre | 6 months ago

It once again completely fails on an extremely simple test: look at a screenshot of sheet music, and tell me what the notes are. Producing a MIDI file for it (unsurprisingly) was far beyond its capabilities.

https://chatgpt.com/share/68954c9e-2f70-8000-99b9-b4abd69d1a...

This is not anywhere remotely close to general intelligence.

discuss

order

adrianh|6 months ago

Interpreting sheet music images is very complex, and I’m not surprised general-purpose LLMs totally fail at it. It’s orders of magnitude harder than text OCR, due to the two-dimensional-ness.

For much better results, use a custom trained model like the one at Soundslice: https://www.soundslice.com/sheet-music-scanner/