(no title)
loubbrad | 1 year ago
There have been several successful models for multi-track music transcription - see Google's MT3 project (https://research.google/pubs/mt3-multi-task-multitrack-music...). In the case of piano transcription, accuracy is nearly flawless at this point, even for very low-quality audio:
https://github.com/EleutherAI/aria-amt
Full disclaimer: I am the author of the above repo.
Earw0rm|1 year ago
loubbrad|1 year ago
bravura|1 year ago
https://replicate.com/turian/multi-task-music-transcription
I ported their colab to runtime so I could use it more easily.
The MIDI output is... puzzling?
I've tried feeding it even simple stems and found the output unusable for some tracks, i.e. the MIDI output and audio were not well aligned and there were timing issues. On other audio it seemed to work fine.
loubbrad|1 year ago
Luckily for me, audio-to-seq approaches do work very well for piano, which turns out to be an amazing way of getting expressive MIDI data for training generative models.
air217|1 year ago
WiSaGaN|1 year ago
loubbrad|1 year ago
https://magenta.tensorflow.org/datasets/maestro
Most current research involves refining deep learning based approaches to this task. When I worked on this problem earlier this year, I was interested in adding robustness to these models by training a sort of musical awareness into them. You can see a good example of it in this tweet:
https://x.com/loubbrad/status/1794747652191777049