Poor schemas, poor cataloguing: why music tagging sucks

[+] heikkilevanto|3 years ago|reply

Cataloguing classical music has always been a headache, even for experienced librarians. There are multiple recordings, some conductors record the same piece multiple times, with same or different orchestras and soloists, arrangements for different instruments (maybe in a different key), parts of a piece known under alternative names (Bach's Air for a G-string is just one movement from his orchestral suite, and he never called it that anyway). Old composers often borrowed bits or rearranged whole pieces (Mozart did his version of Handel's Messiah), and various historical ways to name and number pieces (concerto #1 in B from opus 6).

[+] powersnail|3 years ago|reply

Tangentially, name eliding is also quite difficult for classical pieces, as there are so many parts that eliding is almost always needed, but no standard order, so it's impossible to know which segment is the most informational. There's the composer's name, the piece's name, the opus number, the piece's alias, the movement's number, the movement's name, the performer, and sometimes, the transcriber's name if it is transcribed.

Elide the end, and you might get:

- Sonata No.[...].mp3

Elide the beginning, and you might get:

- [...] by Julia Fischer.mp3

Elide the middle, and:

- Anne-Soph[...]minor.mp3

I really hope that music apps would provide a choice for multi-lined, non-elided presentation of piece titles.

[+] boppo1|3 years ago|reply

I've wanted to 'get into' classical but this had made it difficult. I just settle with whatever the local station plays and forget the name thereafter.

[+] unsignedchar|3 years ago|reply

There are also multiple issuings of the same recording, sometimes from multiple remasters, or from different new labels for historic recordings (like Furtwanger’s pre-1945 recordings). Often those have significant differences due to different types or generations of noise-reduction techniques used.

[+] someguydave|3 years ago|reply

Yes, and the other problem with classical music is that if you split your movements into separate files hardly any reasonable music player for iOS will keep them together in order.

[+] lukaslalinsky|3 years ago|reply

The problem of music metadata is the relations between things are too complicated. It's not a simple tree. It's a very convoluted graph.

If you make the schema right, it's too complicated and people will not use it, because they just don't care that much.

If you simplify the schema, it's more likely people will use it. However, if it's too simple (artist/album/title/year), you end up with many many inconsistencies and duplication.

Finding something in between is nearly impossible.

With MusicBrainz, we've tried to design a strict schema that works for most things, but then you need to find people to actually enter data in that schema.

Wikipedia is on the other end of the spectrum, everything is free-form and some structure is slowly emerging from that, but it's far from universal.

Structured metadata is just hard for people to manage. Unless they are geeks and they really really care.

[+] woolion|3 years ago|reply

Is there a diagrammatic description (UML-like) of the different modelling approaches that are in use, so that you could pick a standard one for your use case? It would also help knowing what conversions from one model to the other would be automated, and what information would be lost without manual intervention.

[+] pcthrowaway|3 years ago|reply

Does MusicBrainz model its data in a compatible way with WikiData?

I feel like WikiData's support for music tagging is pretty robust and flexible, albeit hard to get people to enter data for

[+] PopAlongKid|3 years ago|reply

This thread doesn't seem complete without a reference to Discogs.

www.discogs.com

They have implemented a database with a pretty good representation of all the music releases (realizations, in other words), as submitted by volunteers (much like Wikipedia). How many of the questions posed here have already been resolved in a reasonable way by Discogs?

Then there was the late great Catraxx database, using much of the same relational structure as Discogs, but based on the MS Access engine. It can be used to write tags to audio files.

[+] russelg|3 years ago|reply

Musicbrainz definitely solves this issue better than Discogs.

Personally I think the main draw of Discogs is the marketplace, even with the level of scalping it has.

[+] mixmastamyk|3 years ago|reply

I thought musicbrainz has the most complete music db these days. It’s also a bit complex due to all the releases it tracks.

[+] an_aparallel|3 years ago|reply

It's even more incomplete without a reference to : https://www.mp3tag.de/en/

I use mp3tag connected to discogs (auth tokens)

It's incredible - i feed it files, it spits out glorious tags, folder names (perfectly formatted using regex)

The only thing that imo is annoying is getting catalogue IDs and using discogs + mp3tag to bulk/intelligently provide catalogue number.

[+] bromuro|3 years ago|reply

Discogs is wonderful. I could learn a lot about music. And I always use it to tag my MP3… since 10 years at least!

[+] tunesmith|3 years ago|reply

I've had this problem to an even more frustrating degree when trying to catalog my own rehearsal recordings. I'm the composer, so at least that part is clear. But the "album artists" vary from rehearsal to rehearsal, I have many recordings of the same version of song, differentiated by rehearsal date, and I have different versions of songs too, for instance from when I add or drop an extra verse or extend a bridge. A lot of these recordings are in iTunes, but now I'm petrified to turn on iCloud Library sync because of the various data loss scenarios that are still out there.

Believe it or not, I had my entire process nailed when Bento existed. Then they got rid of it and I tried to cobble something together with Filemaker and it was always an awkward fit.

I'm not really aware if there's an open source or commercial offering out there for this sort of thing. I've come close a few times to just investing the hundred hours or so to roll my own web-based thing on a private server.

[+] jasonjayr|3 years ago|reply

Is the field named 'Album Artists' or 'Album Artist' ? I noticed in Navidrome, it will group Compilation albums by 'Album Artist'.

[+] jschveibinz|3 years ago|reply

I have always been frustrated by recommender algorithms for music. It seems like the classifications for music—whether a result of volunteer classification or from some type of mixture analysis-just don’t seem to match very well with how one “thinks” about music classification.

For example, I like jazz. I happen to enjoy listening to Toots Thielmann (a jazz harmonica player). There aren’t many well known jazz harmonica musicians, so a recommender system always gives me a bunch of other harmonica players from other genres. This is not at all a good recommendation, since the style of music and the style of playing in other genres is completely different (and not of any interest to me).

There needs to be a way of establishing “user input” as a way to weight the recommender algorithms better. Sort of like a search engine with “+” (more of this) and “-“ (less of that).

Then maybe recommender algorithms will learn and get better.

[+] Freak_NL|3 years ago|reply

Toots Thielemans, not Thielmann (Belgian, not German).

[+] thewebcount|3 years ago|reply

I have to wonder if the problem exists partly because most people don’t care? As a musician myself, I care a little bit. But as a user, I haven’t updated metadata on a music file (other than the ones I create from my own music) in probably 2 decades. What I get from the music services I download or stream from seems decent enough. I’m not a collector of rare or obscure songs or versions of songs. I mostly stream music, and have an older catalog from my CDs (which are now in my attic). I don’t have time to hand-curate the tags or databases or whatever for my music, TV Shows, movies, etc. What the services currently offer mostly works OK for me.

[+] JohnFen|3 years ago|reply

Some care. I care a whole lot, because when I play music, I'm deliberately selecting specific music for a specific reason, so I want to be able to find it in my collection.

But I don't use music streaming services at all -- I run my own media server and stream from that. So what I started doing years ago is to ignore any existing metadata (even song and album titles) and enter it all myself. I've developed a system that meets my needs.

But it also means that every time I buy a new album, I'll be spending 15 minutes or so entering all the metadata for it. That sucks. I'd really prefer it if the online music databases got their act together and tagged everything at least in a consistent manner.

None of the fields in the databases can be trusted completely, but the worst offender of all is "Genre" and similar.

[+] Stealthisbook|3 years ago|reply

Librarians really care about it since they have to get the exact thing a patron is looking for. Most people also do care occasionally. Everybody has the moment when they want to find a specific track, maybe from a live album that featured that one drummer they like. Having a service that serves everybody means every day having to fulfill thousands of those individual once in a decade requests

[+] CrypticShift|3 years ago|reply

I'm for dissociating the metadata from the files. I have a custom (Text) DB of Artists and Albums, enhanced by online services metadata (last.fm, RYM, Spotify). I'm more interested in flexibility (filters, notes...) than in "exactness" (Versions...).

[+] cateye|3 years ago|reply

I think there are 2 problems at interplay here. First, there needs to be nomenclature. This is a weird semantic problem. Music is quite abstract in that sense. The schema is just a small sub problem of this.

Second, how to systematically tag the music based on the defined nomenclature? Without an automated system, it will be subjective and error prone. This problem could be maybe tackled on long term with machine learning.

http://musicontology.com/

[+] amgutier|3 years ago|reply

Roon handles metadata and relationships between entities the least bad of anything I’ve tried. It’ll do things like link covers and live performances of the same song together, or let you group multiple issues of the same album and switch between them.

It’s not cheap, and it takes some effort to fix some bad source data, but I’ve found it very rewarding and get a ton of enjoyment exploring my library now.

https://roonlabs.com/

[+] irrational|3 years ago|reply

> Individual tracks have no relation to each other

I added a bunch of songs to my favorites in Spotify and then hit the enhance button which is supposed to suggest other songs I might like. Many of the songs it suggested were duplicates of the ones I already had, but from different albums. It wasn’t able to recognize that this song was the exact same song in various albums of the same artist.

[+] dmitriid|3 years ago|reply

Disclaimer: I work at Spotify, but not at music ingestion and classification.

I can give you a very high level overview of why it's the case.

Simply put, they are different songs, and it's near impossible to recognize that they are the same song (at least not efficeintly across the entire catalog) due to a combination of any, or even all, of these factors:

- they are on different albums. The albums might not be attributed to the artist. Or be a compilation, and attributed to many other artists

- they may come from different sources and copyright holders

- the metadata may be wrong, or just different (metadata is supplied by copyright holders, and it's often ... weird)

- they may very well be considered different songs by most catalogs (a Japanese bootleg version that is 3 seconds longer is different from the European Best Of release etc.)

- everything is the same except some of the people on the record (e.g. arranged by a different person, so attribution and royalties come into play)

- ids, hashes, lengths, musical structure, or whatever internal systems use to identify, match, combine, and display music may all be different, or different enough to be classified as different songs

- there might be not enough music classified in the genres you listen to to present you with a large enhanced playlist. This is an issue for most non-western music because western music is largely understood, catalogued and matched for most of mmore-or-less popular genres. And even there it probably mostly applies to US nad British music. You're looking for African jazz? Good luck. Internally it's likely just one big lump of music dumped by music providers.

[+] wintermutestwin|3 years ago|reply

>streaming is useful for discovery

Only if you want your service provider to do the discovery for you.

I have AM and I don't need or want their discovery as I am quite proficient at doing my own discovery. This means that 3 of 5 icons on the bottom row of their ios interface are a total waste of incredibly valuable UI space for me.

[+] msla|3 years ago|reply

I'm fine with "service providers" essentially similar to FM radio stations (in addition to actual radio stations, FM or otherwise) doing some discovery for me because the good people at Soma FM and WFMU and KFGM and KBGA know more about their music than I do, and their brains aren't driven by an algorithm aiming at getting me more like what I've already heard. This doesn't have to be human-curated, even, as long as the algorithm behind the selection is not tailored to me.

[+] ThrowawayTestr|3 years ago|reply

I think you should give recommendation engines a chance. I've discovered so many new artists and genres through Spotify.

[+] bob1029|3 years ago|reply

I really think a stupid-simple labeling scheme is the most ideal path. You can get by with only 3 tables and 6 columns if you do this.

Sure, you'd have to have labels like "Release Year 1999", "Track #2", etc., but this path can actually be very elegant and desirable at query time.

A few generic columns added to the label/tag table would allow workarounds to most of the edge cases. For example, if you added an "IsSequential" column, then labels like "Track #1", "Track #2" can be interpreted as such.

I think the dragon is trying to build a schema that directly represents the business of music. There are so many genres, cultures, varieties, etc., that you would go insane before you had everything properly covered.

[+] 369548684892826|3 years ago|reply

Seems ok for personal tagging but for a general schema that can be used internationally, like the kind of thing the link is talking about, you can’t really have words that need translation mixed in with the tag data.

[+] dmitriid|3 years ago|reply

Because music is as messy as humans, and whatever we catalogue is arbitrary at best.

See Every Noise: https://everynoise.com/

Is modern chamber music different from modern classical, and why? How different are Polish free jass and free imporvisation? Canadian black metal and Norwegian black metal and Dutch black metal?

To some, there's no difference. To others, there's a world of difference.

Here's more on the difficulties: https://everynoise.com/EverynoiseIntro.pdf

[+] photochemsyn|3 years ago|reply

With all the buzz over machine learning and 'the classification problem' it might be interesting to run something like a waveform classifier trained on large music collections, i.e. on the binary files themselves, - at least as a means of discovering new music one likes (probably wouldn't solve the metadata problem as described in the article however, that's more of an archival issue it seems).

[+] Spivak|3 years ago|reply

I think you’re describing https://maroofy.com/

But the bigger issue is that while it’s pretty good at finding songs that sound similar it’s not great at finding songs that are musically similar.

But nonetheless I really appreciate a novel recommendation algorithm that’s not based on popularity. I’ve gotten good recs from this site with with less than 10 monthly listeners which is super cool — I’ve never been so underground.

[+] youssefabdelm|3 years ago|reply

I'm currently in the middle of creating a user-contributed site that I think will benefit from "schemas" (or just organization / "cleanliness" in general)... as a COMPLETE beginner who's very concerned about this issue (I really don't want my data to turn into a mess)...

1. what resources would you recommend?

2. Is there a "gold standard"?

Article already mentioned some great terms to look up and explore.

[+] WirelessGigabit|3 years ago|reply

This reminds me of an article I read a while back of weird mp3 tags.

One example was Rammstein's Untitled album, it literally doesn't have a title. How do you tag that?

(if anyone has the link to the article, I didn't add it to my Wallabag collection...)

[+] squeaky-clean|3 years ago|reply

Not the article but another funny one, on the first album by band '68, the track names are a single letter and spell out "REGRETNOT." So both track 1 and track 4 are just named "R".

Track order # is part of the metadata, so it shouldn't have been an issue. But the music app I was using on Android at the time would crash whenever it got to a letter which was repeated (so the 2nd R E or T).

If you look up the album on streaming services, they use the naming format "TRACK 1 R, TRACK 2 E" and so on.

[+] dfan|3 years ago|reply

You are probably thinking of https://dustri.org/b/horrible-edge-cases-to-consider-when-de....

[+] sammalloy|3 years ago|reply

Wikipedia has most of this data already available, so all music software has to do is use it.

[+] ssl232|3 years ago|reply

Or better, Wikidata: https://en.wikipedia.org/wiki/Wikidata.

[+] commotionfever|3 years ago|reply

or https://musicbrainz.org/

[+] gfody|3 years ago|reply

this summarizes a lot of thoughts I've had about my music collection. I'd love to have a relational schema for it but you can keep normalizing forever, especially once you try to normalize away "genre"

[+] aaron695|3 years ago|reply

[deleted]

67 comments