top | item 43723020

I analyzed chord progressions in 680k songs

308 points| jnord | 10 months ago |cantgetmuchhigher.com

147 comments

order
[+] huimang|10 months ago|reply
Using absolute chord analysis instead of relative chords (i.e. roman numeral analysis) doesn't make sense. As others have noted, the original dataset is flawed because the structure of a song is critical, you cannot omit repeating chords. Programmers/analysts should take more care to understand music theory or the underlying field at hand, before compiling datasets or doing analysis.

"Most common chord" is mildly interesting, but not really that useful. The most common key, and the most commonly used chords relative to that key (i.e. with roman numeral analysis) would be much more useful and interesting. This would help paint a clearer distinction between e.g. country and jazz, not that "jazz uses Bb major more". Also, anyone with general instrument knowledge would surmise that since Bb and Eb instruments are much more prevalent.

"If you’re sitting down to write a song, throw a 7th chord in. The ghost of a jazz great will smile on you."

7ths don't belong to jazz only, and the average songwriter isn't making data-driven decisions on how to settle on the chord structure for their song.

[+] pfisherman|10 months ago|reply
Agreed on chord numbers and progression being the analysis that should have been done. For example, blues is mostly defined by a 1-4-5 progression and the ol 2-5-1 is pretty ubiquitous across time and genre.

Also, I think disappearance of 7th chords - major, minor, or dominant - is vastly overstated. Keep in mind that these are from guitar tabs so likely ignoring chord inversion / voicing / substitution taking placw to simplify notation. For example a B minor triad can be substituted for a Gmaj7.

Bm triad = B,D,F#

Gmaj7 = G,B,D,F#

Or if you want to be fancy a Bb/Gm can work as either Bbmaj7 or C7 depending on where you put it in a progression.

[+] seertaak|10 months ago|reply
Agree completely. I assume OP means major or minor 7th chord - they can't possibly mean dominant 7th, because...does there even exist a single blues song which doesn't have that chord?

And let's say you take maj7 chords - "you and me song", "you are so beautiful", "sing sang sung", "1975" - just off the top of my head. Pretty much any pop song which is melancholic sounding.

For min7, choose virtually any Santana song.

Even if you said maj9 or min9 it still wouldn't be remotely true. Otoh 13th chords....I think you'd have to reach to find a non-jazz occurrence of that chord. And it happens in jazz all the time.

[+] CuriouslyC|10 months ago|reply
I think most musicians know that I-IV-V-I is the zero thought default for in key chord progression, it's so overused you don't need fancy analysis to figure it out.

For me, I'm more interested in the intervals and voicing pairs, because those tell you something deeper about the music that you don't get from the chord progression.

[+] peanut-walrus|10 months ago|reply
Wouldn't using relative chords simply show that 99% of songs use the I chord? :)
[+] apercu|10 months ago|reply
To further this, my trio is down a half step because we’re older now and it’s easier to sing at a lower register. This is pretty common for a lot of over 40 artists as well.

Also, as you know, blues has dominant 7ths all over.

[+] golergka|10 months ago|reply
> average songwriter isn't making data-driven decisions on how to settle on the chord structure for their song

Depends on what do you call data-driven. A songwriter most likely knows that a lot of fifth chords to gives power-metal vibes, and diminished and out-of-key songs do give these ghosts of jazz.

[+] uoaei|10 months ago|reply
The parallels between your critique of music analysis, and linguists' critique of LLMs, bear remarkable similarities. "Language/thought is more than sequences of tokens" will still be true no matter how much data we throw at the problem to smooth the rough edges.
[+] sysrestartusr|10 months ago|reply
> and the average songwriter isn't making data-driven decisions on how to settle on the chord structure for their song

aren't decisions like that implicit to the source of learning/inspiration? it's not data-driven on the surface of the writers awareness, and maybe not data-driven in the statistical sense, but "intuitively", "that which sounds good successively", is based on what one heard so far within the context of the song ... so it's one hundred percent data-driven, just not data that one has consciously quantified.

IMO: average songwriters and musicians and producers are the top exactly because they hit exactly that big fat belly of the bell curve/ G distribution ... I'd say you have it backwards... there's much more experimentation and less data-stuff going on left and right of the average

[+] randomNumber7|10 months ago|reply
Why is there currently so much low quality low IQ content on hn that gets up voted?
[+] memset|10 months ago|reply
The way this analysis, and the original dataset were created, makes no sense. This is, in part, not the author's fault, since the original data [1, 2] is flawed.

First, the original data was constructed like this: "...The next step was to format the raw HTML files into the full chord progression of each song, collapsing repeating identical chords into a single chord (’A G G A’ became ’A G A’)..."

Already this makes no sense - the fact that a chord is repeated isn't some sort of typo (though maybe it is on UltimateGuitar). For example, a blues might have a progression C7 F7 C7 C7 - the fact that C7 is repeated is part of the blues form. See song 225 from the dataset, which is a blues:

A7 D7 A7 D7 A7 E7 D7 A7

Should really be:

A7 D7 A7 A7 D7 D7 A7 A7 E7 D7 A7 A7

With these omissions, it's a lot harder to understand the underlying harmony of these songs.

The second problem is that we don't really analyze songs so much by the chords themselves, but the relationships between chords. A next step would be to convert each song from chords to roman numerals so we can understand common patterns of how songs are constructed. Maybe a weekend project.

[1] https://arxiv.org/pdf/2410.22046 [2] https://huggingface.co/datasets/ailsntua/Chordonomicon/blob/...

[+] zenogantner|10 months ago|reply
The problem with collapsed repeated chords comes not only from the data processing -- most Ultimate Guitar songs are written down entirely ignoring how often a chord is repeated -- the classic "lyrics plus chords" format is incomplete and requires the player to somewhat know the structure of the song anyway. The write-up usually just gives hints where, relative to the lyrics, the chord changes.
[+] b800h|10 months ago|reply
I agree with you to some extent, but I'm also alive to the problem of how you achieve what you're talking about when chords can change at any point in a bar.
[+] volemo|10 months ago|reply
Could you explain the Roman numerals part?
[+] vthommeret|10 months ago|reply
If you're interested in more relative chord progression analysis, check out Hooktheory (I'm not affiliated but I think love their two books / apps):

https://www.hooktheory.com/theorytab/index

It's "just" 32K songs, but you can see the top chord progressions:

https://www.hooktheory.com/theorytab/common-chord-progressio...

And see which songs follow any chord progression you choose (either absolute or relative chords):

https://www.hooktheory.com/trends

[+] ronyeh|10 months ago|reply
I’m a huge fan of Hooktheory, and have bought all their books and products. Thumbs up!
[+] ben7799|10 months ago|reply
As others have said this is interesting but use of Ultimate Guitar is flawed as the tabs/scores are so bad on that site, very often not even being close to the real chords.

On top of being simplified tons and tons of songs get rewritten with a Capo so people can just play G-C-D shapes, if your analysis doesn't look for "Capo" and then transpose all the chords then you end up overrepresenting the key of G and it's chords. Then very often 7th chords, Sus chords, etc.. all get transcribed down to major chords & minor chords due to the beginner focus. Interestingly he doesn't include 6th chords as their own thing.

To be fair there are tons of songs that do actually use those chords, so they may still end up coming out as the most popular.

I have a grandfathered in lifetime membership to UG that I only had to pay once for. It was cheap so worth it, but I really find the site kind of icky as they are mostly monetizing crowd sourced low quality work and it's very often wrong. And they nerfed their iPad app recently which is really annoying.

[+] strunz|10 months ago|reply
The fact that the data showed only 6% of Metal songs having power chords should've told him to throw the data out the window. UG has terrible tabs/charts.
[+] YZF|10 months ago|reply
It's really not that bad. It's a mix. There are also many versions for most songs and often comments with corrections.

Learning songs by ear is probably a useful skill that people don't develop because of all the other sources of information... but probably helps more people play which is good.

[+] dyauspitr|10 months ago|reply
They have quite a few versions usually but the most accurate version is usually the one with far and away the most positive reviews/upvotes.
[+] kjkjadksj|10 months ago|reply
The power tabs and guitar pro tabs are a big step up over the text based stuff on ug. You can play it in midi and see they are usually perfect.
[+] cjohnson318|10 months ago|reply
Listing the "most frequent chord" is a weird analysis, I'm more interested in the "most frequent key", or a transition matrix from one key to another, e.g., if I'm in F, what's the chance I go a fifth up to C, or a fourth down to Bb. Just telling me G is a popular chord doesn't do much.
[+] alexjplant|10 months ago|reply
Interesting analysis. Some observations:

- Ultimate Guitar isn't exactly known for the sterling quality of its transcriptions. Teenage me submitted at least a few tabs that were clearly incorrect that still got 4 and 5 star ratings. Amateur guitarists are also infamously bad at figuring out voicings and extensions so something like a 9 might end up as a maj7 or just a triad. Adult me checks Songsterr first then uses his ear to figure out what's _really_ going on when I run across incorrect parts in the tablature.

- Some genres of music like downtuned metal are largely monophonic and instead rely on quick melodic movement or drone-y background guitars to imply harmony. This data set doesn't seem to account for this.

- There's no way that power chords only account for single-digit percentages of chords in rock, metal, and punk. There are albums that have been certified Platinum that are 90% power chords (technically power intervals, I suppose).

[+] hirvi74|10 months ago|reply
I find the analysis interesting in terms of a hobby project, but I'd be careful extrapolating too much out of this. 680k is quite the sample size, but my issue lies within the myopic selection of one instrument and the issues that arise from the platform of Ultimate Guitar.

1. I am curious, how many of the 680k songs are unique? It is rather uncommon for massively successful songs to only have one version of tabs out in the wild, so I am curious how many songs individual songs were counted multiple times.

2. This analysis only looks at guitar tabs or instrumentations there were transcribed for guitar. Chords can be made with more than just one instrument, thus that missing 7th note could actually be played by another instrument not included in the tabs.

3. As music progressed from the pre-jazz era to modern times, it became more common for people to play an instrument, like piano or guitar, while singing at the same time. Obviously there are exceptions to everything, but often times guitar pieces are simplified if the guitarist is also singing for practical reasons.

4. Music has also become more accessible as time progressed. It would be hard for an average person to learn the organ or hurdy-gurdy without access to one. It's much easier to acquire and learn piano when it can be a 4 inch thick plastic keyboard on a stand.

5. People tend to have a warped concept of the history of music. Pachelbel's Canon in D is by no means a complex song and has stood the test of time. Music through out time has also served different purposes. Hell, go back to Ancient Greece, Gregorian chants, and Medieval music. Those various time periods were not generally fully of complexity either. I would argue such times were generally less complex than modern music.

[+] iambateman|10 months ago|reply
I think Ultimate Guitar has a lot to do with this.

Sure, G is probably the most popular chord, but there are a _lot_ of chord sheets that are wrong or incomplete. If someone were to play many of these songs as charted on UG it would sound unrecognizable.

Kind of invalidates the analysis IMHO

[+] dehrmann|10 months ago|reply
> Pachelbel's Canon in D is by no means a complex song and has stood the test of time

It was actually mostly forgotten until the 1960's.

https://en.wikipedia.org/wiki/Pachelbel%27s_Canon#Rediscover...

Can anyone find a version without Paillard's changes? Knowing the history, I suspect they have more to do with the song's popularity than the original composition.

[+] alexjplant|10 months ago|reply
> People tend to have a warped concept of the history of music. Pachelbel's Canon in D is by no means a complex song and has stood the test of time. Music through out time has also served different purposes. Hell, go back to Ancient Greece, Gregorian chants, and Medieval music. Those various time periods were not generally fully of complexity either. I would argue such times were generally less complex than modern music.

True facts. The fifties and sixties were replete with simple, disposable pop music. "Yummy Yummy Yummy" topped the charts in the late 60s and has, what, three chords in it? What about "Sugar, Sugar" or the Monkees? Staff songwriters and session cats cranked this stuff out by the ton back in the day but people still love to take potshots at modern pop music for being inferior to the oldies in this regard.

[+] otabdeveloper4|10 months ago|reply
> Music has also become more accessible as time progressed.

Hell no. Before recorded music literally everyone was a musician in one way or another. Music was an activity you did while bored. (Today music is not an activity, it's a product to consume.)

They had simple woodwinds and percussive instruments. People weren't playing the church organ while waiting for the cows to come home.

[+] divbzero|10 months ago|reply
Isn’t OP analyzing frequencies of individual chords, not chord progressions?

Analyzing individual chords involves counting the frequency of each chord (such as G, C, or D).

Analyzing chord progressions would involve counting the frequency of chord pairs (such as D—A or C—G), chord triplets (such as D—A—Bm or C—G—Am), or longer sequences of chords. For an alternative look at the data, you could also normalize chord progressions across key signatures for your analysis (D—A or C—G would both normalize as I—V, D—A—Bm or C—G—Am would both normalize as I—V—vi).

[+] jancsika|10 months ago|reply
> An “interval” is a combination of two notes.

Minor nitpick: it's a "dyad" that is a combination of two notes.

An "interval" is the difference between two (or more) pitches. And just as you'd measure the space between your eyebrows using a ruler, you'd measure the interval between middle C and concert A using your ears.

The bonus, however, is that our listening apparatus is already quantized to octaves-- if you hear a pitch against a second pitch that's double/quadruple/etc. the frequency of the first, your ear marks this interval as special. It's likely most of you've already used this fact to your advantage; perhaps unwittingly, when someone begins singing "Happy Birthday" outside your normal singing range. (Though most renditions of "Happy Birthday" lend credence to Morpheus' lesson from The Matrix that there's a difference between knowing the path and walking it.) :)

[+] cole-k|10 months ago|reply
It's an admittedly smaller dataset, but Hook Theory has an analysis that allows you to search by chords (including relative) and look at trends:

https://www.hooktheory.com/theorytab

https://www.hooktheory.com/trends

It's a weird coincidence to see this post since I only occasionally remember about Hook Theory and binge it, but I remembered earlier this week.

Many of you have probably heard the Axis of Awesome four chords song (if not, look it up, it's great), but it's fun doing the same thing with other songs.

Like, did you know that you can sing the chorus of Numb by Linkin Park over the chorus of...

* I Hate Everything About You by Three Days Grace

* Immortals by Fallout Boy

* Cheap Thrills by Sia (swung Numb lol)

(+ the bridge of The Rock Show by Blink 182)

Numb has a pretty common chord progression so I could pick songs with the exact same chords, but there are also some oddly specific finds like this video game (?) song that inexplicably has the same relative chord progression as Hotel California https://www.hooktheory.com/theorytab/view/zun/reincarnation#...

---

I am often surprised how a seemingly simple chord progression has only one result, even when I search by relative chords and ignore extensions and inversions, e.g. https://www.hooktheory.com/theorytab/chord-search/results?ke...

However when you put that query into the normal search box, it does match a lot more songs, showing that there is a i III _ VII trend, just that i III vi VII is strange (which I guess makes sense). Perhaps my lack of music theory makes it harder to normalize my queries, but it's also possible that (1) there isn't enough data or (2) there is inconsistency in how people annotate the pieces (some songs will have II II II II, for example, following the rhythm, whereas some songs will have just a single II).

[+] parpfish|10 months ago|reply
Hook theory: It doesn't matter what I say, so long as I sing with inflection
[+] throwaway0665|10 months ago|reply
Does this take into account capo position? A G is easy to play so authors might use G to play a Bb for example with a capo to avoid barre chords. Likewise authors will choose simpler chord substitutes to make it easier to play.

It's the same with lead sheets / the real book style music books. Performing musicians need to reproduce music quickly so only the triad will be written down even if the musician ends up playing some other extensions.

The data is heavily biased towards simplicity. You can make conclusions about the data - but not music as a whole.

[+] mastazi|10 months ago|reply
I'm surprised that according to the article, in jazz, some chords like D and A, which are mostly found in sharp keys, are more common than chords like Bb and Eb, which are usually found in flat keys.

I remember once creating a dataset based on 50 random tunes from the Real Book and sharp keys were less than 20% of the total (based on the key signature at the start of the score) so that graph in the article doesn't seem right.

Maybe the discrepancy is because modern jazz fusion tunes are under represented in the Real Book and those are usually more guitar-oriented, so perhaps more likely that the musician would pick a sharp key like D or A. As opposed to straight-ahead jazz were people try to accommodate for sax/trumpet/trombone etc.

Or maybe it's because chords like D or A can be dominants in minor keys that are flat keys, e.g. D in the key of Gmin or A in the key of Dmin. - EDIT I just realised that dominants are listed separately so this is not the case.

One more thing: according to the article, major triads make up more than 50% of chords in jazz... what? That's certainly wrong, most major chords in jazz are usually maj7th or 6th even when they don't have upper extensions. I think that what they actually meant is "major chords that are not dominants". But they used the label "major triad" instead.

[+] duped|10 months ago|reply
I think large scale automated harmonic analysis is a worthy endeavor for the purposes of musicological research that could even be applied to pedagogy (deceptively hard problem: identify what piece(s) of music to teach to achieve specific goals for students).

But you really need good (and preferably ethical) sources of data to do that, and UltimateGuitar ain't it. You also probably want to engage with some music theorists to normalize the data to give you better analysis and ask better questions than "what is the most common chord."

From this analysis I don't think "is music getting simpler" can be answered, and I think the trends are interesting questions to investigate for musicologists but this data set and analysis are too flawed to answer them.

[+] anigbrowl|10 months ago|reply
One thing that jumped out at me was the data point suggesting there are very few power chords in electronic music. But in fact, they're ubiquitous because it's easy to make a power chord in a single note, by tuning oscillators a 5th apart. Any synth with 2 or more oscillators comes with a bunch of 5th patches (or patch sheets if it's all analog). It's one of the first synthesis techniques people learn to make thick-soundings patches.

Also the whole idea of doing the analysis based on absolute rather than relative notes makes little sense to me as a musician, though perhaps that's because I didn't start with guitar or a tuned instrument like a trumpet.

[+] notfed|10 months ago|reply
This seems to be an analysis of chords used, not chord progressions?
[+] teleforce|10 months ago|reply
Fun facts you can use circle of fifths as references or cheat sheet for good Chord Progression [1]:

"Chord progressions also often move between chords whose roots are related by perfect fifth, making the circle of fifths useful in illustrating the "harmonic distance" between chords."

It'll be very interesting to analyse the available songs data to find chords that follow circle of fifths.

By cross-reference patterns with the circle of fifths, we might just end up with the LLM equivalent of data-driven musical composer that's capable of generating harmonically pleasing, genre-aware, even hit songs chord progressions.

[1] Circle of fifths:

https://en.wikipedia.org/wiki/Circle_of_fifths

[+] zzo38computer|10 months ago|reply
I do not see the mention of what chord progressions are used. They did mention what chords (according to only the notes, not according to the key) are common, though.

I would expect that a full analysis should write the roman numbers (so you will have to know what key it is as well), and might also consider such things as non-chord tones, modulation to other keys, etc. (However, this is not as simple as just putting them into the computer and writing a SQL query or whatever.)

What I had seen on television and what I had read, is that I-V-vi-IV chord progression is common in modern music. (There is also i-VI-III-VII, which is the relative minor key than I-V-vi-IV, which is obvious once you realize it.)

[+] notquitethere|10 months ago|reply
Notwithstanding how these are read, e.g. C can be Cmaj7 or other substitutions, that are understood by the musician and not always written or transcribed accurately, this analysis is akin to counting the number of times a digit 1 through 12 occurs in a financial statement and analyzing all annual reports. Or taking all the fiction books in the library and counting frequencies of characters translated to ascii codes. These are chords not chord progressions. Progressions would be i-iv-v, ii-v-i, and the prevalence of these. An interesting start, but the main meat of this analysis is unexplored.