I recreated Shazam’s algorithm with Go

pjs_|1 year ago

Shazam's technology came in part out of CCRMA, which is a very cool and special place on Stanford Campus, with deep connections to early computer history.

I think it is very interesting that so many of the early applications of computer technology have to do with audio. John Bardeen's music box, the first commercial application of the transistor in hearing aids, the HP garage in Palo Alto was originally building audio oscillators, the iPhone evolved from the iPod, the internet was built on copper made to carry analog telephone calls, Bell Labs (ping!), the list goes on.

A friend of mine has the hypothesis that maybe human beings end up figuring out how to do kHz stuff before they go on to do MHz/GHz stuff. Not a perfect explanation but kind of attractive...

crngefest|1 year ago

IMHO it’s because audio is „easy“ to manipulate electronically.

You can transform every audio signal into an electronic signal relatively easy - for graphics there is so much more complexity involved just in making them visible.

A speaker that translates electronic signal into sound waves is a super simple contraption at its core.

Edit:/ and audio is striking - it has a profound effect on every human (except deaf of course). If I wanted to demonstrate the power of electronics/computers I would choose audio as well.

crowdstriker|1 year ago

You're reaching.

halfmatthalfcat|1 year ago

FYI - If this is a true reproduction of Shazam, it’s under patent by Apple through at least March 2025[1].

[1] https://patents.google.com/patent/US7627477

Someone|1 year ago

“An Industrial-Strength Audio Search Algorithm”, the paper where Shazam describes their algorithm (https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf) doesn’t have a clear publication date, but https://www.researchgate.net/publication/220723446_An_Indust... indicates it is from 2003.

That patent was filed in the US on 2004-10-21.

IANAL, but to me, that’s a point against that patent in the USA.

Thaxll|1 year ago

I remember a popular HN post from 10 years ago, that was pulled or the source was pulled because Shazam legally threatened the disclosure of the algorithm. I think it's actually the Google drive file pdf capture from OP's article.

csmpltn|1 year ago

It's:

(1) deriving a simple fingerprint from the FFT of the audio signal

(2) simple indexing

(3) simple similarity search

You need the signatures of all music on earth for this to work though ;)

maeln|1 year ago

So not enforceable anywhere else than in the U.S.

acedTrex|1 year ago

So you are saying to clone the repo now

amelius|1 year ago

Figure 1 looks interesting since it has both a time and frequency axis, when usually signals have either a time __or__ frequency axis.

Now I'm curious how the Fourier (?) transform of a signal at a __single__ given timepoint is even defined ...

throwaway2562|1 year ago

Soon!

mathfailure|1 year ago

[deleted]

shoggouth|1 year ago

I was under the impression that patents aren’t enforceable on open source projects. I assume I am wrong?

VierScar|1 year ago

I dunno how software patents work but I was under the impression that unless you basically copy paste their code, the courts wouldn't consider it patent infringement as you can't patent the function, but rather the specific thing itself which for software is the exact code itself. But if I'm not understanding something please correct me.

yazmeya|1 year ago

I enjoyed this talk at the DAFx17 conference by Avery Wang, co-founder of Shazam. It goes a little into the theory behind the algorithm, and looks at some of the more practical issues (background noise, etc.): https://www.youtube.com/watch?v=YVTnj3OIhwI

DevX101|1 year ago

Adding this to the watch list. Reading this paper was one of the first times I got a 'wow' moment around computing algorthms.

vegabook|1 year ago

Recently found Shazam is less accurate - somehow soundhound is giving me better results. On Shazam I'm getting a lot of results from Asian musical traditions which is great, if it wasn't the wrong song. Maybe they need to improve the algo if they've increased the range of music they will select from? Seems now there's a lot more hash table collision[1].

  [1] https://github.com/cgzirim/not-shazam?tab=readme-ov-file#resources--card_file_box

cglan|1 year ago

Soundhound has always been better than Shazam. It can even pick up people singing and extremely quiet songs

lucgommans|1 year ago

I compared Shazam's, SoundHound's, and BeatFind's recognition library in August 2021. (And tried MusixMatch but it crashed on startup apparently.) Don't think I published it anywhere, these are my raw notes I found among saved chat messages. The format is starting to make more sense now that I'm putting it into a wider window than a chat screen, so I can recommend using a wide browser (94 characters per line should do it). Eyeing the song choices, it looks like I tried to find different genres and artist types (ccMixer/youtube celebrities, to indie, to established) but a larger sample size would obviously have been even better. Still, I hope it's one step up from adding another random opinion!

The conclusion appears to be that BeatFind and Shazam know the most songs, but are also somewhat complementary and all of the services had at least one song they uniquely recognised.

---

    Fun facts:
     * Night Driver (W) said "1 Shazams". I think I was the first person to ever Shazam that. Some of the most obscure things had hundreds, often thousands of shazams!
     * You know where they are taking the hobbits but none of the services do!

    ========

    - ABC = found the song
    - # = number of attempts
    - f = exceptionally fast matching (when it did match, might not be first attempt)
    - ~ = knew one of the songs

    BeatFind:     2B  C     1E    2G 2H  1I  2Jf 2Lf 4M 2Nf 2Of 1Pf 1Rf ~S 2T  1Uf 1Vf 1W 1Xf Y Z
    SoundHound: A 1B        1E 2F 1G 2H  2If     1L         2O  3P  2R     1Tf 2U
    Shazam:       1Bf    1D 1E 1F 1G 1Hf 1I  2J  1L  1M 4N  1O  1Pf 1Rf    1Tf 1Uf 1V  2W 1X  Y Z
    MusixMatch: crashes on startup, presumably it realizes it won't be able to show me ads

    missingno
    Shazam:     A C K Q S
    SoundHound: C D F J K M/N Q S V W X Y Z
    BeatFind:   A D F K Q

    non-universal finds (repeated letter = unique = counts double; slash means same artist so should be counted as one)
    Shazam:     DD F J M/N V X Y Z
    SoundHound: AA F
    BeatFind:   CC J M/N S V X Y Z

    A: Levan Polkka Epic Orchestral Cover version
    B: Pokémon red/blue soundtrack
    C: Mayhem (various songs, it seems either they have all or they have none)
    D: Art Now ft. Snowflake
    E: Hero's Choice
    F: Three Days Grace - Scared
    G: Syrian - Supernova
    H: The Explosion - Here I Am
    I: The Von Bondies - C'mon C'mon
    J: Frank Klepacki - Scouting (C&C TibSun)
    K: Conspiracy - Chaos Theory (demoscene)
    L: Cheshyre - Madness6 (remix) (Newgrounds ID 77998)
    M: Dimrain47 - Twilight Techno
    N: Dimrain47 - Cloud Control
    O: DragonForce - My Spirit Will Go On
    P: Yuki Kajiura - The First Town (SAO)
    Q: THEY'RE TAKING THE HOBBITS TO ISENGARD! THE HOBBITS- THE HOBBITS- TO ISENGARD! TO ISENGARD!
    R: Faithless - Insomnia
    S: Age of Empires 1 soundtrack
    T: Moulin Rouge - El Tango De Roxanne
    U: Van Canto - Master of Puppets
    V: Slack Bird - Jouni
    W: Floppytronic - Night Driver
    X: EgoSalad / Kitboga - Breathe in
    Y: Floppy Drive music: top 4 hits on yt: sweet dreams, imperial march, ghostbusters, beat it. Only ghostbusters was known to any
    Z: Obsidian Shell - Orphanage

---

Note that what I did not test/review introducing noise (like people talking through it) or filtering (like when you hear the music through a wall)

Cieric|1 year ago

While the project does look nice to use and modify. I'm not sure I personally would have posted it yet.

- The instructions seem not to be the best to get it up and running (e.g. "cd not-shazam" and just a few lines later "cd not-shazam/client")

- MongoDB is needed but information on how to hook it up/use it are absent (I would make the DB swapable and provide something less intrusive like sqlite)

- If replacing MongoDB is not possible, I would provide a dockerfile and a docker compose to allow easy startup and testing.

- The client npm install has 8 critical vulnerabilities, these might not actually matter but it makes me hesitant to continue testing

- You might not care about the patent or the copyright, but I would still change the name at the very least. Github itself is located in the US and will remove the project if they receives a DMCA.

- Last, this might not be as important, I would add a way to add songs from wav files. Not everything I'd want to test this with is on spotify or youtube.

I'm not saying this to discourage you or anything, I just think the project needs that little extra bit of polish. Minor things will cause people to discredit or ignore a project. If I get around to it I might make a PR for the project. I want to experiment with audio matching outside of the music space, and your project seems like it'll be the easiest to modify.

Edit: Formatting

ccgzirim|1 year ago

Thank you for the time you took to provide such detailed feedback. I really appreciate your honest input. You've raised some valid points that I hadn't really considered.

I agree that the project could definitely use some polishing. I'll prioritize improving the setup instructions and look into adding a file-based DB for flexibility, as well as resolving the npm vulnerabilities. Adding support for directly fingerprinting wav files is a great idea and something I'll prioritize, too.

Regarding the project name, I understand the potential legal implications and will definitely change it. I'd appreciate any suggestions you might have.

I'm excited about the possibility of your contributions. Please, feel free to open a PR whenever you're ready.

Thanks again for your feedback!

riiii|1 year ago

You post it on HN to get invaluable comments like yours. Good writeup!

unknown|1 year ago

[deleted]

renierbotha|1 year ago

Haven't crawled through the repo (yet) but quick question - where does the data that is being searched over come from? Are you loading a library or searching some large library acquired from somewhere else?

ccgzirim|1 year ago

The data comes from a database of fingerprints connected to the server. These fingerprints are created whenever songs are added.

strongly-typed|1 year ago

This is really cool. I’ve been itching to try building this exact kind of thing as part of my bucket list.

ccgzirim|1 year ago

Thanks. I'm glad it inspires you! It'd be awesome to see you take it on. You can clone it and develop it further.

bravura|1 year ago

It would be quite nice if there were a community-based way of sharing fingerprints.

gradientsrneat|1 year ago

I'd love to see this also, for audio but also picture and video clips as well.

iirc Bittorrent uses a DHT, but the hashes are of the entire content. Not so useful for, say, finding the original version for poorly attributed derivative works.

Tineye is sometimes good for finding the original version of an image.

johnneville|1 year ago

I think musicbrainz supports this https://musicbrainz.org/doc/AcoustID

KomoD|1 year ago

If you insert Spotify songs, wouldn't it make more sense to output Spotify songs too?

ccgzirim|1 year ago

It would actually. But Spotify doesn't allow direct downloads so I had to find the songs on YouTube and download them from there.

blackeyeblitzar|1 year ago

I’ve heard that the Google phones have a built in music recognition feature that is the best implementation of this stuff. Anyone know what their approach was? Apart from that I always have felt Soundhound was better than Shazam

lucb1e|1 year ago

Iirc there was some small algorithm, or perhaps even a piece of hardware, that should trigger when music is playing so that the phone isn't active all the time. From there, I guess they could use any old detection algorithm; for me, the magic was in this super-energy-efficient bit of the chain, though I never read up on the details (if they ever provided any)

jokoon|1 year ago

This is useless unless you have all the songs on earth

Algorithm don't matter, only data matters

nwsm|1 year ago

Here we have an open-source algorithm that is useful to anyone with data. It doesn't have to be music

0cf8612b2e1e|1 year ago

Although, would be curious how good you could get to isolating to a single artist. If you had say one exemplar fingerprint per artist, could an out of dataset fingerprint from their discography cluster to that artist? Obviously not for artists who transitioned musical styles.

Or is the algorithm more feature hash than a clusterable feature vector?

lucb1e|1 year ago

That's like saying the Hutter prize is useless for anyone who doesn't want highly compressed versions of Wikipedia. The underlying code or algorithm is still interesting to study, use, and remix.

38|1 year ago

[deleted]

subpub47|1 year ago

[deleted]

ascorbic|1 year ago

This is cool, but you urgently need to change the name.

ccgzirim|1 year ago

A. SeekSound vs B. SoundSeek vs C. SoundScout

Which would be a better alternative?

montag|1 year ago

Why does OP urgently need to change the name?

dmichulke|1 year ago

Notch of Sam?

Not jus' AM?

DandyDev|1 year ago

Isn’t the whole point of Shazam that you don’t know the song and want to find it? If you don’t know the song, hoeven you provide a Spotify link?

zild3d|1 year ago

this is a demo of the algorithm, not a full app / hosted service using it with a pre-populated database. The spotify link would be to fingerprint the song and add it to the database

paxys|1 year ago

The idea is that you add every Spotify song in the database, and then run your match against them.

unknown|1 year ago

[deleted]

scoot|1 year ago

Shazam is historically interesting, but Google's "hum to search" algorithm is far superior, and even that is nearly four years old (since production).

anticristi|1 year ago

I wonder how long until someone will simply smoosh a billion songs into a "large song model" and make all signal processing knowledge irrelevant.

immibis|1 year ago

You mean Suno?

subpub47|1 year ago

[deleted]

subpub47|1 year ago

[deleted]

johnneville|1 year ago

I'd love a way to use local files instead of spotify/youtube to create the set of fingerprints that is searched.

ccgzirim|1 year ago

I've added that to my to-do list and plan to implement it this weekend.

unknown|1 year ago

[deleted]

euroderf|1 year ago

Run it as a daemon that displays every song in a UI notification ?

theabhinavdas|1 year ago

You deserve reddit gold for this idea

hactually|1 year ago

really decent and nicely done Golang! I'll pull and play with it tomorrow!

ccgzirim|1 year ago

Thanks! I appreciate the compliment on my Golang; This is actually my first full-fledged project with the language, haha. Feel free to reach out if you have any issues running it.

Philip-J-Fry|1 year ago

I think you've leaked your developer key here... https://github.com/cgzirim/not-shazam/blob/main/spotify/yout...

zadokshi|1 year ago

Does this mean he could accidentally get a $1 million credit card bill from google from someone using his key without his permission? (I don’t know how it works with google.)

ccgzirim|1 year ago

Oops... Thank you. I've disabled it.

unknown|1 year ago

[deleted]

jkdmyrs|1 year ago

[deleted]

jena2244|1 year ago

[deleted]

wmichelin|1 year ago

Hardcoded sleeps for some reason, nice /s

https://github.com/cgzirim/not-shazam/blob/888070f3434acbc0a...

rvnx|1 year ago

It's to go around the ban of the IP / account by Spotify and to be softer with them, you have to wait between two requests to download songs.

lucb1e|1 year ago

I also use sleep a lot in my code when interfacing with third-party services (multiprocess usually so it's not blocking things, though I'd also totally see myself using a callback pattern or so if the caller can handle those). When it's more than an ad-hoc piece of code, it generally measures how long ago the previous request finished to determine how long to sleep if the next call is made within the cool-off period. If you're not doing that... please don't interface with my server

msie|1 year ago

I enjoyed reading the Go source. As opposed to the time I had to read some Ruby code.

117 comments