Echoprint: Open-source music fingerprinting

[+] a3_nm|14 years ago|reply

I'm a bit confused by the database license. Why do you want people to contribute additional data specifically back to you, rather than requiring them to release it under a compatible license which would allow you to incorporate it if you wish? This is substantially different from usual copyleft licenses.

As an example, notice that Wikipedia does not require people to send modified articles back to the Wikimedia Foundation, or to allow them to use the data as they see fit (this is clause D. 3. d. in your license). They just require people to contribute under a copyleft license, and they can thus incorporate derivative versions published elsewhere if they want. This is nice because it ensures that the Wikipedia content can be useful even if the Wikimedia Foundation disappears.

Anyway, awesome work, congrats!

[+] jws|14 years ago|reply

Why do you want people to contribute additional data specifically back to you…

One possibility is so that database can be created on the backs of the users, then the database owner can slam the door and turn it proprietary, like the CDDB did.

Their stated goals don't lean that way, but I don't see a clause that either permits or forbids the database owner this action. Lawyers will be required to figure that out. I suspect there is an implicit ability for the grantor to terminate the license, but that is what lawyers are for.

(I did notice that the termination effects reference clauses that don't exist.)

[+] chl|14 years ago|reply

Given how threat-, or, let's say, notification-happy Landmark was just a while ago [1], does anyone have an idea regarding the patent situation? Is this implementation different enough to be considered (reasonably) "safe"?

[1] http://www.redcode.nl/blog/2010/07/patent-infringement/

[+] brianwhitman|14 years ago|reply

We invented everything about Echoprint from scratch, working with some awesome scientists and audio guys. I'm not a lawyer and won't comment on legal stuff here though.

[+] mark-r|14 years ago|reply

No question I'd talk to a lawyer before trying to integrate this service into my own product.

[+] megamark16|14 years ago|reply

This is really amazing, and I can't wait to see all of the possibilities it opens up now that people can create their own databases. I'm tempted to set up an app to fingerprint and dedupe all of the music spread out throughout the network here at work.

[+] VMG|14 years ago|reply

This. Also proper tagging for once.

[+] brianwhitman|14 years ago|reply

if you have any questions, let me know. we're very excited about this!

[+] Aissen|14 years ago|reply

About your data dumps: you're about to get hammered, so please share them in torrent! This would be much better for the thousands of people wanting to bootstrap.

Also, I understand json is very easy to use, etc. But those big dumps cry for a binary format. Or at least add zlib/lzma compression so people don't waste bandwidth on uuencoded binary data in json.

[+] jbk|14 years ago|reply

Where do we go and ask questions for setting up a mirror for us (VLC/VideoLAN)? Is there an IRC channel?

[+] samps|14 years ago|reply

What's the relationship (if any) or distinction between Echoprint and Chromaprint/Acoustid? http://acoustid.org/

[+] osdf|14 years ago|reply

I'm curious about the hashing algorithm. I read about the planned whitepaper, but some preliminary info would be cool (e.g refs to academic papers that you build on).

[+] OoTheNigerian|14 years ago|reply

Awesome stuff.

We are working on something that will benefit both parties (us and you guys) immensely. Is there a way to contact you? No info on your HN profile

[+] denimboy|14 years ago|reply

I think the fingerprinting part is similar to pHash: http://www.phash.org/ but echoprint is more focused on music and they are building a database of fingerprints.

I think pHash also has functions for fingerprinting music but might not be as precise since pHash is not strictly focused on music.

[+] JonnieCache|14 years ago|reply

Echonest are some cool people. Their earlier APIs enabled the illustrious although sadly now defunct http://www.donkdj.com which was done by a classmate of mine as a project for a Generative Creativity course we did at uni.

Looks like their research has taken them a lot further!

[+] natch|14 years ago|reply

I'd love to see a project where the data is MIT licensed too, not just the code.

[+] regomodo|14 years ago|reply

A very interesting project. I've whipped up a little test program(https://github.com/regomodo/handy_scripts/blob/master/echopr...) in Python and found either the codegen or echonest to be a little buggy. Daft Punk fingerprints come back with some very unusual results. http://pastebin.com/8Tfvd0SZ

[+] brianwhitman|14 years ago|reply

can you file an issue on echoprint-codegen or write us at the google group so we don't lose that? That definitely shouldn't be happening, there's an issue somewhere for sure.

[+] caf|14 years ago|reply

Why aren't the fingerprints in the database covered by the recording copyright on the song that they were derived from?

[+] starwed|14 years ago|reply

Because it's less like a copy, and more like a name?

In any case, even if this was technically a copy, I can't imagine this possibly failing the fair use test.

[+] JonnieCache|14 years ago|reply

If the fingerprint data of a track is covered then by that logic its SHA1 hash etc. would also be covered, and I can't see that line of argument going anywhere.

[+] roel_v|14 years ago|reply

Maybe they are, to the best of my knowledge there's no definite answer that they aren't. I find it unintuitive but I do think that rationally there's a case to be made that they are, in fact, 'derived works'.

[+] highace|14 years ago|reply

Woah. This is going to be massive. The amount of things that could potentially be built on top of this is scaring me.

[+] unknown|14 years ago|reply

[deleted]

[+] paulnelligan|14 years ago|reply

How is this different from Shazam?

[+] Stuk|14 years ago|reply

Because it's open source.

[+] yesbabyyes|14 years ago|reply

http://www.shazam.com/tc

[+] stevenp|14 years ago|reply

This is huge. I've been mulling some ideas for awhile that would require music fingerprinting, but I've always been too overwhelmed by the available options, from a licensing and implementation standpoint. I can't wait to play with this!! :D

[+] jbrennan|14 years ago|reply

Looks incredible! One note though, the page seems to partially break for me in Safari on the Mac (the sidebar overlaps onto the content as I scroll horizontally).

But the tech looks incredible. Good work for releasing this!

[+] brianwhitman|14 years ago|reply

oops! as you can tell, we're pretty good with music data and not so much on the web site design. i'll try to fix it :)

[+] brianwhitman|14 years ago|reply

for more on the whys, here is the EN blog post: http://blog.echonest.com/post/6824753703/announcing-echoprin...

[+] paisible|14 years ago|reply

Holy balls you guys are awesome for releasing this.

53 comments