Ask HN: How to legally obtain sports data for commercial use?
I've often thought of building sports related apps (esp. pertaining to fantasy sports) but I've always struggled with how to legally obtain the necessary data (scheduling, statistics, player images, team logos, etc.) such that I can pursue it as a commercial venture. An obvious solution is to simply scrape the info but I'd assume you'd get shut down or blocked rather quickly. Yahoo offers a Fantasy Sports API but it's to be used for non-commercial purposes only.
Can anyone shed light on where/how to obtain current and past sports data that is available for commercial use? (I'm most interested in NFL, NBA, MLB, NHL data)
Thanks!
[+] [-] mattmaroon|15 years ago|reply
At Draftmix we used a competitor of theirs called PA Sports Ticker, which Stats bought shortly after we shut Draftmix down. We had previously used a cheaper one called XML Team, but we realized quickly that we had gotten what we had paid for as the feeds were often updated very late or contained errors. They're fine enough for getting started (and probably the easiest to implement, since you pull the data on demand rather than having them post it to you) especially if you don't require live stats. Live stats cost more and are harder to implement. You could get post-game stats and schedule data for a few grand a year back then from XML Team, live stats for a few times that, but I don't know what Stats buying their primary competitor has done to prices. I can't imagine it's made them get cheaper.
There's a new one called Sports Direct. I don't have any experience with them, but our former salesman from PA works there. I'd be happy to put you in contact if you'd like, just email me. He's a good salesman at least.
For player images and team logos you need to set up licenses. Logos come from the league (NFL, MLB), player images from the players' unions (NFLPA, MLBPA). This is very costly. The actual images themselves can be provided by Stats and other sources, but you can't use them without paying the license (though Stats may have worked out a deal that lets them include that in the package).
The Fantasy Sports Trade Association (fsta.org) is the best place to find service providers for the industry. Anyone worth anything is a member.
[+] [-] mattmaroon|15 years ago|reply
[+] [-] iheartmemcache|15 years ago|reply
[1] http://stackoverflow.com/questions/57106/anyone-know-of-an-n...
[+] [-] weixiyen|15 years ago|reply
[+] [-] mccutchen|15 years ago|reply
I've always assumed that there was some commercial data source out there that would provide all of this information in a nice, structured format for some kind of fee, but I have yet to find it.
One nice thing about major sites moving to "live" scoreboards is that you can often find nicely structured data sources behind them. For instance, here's the NFL's live score feed, in JSON:
http://www.nfl.com/liveupdate/scores/scores.json
(Unfortunately, it's empty as I write this because there are no games going on right now. Here's an example taken late on a Sunday or on Monday morning: http://gist.github.com/626612)
Another, related question is how to get good gambling information (point spreads, totals, etc.) for the same use case. I think this might be easier, as I've come across various sports book sites in the past that offer subscription services.
On Yahoo's NFL odds page, it says their data source is OddsShark (http://www.oddsshark.com/) whose home page advertises
I got in touch with them, but never received a response...[+] [-] davidedicillo|15 years ago|reply
[+] [-] dustym|15 years ago|reply
First off, you are going to have to deal with a rep.
STATS is the big name in the business and they feed, at least partially, many stat resellers from who you might be able to get cheaper rates. From there I'd say you should find a cheap service or a mechanism (scraping, etc) that gives you just enough data to work with and start building against it. Look at XML team for competitive pricing. If you get to the point where your app is past prototype, you should then investigate buying into the full service.
Depending on the day and the feed, wrangling sports data is awesome or horrible or both.
On the subject of scraping, I'm not sure what the legalities are. Obviously you are probably violating the TOS of any site you are visiting if you grab the data, but at the same time, strikes, balls and fouls are facts of the game.
Images and logos are sometimes provided by sports data brokers.
Take a look at http://www.stats.com/ and http://www.xmlteam.com/
[+] [-] dougb|15 years ago|reply
I've been to many baseball games where I've seen people keeping score on paper while watching the games.
[+] [-] jat850|15 years ago|reply
Without permission I don't think it would be fair for me to provide you a direct contact, but they did offer all of the data our site required, in useable formats.
Our initial site only dealt with the NBA as they provided the best avenue for use of their logos and player names.
Feel free to contact me more directly if you want a bit more info.
Best of luck!
[+] [-] cloudkj|15 years ago|reply
I guess there might be some restrictions on who gets access to the official raw data for various games, depending on the sports league. If the costs for getting that data are high, then the only way to circumvent that would be to collect them yourself. Even then, I don't know if the leagues would come at you hard for gathering data and using team names or player names...
[+] [-] kreek|15 years ago|reply
[+] [-] smackfu|15 years ago|reply
[+] [-] mikerhoads|15 years ago|reply
I don't really know the exact pricing structure but I don't imagine it is cheap.
[+] [-] retree|15 years ago|reply
They enforce this strongly, outsourcing it to a company who only does this sort of thing.
[1] http://www.epltalk.com/2010-11-premier-league-opening-day-fi...
[+] [-] gcaprio|15 years ago|reply
http://www.cfbreference.com
There's about 5 years of data that we've culled from the NCAA about CFB. We're adding more every week and will soon go back in time for historical data.
But, our twist is that the site will be upgraded to be a completely consumable site. Full REST API support, dynamic url data generation and more. We're adding new stuff every day. So you can get the data you way in JSON, RDF, XML & HTML depending on your Accept header, querying string parameter and even url parameters.
We are going to try and build apps on top of this date, but data sites are and will remain FREE. We want to encourage community participation contributions. That means free for anyone, anywhere even if you yourself don't contribute data.
We're also going to add scoring / charting apps for mobile phones so that you can chart your own games and, if you'd like, contribute the data back to use.
We're not 100% there yet, but I'll post here when we are. We'd love feedback from the entire HN community, not only on the sports data aspect but on the technical implementation. After all, if it's not easy to use & powerful, we're not doing a good enough job.
[+] [-] sga|15 years ago|reply
[+] [-] joeycfan|15 years ago|reply
[deleted]
[+] [-] ironblunt|15 years ago|reply
We also looked at XML Team and I found their prices to be completely reasonable and they have a per document pricing structure which allows you to control your costs to a much greater extend.
We also spoke with Stats Inc and found them to be pretty unreasonable in terms of dealing with startups and for home projects.
Hit me up if you want any more data or if Benchcoach can help with the data on the baseball side. We're looking at expanding it to football and basketball this year so we've been speaking with XML team about that.
[+] [-] unknown|15 years ago|reply
[deleted]
[+] [-] weixiyen|15 years ago|reply
Getting accurate real-time data is not a hard problem. The real problem is coming up with the money to do it the legal way.
[+] [-] luffy|15 years ago|reply
I have a hard time figuring out what the difference is between having a human read a web page with sports scores on it, and then entering those scores in to your application vs. having a scraper grab those scores automatically. In most cases, these source web pages will be publicly available without requiring any agreement to a terms of service contract.
Scraping a site and using the actual HTML in your application would be a copyright violation, definitely. Sometimes a particular format can even be patented. So I'd definitely stay away from actually scraping out an entire table and inserting that into your app.
But as far as the scores/facts - those are not subject to copyright. So what is the particular legal issue if you are scraping and only getting non-copyrightable facts from a publicly available web page? I'm genuinely curious to know.
[+] [-] sga|15 years ago|reply
[+] [-] hakan|15 years ago|reply
Playerfilter (http://www.playerfilter.com) is built on top of an API that we are looking to expose to the public (use can see it being used in the URL hash). API support isn't live yet but we are working with beta testers. Basically, we return data for players, seasons and games over any time period since 1970. Please check it out and drop us a line if you'd be interested in more details.
[+] [-] terra_t|15 years ago|reply
I'd seriously considered a sports-related project based on open data and I was still concerned that I could get into legal trouble, so I sorta merged the project into something much bigger, in which the sports content would be barely noticeable.
[+] [-] stevederico|15 years ago|reply
[+] [-] shafqat|15 years ago|reply
[+] [-] unknown|15 years ago|reply
[deleted]
[+] [-] briandu|15 years ago|reply
[deleted]