top | item 7778239

Open Football Data

157 points| vinhnx | 12 years ago |openfootball.github.io | reply

66 comments

order
[+] keithxm23|12 years ago|reply
This is impressive for the amount of work put in to formatting the data and making it easy to use in different ways.

For details and advanced analytics though, this one is much better: https://github.com/soccermetrics/soccermetrics-client-py

[+] toyg|12 years ago|reply
Holy fullbacks, Batman! I didn't know about SoccerMetrics. The Python client seems to be just a wrapper for their REST api: http://soccermetrics.github.io/fmrd-summary-api/started.html

Lots of numbers to crunch there!

This is all very nice but it would be nicer if there was some sort of cheap software that amateur teams could use to gather and then analyze their own data. There's a massive market out there for this sort of thing, the football world is very conservative and tends to move slowly.

[+] sourc3|12 years ago|reply
You, my friend, are the best. As a huge soccer fan and a developer, getting this sort of data is really hard unless you shell out hundreds of dollars a month.

Already thinking about the apps that will use this! Thank you.

[+] sourc3|12 years ago|reply
And kudos for calling it football :)
[+] philtar|12 years ago|reply
Last I checked it was thousands of dollars a month.

Something around the tune of $25k a year. Anyone actually paying for this now and can provide pricing?

[+] isaacremuant|12 years ago|reply
Agreed. Having this for the world cup will be awesome to try toying around with a couple of games or apps.

The more the better!

[+] cabbeer|12 years ago|reply
Does anyone know if this is available for (American)Football?
[+] rpedela|12 years ago|reply
A very cool project, but I have one question/issue.

The data format seems to be a custom text format which admittedly I could be wrong about. Is it possible to use TSV or CSV instead since it would be infinitely more useful since it could be directly imported into relational databases, Excel, etc.

[+] ddispaltro|12 years ago|reply
Is there an open database for horse racing?
[+] phillc73|12 years ago|reply
That depends on your definition of open.

The short answer is no. I've searched long and hard, high and low, for free (beer) horse racing databases for UK/IRE and Australia. To a lesser extent I've searched for HK, FR and GER data. I'm yet to find anything that is comprehensive and no cost.

There's a couple that I do use for UK/IRE racing which cost in the region of £35-£45 per month for access. Betwise/Smartform provides an historical database in MySQL, and daily race card/results updates. UKHorseRacing.co.uk provides CVS files with historical race data, their ratings and race results. I take these CVS files, combine them into a SQLite database and interrogate with R.

A slightly longer answer is, sort of. The Betfair API is currently open access for non-commercial and low volume use (as far as I'm aware). This will allow you to retrieve basic racing data - the cards before that race with horse name, jockey, barrier etc and the race results post-race including the Betfair Starting Price. After interrogating the API, you'll need to obviously compile the data into your own database. A bit of work, but feasible. Betfair has a developer programme and their are API bindings available in a number of different languages. I use R (R package developed by Betwise mentioned above), but I know Python is available. One caveat to mention is that Betfair are upgrading their API, so this will obviously have an impact on existing programs using the old one.

If anyone else has additional information or could point me in the direction of something else "free" I'd appreciate it as well.

[+] chevreuil|12 years ago|reply
The data format bothers me. Why not use a standard one like JSON?
[+] shirkey|12 years ago|reply
Agreed -- looking at the player data[1], IMO the format type is unrecognizable:

  ## GK / Goalkeepers

  Kawashima|Eiji Kawashima,   20 Mar 1983
  Nishikawa|Shusaku Nishikawa,   18 Jun 1986
  Gonda|Shūichi Gonda,   3 Mar 1989

  ## DF / Defenders

  Inoha|Masahiko Inoha,   28 Aug 1985
  G. Sakai|Gōtoku Sakai,   14 Mar 1991
  Nagatomo|Yuto Nagatomo,   12 Sep 1986
  Uchida|Atsuto Uchida,   27 Mar 1988
  Konno|Yasuyuki Konno,   25 Jan 1983
  Kurihara|Yuzo Kurihara,   18 Sep 1983
  H. Sakai|Hiroki Sakai,   12 Apr 1990
  Yoshida|Maya Yoshida,   24 Aug 1988
  Masato Morishige,      21 May 1987   ## Japan F.C. Tokyo
Comments as a double-hash, key fields are either player last name or occasionally first initial-space-last name, then three different delimiters of pipe, then comma, then tab. Choosing either a consistently delimited format or a more verbose JSON/YAML structure with clear metadata would seem to be a better approach.

[1] https://github.com/openfootball/players/blob/master/asia/jp-...

[+] nathancahill|12 years ago|reply
The size of JSON files is huge compared to delimited data. Languages like Python make it equally easy to consume delimited data and JSON, so it shouldn't matter much.
[+] abeisgreat|12 years ago|reply
I'm curious if this data is actually public domain. Where are they sourcing it from? Are they legally allowed to redistribute? Etc.
[+] bronson|12 years ago|reply
Why wouldn't they? It's just raw facts, presented in their own minimal style.
[+] MisterBastahrd|12 years ago|reply
Yeah, me too. The official statistics of sports leagues are rarely ever in the public domain. Official being the key word here.
[+] ntietz|12 years ago|reply
This is really cool! Does anyone know if there are similar datasets for other sports out there? Even less clean datasets, as long as they have permissive licensing to allow sanitation and republication.
[+] cwyers|12 years ago|reply
The gold standard for freely-available sports data is baseball, with the Retrosheet project:

http://retrosheet.org/

The license on the data is a pretty permissive one, simply requiring attribution of the data to the Retrosheet project. Software to process Retrosheet files is available, under the GPL:

http://chadwick.sourceforge.net/doc/index.html

[+] isuraed|12 years ago|reply
I believe ESPN has NBA play by play data.
[+] redshirtrob|12 years ago|reply
This looks cool. I see Gold Cup and NA Champion's League repos. Is there a plan to add MLS data? I know some people who would be super excited to get baseball-reference.com level data for MLS.
[+] ngoel36|12 years ago|reply
Is there anywhere to get real-time play-by-play data?
[+] packetslave|12 years ago|reply
There are several, and you'll pay a lot of money for them.
[+] maaaats|12 years ago|reply
Betradar / Sportradar has it.
[+] fiatjaf|12 years ago|reply
I don't know from where did this came from, but I like open formats. From where do the data come?
[+] dalek2point3|12 years ago|reply
its a shame that this is not being done under the wikidata framework. those guys have been thinking about databases like this for a while, and can be reliably trusted to at least keep it up for a reasonable amount of time.
[+] veganarchocap|12 years ago|reply
Where's Derby County's stats?! Just kidding this looks great!
[+] ins429|12 years ago|reply
awesome, exactly what I need
[+] rurabe|12 years ago|reply
perfect timing. thanks!