This is all very nice but it would be nicer if there was some sort of cheap software that amateur teams could use to gather and then analyze their own data. There's a massive market out there for this sort of thing, the football world is very conservative and tends to move slowly.
You, my friend, are the best. As a huge soccer fan and a developer, getting this sort of data is really hard unless you shell out hundreds of dollars a month.
Already thinking about the apps that will use this! Thank you.
A very cool project, but I have one question/issue.
The data format seems to be a custom text format which admittedly I could be wrong about. Is it possible to use TSV or CSV instead since it would be infinitely more useful since it could be directly imported into relational databases, Excel, etc.
The short answer is no. I've searched long and hard, high and low, for free (beer) horse racing databases for UK/IRE and Australia. To a lesser extent I've searched for HK, FR and GER data. I'm yet to find anything that is comprehensive and no cost.
There's a couple that I do use for UK/IRE racing which cost in the region of £35-£45 per month for access. Betwise/Smartform provides an historical database in MySQL, and daily race card/results updates. UKHorseRacing.co.uk provides CVS files with historical race data, their ratings and race results. I take these CVS files, combine them into a SQLite database and interrogate with R.
A slightly longer answer is, sort of. The Betfair API is currently open access for non-commercial and low volume use (as far as I'm aware). This will allow you to retrieve basic racing data - the cards before that race with horse name, jockey, barrier etc and the race results post-race including the Betfair Starting Price. After interrogating the API, you'll need to obviously compile the data into your own database. A bit of work, but feasible. Betfair has a developer programme and their are API bindings available in a number of different languages. I use R (R package developed by Betwise mentioned above), but I know Python is available. One caveat to mention is that Betfair are upgrading their API, so this will obviously have an impact on existing programs using the old one.
If anyone else has additional information or could point me in the direction of something else "free" I'd appreciate it as well.
Agreed -- looking at the player data[1], IMO the format type is unrecognizable:
## GK / Goalkeepers
Kawashima|Eiji Kawashima, 20 Mar 1983
Nishikawa|Shusaku Nishikawa, 18 Jun 1986
Gonda|Shūichi Gonda, 3 Mar 1989
## DF / Defenders
Inoha|Masahiko Inoha, 28 Aug 1985
G. Sakai|Gōtoku Sakai, 14 Mar 1991
Nagatomo|Yuto Nagatomo, 12 Sep 1986
Uchida|Atsuto Uchida, 27 Mar 1988
Konno|Yasuyuki Konno, 25 Jan 1983
Kurihara|Yuzo Kurihara, 18 Sep 1983
H. Sakai|Hiroki Sakai, 12 Apr 1990
Yoshida|Maya Yoshida, 24 Aug 1988
Masato Morishige, 21 May 1987 ## Japan F.C. Tokyo
Comments as a double-hash, key fields are either player last name or occasionally first initial-space-last name, then three different delimiters of pipe, then comma, then tab. Choosing either a consistently delimited format or a more verbose JSON/YAML structure with clear metadata would seem to be a better approach.
The size of JSON files is huge compared to delimited data. Languages like Python make it equally easy to consume delimited data and JSON, so it shouldn't matter much.
This is really cool! Does anyone know if there are similar datasets for other sports out there? Even less clean datasets, as long as they have permissive licensing to allow sanitation and republication.
The license on the data is a pretty permissive one, simply requiring attribution of the data to the Retrosheet project. Software to process Retrosheet files is available, under the GPL:
This looks cool. I see Gold Cup and NA Champion's League repos. Is there a plan to add MLS data? I know some people who would be super excited to get baseball-reference.com level data for MLS.
its a shame that this is not being done under the wikidata framework. those guys have been thinking about databases like this for a while, and can be reliably trusted to at least keep it up for a reasonable amount of time.
[+] [-] keithxm23|12 years ago|reply
For details and advanced analytics though, this one is much better: https://github.com/soccermetrics/soccermetrics-client-py
[+] [-] toyg|12 years ago|reply
Lots of numbers to crunch there!
This is all very nice but it would be nicer if there was some sort of cheap software that amateur teams could use to gather and then analyze their own data. There's a massive market out there for this sort of thing, the football world is very conservative and tends to move slowly.
[+] [-] sourc3|12 years ago|reply
Already thinking about the apps that will use this! Thank you.
[+] [-] sourc3|12 years ago|reply
[+] [-] philtar|12 years ago|reply
Something around the tune of $25k a year. Anyone actually paying for this now and can provide pricing?
[+] [-] isaacremuant|12 years ago|reply
The more the better!
[+] [-] cabbeer|12 years ago|reply
[+] [-] dirtestbird|12 years ago|reply
[+] [-] cwyers|12 years ago|reply
https://github.com/opensport/american-football.db
The best public football repository I am aware of is this, though:
http://www.advancedfootballanalytics.com/2010/04/play-by-pla...
[+] [-] unknown|12 years ago|reply
[deleted]
[+] [-] rpedela|12 years ago|reply
The data format seems to be a custom text format which admittedly I could be wrong about. Is it possible to use TSV or CSV instead since it would be infinitely more useful since it could be directly imported into relational databases, Excel, etc.
[+] [-] unknown|12 years ago|reply
[deleted]
[+] [-] fiatjaf|12 years ago|reply
[+] [-] m0skit0|12 years ago|reply
[+] [-] ddispaltro|12 years ago|reply
[+] [-] phillc73|12 years ago|reply
The short answer is no. I've searched long and hard, high and low, for free (beer) horse racing databases for UK/IRE and Australia. To a lesser extent I've searched for HK, FR and GER data. I'm yet to find anything that is comprehensive and no cost.
There's a couple that I do use for UK/IRE racing which cost in the region of £35-£45 per month for access. Betwise/Smartform provides an historical database in MySQL, and daily race card/results updates. UKHorseRacing.co.uk provides CVS files with historical race data, their ratings and race results. I take these CVS files, combine them into a SQLite database and interrogate with R.
A slightly longer answer is, sort of. The Betfair API is currently open access for non-commercial and low volume use (as far as I'm aware). This will allow you to retrieve basic racing data - the cards before that race with horse name, jockey, barrier etc and the race results post-race including the Betfair Starting Price. After interrogating the API, you'll need to obviously compile the data into your own database. A bit of work, but feasible. Betfair has a developer programme and their are API bindings available in a number of different languages. I use R (R package developed by Betwise mentioned above), but I know Python is available. One caveat to mention is that Betfair are upgrading their API, so this will obviously have an impact on existing programs using the old one.
If anyone else has additional information or could point me in the direction of something else "free" I'd appreciate it as well.
[+] [-] pessimizer|12 years ago|reply
http://thorotrends.com/news-and-views/50-blog/117-release-th...
http://www.anddownthestretchtheycome.com/2012/1/14/2706205/t...
http://www.paulickreport.com/news/ray-s-paddock/free-our-sta...
[+] [-] chevreuil|12 years ago|reply
[+] [-] shirkey|12 years ago|reply
[1] https://github.com/openfootball/players/blob/master/asia/jp-...
[+] [-] wambotron|12 years ago|reply
[+] [-] nathancahill|12 years ago|reply
[+] [-] fatihpense|12 years ago|reply
[+] [-] iamwithnail|12 years ago|reply
Kickdex is also pretty awesome, they use the Opta data to produce real time indices for teams and players.
[+] [-] abeisgreat|12 years ago|reply
[+] [-] bronson|12 years ago|reply
[+] [-] MisterBastahrd|12 years ago|reply
[+] [-] yitchelle|12 years ago|reply
https://cfp.linuxwochen.at/en/lww2013/public/events/61
[+] [-] llimllib|12 years ago|reply
[+] [-] ntietz|12 years ago|reply
[+] [-] cwyers|12 years ago|reply
http://retrosheet.org/
The license on the data is a pretty permissive one, simply requiring attribution of the data to the Retrosheet project. Software to process Retrosheet files is available, under the GPL:
http://chadwick.sourceforge.net/doc/index.html
[+] [-] isuraed|12 years ago|reply
[+] [-] redshirtrob|12 years ago|reply
[+] [-] ngoel36|12 years ago|reply
[+] [-] packetslave|12 years ago|reply
[+] [-] maaaats|12 years ago|reply
[+] [-] fiatjaf|12 years ago|reply
[+] [-] dalek2point3|12 years ago|reply
[+] [-] veganarchocap|12 years ago|reply
[+] [-] unknown|12 years ago|reply
[deleted]
[+] [-] ins429|12 years ago|reply
[+] [-] rurabe|12 years ago|reply