The people running the cricsheet website got in touch via twitter to tell me about the amount of cricket data there is sat on their website. In a sense there’s tons of stuff there – I’ve easily grabbed the details from 197 test matches played since 2009 in yaml format, which is easily loadable in a text editor program such as textwrangler. There are details in this format for every delivery in the match:
So I think I’m going to be able to pull out a lot of the surface stuff, and potentially some nice WOWY (with-or-without-you) analysis without a ton of work. However, I’m not going to be able to evaluate or analyse things like ‘Pietersen scores more runs backward of square once he’s past 20’, or ‘Swann tends to bowl a quicker ball the fourth delivery of an over’. Who knows though, by the time I’ve exhausted this data there may be more detailed stuff available. One can dream.
I have thought about throwing all of these into excel but I think there’s too much data for that to be feasible. When I get comfortable with the most sensible way to play with this stuff using a text editor I’ll probably throw a tutorial up here.