Following on from the last post wehre I detailed the methods of dismissal in test matches, today I’m going to keep it simple and show the distribution of runs off the bat during test matches.
One note to go with this post – it contains all balls bowled, including extras. Wides, Byes, and Leg Byes count as ‘no runs’, as do no-balls in the case of a no-ball being bowled and no further runs being scored off the bat it also counts as ‘no runs’. And in the case of a no-ball being bowled off which a batsman scores runs, then it is counted as the number of runs scored off the bat. Not exactly ideal but no-balls are proving difficult to separate thus far.
Firstly, here’s the distribution of runs off the bat by deliveries:
No runs off the bat is by far the most likely outcome – I don’t think anyone would have struggled to guess that.
Next up are singles – again, few people are going to be surprised here.
I actually expected 2’s and 4’s to be relatively closely matched but that’s not really the case.
There’s a six scored about every 40 overs – I didn’t expect them to be as common as that.
Lets quickly look at this one other way – a shot that scores two runs only occurs one quarter as often as a shot that results in a single – but it’s worth twice as many runs, so lets look at the distribution of runs scored per scoring shot.
So 3’s and 6’s are relatively rare occurrences, one of them happens on 1.3% of deliveries, but they actually provide almost 10% of the runs scored off the bat.
Boundaries contribute >50% to the runs scored in tests.
An average innings of 300 runs off the bat would constitute roughly 86 singles, 22 2’s, 5 3’s, 35 4’s, and 2 6’s, and take 94 overs.
For the next while I want to start really simple – I think this kind of post is really handy to have out in the public domain, and also a nice reference to come back to later.
Of the 3,703 catches, 123 were caught and bowled
Of the 187 run outs, 46 came after a team had already completed at least one run. As I discussed with Omar Chaudhuri on twitter yesterday, that seems incredibly wasteful.
In a test match where 40 wickets fell we’d expect ~38 of the dismissals to be caught/LBW/bowled.
The sample here comprises of 197 test matches – an average of 31.3 wickets fall per match.
The people running the cricsheet website got in touch via twitter to tell me about the amount of cricket data there is sat on their website. In a sense there’s tons of stuff there – I’ve easily grabbed the details from 197 test matches played since 2009 in yaml format, which is easily loadable in a text editor program such as textwrangler. There are details in this format for every delivery in the match:
So I think I’m going to be able to pull out a lot of the surface stuff, and potentially some nice WOWY (with-or-without-you) analysis without a ton of work. However, I’m not going to be able to evaluate or analyse things like ‘Pietersen scores more runs backward of square once he’s past 20’, or ‘Swann tends to bowl a quicker ball the fourth delivery of an over’. Who knows though, by the time I’ve exhausted this data there may be more detailed stuff available. One can dream.
I have thought about throwing all of these into excel but I think there’s too much data for that to be feasible. When I get comfortable with the most sensible way to play with this stuff using a text editor I’ll probably throw a tutorial up here.