# F1 Monaco Grand Prix Stats

A while back I thought it might be fun if I started to look at the statistical nature of Formula 1 races. As a data scientist I always enjoy having a play with numbers and I thought I could easily apply some simplistic models to the data with some python scripting.

I initially had some “fun” with JSON and getting the data into some useful format for me to process in python. This was overcome once I realised that the data structure was a bit more complicated than I had expected. Python has a nice function for dealing with web outputs in JSON format (using the libraries simplejson and urllib2 was the trick – I’ll write something about this later).

There is of course lots of data and I intend to look at this further but for the time being I thought it would be really interesting to look at the variation of lap times for each driver by looking at their mean, max, min and standard deviation. It turns out that the driver who was the most consistent with their lap times in the Monaco Grand Prix was Nico Hulkenberg who produced a standard deviation in his lap times of 5.201. Webber won the race and had a standard deviation of 5.224. I’ll have to dig out the rest of the data, could be interesting to see who is actually the most consistent driver in this very odd season.

Over the next few races I intend to apply a bunch of different approaches to the data but for now here is a quick first plot looking at 10 drivers lap times over the race. You can really see when it started to get a bit damp (the bump towards the end of the race):

Oh and here is a quick data table I’ve produced that gives some overview info: ID number, driver name, mean laptime (s), max laptime (s), min laptime, standard deviation in the lap time and finally the total number of laps completed:

1 Webber 81.2634358974 113.554 79.076 5.22471471332 78

2 Rosberg 81.3692179487 113.729 78.805 5.21267201929 78

3 Hamilton 81.7103974359 113.709 79.04 5.53280282831 78

4 Alonso 81.6865512821 113.85 78.857 5.56464331487 78

5 Massa 81.5894102564 114.276 78.806 5.49839624505 78

6 Vettel 82.2420512821 114.809 79.101 6.18925093413 78

7 Raikkonen 82.0597820513 114.904 78.904 5.24474347995 78

8 Schumacher 81.9103717949 115.898 79.082 5.20794225841 78

9 Hulkenberg 81.9712051282 115.635 79.457 5.20147106992 78

10 Senna 82.1178846154 115.198 79.719 5.4820607696 78

11 Resta 82.8519220779 115.304 79.246 6.337354458 77

12 Ricciardo 82.7586623377 115.818 78.423 5.854848841 77

13 Kovalainen 82.9670519481 117.791 79.305 6.34228615403 77

14 Button 82.6418701299 117.735 79.548 5.87962974219 77

15 Glock 83.6168026316 118.07 79.58 6.78045042755 76

16 Pic 82.9222 119.071 77.43 6.48074288422 70

17 Vergne 82.8413692308 119.216 77.296 7.0860552063 65

18 Perez 84.60925 118.874 78.53 8.55919369012 64

19 Petrov 84.1634285714 120.466 79.649 6.8907672407 63

20 Karthikeyan 86.6702666667 118.104 80.825 10.3405089976 15

21 Kobayashi 108.5892 135.62 81.603 23.0780821595 5

As I say this is just a start, I’m hoping to make some nicer looking lots for the next race…

How much is the SD affected by the longer lap-times in the first three laps?

If we are using the SD as a measure of ability (or at least consistency), it strikes me that removing these laps may result in a ‘better’ assessment of the underlying ability of the driver.

Interesting thought. I should have noted I had already neglected the first lap – not thinking about the fact that the first few laps were very slow. Of course the deviation is also increased due to the period near the end where the track was wet.

I wonder if something like a rolling 10 lap standard deviation might be more interesting.

For the full dataset I should probably also, not sure I have this data, remove any laps where the drivers either pit or are on the way out from the pits, i.e. any expected odd laps should be removed. Though I can probably guestimate this by looking at my graph – the spikes in the mean I expect are due to the pit stops.

This is quite interesting, will take a look later today.

If might be interesting to see a plot of lap times normalized relative to the mean.

It’s interesting how in the first ‘lump’ lap times seem to break into two streams – hulkenberg and bruno_senna having longer, but consistent lap times than the other group or webber and rosenberg. I didn’t watch the GP – was there something going on at this point?

I should try and figure out a way of indicating their position on track too – would make the graph more confusing but more useful. At some point, around then I think, there was definitely two clumps – those pulling away at the front and those stuck being Raikkonen – but I’m not sure that this exactly corresponds. It might make sense though, as lots of the cars then pitted as the reason Raikkonen was going slower was due to his tyres degrading.

“If we are using the SD as a measure of ability (or at least consistency), it strikes me that removing these laps may result in a ‘better’ assessment of the underlying ability of the driver.”

So if we removed the first 3 laps; indeed its Mark Webber that wins, with a standard deviation of just 1.302 with Nico Hulkenberg dropping down to 5th in the listing via standard deviation.

“If might be interesting to see a plot of lap times normalized relative to the mean. ”

I was thinking of doing this, but the question is – normalized to the mean of the most consistent driver? normalized to each individuals mean or the mean of all drivers!

Basically, its seems like I should do another post – which I think I’ll do at the weekend (whilst trying to make the plots a bit nicer).

Though saying that I’ve now added a plot looking at the rolling (10 lap) standard deviations for the drivers, you can see that they all seem to get a bit more ragged at the end: http://www.flickr.com/photos/starrydude/7310446436/in/photostream

I have a simpler and more accurate way of working out driver performance. Would you be able to create the software for me?

If you post your method I’ll have a go.