-->

Thursday, 12 September 2013

Data Visualisation Rules

Data-Driven Journalism recently posted a survival guide for data visualisation. I found it to be almost* comprehensive and decided to put it to the test on the following image.


The Monkey Cage, which has recently moved to the Washington Post asked whether it was the worst chart on record. I think there are a few points worth defending it on - and some points that aren't even mentioned. So here is the survival guide on charts with my comments in bold.

Sunday, 8 September 2013

The Effect Of Being Observed

Two of my favourite Twitter accounts had this conversation on Saturday evening.

I had been looking out for an example like this after reading that Bayesian probability means that two people can update their beliefs differently on the same information.

When This Is Anfield is correct is when we can observe the length of an injury. If a player is expected to be out for two weeks but then does not play in two weeks time, it is far more likely that the manager was being cautious than that he is now out for two months.

I have set up a fiddle http://pythonfiddle.com/modelling-injuries-bayesically for the case where Anfield Index is correct - the case where we have not observed the expected length of the injury. I have taken the following injury lengths in weeks [1,2,3,4,6,8,13,26,52] and given them probabilities of 40%^(array index + 1) - except for 52 which gets whatever is left over.

So what do we see?


Friday, 6 September 2013

Unproductive Productivity

Well the NFL is back and it got me thinking about this tweet:


I am interested in things like this because of records like Liverpool not losing when Ian Rush scored and similar for Man U when Van Nistelrooy scored. I always thought these records reflected poorly on the player. Basically the rest of the team was so good that if they got a goal from you they were really unlikely to lose.

What caught my eye about the above list was how similar the players records were to what I thought there team record would be in those seasons. The two exceptions were Roddy White and Marques Colston but it turns out Atlanta have won 67 games in that span and  New Orleans 56 (they followed that their shock 11-5 to win the 2006 NFC South with 7-9 and 8-8 seasons and finished 13-3 the year they started 13-0 and of course you have the bounty season).

What other individual stats correlate poorly with team stats?

Monday, 2 September 2013

Not Prospect Theory

When reading the chapter on Prospect Theory in "Thinking, Fast and Slow" one of the things that struck me was that if you use the reference point for gains and losses as the point you end up at then you would have a pretty good model. For example if you have a hundred and lose 20 your loss is 20/80 but if you win 20 your gain is 20/120.

The formulas for this is x/(x+worth): Plots are done in Wolfram Alpha.


Not sure if this is important yet but here is the derivative:


As opposed to Prospect Theory which looks like this:


One thing I didn't like about Prospect Theory is the steep curve for a loss this does not explain why people are so ready to buy insurance. If you use the above model if you could lose 90% of your value then -90/(-90+100) gives -9 and if you have to pay 1% then -1/(-1+100) gives -1/99 so you would do that for a 1/891 chance. Whereas in PT the steep curve should mean people are against paying insurance.

Someone on  Hacker News submitted the link to the wikipedia article on the St. Petersburg Paradise and so I set up a python fiddle to mess about with this: http://pythonfiddle.com/st-petersburgh-paradox

The game as explained on Wikipedia is:
A casino offers a game of chance for a single player in which a fair coin is tossed at each stage. The pot starts at 1 dollar and is doubled every time a head appears. The first time a tail appears, the game ends and the player wins whatever is in the pot. Thus the player wins 1 dollar if a tail appears on the first toss, 2 dollars if a head appears on the first toss and a tail on the second, 4 dollars if a head appears on the first two tosses and a tail on the third, 8 dollars if a head appears on the first three tosses and a tail on the fourth, and so on. In short, the player wins 2k−1 dollars if the coin is tossed k times until the first tail appears.
The paradox is that no rational person would pay a substantial amount to play yet it has an infinite expected payout. How does my model resolve this?

Arrow Key Nav