I remember reading about Benford's law years ago with fascination and thought I'd share it. Such a fun use of maths in the real world. Here's one application:
Dr. Theodore P. Hill asks his mathematics students at the Georgia Institute of Technology to go home and either flip a coin 200 times and record the results, or merely pretend to flip a coin and fake 200 results. The following day he runs his eye over the homework data, and to the students' amazement, he easily fingers nearly all those who faked their tosses.
Smart, eh? Its all because people don't know enough about how numbers really work and so can't fake data convincingly. The first thing that people get wrong when faking data is assuming that each number 0-10 has an equal chance of being used. They do not. In the real world, numbers are much more likely to start with a '1' than any other digit.
From "The First-Digit Phenomenon" by T. P. Hill, American Scientist, July-August 1998
Benford's law can be used to predict the frequency of numbers. As you can see fro mthe above, it matches closely to real-world data sets. It predicts that '1' is the most likely first digit, then '2' less so and so on and so on.
When you see the analysis of fraudulent data sets, it really comes to life:
From "The First-Digit Phenomenon" by T. P. Hill, American Scientist, July-August 1998
From the same article:
Benford's law can be used to test for fraudulent or random-guess data in income tax returns and other financial reports. Here the first significant digits of true tax data taken by Mark Nigrini from the lines of 169,662 IRS model files follow Benford's law closely. Fraudulent data taken from a 1995 King’s County, New York, District Attorney's Office study of cash disbursement and payroll in business do not follow Benford's law. Likewise, data taken from the author's study of 743 freshmen's responses to a request to write down a six-digit number at random do not follow the law. Although these are very specific examples, in general, fraudulent or concocted data appear to have far fewer numbers starting with 1 and many more starting with 6 than do true data.
Back in 2005 I worked with a team of organisers who would report the number of doors that each of their teams had knocked on each day. I thought it would be fun to see how that data compared to Benford's law. Overall, you see that it doesn't look like people were being honest:
Buit its not all bad news. It looks like some were more honest than others:
So, help me understand how this would apply to fake coin flipping data?
ReplyDeleteFirst, I'm certain its not about the total number of heads or tails. Comparing that across the class would allow you to estimate the proportion of the class that cheated, but not to identify individuals.
ReplyDeleteI suspect therefore its about the number sequence itself. Probably the number of sequential strings of H or T.
If it were me I would convert the sequence in to numbers and then look at the distribution of these numbers. I would do this by listing the length of each 'run' of heads or tails in order.
E.g. HHTHTTHHTHTHHHT would become 2,1,1,2,2,2,1,1,1,3,1
Lets have a stab at the maths:
Once you have flipped your first coin, the probability of the second flip being different, and ending the 'run' is 50%. So we should see '1' occur about 50% of the time in our list. (In my above made-up list, it is 6/11 - not too bad!)
The probability of the second flip being the same is also 50%. And the probability of this 'run' ending on the next flip is 50%. This means the total probability of getting a run of length two is 25%. (In my made-up flips above, I had '2' 4/11 times, or 36% of the time. Oops!)
Similarly, the probability of getting a run of length n is (50%)^n.
All the lecturer needs to do is convert each student's list of coin flips in to the number sequence above and do a statistical test to understand whether the difference from the expected pattern is less than 5% likely to be down to chance (or similar). 200 coin flips should give about 100 numbers once converted, which seems to be a decent sample size.
Anyone want to chip in on what the statistical test should be?
By the way, here's another (probable) application of Benford's law to coin-flipping in the real world: http://paul.kedrosky.com/archives/2008/07/21/hedge_fund_test.html
(From a blog I highly recommend, by the way)
Very cool! Who is organizer 4? Do I have to search around through my decommissioned computers for zone 4 to find the answer to this?
ReplyDeleteThat would be very cruel of me. The organizer numbers have been changed. I don't think we should be surprised by the findings though, should we :)
ReplyDelete