I was playing around with a really cool and easy to use analysis tool for Excel, Analyse It. And I thought I'd show how easy it is by running an old data set through it. So here is some quick analysis of an old political campaign I worked on. Seriously, this took about 30 minutes total. 
We won the race overwhelmingly. In fact we won every precinct in the city. One thing the tool allows you to do is compare groups of data. Here is our vote share by city ward, with each blob representing a precinct:
This type of chart can be created with about three button presses. It shows we won big in our home ward (yay!). It also shows the distribution of precincts. They were pretty bunched together, but it might be interesting to look at the outliers. Why were they so different?
We did an ok job of predicting where turnout was going to come from. I can tell this by running a regression of our predicted turnout by precinct and the actual turnout by precinct. The tool does this very simply, with no specialist knowledge required:
This graph also takes about three button presses to generate, and comes with a neat set of stats telling us that we did an ok (not great) job of predicting turnout by precinct. But then we should have done, since historical turnout by precinct was widely available, and the best predictor of future behaviour is past behaviour.
Now here is the interesting bit: we were crap at predicting how people would actually vote. I should stress that this was a Democrat versus Democrat election, so it is a notoriously difficult thing to call. But either way, we didn't do a good job! Here is our categorisation of precincts in to how we thought they would vote in advance versus how they actually voted:
You have to say that's pretty bad. In the precincts we called for our opponent, we actually did better than those that we thought were too close to call. Its good to know that we did better in the precincts we thought would support our candidate. But in those precincts there is a really wide range of support. They could probably have been better broken out in to different categories.
We did try to break them out further, but without much luck. Here is a more detailed look at what vote we thought we would get versus what we actually got:
We did a good job of predicting the very best precincts, but beyond that group, all of the other five groups we decided on behaved very similarly.
We also broke the precincts out that we thought our opponent would do well in. That also didn't turn out to be a good predictor of the actual vote:
Pretty depressing really!
Data and targeting has played a huge part in many political campaigns. That is particularly clear from '08 and organisations like Catalist have played a huge part in this. In politics I had the pleasure of working with a fantastic group of people who would honestly evaluate their own work, but ...
... usually after elections finish, candidates' political campaigns close their doors and everyone goes off to the next campaign, tired and often ready to forget about the campaign. There is no will to evaluate what worked and no money to fund evaluation. And besides, the people who did the analysis often don't want to evaluate for fear that their analysis that they were paid handsomely for wasn't actually helpful after all. This is a tragedy.
Even if predictions aren't that helpful, having lots of data on a campaign makes people believe. Everyone has confidence in a well presented graph and they feel that those extra phone calls and those extra door knocks are being well targeted. In reality, I often believe that when you are lost, any old map will do. 
However if there were more evaluation after the effect, perhaps campaign analysis would improve quicker and it would more often actually help the campaign to win, rather than just boosting morale.
0

Add a comment

  1. I gave a talk at the Big Data Insight Group in London recently and they've just posted my talk online.


    I talk about how we've helped EMI Music make use of data and about how we're doing so in zeebox.

    One of the themes throughout my talk is the importance of people. Both in terms of how we use data to help people make decisions and about how we need to understand the people we're trying to help, in order to give them what we need. Technology enables this, but without the right people and without understanding people, technology is as good as useless.


    I also talk about how important skills and judgement are. And that, although it's sometimes seen as the things that drives decisions, it's usually or perhaps always used alongside skills and judgement. 


    I think that admitting to the role of skills and judgement isn't being 'anti-data'. I think that being honest about this enables and empowers us to better use data in the right ways. And it certainly helps people to feel comfortable with data, also!

    With the right people in place and data playing the right role in an organisation, the opportunity for data to help an organisation is massive. The way that EMI Music has embraced data across the organisation alongside skills and judgement shows that this is the case.


    0

    Add a comment

  2. We all know there are decisions where you need data to help you make them and there are decisions where data just isn't that important. This morning XKCD did a wonderful job of illustrating it. http://xkcd.com/1036/

    Buying a lamp is a creative decision. Turn your eye away from the reviews and go with your heart :)

    The same is true of many decisions data folks are asked to help with every day in organisations. We shouldn't be afraid to champion this strategy there, either!

    0

    Add a comment

  3. We sat down recently to talk data and insight. Here is what we talked about, plus a little video of me talking about insight at both zeebox and EMI.

    http://www.thebigdatainsightgroup.com/site/article/david-boyle-emi-zeebox-data-driven-includes-video
    0

    Add a comment

  4. I don't like the term 'scientist' as it makes the role sound unaccessible and elite. Google's Hal Varian said "the sexy job in the next ten years will be statisticians" ... but I don't like that term either. I'd replace 'statisticians' with 'working with data' or something ... and then I believe it!  I think data people have a tendency to overplay the role of the 'statistics' and magic of it and underplay the importance of the 'bringing it to life' and 'helping people understand / make use of it' parts of working with data.

    I thought about this because of this cool article in The Guardian about data scientists.

    As it points out, "science" is defined as "systematic study of natural or physical phenomena". I guess that's us all. Perhaps I shouldn't shy away from that phrase.

    The journalist describes the role well, as "someone who can bridge the raw data and the analysis - and make it accessible. It's a democratising role; by bringing the data to the people, you make the world just a little bit better." Perfect, eh?

    One last quote: "the four qualities of a great data scientist are creativity, tenacity, curiosity, and deep technical skills." That list sounds pretty good to me, also. So perhaps I should rename this the 'data scientist' blog and be done :)



    0

    Add a comment

  5. Some fun from http://fosslien.com/ via http://www.freakonomics.com/2012/02/29/the-life-of-the-number-crunching-analyst/


    I particularly like this one:

    0

    Add a comment

  6. So much data, so easily displayed in such a small but easy to understand format. I need say no more. I'm in love with the new sparlklines just made available in Google Spreadsheets: http://support.google.com/docs/bin/answer.py?hl=en&answer=2371371


    It's this simple:

    Google Spreadsheets is rapidly becoming my go to choice for building business dashboards. Bye, bye cost. Bye, bye developers (would be VERY sad not to work with them, of course). Bye, bye Microsoft!

    6

    View comments


  7. I spoke on a panel last night on the subject 'data as the new black gold'. There are three challenges I think this metaphor poses to the data world.







    First, that of crude oil. Data is everywhere in organisations, but too often left in it's crude form: gloopy and unusable. The oil industry had to work this out before it could be mainstream. It had to refine oil to a form that works for consumers day-to-day and it had to make it available to consumers in ways that fitted in to their daily life. It's trivial to stop by a petrol station and pick up some oil in a format you can instantly make use of. Data doesn't yet work the same way: it's rare to find an organisation that appropriately refines it and then makes it available to it's people in a way they can access and make use of as part of their day-to-day work.


    Second, I think we need to demand higher 'miles per gallon' from our data. Often we gather fantastic raw data, capable of being a really powerful part of decision making ... but then business leaders don't ask interesting questions of it. They don't demand smart analysis and challenge the data to offer insight. It's like demanding that cars offer higher miles per gallon from the oil they are burning.


    Finally, I think we need to embrace hybrid technology. In cars that's about oil being only part of the story for how the car gets powered. In data it's about saying that data is only part of the story for how organisations get powered. We need to be honest and bold about the role of skills & judgement alongside data in powering organisations. Too many people believe / pretend that data alone can power organisations to greatness. Everything I've seen tells me that data is necessary but not sufficient: smart people to use the data alongside their expertise is ALWAYS required. The data world should be honest about this and build data and systems around that truth. I've always found that has a much greater impact :)
    0

    Add a comment

  8. I've used a lot of word clouds recently. But I think of them as charts really, since they are still pretty faithful to the underlying data. The size of the word is proportional to the number of times that word is in the data set. Simple.

    But reading a cool data visualization book I came across this. Really it not based on 'data', but it's interesting his words and their location on the page conveys such a lot of information. Perhaps some good, well placed words can replace the need to chart actual data?

    http://creativeroots.org/2011/03/italy-infographic-map/
    0

    Add a comment

  9. Simple, easy to read, but really powerful. Nice little sparklines spotted in the papers from the 20 week scan my wife just had. Cool little chart like this should be everywhere!

    And by the way, it's a boy!
    0

    Add a comment

Labels
If you like this you'll like:
Info Clarity Archive
Loading