1. I just had chance to play with a fantastic data visualization tool from IBM: Many Eyes. You can upload data sets or use data sets that others have uploaded to create all the regular visualizations (charts, etc), but also maps and treemaps, like this one that I built from party registration information in the early 2008 states:


    The big blobs are states (Iowa, South Carolina, New Hampshire, Nevada). The size of the blobs represents the number of registered voters in 2004. You can see the small county boxes inside the states, showing the number of registered voters in each county. You can then zoom in on a state to see just the counties within that state. Within New Hampshire, I had town-level data readily available, so you can soom in to a particular county and see the towns in that county.

    As a user, you have the power to easilly click away and play with the data in a way that emailing you a PowerPoint chart will never give you. You can play with the chart and explore the data by changing almost anything about the chart:
    • The colour of the shading currently shows the number of democrats. You can change it to republicans, or independents at the flip of a switch
    • The size of the boxes are the number of registered voters. Again, you can change it to be the number of republicans, or independents at the flip of a switch
    • You can change the sorting from State > County > Town to sort instead by County, or by Town
    Isn't this a great way to let someone explore complex data? I love it!

    The site's purpose is to "harness the collective intelligence of the net for insight and analysis." This sounds like a great goal. The thing I'm really interested in is that it allows someone to gather some neat data and easily visualize it in novel and interesting ways that others can tweak. Power to the people, indeed.

    Is this the start of the end of sending a pack of PowerPoint charts? Instead hopefully we'll soon be able to quickly and easily send links to online visualizations that users can play with and get a feel for. Lets hope this is just a first step in this area.
    0

    Add a comment

  2. I was going to only post on projects I've worked on, but I saw this today and HAD to post about it. This chap has created a personal annual report, detailing things like 'airmiles traveled', 'number of emails sent', 'number of photos taken', 'animals eaten', 'books read', 'most frequented bar', 'best meal', 'miles run', 'plants killed' or 'beverages drunk by type'. And he has done so with tremendous clarity of design. Seriously impressive stuff! I have no idea how he kept track of all of this. Seriously impressive stuff! Thanks to information aesthetics for spotting this. I did one a few years ago, and have been working on one this year, but with WAY less information in it (and designed in a much less attractive format).
    0

    Add a comment

  3. Its great to have a team of people to do analysis for you. I love being one of the people who can do the analysis that people want to see (and the analysis they don't think they want to see!) It usually works something like this:
    1. They ask for some analysis
    2. I do the analysis and pass it to them
    3. Upon seeing it, they decide they want something a little different
    4. Go to 1
    Anticipating this, I make sure I'm pretty clear what the user wants and I try to provide analysis in a flexible way so that it also can be used to answer questions around the edge of the specific question that they asked. This works well, but its becoming clearer that there is a much better way to structure it: give the user the tools to interrogate the data themselves. I'm not talking about giving them a database, or even a spreadsheet. I'm talking about giving them the analysis you would otherwise provide, but making it such that they can easily change it and play with it.

    My earlier post on mapping started to make this point. A GIS map in PowerPoint is a great output from analysis, but a Google Map where the user can add and remove layers and easily alter the data that is shown does all that and MUCH, MUCH more.

    I just came across another example to do with charts.

    A question that any number of people have been asking around election time for as long as I can remember is 'who votes when?'. What type of person votes early / late / on their lunch break? People have all sorts of theories, that I've always found to be not based on data. So I got some data to answer the question. Here's a chart I did showing Democratic votes, Republican votes and Independent votes by time of day in a recent election.

    Here is the turnout by party the voter is registered with.
    Not much of interest here. Some difference, but not much of a difference and certainly not enough to use to change turnout operations.

    Here is the turnout by age:

    Much more interesting now. Big differences by age of voter.

    I did a few of these and PowerPointed them up and sent them around. But I knew what would happen. People would want something else. What about party registration and age combined? What about urban and rural differences by age? What about income and party registration crossed?

    So I squashed down the several million record database in to a table that would fit in to Excel's 65,536 record limit (while preserving the key data that the user cared about) and stuck on a pivot table and a pivot table chart that allowed the user to change the chart and answer all of the questions above by simply pointing and clicking. This allowed the user to interrogate the several mission records with no database skills required.

    I'd love to hear about other examples or tools that can do this. That can provide output-ready analysis for non-technical users, but that at the same time give power to the user to alter and tweak the analysis. Examples that give Power to the people!

    One example of one of the tables that I created, that fit nicely in to excel was:
    SELECT Count(VoterID) AS NumberOfVoters, Time, Gender, Income, Party, AgeGroup, Race ...
    GROUP BY Time, Gender, Income, Party, AgeGroup, Race ...
    1

    View comments

  4. I've been keeping my eye open for a good example of a PowerPoint slide that needed some love, and in a meeting today, I finally found one. I definitely don't pretend to be an expert on how to make this chart in to the perfect slide, but here are some things I would do to help it on its way.

    Here is an anonomized, but otherwise un-altered version of the original slide:
    The first thing I want to do when I see something like this is to remove the chart junk. I want to remove pixels that don't convey any information. The 3D and the lines around the bars don't help me to understand the data, and in fact they provide my eye with more work to do, and so detract from the data. The boxes around the chart area and the legend are unhelpful defaults in Windows. I don't need to know where the plot area or the key area ends, so they can go, also. The axis themselves can also go. They don't help and so they shouldn't stay.

    The grid lines I'll leave for now, but I'll grey them, so they aren't overpowering. When you look at the slide, the data should leap out and anything like gridlines and axis and labels that help me to understand it should blend in to the background, and support the data from there.

    Finally I immediately want to label the y-axis with percent and move the key to sit alongside the data. This way I don't need the extra pixels of little blue boxes to show me what they relate to, and this way they are where my eye finds them easiest to find - next to the data.

    We then get something like the chart below. Basically, this is a much cleaner and easier to read version of the original chart. Notice the use of grey for the control group and the bright colour for the group we should be interested in, the group that we contacted. Also notice that I changed the wording on the labels to say exactly what they are, without any jargon (such as the phrase 'control group').
    Next problem: the axis goes from 54% to 68%. This highlights the difference between the data elements, since the data is bunched up. But isn't the data being bunched up kinda the point? If on an honest scale (in this case 0% to 100%), the difference is slight, isn't that the point? Headline message: there isn't a glaring difference in the data!!!

    Here's where a proper scale gets us:
    Now this is still the basic chart, with some tidying and presentational suggestions. I'd go one further. I'd present the data in a different chart format:
    This adds a number of steps:
    • The stacked bar draws attention to the fact that there is something 'else'. There is more data that is missing from the original chart. In response to a question, it turned out that the question that generated the chart had a couple of other answers: Voted for Jones, voted for someone else, 'I'm not telling you' and others. I suggest drawing them in grey since they're not the focus, but they're important and depending on the size and nature of them, might actually be more important than the data in the chart: If 'I'm not telling you' varies significantly by age and is big enough, it might completely explain the variance that the chart purports to show!

    • I added in black arrows to show the effect that is the point of the chart: the difference in Smith votes in people we contacted versus people we didn't contact.

    • I added a plain English explanation to the top of the chart, so that the slide means something to someone not listening or someone sent the PowerPoint by email. Why force them to listen and attend the presentation if they want to understand it?

    • Finally I added in places for two more pieces of data that are essential if you are to understand the meaning of the chart: the margin of error and the sample size. The speaker mentioned that the sample of size was small in the 18 to 39 years category and so that shouldn't be trusted. If so I'd either not show it on the chart at all, or mark the sample size clearly to illustrate that it shouldn't be trusted.
    In this example I don't mean to insult the presenter. This was one slide in a long presentation of some truly excellent work. The fact that this analysis had been done and written up and presented should be an inspiration to everyone in the room. I simply mean to provide an illustration of what to look out for in charts to ensure we read the right things from them and to provide some tips on how I think the display of charts might be improved.
    0

    Add a comment

  5. This post is less of an example of good practice, and more of an illustration of how technology is currently hampering best practice in one important area.

    Politics is all about people, but its also about geographic neighbourhoods and communities.


    To use data to help to understand voters and to understand where they are and how their neighbourhoods and communities work, data needs to be mapped out. And here is the problem. Mapping out data is tough. There is no excuse for an organization to not gather data and to produce tables and graphs from it. That technology is on everyone's computer and is extremely accessible. But technology hasn't yet brought mapping out data in to the hands of the many.

    There are a number of barriers that make even basic maps (like the one above) out of the reach of most small organizations and political campaigns, including:
    1. GIS programs are expensive and inaccessible. Many organizations don't have the money to buy them or to hold on to someone with the skills to use them. We need a way to get mapping out to organizations and campaigns without them having to invest in expensive software and staff with specialist skills.

    2. Shapefiles are tough to acquire, often out of data and frequently almost impossible to match back to the data that you're trying to map out. Most organizations don't have the time to invest in finding, cleaning, checking and matching up shapefiles. We need an organization to gather recent shapefiles and make them available with a key that allows them to be matched back to other data sources. Or even better, we need a way to map out data without needing shapefiles to make it useable.

    3. Output is inaccessible to most users. While a map like the ones above provide a helpful overview of the top-level information. A well-constructed pdf can allow user to zoom in and see more detail, but what is really needed is the ability to personalize it. For users to turn on and off features. To see satellite view, to see roads and then to zoom out and see just the data again if they so choose.
    The only example I've yet seen of data really being mapped out in a way that non-expert users can access it and play with it combines data from a site that combined U.S. Census Bureau data with Google Maps. It's called World Wide Webfoot Maps (Thanks to Google Maps Mania).

    Here is an example:

    - Race and density
    - Age distribution
    - Housing units
    - Total population density
    - Male population percentage
    - Female population percentage
    - Average Household Size
    - Average Family Size
    - Blacks per housing unit
    - Percentage of population 18-22

    In this example someone has done (1) and (2) for you already. They've gathered the data and the shapefiles and matched them and programmed a web interface. What we need is some way for organizations and political campaigns to be able to upload their custom data and see it mapped out in an accessible way. I'm keeping my eye open for developments in this area, but please post a comment if you have ideas or solutions!
    0

    Add a comment

  6. I find a table of numbers the most inaccessible thing in the world. It just doesn't tell you anything without you having to read the table. You must literally read the numbers and remember the relative positions of big ones and small ones to make sense of it.

    ... my mind really doesn't work that way, and so I was VERY happy when I saw Edward Tufte talk about what he calls sparklines. The idea is to put small graphs right there in the table, or in the paragraph to allow your eye to immediately pick out the story. Here's a version implemented with a really basic formula in Excel:


    The basic formula is simple. It utilizes the 'rept' function to repeat a symbol a set number of times. Pretty clever, eh?
    =REPT("|",E11)
    To append the number on the end of the graph is also simple:
    =REPT("|",E11)&" "&TEXT(E11)
    Thats a pretty basic implementation of course. For more adventurous implementations I use BonaVista Systems MicroCharts Excel add-in, which I highly recommend. This allows you easily to create line , column , pie , win-lose and bar charts right there in Excel.

    4

    View comments

  7. I have always hated pie charts. I hated them even before L.E.K. trained me to use a stacked bar chart instead. This was a step forward, but wasn't ever quite right. Thanks to the guys at Juice Analytics, I've now (perhaps temporarily) fallen in love with the square pie chart.

    For example, I was taking a look at the population of Iowa today. There were several layers to the problem. How many people are there? How many are of voting age? How many are registered to vote? How many of them are Republican, Democrat or Independent? Are they active or Inactive? Who can vote in the Presidential Caucus? A lot of questions. I tried to come up with a graphic that would answer all of these questions. A pie just wouldn't do, so here's what I ended up with:
    You can already make out the red (Republicans), the Blue (Democrats) and the Purple (Independents). They look about the same and the labels on the detailed version will tell you the exact numbers. Then at the top you can see the Under 18 (dark grey) and the unregistered (light grey). As it turns out, only registered Democrats can attend the Iowa Caucus, and so there is bold line drawn around them to signify this.

    I took this one step further when looking at other early presidential primary states:

    At a glance you can see the relative size of the states' populations. Some are tiny compared to others. Some are Democrat-heavy. Some have no party registration at all (registered voters are all purple).

    The small purple bar charts under each state show the number of registered voters per person who is eligible to vote in the Presidential Primary. The size of this bar gives you an idea of how many voters you must talk to in an untargeted campaign to fins a primary voter.

    I deliberately blurred the data in the jpg on this site but I created it in PowerPoint, where the full detail is visible. While it works on a projector, the resolution of a regular monitor doesn't enable this kind of diagram to work. You just can't see the detail. Technology gets in the way! A pdf is helpful, because you can easily zoom in, but who wants to scroll around a pdf? The beauty of this much information on a page is that you can see it all, see patterns from afar and then look at sections for more detail without losing the overall picture. Until technology catches up and monitors get higher resolution, the only real answer for a diagram like this is to go old skool: print it out. It works like a treat!

    Here's the Excel template I built for generating square pie charts. (Based on and inspired by the solutions to the excellent Juice Analytics challenge.)
    0

    Add a comment

  8. I watched Hans Rosling's inspiring presentation at TED and HAD to apply at least some of what he did to some data I was playing with that day. I was looking at patterns of voter registration data in Presidential years in New Hampshire. There were four variables that I was playing with: geography, three types of party registration (Democrat, Republican, Independent), number of registered voters and time. I didn't have a good way to show this until Hans' talk inspired me.

    I didn't have the sophisticated software that he has, and I like the challenge of doing fun things with the basics. With Excel and PowerPoint you can do almost anything! So I came up with an animated PowerPoint showing the New Hampshire counties moving in 'party registration' space. Here's a still of the end of the animation:

    You'll see that some are bigger than others, since there is a different number of registered voters in each. This still is 2004 and you can see the dotted trails that trace out the path that the counties took to get to their 2004 position from their 1992 position.

    The general trend is that they moved to the left (less democratic) in 1996, down and to the left in 2000 (more independent) and then slightly to the right in 2004 (more democratic). The relative lengths and angles of the lines tell you about the relative differences in how party registration varied in those towns.

    Within each county, there are multiple towns. Not all of which behave the same. So I created another type of chart and an animation to show this. Here is a still from 2004:

    Each county has a colour, and the sum of the sizes of the blobs shows you the number of registered voters. The spread on the x axis shows the variation in the percent of voters that are registered either Democrat or Independent in the various towns in each county. Note that a big blob could be a big town, or the sum of many small towns.

    Here's a link to the animated PowerPoint presentation itself.
    0

    Add a comment

  9. I was in a bar making a list of friends I'd lost touch with and, inspired by Garofalo's Genealogy of Pop/Rock Music chart, I started sketching a chart of my friends over time. A little odd I agree, but for sure its an interesting way of showing a pattern in something that you don't usually see patterns in. Goes to show that even in areas where there doesn't appear to be any 'data', there is plenty there if you choose to look for it!

    I got home and knocked out some quick charts in Excel. Here's the top-level view by 'source' of friend:
    You'll see that I sadly haven't kept in touch with my school friends very well. They are on my call list for the next couple of months! University provided the most friends. And I'd say that the decent tail on the blue blob means that I've done a pretty good job of keeping up with them. Doing this helped me make a list of people I unfortunately haven't kept in touch with, though.

    Here's the friend-level chart that created the top-level version:
    This is comprised of actual friend-level data. I'm not going to label this version, because I KNOW I've missed people. This was done in a rush. But here it is, FYI. Please don't try to work out who is who, since some small categories are combined even here!

    Finally, here's the chart that inspired me. Pretty impressive stuff!

    3

    View comments

Labels
If you like this you'll like:
Info Clarity Archive
Loading