2012-11-15

Exit PEBOS - Enter exit polls



PEBOS is over. Time to look at the details of the Election. The final results are not yet in, but the exit polls are there, and up for grabs. Just to get warm: here's a tiny example.

Obviously Romney had an age problem. But for now I don't want to speculate about political consequences. This is just an example plot.

Let's imagine we have a data.frame "EP" that contains the state level exit polls for the presidential election 2012. (Actually, I have these data, and tomorrow I'll post how I got them using R - and a tiny bit of Python. For today I just let them reside in a file called "PresExitPolls2012.Rdata".)

Update: I've released the code to create the PresExitPolls2012.Rdate file today.

I fire up R and the first code snippet is

library(ggplot2)
library(plyr)
library(reshape)
 
load(file="PresExitPolls2012.Rdata")
head(EP)
Created by Pretty R at inside-R.org
For now I just concentrate on the "Vote by Age". There are two different age groupings for that question:
unique(EP$QNo[EP$question=="Vote by Age"])
# 4 category breakdown
head(EP[EP$QNo==2, ])
# 6 category breakdown
head(EP[EP$QNo==3, ])
Created by Pretty R at inside-R.org

Today I want to produce a plot of the 6 category breakdown, so I reduce the data and do some checks:

1. There might be some inconsistency between states in the numbering of the questions. There should be 6 categories for each state.

2. This year's exit polls have been conducted in 31 states. In addition to this the reduced dataset should contain the nation wide data. So I expect 32 "states" in the newly created VbA dataset.

Both checks can easily be implemented with the daply funktion from Hadley's plyr package:

VbA <- EP[EP$QNo==3, ]
unique(daply(VbA, .(state), nrow)) == 6
length(daply(VbA, .(state), nrow)) == 32
Created by Pretty R at inside-R.org

The plot needs the data to be in "long format". I let Hadley's melt function (from the reshape package) do the job. Then I remove all Candidates with the exception of Obama and Romney.

vba <- melt(VbA, id = c("state", "answer"), variable_name = "Candidate")
unique(vba$Candidate)
# we're only interested in Obama and Romney
vba <- vba[vba$Candidate %in% c("Obama", "Romney"), ]
Created by Pretty R at inside-R.org

Finally the plot can be created. Initially the plot was a mess with garbled and unreadable text elements. I'm indebted to the people over at is.R() for their most valuable hints that helped me arrive at a readable plot.


But before plotting there's a fix to be applied. In the VbA data.frame the numbers for the candidate were numeric. For some reason I'll have yet to look into this made the NA's appear like peaks with both candidates having roughly the same value of about 70. (Thanks to lemonlaug whose comment alerted me to the absurdity in the original plot.)

Now to the fix. It's as simple as that:

vba$value <- as.numeric(vba$value)

Here's the final code snippet:

png(file = "VbA2012.png", width = 960, height = 960)
ggplot(vba, aes(answer, value)) +
  geom_line(aes(group = Candidate, color = Candidate)) +
  facet_wrap(~ state, ncol = 4) +
  labs(title = "2012 Presidential Vote by Age\n",
    y = "Percentage\n",
    x = "Age group\n"
  ) +
  theme(axis.text.x = element_text(colour = "black",
          size = 9,
          angle = 45,
          vjust = 1,
          hjust = 1),
        axis.text.y = element_text(colour = "black",
          size = 9,
          angle = 0,
          vjust = 1,
          hjust = 1)
  ) +
  scale_y_discrete(breaks=c(30, 50, 70)) +
  scale_colour_manual(values =  c("darkblue", "darkred"))
dev.off()
Created by Pretty R at inside-R.org

2 comments:

  1. Is something wrong here? I'd expect all of these to have a roughly symmetrical look (as in the example of AZ) how can both parties have 70% of a particular age group (as in the example of CO)?

    ReplyDelete
    Replies
    1. Thank you, lemonlaug, for taking a closer look. (I should have done that!)

      There's been a problem with the melt function call turning the numerical values into characters.

      I'll correct this in the plot and code snippets.

      Delete