R + ggplot : Time series with events

I'm an R/ggplot newbie. I would like to create a geom_line plot of a continuous variable time series and then add a layer composed of events. The continuous variable and its timestamps is stored in one data.frame, the events and their timestamps are stored in another data.frame.

What I would really like to do is something like the charts on finance.google.com. In those, the time series is stock-price and there are "flags" to indicate news-events. I'm not actually plotting finance stuff, but the type of graph is similar. I am trying to plot visualizations of log file data. Here's an example of what I mean...

If advisable (?), I would like to use separate data.frames for each layer (one for continuous variable observations, another for events).

After some trial and error this is about as close as I can get. Here, I am using example data from data sets that come with ggplot. "economics" contains some time-series data that I'd like to plot and "presidential" contains a few events (presidential elections).

library(ggplot2)
data(presidential)
data(economics)

presidential <- presidential[-(1:3),]
yrng <- range(economics$unemploy)
ymin <- yrng[1]
ymax <- yrng[1] + 0.1*(yrng[2]-yrng[1])

p2 <- ggplot()
p2 <- p2 + geom_line(mapping=aes(x=date, y=unemploy), data=economics , size=3, alpha=0.5) 
p2 <- p2 + scale_x_date("time") +  scale_y_continuous(name="unemployed [1000's]")
p2 <- p2 + geom_segment(mapping=aes(x=start,y=ymin, xend=start, yend=ymax, colour=name), data=presidential, size=2, alpha=0.5)
p2 <- p2 + geom_point(mapping=aes(x=start,y=ymax, colour=name ), data=presidential, size=3) 
p2 <- p2 + geom_text(mapping=aes(x=start, y=ymax, label=name, angle=20, hjust=-0.1, vjust=0.1),size=6, data=presidential)
p2

Questions:

  • This is OK for very sparse events, but if there's a cluster of them (as often happens in a log file), it gets messy. Is there some technique I can use to neatly display a bunch of events occurring in a short time interval? I was thinking of position_jitter, but it was really hard for me to get this far. google charts stacks these event "flags" on top of each other if there's a lot of them.

  • I actually don't like sticking the event data in the same scale as the continuous measurement display. I would prefer to put it in a facet_grid. The problem is that the facets all must be sourced from the same data.frame (not sure if that's true). If so, that also seems not ideal (or maybe I'm just trying to avoid using reshape?)

Answers


As much as I like @JD Long's answer, I'll put one that is just in R/ggplot2.

The approach is to create a second data set of events and to use that to determine positions. Starting with what @Angelo had:

library(ggplot2)
data(presidential)
data(economics)

Pull out the event (presidential) data, and transform it. Compute baseline and offset as fractions of the economic data it will be plotted with. Set the bottom (ymin) to the baseline. This is where the tricky part comes. We need to be able to stagger labels if they are too close together. So determine the spacing between adjacent labels (assumes that the events are sorted). If it is less than some amount (I picked about 4 years for this scale of data), then note that that label needs to be higher. But it has to be higher than the one after it, so use rle to get the length of TRUE's (that is, must be higher) and compute an offset vector using that (each string of TRUE must count down from its length to 2, the FALSEs are just at an offset of 1). Use this to determine the top of the bars (ymax).

events <- presidential[-(1:3),]
baseline = min(economics$unemploy)
delta = 0.05 * diff(range(economics$unemploy))
events$ymin = baseline
events$timelapse = c(diff(events$start),Inf)
events$bump = events$timelapse < 4*370 # ~4 years
offsets <- rle(events$bump)
events$offset <- unlist(mapply(function(l,v) {if(v){(l:1)+1}else{rep(1,l)}}, l=offsets$lengths, v=offsets$values, USE.NAMES=FALSE))
events$ymax <- events$ymin + events$offset * delta

Putting this together into a plot:

ggplot() +
    geom_line(mapping=aes(x=date, y=unemploy), data=economics , size=3, alpha=0.5) +
    geom_segment(data = events, mapping=aes(x=start, y=ymin, xend=start, yend=ymax)) +
    geom_point(data = events, mapping=aes(x=start,y=ymax), size=3) +
    geom_text(data = events, mapping=aes(x=start, y=ymax, label=name), hjust=-0.1, vjust=0.1, size=6) +
    scale_x_date("time") +  
    scale_y_continuous(name="unemployed \[1000's\]")

You could facet, but it is tricky with different scales. Another approach is composing two graphs. There is some extra fiddling that has to be done to make sure the plots have the same x-range, to make the labels all fit in the lower plot, and to eliminate the x axis in the upper plot.

xrange = range(c(economics$date, events$start))

p1 <- ggplot(data=economics, mapping=aes(x=date, y=unemploy)) +
    geom_line(size=3, alpha=0.5) +
    scale_x_date("", limits=xrange) +  
    scale_y_continuous(name="unemployed [1000's]") +
    opts(axis.text.x = theme_blank(), axis.title.x = theme_blank())

ylims <- c(0, (max(events$offset)+1)*delta) + baseline
p2 <- ggplot(data = events, mapping=aes(x=start)) +
    geom_segment(mapping=aes(y=ymin, xend=start, yend=ymax)) +
    geom_point(mapping=aes(y=ymax), size=3) +
    geom_text(mapping=aes(y=ymax, label=name), hjust=-0.1, vjust=0.1, size=6) +
    scale_x_date("time", limits=xrange) +
    scale_y_continuous("", breaks=NA, limits=ylims)

#install.packages("ggExtra", repos="http://R-Forge.R-project.org")
library(ggExtra)

align.plots(p1, p2, heights=c(3,1))


Now I like ggplot as much as the next guy, but if you want to make the Google Finance type charts, why not just do it with the Google graphics API?!? You're going to love this:

install.packages("googleVis")
library(googleVis)

dates <- seq(as.Date("2011/1/1"), as.Date("2011/12/31"), "days")
happiness <- rnorm(365)^ 2
happiness[333:365] <- happiness[333:365]  * 3 + 20
Title <- NA
Annotation <- NA
df <- data.frame(dates, happiness, Title, Annotation)
df$Title[333] <- "Discovers Google Viz"
df$Annotation[333] <- "Google Viz API interface by Markus Gesmann causes acute increases in happiness."

### Everything above here is just for making up data ### 
## from here down is the actual graphics bits        ###
AnnoTimeLine  <- gvisAnnotatedTimeLine(df, datevar="dates",
                                       numvar="happiness", 
                                       titlevar="Title", annotationvar="Annotation",
                                       options=list(displayAnnotations=TRUE,
                                                    legendPosition='newRow',
                                                    width=600, height=300)
                                       )
# Display chart
plot(AnnoTimeLine) 
# Create Google Gadget
cat(createGoogleGadget(AnnoTimeLine), file="annotimeline.xml")

and it produces this fantastic chart:


Need Your Help

Removing Specific Lines From Contents of a Memo field

database delphi paradox

How would I be able to use Delphi to remove data from a memo field that comes after a certain string, for example the data in the database I'm going through it displayed as such:

How to fire off an AJAX request and redirect AFTER making the request?

php javascript jquery

I am making use of JQuery to fire off an AJAX request. As soon as that PHP script is initiated I want to redirect my browser to a different page. But the AJAX request must have been fired off