Open many files and plot colour by dataset in R

I am a newbie to R and I am sure this is simple, but I am not sure what terms to search for.

I have a series of data files in directories, each with the same format (tab separated, each has X_DATA and Y_DATA columns). I want to open them up and plot them with ggplot2, with each dataset having a different colour. I have tried the following, but it returns an error (also below).

Script:

require(ggplot2)

# ---
# open files
d_files <- list.files(pattern = '*.dat', recursive=TRUE)
d_list <- lapply(d_files, read.csv, sep = "\t")

## ---
# add a name attribute
for (i in seq_along(d_list)) attr(d_list[[i]], 'Name') <- d_files[[i]]

# ---
# join all of the files into a single dataset
d <- do.call('rbind', d_list)

# ---
# plot
p <- ggplot(data=d, aes(x=X_DATA, y=Y_DATA, colour=Name)) + geom_point()
ggsave(p, file="test.pdf", width=8, height=4.5)

Output:

Loading required package: ggplot2
Loading required package: methods
Error in eval(expr, envir, enclos) : object 'Name' not found
Calls: ggsave ... sapply -> lapply -> eval.quoted -> lapply -> FUN -> eval
Execution halted

EDIT:

Here is a python script to generate some data

from random import uniform
N = 100  # entries per file
M = 3  # number of files
for i in range(M):
    with open('%i.dat' % (i + 1), 'w') as f:
        f.write('X_DATA\tY_DATA\n')
        f.write('\n'.join((('%g\t%g' % (x, x ** (i + 1))) for x in (uniform(0,1) for j in range(N)))))

This R script shows what I would like, however, here, I have to explicitly type out every file. This is not appropriate for the working version.

# ---
# read each file and assign the dataset a name
d1 <- read.csv('1.dat', sep='\t')
d1$Name = '1.dat'
d2 <- read.csv('2.dat', sep='\t')
d2$Name = '2.dat'
d3 <- read.csv('3.dat', sep='\t')
d3$Name = '3.dat'

# ---
# combine datasets
d <- rbind(d1, d2, d3)

# ---
# plot
p <- ggplot(data=d, aes(x=X_DATA, y=Y_DATA, colour=Name)) + geom_point()
ggsave(p, file="test.pdf", width=8, height=4.5)

From the original script, setting the following as per below works: for (i in seq_along(d_list)) d_list[[i]][['Name']] <- d_files[[i]] -- but does this not mean that I now have a text element called Name for every data point? This strikes me as not ideal...

Answers


This is total guessology because you haven't given us a reproducible example.

Let's read the first bit of the error message. Always a good idea:

Error in eval(expr, envir, enclos) : object 'Name' not found

Now, Name appears twice in your script:

for (i in seq_along(d_list)) attr(d_list[[i]], 'Name') <- d_files[[i]]

and

p <- ggplot(data=d, aes(x=X_DATA, y=Y_DATA, colour=Name)) + geom_point()

And I suspect what you are trying to do is add a Name column to the data frame (Comment your code!!!). This is not what R calls an 'Attribute'. In R, attributes are little additional bits of metadata you can stick on objects, and are also used for dimensions, row names, and column names (but not the actual data itself).

I think in your loop, do:

 for (i in seq_along(d_list)) d_list[[i]][['Name']] <- d_files[[i]]

to add a Name column.

A way that doesn't involve adding Name to every data frame is this:

ggplot()+geom_point(aes(x=X_DATA,y=Y_DATA,col=d_files[[1]]),d_list[[1]]) +
        geom_point(aes(x=X_DATA,y=Y_DATA,col=d_files[[2]]),d_list[[2]]) +
        geom_point(aes(x=X_DATA,y=Y_DATA,col=d_files[[3]]),d_list[[3]])

But try as I might I can't get this in a loop. Raaaage.

This almost works:

plots = laply(1:3,function(i)
    {geom_point(aes(x=X_DATA,y=Y_DATA,col=d_files[[i]]),d_list[[i]])}
)
Reduce("+",plots,init=ggplot())

But fails because ggplot evaluates d_list[[i]] for i at the time the geom is created, but evaluates d_files[[i]] when the geom is plotted. So you'll see the points from all three sets, but they all seem to come from i=3. If you set i=2 and re-run the Reduce function you'll see them all seeming to come from the second dataset.

There's probably a way round that that doesn't involve making a character string and evalling it. Will call a guru...


Need Your Help

How can/should I host ratings inside of my app?

android rating review

I am making a yelp-like app in which users can discover local places and rate them, but I am having trouble on deciding how I should store the ratings for each place. What would be the best method of

Comparing Two Objects QUnit Javascript

javascript qunit

I need to compare the properties of two objects and the property type, but not the values just the type.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.