# comparing two integers in R: “longer object length not multiple of shorter object length” ddply

I'm getting an "longer object length not multiple of shorter object length" warning in R when comparing two integers to subset a dataframe in the midst of a user defined function.

The user defined function just returns the median of a subset of integers taken from a dataframe:

function(s){ return(median((subset(EDB,as.integer(validSession) == as.integer(s)))$absStudentDeviation)) }

(I did not originally have the as.integer coercions in there. I put them there to debug, text, and I'm still getting an error.)

The specific error I'm getting is:

In as.integer(validSession) == as.integer(s) : longer object length is not a multiple of shorter object length

I get this warning over 50 times when calling:

mediandf <- ddply(mediandf,.(validSession), transform, grossMed2 = medianfuncEDB(as.integer(validSession)))

The goal is to calculate the median of $validSession associated with the given validSession in the large dataframe EDB and attach that vector to mediandf.

I have actually double-checked that all values for validSession in both the mediandf dataframe and the EDB dataframe are integers by subsetting with is.integer(validSession).

Furthermore, it appears that the command actually does what I intend, I get a new column in my dataframe with values I have not verified, but I want to understand the warning. if "medianfuncEDB" is being called with an integer as its input, why am I getting a "longer object length is not multiple of shorter object length" when s == validSession is called?

Note that simple function calls, like medianfuncEDB(5) work without any problems, so why do I get warnings when using ddply?

EDIT: I found the problem with the help of Joran's comment. I did not know that transform fed entire vecotrs into the function. Using validSession[1] instead gave no warnings.

## Answers

The ddply function already subsets your data frame by validSession. Hence transform is only fed a data frame with all the rows corresponding to a particular validSession.

That is, transform is already being fed subset(mediandf,validSession==s) for each s in unique(mediandf$validSession).

Since you don't have to do any subsetting (ddply takes care of that), all you need to do is:

ddply(mediandf,.(validSession),transform,grossMed2=median(absStudentDeviation))

And then you'll get mediandf back out with a new column grossMed2 with the value you want (so it will be the same value within each unique validSession).