Unsure how to pass column variable to mean() in r

Am a newb. Trying to code for a program. I have a multi-column data frame. I want to calc the mean of a column. I want to pass to the mean() function the name of the column that i want to use for mean calc. I have tried to pass it a character string that uses the $ symbol. It seems R doesnt allow the string passed to be a character and wants a logical or numeric when $ is used to define the column name. Net, am stuck. Is there another way to do this? Am suggestions would be appreciated. Code and results are below

> ## df.final is the name of the dataframe

> car.type        <- "ford"
> col.name        <- paste("df.final","$", car.type, sep = "")

> print(col.name)
[1] "df.final$ford"

> mean(col.name, na.rm = TRUE)
[1] NA
Warning message:
In mean.default(col.name, na.rm = TRUE) :
argument is not numeric or logical: returning NA

> mean(df.final$ford, na.rm = TRUE)
[1] 3.14

Answers


(df.final <- data.frame(ford = sample(0:100, 5), toyota = sample(0:50, 5)))
#   ford toyota
# 1   42      5
# 2   30     46
# 3   45     29
# 4   69     48
# 5   18     14
col.name
# [1] "df.final$ford"
typeof(col.name)
# [1] "character"

Currently, col.name is a character vector, so taking its mean makes no sense. Let's parse it into an expression:

temp <- parse(text = col.name)
temp
# expression(df.final$ford)
typeof(temp)
# [1] "expression"
mean(temp)
# [1] NA
# Warning message:
# In mean.default(temp) : argument is not numeric or logical: returning NA

Hmm. R still isn't happy, because taking the mean of an expression doesn't make sense either. Let's evaluate our expression.

temp <- eval(parse(text = col.name))
temp
# [1] 42 30 45 69 18
typeof(temp)
# [1] "integer"
mean(temp)
# [1] 40.8

Much better. So mean(eval(parse(text = col.name)), na.rm = T) does the trick for your example. You might also check out the useful function ?do.call as well:

do.call(mean, args = list(x = temp, na.rm = T))
# [1] 40.8

You can use [ or [[ to access columns by name:

df.final <- data.frame(ford=c(1, 2, NA), toyota=c(3, 2, 1))
car.type <- "ford"
mean(df.final[,car.type], na.rm=TRUE)
# [1] 1.5
mean(df.final[[car.type]], na.rm=TRUE)
# [1] 1.5

Just to mention, you can use eval(·) and parse(·)

> mean(eval(parse(text=col.name)), na.rm=TRUE)
[1] 1.5

Need Your Help

What is the best way to write a loop with no body

c++

I have a function func which returns true or false. Until func returns false, I want to keep calling it. What is the least awkward way to do this?