Descriptive Statistics with multiple columns and multiple categories

Consider the dummy data:

head(df)

group   v1  v2  v3
1       3   9   7
1       4   7   6
2       10  9   1
2       12  2   2
2       15  9   10

I need to compute the mean for each columns (v1, v2, v3) for each group.

I tried the using by() with colMeans(), which works as:

mean.df = by(df[,2:4],df$group,colMeans)

It works just fine, but I need to rely on the existence of a "column version" of any function I want to apply to my data. There is, when I need to compute the standard deviation or the interquantile range (IQR), for example, there are no colSds or colIQR functions so just replacing colMeans doesn't do the trick.

I could use "for" loops, but I'd like to try this without loops such as:

mean.df = data.frame(group = 1:2)
for (i in 2:ncol(df)) {
mean.df[,i] = tapply(df[,i],df$group,mean)
}

This way I can just plug any descriptive statistics function and it returns the desired output:

> mean.df
  group       V2       V3       V4
1     1  3.50000 8.000000 6.500000
2     2 12.33333 6.666667 4.333333

Is there any better way to do this without using loops or relying on column-wise functions?

Thanks in advance

Answers


The function aggregate can be used to apply a function to multiple columns based on a grouping variable:

> aggregate(. ~ group, df, mean)
  group       v1       v2       v3
1     1  3.50000 8.000000 6.500000
2     2 12.33333 6.666667 4.333333

> aggregate(. ~ group, df, sd)
  group        v1       v2        v3
1     1 0.7071068 1.414214 0.7071068
2     2 2.5166115 4.041452 4.9328829

> aggregate(. ~ group, df, IQR)
  group  v1  v2  v3
1     1 0.5 1.0 0.5
2     2 2.5 3.5 4.5

Another option is the data.table package:

> library(data.table)
> DT <- as.data.table(df)
> DT[ , lapply(.SD, mean), by = group]
   group       v1       v2       v3
1:     1  3.50000 8.000000 6.500000
2:     2 12.33333 6.666667 4.333333

Need Your Help

Calculations on datetime range

ruby-on-rails ruby postgresql ruby-on-rails-4

I have a problem with calculation on datetime fields.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.