# How can I check whether data has an equal number of observations per group?

I'm writing some code where I need to check whether all group sizes for a given input of data are equal. For example, suppose I wanted to know whether the "mpg" dataset (in the ggplot2 package) has:

- Equal numbers of cars for every manufacturer
- Equal numbers of cars for each type of drive (4-wheel, front-wheel, rear-wheel)
- Equal numbers of cars for each engine type (4-cylinder, 6-cylinder, 8-cylinder)

For data like mpg, some of those questions can be answered by inspecting the summary output

library(ggplot2) # contains the mpg dataset summary(mpg$drive) # shows the breakdown of cars by drive-type, # which we can verify is unequal

But I feel like I'm missing an easy way to check whether group sizes are equal. Is there some single, mythical function I can call like are.groups.of.equal.size(x)? Or another base function (or composition of them) that would return such information?

## Answers

As Joran said we could invent 100s of ways from here till Christmas on how to do this one. I smell a microbenchmark challenge:

are.groups.of.equal.size <- function(x) { y <- rle(as.character(sort(x)))$lengths all(y%in%mean(y)) } are.groups.of.equal.size(c(3, 3, 3)) are.groups.of.equal.size(mtcars$cyl) are.groups.of.equal.size(CO2$Plant) are.groups.of.equal.size(mtcars$carb)

Here is one way of doing it:

are.groups.of.equal.size <- function(x)length(unique(table(x))) == 1L are.groups.of.equal.size(mpg$manufacturer) # [1] FALSE are.groups.of.equal.size(mpg$drv) # [1] FALSE are.groups.of.equal.size(mpg$year) # [1] TRUE

Note that if needed, table has options for how to handle NAs in your data.

Using the sd approach:

are.groups.of.equal.size <- function(x) { x2 <- tapply(x, x, length) sd(x2)==0 | length(x2)==1 } are.groups.of.equal.size(c(3, 3, 3)) are.groups.of.equal.size(mtcars$cyl) are.groups.of.equal.size(CO2$Plant) are.groups.of.equal.size(mtcars$carb)