# creating factors in R with differing number of replicates

The following command

```     region <- gl(6,2,24, label=c("ag", "cb", "cx", "ec", "hp", "mb"))
```

creates a factor in the following way

```     structure(c(1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 1L,
1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L), .Label = c("ag",
"cb", "cx", "ec", "hp", "mb"), class = "factor")
```

But when I try to create it for a differing number of replicates it goes wrong. For instance when ag and cb are three replicates each and I would need something like this

```     structure(c(1L, 1L,1L, 2L, 2L, 2L, 2L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 1L,
1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L), .Label = c("ag",
"cb", "cx", "ec", "hp", "mb"), class = "factor")
```

How to write the command

```       region <- gl(6,2,24, label=c("ag", "cb", "cx", "ec", "hp", "mb")) now?
```

You need in that exact order? If not, this will work:

```factor(rep(c('ag', 'cb', 'cx', 'ex', 'hp', 'mb'), times=c(5, 6, 3, 4, 4, 4)))
```

If the order is important, adapt the code should be easy.

gl is simply a wrapper for rep.int. You can call rep yourself

```l <- c("ag", "cb", "cx", "ec", "hp", "mb")
# I will presume you want the output to now be length 28 to account
# for the extra replications in the first two levels
factor(rep_len(rep.int(l, times = rep.int(c(3,2), c(2,4))),28))
## [1] ag ag ag cb cb cb cx cx ec ec hp hp mb mb
## [14] ag ag ag cb cb cb cx cx ec ec hp hp mb mb
## Levels: ag cb cx ec hp mb
```

I think if I understand your desired output correctly, you might have to do a bit of manual fiddling around using rep. This is what gl uses to make the factors anyway:

```region <- rep( c( rep( c( "ag" , "cb" ) , each = 3 )  , rep ( c( "cx", "ec", "hp", "mb" ) , each = 2 ) ) , times = 2 )
region <- as.factor( region )
region
# [1] ag ag ag cb cb cb cx cx ec ec hp hp mb mb ag ag ag cb cb cb cx cx ec ec hp hp mb mb
# Levels: ag cb cx ec hp mb
```

I would create a vector of numbers with the proper replicates, using rep and then convert it to a factor by specifying the labels.

```vector <- c(rep(1:2, each=3), rep(3:6, each=2))

region <- factor(vector,
levels=1:6,
labels=c("ag", "cb", "cx", "ec", "hp", "mb"))
```