R longitudinal data - Grouping by multiple factors

I am still attempting to create a detailed time-series dataframe. I'm attempting to get monthly data for multiple data points, then group by multiple factors. I'm not sure this is possible, as I have not seen an example close to this in the documentation, vignettes or on SO.

Here is the sample data I am trying to structure:

clients <- 1:100
dates <- seq(as.Date("2012/1/1"), as.Date("2012/9/1"), "days")
categories <- LETTERS[1:5]
products <- data.frame(clientID = sample(clients, 10000, replace = TRUE), 
                       OrderDate = sample(dates, 10000, replace = TRUE), 
                       category = sample(categories, 10000, replace = TRUE),
                       numProducts = sample(1:10, 1000, replace = TRUE), 
                       OrderTotal = sample(1:100, 1000, replace = TRUE))

The output looks like this:

head(products)
  clientID  OrderDate category numProducts OrderTotal
1       90 2012-03-20        D           9         18
2       66 2012-08-19        A           3         50
3       45 2012-05-25        A          10         75
4       28 2012-01-01        D           4         27
5       71 2012-02-28        A           4         76
6       26 2012-01-28        C           8         89

The structure I am trying to get to would look something like this:

          Category A                                                                    ...   Category E
ClientID  Jan2012numProducts  Jan2012OrderTotal  Feb2012numProducts  Feb2012OrderTotal  ...  Sep2012numProducts  Sep2012OrderTotal
1         12                  78                 6                   52                      0                   0
2         7                   218                3                   15                      1                   28
...
99999     20                  192                10                  100                     28                  156

I realize that the column names will likely get long and would look something like AJan2012numProducts or AJan2012OrderTotal, and that's fine.

Here are the procedures I'm unclear about - Again, I can't find them referenced in the documentation or the vignettes:

1) Can zoo aggregate for multiple observation fields? In this case, I want to get the sum of numProducts and OrderTotal at the same time, for the month. Even if zoo can't, I could use the merge function and join on clientID and category

2) Can zoo group by a factor (or multiple factors) to perform the aggregation? I want to be able to look at clientID and category by month.

3) Is there an ability to make the dataframe with category and month along the X axis. If not, if I could get the time-series data to simply group together by clientID and category, I could then use reshape to make the time-series wide using cast. I would need to get the dataframe into this structure:

head(df)
clientID   Month     category    numProducts  OrderTotal
1        2012-01-31  A           12           78
1        2012-01-31  B           0            0
....
99999    2012-09-30  D           6            71
99999    2012-09-30  E           1            28



cast(df, month~category, sum) (or something close to that)

Is any of this possible? Could you help with some examples?

Answers


A combination of using format.Date, xtabs, and ftable gets you pretty much exactly what you ask for. I shortened the example a bit but the principle should be clear. If you wanted the month-field to be shorter you could change the name of the dimension in the table-object or you could make a month-column and redo all the work with that. (I admit I had trouble figuring out how 'zoo' would enter this picture. It looks like a simple tabulation problem at the moment. Although ... I'm sure aggregate.zoo is capable of aggregating on multiple criteria and using the sum as the aggregation function.)

First the two commands, then a console session output:

prodtble <- xtabs(cbind(numProducts, OrderTotal) ~ clientID + 
                                                  format(OrderDate, "%b%Y") + 
                                                  category, 
                  data=products)
ftable(prodtbl, row.vars=c("category","clientID"))

Now the output:

> xtabs(cbind(numProducts, OrderTotal) ~ clientID + format(OrderDate, "%b%Y")+category, data=products)
, , category = A,  = numProducts

        format(OrderDate, "%b%Y")
clientID Feb2012 Jan2012 Mar2012
       1      23       0      16
       2       0       6      27
       3      30       0      21
       4      13      33      24
       5       5      20      12

, , category = B,  = numProducts

        format(OrderDate, "%b%Y")
clientID Feb2012 Jan2012 Mar2012
       1       8      27      23
       2       8      14       4
       3       0       5       6
       4       8      13      39
       5       3      23       9

, , category = C,  = numProducts

        format(OrderDate, "%b%Y")
clientID Feb2012 Jan2012 Mar2012
       1       0       6      20
       2      20      20       4
       3       0      17       0
       4      17      11       2
       5       7       3       8

, , category = A,  = OrderTotal

        format(OrderDate, "%b%Y")
clientID Feb2012 Jan2012 Mar2012
       1      40       0      41
       2       0       5      33
       3      48       0      40
       4      16      28      24
       5      23      42      29

, , category = B,  = OrderTotal

        format(OrderDate, "%b%Y")
clientID Feb2012 Jan2012 Mar2012
       1      14      24      19
       2      22      19      19
       3       0       2       4
       4      19      46      62
       5      10      38      10

, , category = C,  = OrderTotal

        format(OrderDate, "%b%Y")
clientID Feb2012 Jan2012 Mar2012
       1       0       2      39
       2      30      33       7
       3       0      44       0
       4      50      21      19
       5      16      14      28
# You could have skipped the printout by assigning to 'prodtable' in the step above.
# I thought is was useful pedagogically.

> prodtbl <- .Last.value

> ftable(prodtbl, row.vars=c("category","clientID"))
                  format(OrderDate, "%b%Y")     Feb2012                Jan2012                Mar2012           
                                            numProducts OrderTotal numProducts OrderTotal numProducts OrderTotal
category clientID                                                                                               
A        1                                           23         40           0          0          16         41
         2                                            0          0           6          5          27         33
         3                                           30         48           0          0          21         40
         4                                           13         16          33         28          24         24
         5                                            5         23          20         42          12         29
B        1                                            8         14          27         24          23         19
         2                                            8         22          14         19           4         19
         3                                            0          0           5          2           6          4
         4                                            8         19          13         46          39         62
         5                                            3         10          23         38           9         10
C        1                                            0          0           6          2          20         39
         2                                           20         30          20         33           4          7
         3                                            0          0          17         44           0          0
         4                                           17         50          11         21           2         19
         5                                            7         16           3         14           8         28

This is the shortened example:

clients <- 1:5
dates <- seq(as.Date("2012/1/1"), as.Date("2012/3/31"), "days")
categories <- LETTERS[1:3]
products <- data.frame(clientID = sample(clients, 100, replace = TRUE), 
                       OrderDate = sample(dates, 100, replace = TRUE), 
                       category = sample(categories, 100, replace = TRUE),
                       numProducts = sample(1:10, 100, replace = TRUE), 
                       OrderTotal = sample(1:20, 100, replace = TRUE))

Need Your Help

How to iterate over an array and remove elements in JavaScript

javascript iteration bounds

I have an array of elements and need to remove certain ones from it. The problem is that JavaScript doesn't seem to have a for each loop and if I use a for loop I run into problems with it basically