data.matrix() when character involved

In order to calculate the highest contribution of a row per ID I have a beautiful script which works when the IDs are a numeric. Today however I found out that it is also possible that IDs can have characters (for instance ABC10101). For the function to work, the dataset is converted to a matrix. However data.matrix(df) does not support characters. Can the code be altered in order for the function to work with all kinds of IDs (character, numeric, etc.)? Currently I wrote a quick workaround which converts IDs to numeric when ID=character, but that will slow the process down for large datasets.

Example with code (function: extract the first entry with the highest contribution, so if 2 entries have the same contribution it selects the first):

Note: in this example ID is interpreted as a factor and data.matrix() converts it to a numeric value. In the code below the type of the ID column should be character and the output should be as shown at the bottom. Order IDs must remain the same.

tc <- textConnection('
    ID   contribution   uniqID      
   ABCUD022221       40           101  
   ABCUD022221       40           102 
   ABCUD022222       20           103
   ABCUD022222       10           104
   ABCUD022222       90           105
   ABCUD022223       75           106
   ABCUD022223       15           107
   ABCUD022223       10           108        ')

df <- read.table(tc,header=TRUE)

#Function that needs to be altered
uniqueMaxContr <- function(m, ID = 1, contribution = 2) {
  t(
    vapply(
           split(1:nrow(m), m[,ID]), 
           function(i, x, contribution) x[i, , drop=FALSE]
           [which.max(x[i,contribution]),], m[1,], x=m, contribution=contribution
          )
  )
}

df<-data.matrix(df) #only works when ID is numeric
highestdf<-uniqueMaxContr(df)
highestdf<-as.data.frame(highestdf)

In this case the outcome should be:

    ID   contribution   uniqID      
   ABCUD022221       40           101  
   ABCUD022222       90           105
   ABCUD022223       75           106

Answers


Others might be able to make it more concise, but this is my attempt at a data.table solution:

tc <- textConnection('
    ID   contribution   uniqID      
   ABCUD022221       40           101  
   ABCUD022221       40           102 
   ABCUD022222       20           103
   ABCUD022222       10           104
   ABCUD022222       90           105
   ABCUD022223       75           106
   ABCUD022223       15           107
   ABCUD022223       10           108        ')

df <- read.table(tc,header=TRUE)

library(data.table)
dt <- as.data.table(df)
setkey(dt,uniqID)

dt2 <- dt[,list(contribution=max(contribution)),by=ID]

setkeyv(dt2,c("ID","contribution"))
setkeyv(dt,c("ID","contribution"))

dt[dt2,mult="first"]

##               ID contribution uniqID
## [1,] ABCUD022221           40    101
## [2,] ABCUD022222           90    105
## [3,] ABCUD022223           75    106
EDIT -- more concise solution
  • You can use .SD which is the subset of the data.table for the grouping, and then use which.max to extract a single row.

in one line

dt[,.SD[which.max(contribution)],by=ID]

##               ID contribution uniqID
## [1,] ABCUD022221           40    101
## [2,] ABCUD022222           90    105
## [3,] ABCUD022223           75    106

Need Your Help

iOS + Parse Cloud Code - updating a specific user (not logged in user)

ios parse.com cloud-code

I am trying to update a user (Which is not the current user). I figured that you have to use cloud code to make the update.

How to Superscript some Text in a TextBox/TextBlock Control in windows phone 8?

windows-phone-7 xaml windows-phone-8 windows-phone

I'm working on windows phone 8 app, and m stuck here, guys i want to show some text as a superscript either in TextBox or in TextBlock where-ever possible. suggest me how can i obtained it. Thanks