big.matrix as data.frame in R
I've recently started using R for data analysis. Now I've got a problem in ranking a big query dataset (~1 GB in ASCII mode, over my laptop's 4GB RAM in binary mode). Using bigmemory::big.matrix for this dataset is a nice solution, but providing such a matrix 'm' in the gbm() or randomForest() algorithms causes the error:
cannot coerce class 'structure("big.matrix", package = "bigmemory")' into a data.frame
class(m) outputs the folowing:
 "big.matrix" attr(,"package")  "bigmemory"
Is there a way to correctly pass a big.matrix instance into these algorithms?
I obviously can't test this using data of your scale, but I can reproduce your errors by using the formula interface of each function:
require(bigmemory) m <- matrix(sample(0:1,5000,replace = TRUE),1000,5) colnames(m) <- paste("V",1:5,sep = "") bm <- as.big.matrix(m,type = "integer") require(gbm) require(randomForest) #Throws error you describe rs <- randomForest(V1~.,data = bm) #Runs without error (with a warning about the response only having two values) rs <- randomForest(x = bm[,-1],y = bm[,1]) #Throws error you describe rs <- gbm(V1~.,data = bm) #Runs without error rs <- gbm.fit(x = bm[,-1],y = bm[,1])
Not using the formula interface for randomForest is fairly common advice for large data sets; it can be quite inefficient. If you read ?gbm, you'll see a similar recommendation steering you towards gbm.fit for large data as well.