big.matrix as data.frame in R

I've recently started using R for data analysis. Now I've got a problem in ranking a big query dataset (~1 GB in ASCII mode, over my laptop's 4GB RAM in binary mode). Using bigmemory::big.matrix for this dataset is a nice solution, but providing such a matrix 'm' in the gbm() or randomForest() algorithms causes the error:

cannot coerce class 'structure("big.matrix", package = "bigmemory")' into a data.frame

class(m) outputs the folowing:

[1] "big.matrix"
[1] "bigmemory"

Is there a way to correctly pass a big.matrix instance into these algorithms?


I obviously can't test this using data of your scale, but I can reproduce your errors by using the formula interface of each function:

m <- matrix(sample(0:1,5000,replace = TRUE),1000,5)
colnames(m) <- paste("V",1:5,sep = "")

bm <- as.big.matrix(m,type = "integer")


#Throws error you describe
rs <- randomForest(V1~.,data = bm)
#Runs without error (with a warning about the response only having two values)
rs <- randomForest(x = bm[,-1],y = bm[,1])

#Throws error you describe
rs <- gbm(V1~.,data = bm)
#Runs without error
rs <- = bm[,-1],y = bm[,1])

Not using the formula interface for randomForest is fairly common advice for large data sets; it can be quite inefficient. If you read ?gbm, you'll see a similar recommendation steering you towards for large data as well.

Need Your Help

Heroku does not show up my images

ruby-on-rails heroku 12factor

I have a problem, my images does not show up on heroku ..

Intermittent generic error occurred in GDI+ when saving image

I built a dynamic image resizing web application in ASP.NET 4.0, and it is running on a Windows 2008 R2 server using IIS 7.5.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.