Read Table and Random Forest in R

I'm trying to use the Random Forest method in R. I need to read a txt file (training set).

dataset<- read.table(path1,header=TRUE,sep=",")

The column names are numeric (i.e 1005_at) so they are automatically converted, adding X, by R (i.e X1005_at). In order to resolve this problem I did:


Now the names are ok, but when I run the Random Forest:

model.rf <- randomForest(class ~ ., data=dataset, importance=TRUE,keep.forest=T, ntree=5, do.trace=T) 

I have this error:

Error in eval(expr, envir, enclos) : object '1005_at' not found

While if I run the Random Forest on the original dataset (without modify the names, so using X1005_at) this error doesn't occur. Why? How can I fix it?


Use read.csv as it already has the appropriate defaults for header and sep and use the check.names=FALSE argument to avoid mangling the names.

The formula method of randomForest will not accept non-syntactic names in the input data frame. Use the default method instead.

Thus we have:

> # dataset <- read.csv(path1, check.names = FALSE)
> # next few lines are to make example similar to the one in the question
> dataset <- CO2
> names(dataset) <- c(paste(1:4, names(dataset[1:4]), sep = "_"), "class")
> names(dataset)
[1] "1_Plant"     "2_Type"      "3_Treatment" "4_conc"      "class"      
> i <- match("class", names(dataset)) # i is index of class column
> fm <- randomForest(dataset[-i], dataset[[i]]
+    # other arguments - in this example none
+ )
> fm

 randomForest(x = dataset[-i], y = dataset[[i]]) 
               Type of random forest: regression
                     Number of trees: 500
No. of variables tried at each split: 1

          Mean of squared residuals: 26.43385
                    % Var explained: 77.13
> fm$importance
1_Plant          2105.779
2_Type           1529.527
3_Treatment       557.300
4_conc           2265.724

