FNN Package : R and knn()
I am having trouble understanding what the variables in knn() mean in context of the R function as I don't come from a background of statistics.
Lets say that I am trying to predict a pool race results for each pool A, B, and C.
I know the height and weight of each racing candidate competing in the race. Assuming that the candidates competing are the same every year, I also know who won for the past 30 years.
How would I predict who is going to win at pool A, B, and C this year?
The train argument is a data frame with the columns of weight, height, and pool that he is competing in for each competitor. This is for the last 29 years.
The test argument is a data frame with the columns of weight, height, and pool that he is competing in for each competitor. This is for the last year.
The cl argument is a vector of which competitor won the race each year.
Is this how knn() was intended to be used?
Not exactly. Train data is used for training, but test for testing. You can't just train and apply it straight away - you need to cross-validate your model. The aim of model training is not to minimize the error, but to minimize the difference between in-sample and out-of-sample errors. Otherwise you will overfit it: the fact is if you do it good enough your in-sample error will be 0. Which will not give any good results for real prediction. Training set in that function is your in-sample and testing is out-of-sample.
The actual model is then built and you can make a prediction (i.e., for current year) using mymodel.predict().