Discussion:
[R] Help with caret, please
Iván Vallés Pérez
2014-10-11 23:58:11 UTC
Permalink
Hello,

I am using caret package in order to train a K-Nearest Neigbors algorithm. For this, I am running this code:

Control <- trainControl(method="cv", summaryFunction=twoClassSummary, classProb=T)

tGrid=data.frame(k=1:100)

trainingInfo <- train(Formula, data=trainData, method = "knn",tuneGrid=tGrid,
trControl=Control, metric = "ROC")
As you can see, I am interested in obtain the AUC parameter of the ROC. This code works good but returns the testing error (which the algorithm uses for tuning the k parameter of the model) as the mean of the error of the CrossValidation folds. I am interested in return, in addition of the testing error, the trainingerror (the mean across each fold of the error obtained with the training data). ?How can I do it?

Thank you
[[alternative HTML version deleted]]
Max Kuhn
2014-10-12 01:21:17 UTC
Permalink
What you are asking is a bad idea on multiple levels. You will grossly
over-estimate the area under the ROC curve. Consider the 1-NN model: you
will have perfect predictions every time.

To do this, you will need to run train again and modify the index and
indexOut objects:

library(caret)

set.seed(1)
dat <- twoClassSim(200)

set.seed(2)
folds <- createFolds(dat$Class, returnTrain = TRUE)

Control <- trainControl(method="cv",
summaryFunction=twoClassSummary,
classProb=T,
index = folds,
indexOut = folds)

tGrid=data.frame(k=1:100)

set.seed(3)
a_bad_idea <- train(Class ~ ., data=dat,
method = "knn",
tuneGrid=tGrid,
trControl=Control, metric = "ROC")

Max

On Sat, Oct 11, 2014 at 7:58 PM, Iv?n Vall?s P?rez <
Post by Iván Vallés Pérez
Hello,
I am using caret package in order to train a K-Nearest Neigbors algorithm.
Control <- trainControl(method="cv", summaryFunction=twoClassSummary, classProb=T)
tGrid=data.frame(k=1:100)
trainingInfo <- train(Formula, data=trainData, method =
"knn",tuneGrid=tGrid,
trControl=Control, metric = "ROC")
As you can see, I am interested in obtain the AUC parameter of the ROC.
This code works good but returns the testing error (which the algorithm
uses for tuning the k parameter of the model) as the mean of the error of
the CrossValidation folds. I am interested in return, in addition of the
testing error, the trainingerror (the mean across each fold of the error
obtained with the training data). ?How can I do it?
Thank you
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

Loading...