Chapter 7 Comparing Models
One of the more frequent activities in Machine Learning relates to setting up “shoot outs” between different models to see which one will perform the best. This is something we could do without caret but the package does help accomplish this using a standard interface. We’ll keep using the Pima Indians Data and (re)build a few models. We’ll use a common control object as well as a seed to maintain reproducibility.
control <- trainControl(method="cv",
number=5,
summaryFunction = twoClassSummary,
classProbs = TRUE)
# Train the glm model
set.seed(7)
model_glm <- train(diabetes ~ .,
data=Train,
method="glm",
metric="ROC",
trControl=control)
# Train the Decision Tree <odel
set.seed(7)
model_rpart <- train(diabetes~.,
data=Train,
method="rpart",
metric="ROC",
trControl=control)
# Train the Random Forest model
set.seed(7)
model_rf <- train(diabetes~.,
data=Train,
method="rf",
metric="ROC",
trControl=control)
# Train the knn model
set.seed(7)
model_knn <- train(diabetes~.,
data=Train,
method="knn",
metric="ROC",
trControl=control)
# Use the resamples function to prep for comparisons
results <- resamples(list(GLM = model_glm,
RPART = model_rpart,
RF = model_rf,
KNN = model_knn))Now we can easily look at how well the different models compare:
##
## Call:
## summary.resamples(object = results)
##
## Models: GLM, RPART, RF, KNN
## Number of resamples: 5
##
## ROC
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## GLM 0.7805233 0.7892442 0.8188953 0.8267442 0.8514535 0.8936047 0
## RPART 0.6809593 0.7011628 0.7017442 0.7260756 0.7720930 0.7744186 0
## RF 0.7792151 0.7918605 0.8107558 0.8197674 0.8396802 0.8773256 0
## KNN 0.7088663 0.7329942 0.7729651 0.7706686 0.7889535 0.8495640 0
##
## Sens
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## GLM 0.8250 0.875 0.8875 0.8800 0.9000 0.9125 0
## RPART 0.8500 0.850 0.8500 0.8625 0.8625 0.9000 0
## RF 0.8250 0.825 0.8375 0.8450 0.8625 0.8750 0
## KNN 0.7875 0.800 0.8375 0.8350 0.8500 0.9000 0
##
## Spec
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## GLM 0.4418605 0.5348837 0.5813953 0.5674419 0.6279070 0.6511628 0
## RPART 0.4186047 0.5116279 0.5348837 0.5302326 0.5581395 0.6279070 0
## RF 0.4883721 0.5581395 0.5581395 0.5720930 0.5813953 0.6744186 0
## KNN 0.3953488 0.4651163 0.5581395 0.5255814 0.5581395 0.6511628 0


Note that the capabilities of caret are considerable. We can do pre processing of data that insulates us from having to do it ourselves.