This is an introduction to the package basemodels
. This
package provides equivalent functions for the dummy classifier and
regressor used in ‘Python’ ‘scikit-learn’ library with some
modifications. We aim to help R users easily identify baseline
performance for their classification and regression problems. Our
baseline models do not use any predictors to make predictions. They are
useful in cases of class imbalance, multi-class classification, and when
users want to quickly compare their statistical and machine learning
models with several baseline models to see how much they have
improved.
We show a few examples here. First, we split the data into training and testing sets.
set.seed(2023)
index <- sample(1:nrow(iris), nrow(iris) * 0.8)
train_data <- iris[index,]
test_data <- iris[-index,]
We can use the dummyClassifier method for the train() function in
caret
package.
ctrl1 <- caret::trainControl(method = "none")
# Train a dummy classifier with caret
dummy_model <- caret::train(Species ~ .,
data = train_data,
method = dummyClassifier,
strategy = "stratified",
trControl = ctrl1)
# Make predictions using the trained dummy classifier
pred_vec <- predict(dummy_model, test_data)
# Evaluate the performance of the dummy classifier
conf_matrix <- caret::confusionMatrix(pred_vec, test_data$Species)
print(conf_matrix)
#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction setosa versicolor virginica
#> setosa 3 3 3
#> versicolor 4 2 2
#> virginica 5 4 4
#>
#> Overall Statistics
#>
#> Accuracy : 0.3
#> 95% CI : (0.1473, 0.494)
#> No Information Rate : 0.4
#> P-Value [Acc > NIR] : 0.9060
#>
#> Kappa : -0.0448
#>
#> Mcnemar's Test P-Value : 0.7269
#>
#> Statistics by Class:
#>
#> Class: setosa Class: versicolor Class: virginica
#> Sensitivity 0.2500 0.22222 0.4444
#> Specificity 0.6667 0.71429 0.5714
#> Pos Pred Value 0.3333 0.25000 0.3077
#> Neg Pred Value 0.5714 0.68182 0.7059
#> Prevalence 0.4000 0.30000 0.3000
#> Detection Rate 0.1000 0.06667 0.1333
#> Detection Prevalence 0.3000 0.26667 0.4333
#> Balanced Accuracy 0.4583 0.46825 0.5079
For a classification problem, we can use the dummy_classifier() function.
dummy_model <- dummy_classifier(train_data$Species, strategy = "proportional", random_state = 2024)
# Make predictions using the trained dummy classifier
pred_vec <- predict_dummy_classifier(dummy_model, test_data)
# Evaluate the performance of the dummy classifier
conf_matrix <- caret::confusionMatrix(pred_vec, test_data$Species)
print(conf_matrix)
#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction setosa versicolor virginica
#> setosa 6 3 3
#> versicolor 3 4 4
#> virginica 3 2 2
#>
#> Overall Statistics
#>
#> Accuracy : 0.4
#> 95% CI : (0.2266, 0.594)
#> No Information Rate : 0.4
#> P-Value [Acc > NIR] : 0.5689
#>
#> Kappa : 0.0909
#>
#> Mcnemar's Test P-Value : 0.8810
#>
#> Statistics by Class:
#>
#> Class: setosa Class: versicolor Class: virginica
#> Sensitivity 0.5000 0.4444 0.22222
#> Specificity 0.6667 0.6667 0.76190
#> Pos Pred Value 0.5000 0.3636 0.28571
#> Neg Pred Value 0.6667 0.7368 0.69565
#> Prevalence 0.4000 0.3000 0.30000
#> Detection Rate 0.2000 0.1333 0.06667
#> Detection Prevalence 0.4000 0.3667 0.23333
#> Balanced Accuracy 0.5833 0.5556 0.49206
For a regression problem, we can use the dummy_regressor() function.
# Make predictions using the trained dummy regressor
reg_model <- dummy_regressor(train_data$Sepal.Length, strategy = "median")
y_hat <- predict_dummy_regressor(reg_model, test_data)
# Find mean squared error
mean((test_data$Sepal.Length-y_hat)^2)
#> [1] 0.6056667
The dummyRegressor method can be used for the train() function in
caret
package.
ctrl1 <- caret::trainControl(method = "none")
# Train a dummy regressor with caret
reg_model <- caret::train(Sepal.Length ~ ., data = train_data,
method = dummyRegressor,
strategy = "median",
trControl = ctrl1)
y_hat <- predict(reg_model, test_data)
# Find mean squared error
mean((test_data$Sepal.Length-y_hat)^2)
#> [1] 0.6056667