Title: | Baseline Models for Classification and Regression |
---|---|
Description: | Providing equivalent functions for the dummy classifier and regressor used in 'Python' 'scikit-learn' library. Our goal is to allow R users to easily identify baseline performance for their classification and regression problems. Our baseline models use no predictors, and are useful in cases of class imbalance, multiclass classification, and when users want to quickly identify how much improvement their statistical and machine learning models are over several baseline models. We use a "better" default (proportional guessing) for the dummy classifier than the 'Python' implementation ("prior", which is the most frequent class in the training set). The functions in the package can be used on their own, or introduce methods named 'dummy_regressor' or 'dummy_classifier' that can be used within the caret package pipeline. |
Authors: | Ying-Ju Chen [aut, cre] |
Maintainer: | Ying-Ju Chen <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.0 |
Built: | 2025-02-19 02:58:11 UTC |
Source: | https://github.com/ying-ju/basemodels |
dummy classifier for a categorical variable.
dummy_classifier( y, strategy = "proportional", constant = NULL, random_state = NULL )
dummy_classifier( y, strategy = "proportional", constant = NULL, random_state = NULL )
y |
a categorical vector, containing the outcomes of interest |
strategy |
a strategy from "constant", "most_frequent", "proportional", "uniform", or "stratified". |
constant |
a constant value for the constant strategy. |
random_state |
a random seed. |
a list
# Split the data into training and testing sets set.seed(2023) index <- sample(1:nrow(iris), nrow(iris) * 0.8) train_data <- iris[index,] test_data <- iris[-index,] dummy_model <- dummy_classifier(train_data$Species, strategy = "proportional", random_state = 2024) dummy_model
# Split the data into training and testing sets set.seed(2023) index <- sample(1:nrow(iris), nrow(iris) * 0.8) train_data <- iris[index,] test_data <- iris[-index,] dummy_model <- dummy_classifier(train_data$Species, strategy = "proportional", random_state = 2024) dummy_model
dummy classifier for a categorical variable, used with the train function in caret.
dummy_classifier_caret( strategy = "proportional", constant = NULL, random_state = NULL )
dummy_classifier_caret( strategy = "proportional", constant = NULL, random_state = NULL )
strategy |
a strategy from "constant", "most_frequent", "proportional", "uniform", or "stratified". |
constant |
a constant value for the constant strategy. |
random_state |
a random seed. |
a list
dummy regressor for a numerical variable.
dummy_regressor(y, strategy = "mean", quantile = NULL, constant = NULL)
dummy_regressor(y, strategy = "mean", quantile = NULL, constant = NULL)
y |
a numerical vector. |
strategy |
a strategy from "constant", "mean", "median", or "quantile". |
quantile |
used when using the quantile strategy. It is a value between 0 and 1. |
constant |
used when using the constant strategy. It is a numeric value. |
a list containing information of the model.
# Split the data into training and testing sets set.seed(2023) index <- sample(1:nrow(iris), nrow(iris) * 0.8) train_data <- iris[index,] test_data <- iris[-index,] reg_model <- dummy_regressor(train_data$Sepal.Length, strategy = "median") reg_model
# Split the data into training and testing sets set.seed(2023) index <- sample(1:nrow(iris), nrow(iris) * 0.8) train_data <- iris[index,] test_data <- iris[-index,] reg_model <- dummy_regressor(train_data$Sepal.Length, strategy = "median") reg_model
dummy regressor for a numerical variable, used in the train function in caret.
dummy_regressor_caret(strategy = "mean", quantile = NULL, constant = NULL)
dummy_regressor_caret(strategy = "mean", quantile = NULL, constant = NULL)
strategy |
a strategy from "constant", "mean", "median", or "quantile". |
quantile |
used when using the quantile strategy. It is a value between 0 and 1. |
constant |
used when using the constant strategy. It is a numeric value. |
a list containing information of the model.
a method used for the train function in caret
dummyClassifier
dummyClassifier
An object of class list
of length 13.
# Split the data into training and testing sets set.seed(2023) index <- sample(1:nrow(iris), nrow(iris) * 0.8) train_data <- iris[index,] test_data <- iris[-index,] ctrl1 <- caret::trainControl(method = "none") # Train a dummy classifier with caret dummy_model <- caret::train(Species ~ ., data = train_data, method = dummyClassifier, strategy = "stratified", trControl = ctrl1) # Make predictions using the trained dummy classifier pred_vec <- predict(dummy_model, test_data) # Evaluate the performance of the dummy classifier conf_matrix <- caret::confusionMatrix(pred_vec, test_data$Species) print(conf_matrix)
# Split the data into training and testing sets set.seed(2023) index <- sample(1:nrow(iris), nrow(iris) * 0.8) train_data <- iris[index,] test_data <- iris[-index,] ctrl1 <- caret::trainControl(method = "none") # Train a dummy classifier with caret dummy_model <- caret::train(Species ~ ., data = train_data, method = dummyClassifier, strategy = "stratified", trControl = ctrl1) # Make predictions using the trained dummy classifier pred_vec <- predict(dummy_model, test_data) # Evaluate the performance of the dummy classifier conf_matrix <- caret::confusionMatrix(pred_vec, test_data$Species) print(conf_matrix)
a method used for the train function in caret
dummyRegressor
dummyRegressor
An object of class list
of length 13.
# Split the data into training and testing sets set.seed(2023) index <- sample(1:nrow(iris), nrow(iris) * 0.8) train_data <- iris[index,] test_data <- iris[-index,] ctrl1 <- caret::trainControl(method = "none") # Train a dummy regressor with caret reg_model <- caret::train(Sepal.Length ~ ., data = train_data, method = dummyRegressor, strategy = "median", trControl = ctrl1) y_hat <- predict(reg_model, test_data) # Find mean squared error mean((test_data$Sepal.Length-y_hat)^2)
# Split the data into training and testing sets set.seed(2023) index <- sample(1:nrow(iris), nrow(iris) * 0.8) train_data <- iris[index,] test_data <- iris[-index,] ctrl1 <- caret::trainControl(method = "none") # Train a dummy regressor with caret reg_model <- caret::train(Sepal.Length ~ ., data = train_data, method = dummyRegressor, strategy = "median", trControl = ctrl1) y_hat <- predict(reg_model, test_data) # Find mean squared error mean((test_data$Sepal.Length-y_hat)^2)
dummy classifier predictor
predict_dummy_classifier(object, X)
predict_dummy_classifier(object, X)
object |
a list created using dummy classifier. |
X |
a data frame. |
predicted values for the response variable.
# Split the data into training and testing sets set.seed(2023) index <- sample(1:nrow(iris), nrow(iris) * 0.8) train_data <- iris[index,] test_data <- iris[-index,] dummy_model <- dummy_classifier(train_data$Species, strategy = "proportional", random_state = 2024) # Make predictions using the trained dummy classifier pred_vec <- predict_dummy_classifier(dummy_model, test_data) # Evaluate the performance of the dummy classifier conf_matrix <- caret::confusionMatrix(pred_vec, test_data$Species) print(conf_matrix)
# Split the data into training and testing sets set.seed(2023) index <- sample(1:nrow(iris), nrow(iris) * 0.8) train_data <- iris[index,] test_data <- iris[-index,] dummy_model <- dummy_classifier(train_data$Species, strategy = "proportional", random_state = 2024) # Make predictions using the trained dummy classifier pred_vec <- predict_dummy_classifier(dummy_model, test_data) # Evaluate the performance of the dummy classifier conf_matrix <- caret::confusionMatrix(pred_vec, test_data$Species) print(conf_matrix)
dummy regressor predictor
predict_dummy_regressor(object, X)
predict_dummy_regressor(object, X)
object |
a list from the dummy_regressor function |
X |
a data frame |
the predicted values
#' # Split the data into training and testing sets set.seed(2023) index <- sample(1:nrow(iris), nrow(iris) * 0.8) train_data <- iris[index,] test_data <- iris[-index,] # Make predictions using the trained dummy regressor reg_model <- dummy_regressor(train_data$Sepal.Length, strategy = "median") y_hat <- predict_dummy_regressor(reg_model, test_data) # Find mean squared error mean((test_data$Sepal.Length-y_hat)^2)
#' # Split the data into training and testing sets set.seed(2023) index <- sample(1:nrow(iris), nrow(iris) * 0.8) train_data <- iris[index,] test_data <- iris[-index,] # Make predictions using the trained dummy regressor reg_model <- dummy_regressor(train_data$Sepal.Length, strategy = "median") y_hat <- predict_dummy_regressor(reg_model, test_data) # Find mean squared error mean((test_data$Sepal.Length-y_hat)^2)
probabilities for predicting classes
predict_proba(model, X)
predict_proba(model, X)
model |
a list from dummy classifier. |
X |
a data frame. |
a probability matrix.