This function wraps the easyml core framework, allowing a user to easily run the easyml methodology for a random forest model.
easy_random_forest(.data, dependent_variable, family = "gaussian", resample = NULL, preprocess = preprocess_identity, measure = NULL, exclude_variables = NULL, categorical_variables = NULL, train_size = 0.667, foldid = NULL, n_samples = 1000, n_divisions = 1000, n_iterations = 10, random_state = NULL, progress_bar = TRUE, n_core = 1, coefficients = FALSE, variable_importances = TRUE, predictions = TRUE, model_performance = TRUE, model_args = list())
.data | A data.frame; the data to be analyzed. |
---|---|
dependent_variable | A character vector of length one; the dependent variable for this analysis. |
family | A character vector of length one; the type of regression to run on the data. Choices are one of c("gaussian", "binomial"). Defaults to "gaussian". |
resample | A function; the function for resampling the data. Defaults to NULL. |
preprocess | A function; the function for preprocessing the data. Defaults to NULL. |
measure | A function; the function for measuring the results. Defaults to NULL. |
exclude_variables | A character vector; the variables from the data set to exclude. Defaults to NULL. |
categorical_variables | A character vector; the variables that are categorical. Defaults to NULL. |
train_size | A numeric vector of length one; specifies what proportion of the data should be used for the training data set. Defaults to 0.667. |
foldid | A vector with length equal to |
n_samples | An integer vector of length one; specifies the number of times the coefficients and predictions should be generated. Defaults to 1000. |
n_divisions | An integer vector of length one; specifies the number of times the data should be divided when replicating the measures of model performance. Defaults to 1000. |
n_iterations | An integer vector of length one; during each division, specifies the number of times the predictions should be generated. Defaults to 10. |
random_state | An integer vector of length one; specifies the seed to be used for the analysis. Defaults to NULL. |
progress_bar | A logical vector of length one; specifies whether to display a progress bar during calculations. Defaults to TRUE. |
n_core | An integer vector of length one; specifies the number of cores to use for this analysis. Currently only works on Mac OSx and Unix/Linux systems. Defaults to 1. |
coefficients | A logical vector of length one; whether or not to generate coefficients for this analysis. |
variable_importances | A logical vector of length one; whether or not to generate variable importances for this analysis. |
predictions | A logical vector of length one; whether or not to generate predictions for this analysis. |
model_performance | A logical vector of length one; whether or not to generate measures of model performance for this analysis. |
model_args | A list; the arguments to be passed to the algorithm specified. |
A list of class easy_random_forest
.
Other recipes: easy_analysis
,
easy_avNNet
,
easy_deep_neural_network
,
easy_glinternet
, easy_glmnet
,
easy_neural_network
,
easy_support_vector_machine
# NOT RUN { library(easyml) # https://github.com/CCS-Lab/easyml # Gaussian data("prostate", package = "easyml") results <- easy_random_forest(prostate, "lpsa", n_samples = 10L, n_divisions = 10, n_iterations = 2, random_state = 12345, n_core = 1) # Binomial data("cocaine_dependence", package = "easyml") results <- easy_random_forest(cocaine_dependence, "diagnosis", family = "binomial", exclude_variables = c("subject"), categorical_variables = c("male"), n_samples = 10, n_divisions = 10, n_iterations = 2, random_state = 12345, n_core = 1) # }