This recipe is the workhorse behind all of the easy_* functions.
easy_analysis(.data, dependent_variable, algorithm, family = "gaussian", resample = NULL, preprocess = NULL, measure = NULL, exclude_variables = NULL, categorical_variables = NULL, train_size = 0.667, foldid = NULL, survival_rate_cutoff = 0.05, n_samples = 1000, n_divisions = 1000, n_iterations = 10, random_state = NULL, progress_bar = TRUE, n_core = 1, coefficients = NULL, variable_importances = NULL, predictions = NULL, model_performance = NULL, model_args = list())
.data | A data.frame; the data to be analyzed. |
---|---|
dependent_variable | A character vector of length one; the dependent variable for this analysis. |
algorithm | A character vector of length one; the algorithm to run on the data. Choices are currently one of c("deep_neural_network", "glinternet", "glmnet", "neural_network", "random_forest", "support_vector_machine"). |
family | A character vector of length one; the type of regression to run on the data. Choices are one of c("gaussian", "binomial"). Defaults to "gaussian". |
resample | A function; the function for resampling the data. Defaults to NULL. |
preprocess | A function; the function for preprocessing the data. Defaults to NULL. |
measure | A function; the function for measuring the results. Defaults to NULL. |
exclude_variables | A character vector; the variables from the data set to exclude. Defaults to NULL. |
categorical_variables | A character vector; the variables that are categorical. Defaults to NULL. |
train_size | A numeric vector of length one; specifies what proportion of the data should be used for the training data set. Defaults to 0.667. |
foldid | A vector with length equal to |
survival_rate_cutoff | A numeric vector of length one; for |
n_samples | An integer vector of length one; specifies the number of times the coefficients and predictions should be generated. Defaults to 1000. |
n_divisions | An integer vector of length one; specifies the number of times the data should be divided when replicating the measures of model performance. Defaults to 1000. |
n_iterations | An integer vector of length one; during each division, specifies the number of times the predictions should be generated. Defaults to 10. |
random_state | An integer vector of length one; specifies the seed to be used for the analysis. Defaults to NULL. |
progress_bar | A logical vector of length one; specifies whether to display a progress bar during calculations. Defaults to TRUE. |
n_core | An integer vector of length one; specifies the number of cores to use for this analysis. Currently only works on Mac OSx and Unix/Linux systems. Defaults to 1. |
coefficients | A logical vector of length one; whether or not to generate coefficients for this analysis. |
variable_importances | A logical vector of length one; whether or not to generate variable importances for this analysis. |
predictions | A logical vector of length one; whether or not to generate predictions for this analysis. |
model_performance | A logical vector of length one; whether or not to generate measures of model performance for this analysis. |
model_args | A list; the arguments to be passed to the algorithm specified. |
A list of class easy_*
, where * is the name of the algorithm.
An object of class call
; the original function call.
A data.frame; the original data.
A character vector of length one; the dependent variable for this analysis.
A character vector of length one; the algorithm to run on the data.
A character vector of length one; the class of the object.
A character vector of length one; the type of regression to run on the data. Choices are one of c("gaussian", "binomial"). Defaults to "gaussian".
A function; the function for resampling the data.
A function; the function for preprocessing the data.
A function; the function for measuring the results.
A character vector; the variables from the data set to exclude.
A numeric vector of length one; specifies what proportion of the data should be used for the training data set.
A numeric vector of length one; for easy_glmnet
, specifies the minimal threshold (as a percentage) a coefficient must appear out of n_samples.
An integer vector of length one; specifies the number of times the coefficients and predictions should be generated.
An integer vector of length one; specifies the number of times the data should be divided when generating measures of model performance.
An integer vector of length one; during each division, specifies the number of times the predictions should be generated.
An integer vector of length one; specifies the seed to be used for the analysis.
A logical vector of length one; specifies whether to display a progress bar during calculations.
An integer vector of length one; specifies the number of cores to use for this analysis.
A logical vector of length one; whether or not to generate coefficients for this analysis.
A logical vector of length one; whether or not to generate variable importances for this analysis.
A logical vector of length one; whether or not to generate predictions for this analysis.
A logical vector of length one; whether or not to generate measures of model performance for this analysis.
A list; the arguments to be passed to the algorithm specified.
A character vector; the column names.
A logical vector; the variables that are categorical.
A data.frame; the full dataset to be used for modeling.
A vector; the full response variable to be used for modeling.
A (n_variables, n_samples) matrix; the generated coefficients.
A data.frame; the coefficients after being processed.
A ggplot object; the plot of the processed coefficients.
A data.frame; the train dataset to be used for modeling.
A data.frame; the test dataset to be used for modeling.
A vector; the train response variable to be used for modeling.
A vector; the test response variable to be used for modeling.
A (nrow(X_train), n_samples) matrix; the train predictions.
A (nrow(X_test), n_samples) matrix; the test predictions.
A vector; the mean train predictions.
A vector; the mean test predictions.
A function; the function for plotting predictions generated by the model.
A ggplot object; the plot of the mean train predictions.
A ggplot object; the plot of the mean test predictions.
A vector of length n_divisions; the measures of model performance on the train datasets.
A vector of length n_divisions; the measures of model performance on the test datasets.
A function; the function for plotting the measures of model performance.
A ggplot object; the plot of the measures of model performance on the train datasets.
A ggplot object; the plot of the measures of model performance on the test datasets.
Other recipes: easy_avNNet
,
easy_deep_neural_network
,
easy_glinternet
, easy_glmnet
,
easy_neural_network
,
easy_random_forest
,
easy_support_vector_machine