EnsembleIntegration

class eipy.ei.EnsembleIntegration(base_predictors=None, ensemble_predictors=None, k_outer=5, k_inner=5, n_samples=1, sampling_strategy='undersampling', sampling_aggregation=None, n_jobs=1, metrics=None, random_state=None, parallel_backend='loky', project_name='project', model_building=True, verbose=1)

Ensemble Integration.

Train and test a variety of ensemble classification algorithms using a nested cross validation approach.

Parameters
base_predictorsdict, default=None

Dictionary of (sklearn-like) base predictors. Can also be passed in the fit_base method.

ensemble_predictorsdict, default=None

Dictionary of (sklearn-like) stacking algorithms. Can also be passed in the fit_ensemble method.

k_outerint, default=5

Number of outer folds.

k_innerint, default=5

Number of inner folds.

n_samplesint, default=1

The number of samples to take when balancing classes. Ignored if sampling_strategy is None.

sampling_strategystr, default=None

The sampling method for class balancing. Can be set to ‘undersampling’, ‘oversampling’, ‘hybrid’.

sampling_aggregationstr, default=None

Method for combining multiple samples. Only relevant when n_samples>1. Can be ‘mean’ or None.

metricsdict, default=None

A dictionary of metrics for which to evaluate ensembles. If left as default=None, the fmax_score and roc_auc_score are calculated.

n_jobsint, default=1

Number of workers for parallelization in joblib.

metricsdict, default=None

If None, the maximized F1-score and AUC scores are calculated.

random_stateint, default=None

Random state for cross-validation and use in some models.

parallel_backendstr, default=’loky’

Backend to use in joblib. See joblib.Parallel() for other options.

project_namestr, default=’project’

Name of project.

model_buildingbool, default=True

Whether or not to train and save final models.

verboseint, default=1

Verbosity level. Can be set to 0 or 1.

Attributes
base_summarydict

Summary of performance scores for each base predictor. Scores can be accessed using the ‘metrics’ key and corresponding thresholds (if applicable) can be accessed in the ‘thresholds’ key.

ensemble_summarydict

Summary of performance scores for each ensemble method. Scores can be accessed using the ‘metrics’ key and corresponding thresholds (if applicable) can be accessed in the ‘thresholds’ key.

ensemble_training_datalist of pandas.DataFrame

Training data for ensemble methods, for each outer fold. len(ensemble_training_data) = len(k_outer)

ensemble_test_datalist of pandas.DataFrame

Test data for ensemble methods, for each outer fold. len(ensemble_test_data) = len(k_outer)

ensemble_predictionspandas.DataFrame

Combined predictions (across all outer folds) made by each ensemble method.

modality_nameslist of str

List of modalities in the order in which they were passed to EnsembleIntegration.

n_features_per_modalitylist of int

List of number of features in each modality corresponding to modality_names.

feature_namesdict

Feature names for each modality passed to fit_base.

random_numbers_for_sampleslist of int

Random numbers used to sample each training fold.

final_modelsdict

Dictionary of the form {“base models”: {}, “ensemble models”: {}}. Populated if model_building=True.

ensemble_training_data_final: list of pandas.DataFrame

List containing single dataframe of training data. Final models are trained on all available data.

cv_outerStratifiedKFold

StratifiedKFold() cross validator from sklearn.

cv_innerStratifiedKFold

StratifiedKFold() cross validator from sklearn.

Methods

fit_base(X, y[, base_predictors, modality_name])

Train base predictors and generate ensemble train/test data.

fit_ensemble([ensemble_predictors])

Train ensemble predictors on data generated by fit_base.

load(path)

Load from path.

predict(X_dict, ensemble_model_key)

Predict class labels for samples in X

save([path])

Save to path.

fit_base(X, y, base_predictors=None, modality_name=None)

Train base predictors and generate ensemble train/test data.

Parameters
Xarray of shape (n_samples, n_features)

Training vector, where n_samples is the number of samples and n_features is the number of features.

yarray of shape (n_samples,)

Target vector relative to X.

Returns
self

Ensemble train/test data and fitted final base predictors.

fit_ensemble(ensemble_predictors=None)

Train ensemble predictors on data generated by fit_base.

Parameters
ensemble_predictorsdict, default=None

Dictionary of (sklearn-like) stacking algorithms.

Returns
self

Summary of ensemble predictor performance and fitted final ensemble models.

predict(X_dict, ensemble_model_key)

Predict class labels for samples in X

Parameters
X_dictdict

Dictionary of X modalities each having n_samples. Keys and n_features must match those seen by fit_base.

ensemble_model_key

The key of the ensemble method selected during performance analysis.

Returns
y_predarray of shape (n_samples,)

Vector containing the class labels for each sample.

save(path=None)

Save to path.

Parameters
pathoptional, default=None

Path to save the EnsembleIntegration class object.

classmethod load(path)

Load from path.

Parameters
pathstr

Path to load the EnsembleIntegration class object.