EnsembleIntegration

class eipy.ei.EnsembleIntegration(base_predictors=None, ensemble_predictors=None, k_outer=5, k_inner=5, n_samples=1, sampling_strategy='undersampling', sampling_aggregation=None, n_jobs=1, metrics=None, random_state=None, parallel_backend='loky', project_name='project', model_building=True, verbose=1)

Ensemble Integration.

Train and test a variety of ensemble classification algorithms using a nested cross validation approach.

Parameters

base_predictorsdict, default=None: Dictionary of (sklearn-like) base predictors. Can also be passed in the fit_base method.
ensemble_predictorsdict, default=None: Dictionary of (sklearn-like) stacking algorithms. Can also be passed in the fit_ensemble method.
k_outerint, default=5: Number of outer folds.
k_innerint, default=5: Number of inner folds.
n_samplesint, default=1: The number of samples to take when balancing classes. Ignored if sampling_strategy is None.
sampling_strategystr, default=None: The sampling method for class balancing. Can be set to ‘undersampling’, ‘oversampling’, ‘hybrid’.
sampling_aggregationstr, default=None: Method for combining multiple samples. Only relevant when n_samples>1. Can be ‘mean’ or None.
metricsdict, default=None: A dictionary of metrics for which to evaluate ensembles. If left as default=None, the fmax_score and roc_auc_score are calculated.
n_jobsint, default=1: Number of workers for parallelization in joblib.
metricsdict, default=None: If None, the maximized F1-score and AUC scores are calculated.
random_stateint, default=None: Random state for cross-validation and use in some models.
parallel_backendstr, default=’loky’: Backend to use in joblib. See joblib.Parallel() for other options.
project_namestr, default=’project’: Name of project.
model_buildingbool, default=True: Whether or not to train and save final models.
verboseint, default=1: Verbosity level. Can be set to 0 or 1.

Attributes

base_summarydict: Summary of performance scores for each base predictor. Scores can be accessed using the ‘metrics’ key and corresponding thresholds (if applicable) can be accessed in the ‘thresholds’ key.
ensemble_summarydict: Summary of performance scores for each ensemble method. Scores can be accessed using the ‘metrics’ key and corresponding thresholds (if applicable) can be accessed in the ‘thresholds’ key.
ensemble_training_datalist of pandas.DataFrame: Training data for ensemble methods, for each outer fold. len(ensemble_training_data) = len(k_outer)
ensemble_test_datalist of pandas.DataFrame: Test data for ensemble methods, for each outer fold. len(ensemble_test_data) = len(k_outer)
ensemble_predictionspandas.DataFrame: Combined predictions (across all outer folds) made by each ensemble method.
modality_nameslist of str: List of modalities in the order in which they were passed to EnsembleIntegration.
n_features_per_modalitylist of int: List of number of features in each modality corresponding to modality_names.
feature_namesdict: Feature names for each modality passed to fit_base.
random_numbers_for_sampleslist of int: Random numbers used to sample each training fold.
final_modelsdict: Dictionary of the form {“base models”: {}, “ensemble models”: {}}. Populated if model_building=True.
ensemble_training_data_final: list of pandas.DataFrame: List containing single dataframe of training data. Final models are trained on all available data.
cv_outerStratifiedKFold: StratifiedKFold() cross validator from sklearn.
cv_innerStratifiedKFold: StratifiedKFold() cross validator from sklearn.

Methods

`fit_base`(X, y[, base_predictors, modality_name])	Train base predictors and generate ensemble train/test data.
`fit_ensemble`([ensemble_predictors])	Train ensemble predictors on data generated by fit_base.
`load`(path)	Load from path.
`predict`(X_dict, ensemble_model_key)	Predict class labels for samples in X
`save`([path])	Save to path.

fit_base(X, y, base_predictors=None, modality_name=None)

Train base predictors and generate ensemble train/test data.

Parameters

Xarray of shape (n_samples, n_features): Training vector, where n_samples is the number of samples and n_features is the number of features.
yarray of shape (n_samples,): Target vector relative to X.

Returns

self: Ensemble train/test data and fitted final base predictors.

fit_ensemble(ensemble_predictors=None)

Train ensemble predictors on data generated by fit_base.

Parameters

ensemble_predictorsdict, default=None: Dictionary of (sklearn-like) stacking algorithms.

Returns

self: Summary of ensemble predictor performance and fitted final ensemble models.

predict(X_dict, ensemble_model_key)

Predict class labels for samples in X

Parameters

X_dictdict: Dictionary of X modalities each having n_samples. Keys and n_features must match those seen by fit_base.
ensemble_model_key: The key of the ensemble method selected during performance analysis.

Returns

y_predarray of shape (n_samples,): Vector containing the class labels for each sample.

save(path=None)

Save to path.

Parameters

pathoptional, default=None: Path to save the EnsembleIntegration class object.

classmethod load(path)

Load from path.

Parameters

pathstr: Path to load the EnsembleIntegration class object.