EnsembleIntegration
- class eipy.ei.EnsembleIntegration(base_predictors=None, ensemble_predictors=None, k_outer=5, k_inner=5, n_samples=1, sampling_strategy='undersampling', sampling_aggregation=None, n_jobs=1, metrics=None, random_state=None, parallel_backend='loky', project_name='project', model_building=True, verbose=1)
Ensemble Integration.
Train and test a variety of ensemble classification algorithms using a nested cross validation approach.
- Parameters
- base_predictorsdict, default=None
Dictionary of (sklearn-like) base predictors. Can also be passed in the fit_base method.
- ensemble_predictorsdict, default=None
Dictionary of (sklearn-like) stacking algorithms. Can also be passed in the fit_ensemble method.
- k_outerint, default=5
Number of outer folds.
- k_innerint, default=5
Number of inner folds.
- n_samplesint, default=1
The number of samples to take when balancing classes. Ignored if sampling_strategy is None.
- sampling_strategystr, default=None
The sampling method for class balancing. Can be set to ‘undersampling’, ‘oversampling’, ‘hybrid’.
- sampling_aggregationstr, default=None
Method for combining multiple samples. Only relevant when n_samples>1. Can be ‘mean’ or None.
- metricsdict, default=None
A dictionary of metrics for which to evaluate ensembles. If left as default=None, the fmax_score and roc_auc_score are calculated.
- n_jobsint, default=1
Number of workers for parallelization in joblib.
- metricsdict, default=None
If None, the maximized F1-score and AUC scores are calculated.
- random_stateint, default=None
Random state for cross-validation and use in some models.
- parallel_backendstr, default=’loky’
Backend to use in joblib. See joblib.Parallel() for other options.
- project_namestr, default=’project’
Name of project.
- model_buildingbool, default=True
Whether or not to train and save final models.
- verboseint, default=1
Verbosity level. Can be set to 0 or 1.
- Attributes
- base_summarydict
Summary of performance scores for each base predictor. Scores can be accessed using the ‘metrics’ key and corresponding thresholds (if applicable) can be accessed in the ‘thresholds’ key.
- ensemble_summarydict
Summary of performance scores for each ensemble method. Scores can be accessed using the ‘metrics’ key and corresponding thresholds (if applicable) can be accessed in the ‘thresholds’ key.
- ensemble_training_datalist of pandas.DataFrame
Training data for ensemble methods, for each outer fold. len(ensemble_training_data) = len(k_outer)
- ensemble_test_datalist of pandas.DataFrame
Test data for ensemble methods, for each outer fold. len(ensemble_test_data) = len(k_outer)
- ensemble_predictionspandas.DataFrame
Combined predictions (across all outer folds) made by each ensemble method.
- modality_nameslist of str
List of modalities in the order in which they were passed to EnsembleIntegration.
- n_features_per_modalitylist of int
List of number of features in each modality corresponding to modality_names.
- feature_namesdict
Feature names for each modality passed to fit_base.
- random_numbers_for_sampleslist of int
Random numbers used to sample each training fold.
- final_modelsdict
Dictionary of the form {“base models”: {}, “ensemble models”: {}}. Populated if model_building=True.
- ensemble_training_data_final: list of pandas.DataFrame
List containing single dataframe of training data. Final models are trained on all available data.
- cv_outerStratifiedKFold
StratifiedKFold() cross validator from sklearn.
- cv_innerStratifiedKFold
StratifiedKFold() cross validator from sklearn.
Methods
fit_base(X, y[, base_predictors, modality_name])Train base predictors and generate ensemble train/test data.
fit_ensemble([ensemble_predictors])Train ensemble predictors on data generated by fit_base.
load(path)Load from path.
predict(X_dict, ensemble_model_key)Predict class labels for samples in X
save([path])Save to path.
- fit_base(X, y, base_predictors=None, modality_name=None)
Train base predictors and generate ensemble train/test data.
- Parameters
- Xarray of shape (n_samples, n_features)
Training vector, where n_samples is the number of samples and n_features is the number of features.
- yarray of shape (n_samples,)
Target vector relative to X.
- Returns
- self
Ensemble train/test data and fitted final base predictors.
- fit_ensemble(ensemble_predictors=None)
Train ensemble predictors on data generated by fit_base.
- Parameters
- ensemble_predictorsdict, default=None
Dictionary of (sklearn-like) stacking algorithms.
- Returns
- self
Summary of ensemble predictor performance and fitted final ensemble models.
- predict(X_dict, ensemble_model_key)
Predict class labels for samples in X
- Parameters
- X_dictdict
Dictionary of X modalities each having n_samples. Keys and n_features must match those seen by fit_base.
- ensemble_model_key
The key of the ensemble method selected during performance analysis.
- Returns
- y_predarray of shape (n_samples,)
Vector containing the class labels for each sample.
- save(path=None)
Save to path.
- Parameters
- pathoptional, default=None
Path to save the EnsembleIntegration class object.
- classmethod load(path)
Load from path.
- Parameters
- pathstr
Path to load the EnsembleIntegration class object.