[WIP] Add sample_weight
to permutation importances scikit-learn interface
#265
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Sample weights are an important parameter for imbalanced problems and some forms of bias correction and are part of the scikit-learn API for all classifiers and regressors. They can also have a large impact on permutation importances and are not supported through
fit_params
due to the need to split them into test / train like X and y for the CV case.Opening this as a PR now to start a discussion before I do anymore work. At the very least I would like to add to the documentation how you can use
get_score_importances
and pass sample weights through thescore_func
if you require them.As a very quick (and perhaps slightly flawed) example, if we calculate permutation importances for our standard data and then imbalance it (the reverse of the standard case)
I have seen dramatic (and less uniform) changes in feature_importances in real world imbalanced sets when doing this. I'll try and find a better example.
Issues
PermutationImportance
if anestimator
does not supportsample_weight
on itsfit
method.class_weight
which some classifiers (e.g.RandomForestClassifier
) support. In this case your sample weights are modified during the classifier fit to balance the classes but not during the permutation importance fit for the test set which can be misleading.