Permutation Importance

Single-pass and multi-pass permutation importance are model-agnostic methods for ranking features by their contribution to model performance. In the single-pass method, each feature is individually permuted and the resulting drop in performance is measured. The multi-pass method extends this by keeping the most important feature permuted before assessing the next, which helps break inter-feature correlations.

This notebook demonstrates how to compute and visualize both methods, compare forward vs. backward selection, and annotate correlated features.

[ ]:
import skexplain
import plotting_config
[ ]:
estimators = skexplain.load_models()
X, y = skexplain.load_data()

print(estimators)
print(f'X Shape : {X.shape}')
print(f'y Skew : {y.mean()*100:.1f}%')
[ ]:
explainer = skexplain.ExplainToolkit(estimators=estimators, X=X, y=y)

explainer.set_plotting_config(
    display_feature_names=plotting_config.display_feature_names,
    display_units=plotting_config.display_units,
    feature_colors=plotting_config.color_dict,
)

Computing Permutation Importance

We compute the backward multi-pass permutation importance for the top 10 features using the Normalized Area Under the Performance Diagram Curve (NAUPDC) as the evaluation metric. The n_permute=5 setting produces bootstrap confidence intervals.

[ ]:
results = explainer.permutation_importance(
    n_vars=10,
    evaluation_fn='norm_aupdc',
    n_permute=5,
    subsample=0.1,
    n_jobs=8,
    verbose=True,
    random_seed=42,
    direction='backward',
)

Plotting Single-Pass Importance

The first iteration of the multi-pass method is the single-pass result and is saved by default. The panels argument controls what to display: (method, estimator_name).

[ ]:
fig = explainer.plot_importance(
    data=results,
    panels=[('backward_singlepass', 'Random Forest')],
    num_vars_to_plot=15,
)

Multi-Pass Importance

Multi-pass keeps previously identified important features permuted, breaking inter-feature correlations.

[ ]:
fig = explainer.plot_importance(
    data=[results]*2,
    panels=[
        ('backward_multipass', 'Random Forest'),
        ('backward_multipass', 'Logistic Regression'),
    ],
    num_vars_to_plot=10,
)

Comparing Single-Pass vs Multi-Pass

Placing single-pass and multi-pass results side-by-side reveals how inter-feature correlations affect the rankings.

[ ]:
fig = explainer.plot_importance(
    data=[results]*2,
    panels=[
        ('backward_singlepass', 'Random Forest'),
        ('backward_multipass', 'Random Forest'),
    ],
    num_vars_to_plot=10,
)

Forward vs Backward Selection

The backward method starts with unaltered features and progressively permutes them. The forward method starts with all features permuted and progressively un-permutes them. Comparing both provides a more complete picture.

[ ]:
forward_results = explainer.permutation_importance(
    n_vars=10,
    evaluation_fn='norm_aupdc',
    n_permute=5,
    subsample=0.1,
    n_jobs=8,
    verbose=True,
    random_seed=42,
    direction='forward',
)
[ ]:
fig = explainer.plot_importance(
    data=[results]*3 + [forward_results]*3,
    panels=[
        ('backward_multipass', 'Random Forest'),
        ('backward_multipass', 'Gradient Boosting'),
        ('backward_multipass', 'Logistic Regression'),
        ('forward_multipass', 'Random Forest'),
        ('forward_multipass', 'Gradient Boosting'),
        ('forward_multipass', 'Logistic Regression'),
    ],
    ylabels=['Backward', 'Forward'],
    figsize=(8, 5),
    hspace=0.2,
)

Annotating Correlated Features

Permutation importance assumes independent features. When strong correlations exist, the rankings can be distorted. Setting plot_correlated_features=True annotates correlated pairs on the plot.

[ ]:
fig = explainer.plot_importance(
    data=[results]*3,
    panels=[
        ('backward_multipass', 'Random Forest'),
        ('backward_multipass', 'Gradient Boosting'),
        ('backward_multipass', 'Logistic Regression'),
    ],
    plot_correlated_features=True,
    rho_threshold=0.6,
    figsize=(13, 4),
)

References

  • McGovern, A., R. Lagerquist, D. John Gagne, G. E. Jergensen, K. L. Elmore, C. R. Homeyer, and T. Smith, 2019: Making the Black Box More Transparent: Understanding the Physical Implications of Machine Learning. Bull. Amer. Meteor. Soc., 100, 2175-2199.

  • Lakshmanan, V., C. Karstens, J. Krause, K. Elmore, A. Ryzhkov, and S. Berkseth, 2015: Which Polarimetric Variables Are Important for Weather/No-Weather Discrimination? J. Atmos. Oceanic Technol., 32, 1209-1223.

  • Flora, M. L., C. K. Potvin, and A. McGovern, 2021: The Use of a Machine Learning Approach to Predict the Quality of Model Output Statistics. Mon. Wea. Rev., 149, 1367-1385.