Comparing Feature Ranking Methods

Different methods can rank features differently. This notebook compares multi-pass permutation importance, ALE variance-based ranking, SHAP-based ranking, and grouped permutation importance to provide a comprehensive view of feature relevance.

[ ]:
import skexplain
import plotting_config
import shap
from skexplain.common.utils import shap_values_to_importance
[ ]:
estimators = skexplain.load_models()
X, y = skexplain.load_data()
X = X.astype({'urban': 'category', 'rural': 'category'})

print(f'X Shape : {X.shape}')
print(f'y Skew : {y.mean()*100:.1f}%')
[ ]:
explainer = skexplain.ExplainToolkit(estimators=estimators, X=X, y=y)

explainer.set_plotting_config(
    display_feature_names=plotting_config.display_feature_names,
    display_units=plotting_config.display_units,
    feature_colors=plotting_config.color_dict,
)

Multi-Pass Permutation Importance

The standard backward multi-pass method progressively permutes features and measures the drop in model performance.

[ ]:
perm_results = explainer.permutation_importance(
    n_vars=10,
    evaluation_fn='norm_aupdc',
    n_permute=5,
    subsample=0.1,
    n_jobs=8,
    verbose=True,
    random_seed=42,
    direction='backward',
)

ALE Variance-Based Ranking

Instead of measuring performance loss, we can rank features by the standard deviation of their 1D Accumulated Local Effects (ALE). Features with higher ALE variance have a larger range of contribution to the model’s predictions. This approach is inspired by Greenwell et al. (2018).

[ ]:
ale_1d_ds = explainer.ale(features='all', n_bootstrap=1, subsample=1000, n_jobs=8, n_bins=20)
ale_var_1d = explainer.ale_variance(ale=ale_1d_ds)

SHAP-Based Ranking

SHAP values provide per-example feature attributions. By summarizing the absolute SHAP values across examples, we get another importance ranking. Here we use local_attributions(method='shap') and convert the results to an importance dataset.

[ ]:
# Use a subset for SHAP computation (can be expensive)
X_subset = shap.sample(X, 10, random_state=22)
background_dataset = shap.sample(X, 100)

shap_explainer = skexplain.ExplainToolkit(estimators[-1], X=X_subset)
shap_explainer.set_plotting_config(
    display_feature_names=plotting_config.display_feature_names,
    display_units=plotting_config.display_units,
    feature_colors=plotting_config.color_dict,
)

shap_results = shap_explainer.local_attributions(
    method='shap',
    shap_kws={'masker': background_dataset, 'algorithm': 'auto'},
)

shap_values = shap_results['shap_values__Logistic Regression'].values
shap_rank = shap_values_to_importance(
    shap_values,
    estimator_name='Logistic Regression',
    feature_names=X.columns,
)

Grouped Permutation Importance

When features are correlated, grouping them reveals their joint importance. The Group Only Permutation Feature Importance (GOPFI) method from Au et al. (2021) clusters correlated features and evaluates each group together. The feature groups are stored in .attrs["feature_groups"] of the returned dataset.

[ ]:
grouped_results = explainer.grouped_permutation_importance(
    perm_method='grouped_only',
    evaluation_fn='norm_aupdc',
    n_permute=5,
    n_jobs=8,
    subsample=0.1,
    clustering_kwargs={'n_clusters': 10},
)

# Inspect the automatically determined feature groups
print('Feature groups:', grouped_results.attrs['feature_groups'])

Comparing All Methods

We can plot results from different ranking methods side-by-side using plot_importance with multiple panels. This lets us see where different methods agree or disagree on feature rankings.

[ ]:
data = [perm_results, perm_results, ale_var_1d, shap_rank]
panels = [
    ('backward_multipass', 'Logistic Regression'),
    ('backward_singlepass', 'Logistic Regression'),
    ('ale_variance', 'Logistic Regression'),
    ('shap', 'Logistic Regression'),
]

fig = explainer.plot_importance(
    data=data,
    panels=panels,
    num_vars_to_plot=10,
    figsize=(14, 4),
)

Analysis

Comparing the rankings across methods helps triangulate which features are truly important. Features that rank highly across multiple methods are more likely to be robustly important, while disagreements may indicate sensitivity to inter-feature correlations or the specific way each method defines “importance.” Grouped importance is particularly useful when many features are correlated, as it avoids double-counting redundant information.