Accumulated Local Effects (ALE)

1D ALE curves show how features affect predictions, accounting for feature correlations.

[ ]:

import skexplain
import plotting_config

[ ]:

# Load the training data and pre-fit models
estimators = skexplain.load_models()
X, y = skexplain.load_data()
X = X.astype({'urban': 'category', 'rural': 'category'})

explainer = skexplain.ExplainToolkit(estimators, X=X, y=y)

explainer.set_plotting_config(
    display_feature_names=plotting_config.display_feature_names,
    display_units=plotting_config.display_units,
)

Computing 1D ALE

The ale method computes 1D ALE curves. Key arguments:

features: a single feature, list of features, or 'all'
n_bins: number of percentile-based bins (default 30)
n_bootstrap: number of bootstrap iterations for confidence intervals
subsample: number of examples to use

[ ]:

ale_1d_ds = explainer.ale(
    features='all',
    n_bins=20,
    n_bootstrap=1,
    subsample=10000,
    n_jobs=1,
)

Plotting ALE Curves

Plot the ALE curve for a single feature. The light blue histogram in the background shows the data distribution.

[ ]:

fig, ax = explainer.plot_ale(
    ale=ale_1d_ds,
    features='sfc_temp',
)

Plotting Multiple Features

Use get_important_vars to select top features from a permutation importance result, then plot their ALE curves together.

[ ]:

# Load permutation importance results and get top features
results = explainer.load(fnames='../tutorial_data/multipass_importance_naupdc.nc')
important_vars = explainer.get_important_vars(
    results, multipass=True, n_vars=100, combine=True
)

fig, axes = explainer.plot_ale(
    ale=ale_1d_ds,
    features=important_vars,
)

Customizing ALE Plots

You can customize line colors, styles, and the background histogram color using line_kws and hist_color.

[ ]:

fig, axes = explainer.plot_ale(
    ale=ale_1d_ds,
    features=important_vars,
    line_kws={
        'line_colors': ['b', 'orange', 'k'],
        'linewidth': 3.0,
        'linestyle': 'dashed',
    },
    hist_color='red',
)

Confidence Intervals via Bootstrapping

Set n_bootstrap > 1 to compute confidence intervals on the ALE curves. The shaded bands represent the uncertainty in the mean ALE value across bootstrap samples.

[ ]:

ale_1d_ci = explainer.ale(
    features=important_vars,
    n_bootstrap=10,
    subsample=1000,
    n_jobs=4,
    n_bins=10,
)

fig, axes = explainer.plot_ale(ale=ale_1d_ci)

These confidence intervals reflect the uncertainty in the mean ALE value due to subsampling. They are not the same as the spread in individual conditional expectations (ICE curves), which capture variation from feature interactions.

ALE for Regression

ALE works for regression problems as well. Here we use the California housing dataset as a quick example.

[ ]:

from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import RandomForestRegressor

data = fetch_california_housing()
X_reg = data['data']
y_reg = data['target']
feature_names = data['feature_names']

model = RandomForestRegressor()
model.fit(X_reg, y_reg)

explainer_reg = skexplain.ExplainToolkit(
    ('Random Forest', model), X=X_reg, y=y_reg, feature_names=feature_names
)

ale_reg = explainer_reg.ale(
    features=feature_names,
    n_bootstrap=1,
    subsample=10000,
    n_jobs=6,
    n_bins=30,
)

fig, axes = explainer_reg.plot_ale(ale_reg)