Accumulated Local Effects (ALE)
1D ALE curves show how features affect predictions, accounting for feature correlations.
[ ]:
import skexplain
import plotting_config
[ ]:
# Load the training data and pre-fit models
estimators = skexplain.load_models()
X, y = skexplain.load_data()
X = X.astype({'urban': 'category', 'rural': 'category'})
explainer = skexplain.ExplainToolkit(estimators, X=X, y=y)
explainer.set_plotting_config(
display_feature_names=plotting_config.display_feature_names,
display_units=plotting_config.display_units,
)
Computing 1D ALE
The ale method computes 1D ALE curves. Key arguments:
features: a single feature, list of features, or'all'n_bins: number of percentile-based bins (default 30)n_bootstrap: number of bootstrap iterations for confidence intervalssubsample: number of examples to use
[ ]:
ale_1d_ds = explainer.ale(
features='all',
n_bins=20,
n_bootstrap=1,
subsample=10000,
n_jobs=1,
)
Plotting ALE Curves
Plot the ALE curve for a single feature. The light blue histogram in the background shows the data distribution.
[ ]:
fig, ax = explainer.plot_ale(
ale=ale_1d_ds,
features='sfc_temp',
)
Plotting Multiple Features
Use get_important_vars to select top features from a permutation importance result, then plot their ALE curves together.
[ ]:
# Load permutation importance results and get top features
results = explainer.load(fnames='../tutorial_data/multipass_importance_naupdc.nc')
important_vars = explainer.get_important_vars(
results, multipass=True, n_vars=100, combine=True
)
fig, axes = explainer.plot_ale(
ale=ale_1d_ds,
features=important_vars,
)
Customizing ALE Plots
You can customize line colors, styles, and the background histogram color using line_kws and hist_color.
[ ]:
fig, axes = explainer.plot_ale(
ale=ale_1d_ds,
features=important_vars,
line_kws={
'line_colors': ['b', 'orange', 'k'],
'linewidth': 3.0,
'linestyle': 'dashed',
},
hist_color='red',
)
Confidence Intervals via Bootstrapping
Set n_bootstrap > 1 to compute confidence intervals on the ALE curves. The shaded bands represent the uncertainty in the mean ALE value across bootstrap samples.
[ ]:
ale_1d_ci = explainer.ale(
features=important_vars,
n_bootstrap=10,
subsample=1000,
n_jobs=4,
n_bins=10,
)
fig, axes = explainer.plot_ale(ale=ale_1d_ci)
These confidence intervals reflect the uncertainty in the mean ALE value due to subsampling. They are not the same as the spread in individual conditional expectations (ICE curves), which capture variation from feature interactions.
ALE for Regression
ALE works for regression problems as well. Here we use the California housing dataset as a quick example.
[ ]:
from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import RandomForestRegressor
data = fetch_california_housing()
X_reg = data['data']
y_reg = data['target']
feature_names = data['feature_names']
model = RandomForestRegressor()
model.fit(X_reg, y_reg)
explainer_reg = skexplain.ExplainToolkit(
('Random Forest', model), X=X_reg, y=y_reg, feature_names=feature_names
)
ale_reg = explainer_reg.ale(
features=feature_names,
n_bootstrap=1,
subsample=10000,
n_jobs=6,
n_bins=30,
)
fig, axes = explainer_reg.plot_ale(ale_reg)