Partial Dependence (PD)
Partial dependence plots show the marginal effect of features on predictions.
[ ]:
import skexplain
import plotting_config
[ ]:
# Load the training data and pre-fit models
estimators = skexplain.load_models()
X, y = skexplain.load_data()
explainer = skexplain.ExplainToolkit(estimators, X=X, y=y)
explainer.set_plotting_config(
display_feature_names=plotting_config.display_feature_names,
display_units=plotting_config.display_units,
feature_colors=plotting_config.color_dict,
)
Computing 1D PD
The pd method computes partial dependence curves. Key arguments:
features: list of features to compute PD forn_bins: number of evenly-spaced binsn_bootstrap: number of bootstrap iterations for confidence intervalssubsample: number of examples to use
[ ]:
important_vars = ['sfcT_hrs_bl_frez', 'temp2m', 'sfc_temp', 'uplwav_flux']
pd_1d_ds = explainer.pd(
features=important_vars,
n_bootstrap=1,
subsample=1000,
n_jobs=len(important_vars) * 3,
n_bins=10,
)
Plotting PD Curves
Plot the partial dependence curves for the selected features.
[ ]:
fig, axes = explainer.plot_pd(pd=pd_1d_ds)
PD vs ALE
PD marginalizes over all other features, which can be misleading when features are correlated. ALE accounts for correlations by restricting the computation to nearby data points.
Below we compute the ALE for the same features so you can compare the two approaches.
[ ]:
ale_1d_ds = explainer.ale(
features=important_vars,
n_bootstrap=1,
subsample=1000,
n_jobs=1,
n_bins=10,
)
fig, axes = explainer.plot_ale(ale=ale_1d_ds)
When features are independent, PD and ALE will give similar results. When features are correlated (e.g., sfc_temp and temp2m), the PD curves may show artifacts because they evaluate the model at unrealistic feature combinations. ALE avoids this by computing effects within narrow bins of the feature distribution.
As a rule of thumb, prefer ALE when you suspect feature correlations, and use PD when features are roughly independent or you want the marginal effect interpretation.