{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Partial Dependence (PD)\n",
    "Partial dependence plots show the marginal effect of features on predictions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "source": [
    "import skexplain\n",
    "import plotting_config"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "source": [
    "# Load the training data and pre-fit models\n",
    "estimators = skexplain.load_models()\n",
    "X, y = skexplain.load_data()\n",
    "\n",
    "explainer = skexplain.ExplainToolkit(estimators, X=X, y=y)\n",
    "\n",
    "explainer.set_plotting_config(\n",
    "    display_feature_names=plotting_config.display_feature_names,\n",
    "    display_units=plotting_config.display_units,\n",
    "    feature_colors=plotting_config.color_dict,\n",
    ")"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Computing 1D PD\n",
    "\n",
    "The `pd` method computes partial dependence curves. Key arguments:\n",
    "- `features`: list of features to compute PD for\n",
    "- `n_bins`: number of evenly-spaced bins\n",
    "- `n_bootstrap`: number of bootstrap iterations for confidence intervals\n",
    "- `subsample`: number of examples to use"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "source": [
    "important_vars = ['sfcT_hrs_bl_frez', 'temp2m', 'sfc_temp', 'uplwav_flux']\n",
    "\n",
    "pd_1d_ds = explainer.pd(\n",
    "    features=important_vars,\n",
    "    n_bootstrap=1,\n",
    "    subsample=1000,\n",
    "    n_jobs=len(important_vars) * 3,\n",
    "    n_bins=10,\n",
    ")"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plotting PD Curves\n",
    "\n",
    "Plot the partial dependence curves for the selected features."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "source": [
    "fig, axes = explainer.plot_pd(pd=pd_1d_ds)"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## PD vs ALE\n",
    "\n",
    "PD marginalizes over all other features, which can be misleading when features are correlated. ALE accounts for correlations by restricting the computation to nearby data points.\n",
    "\n",
    "Below we compute the ALE for the same features so you can compare the two approaches."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "source": [
    "ale_1d_ds = explainer.ale(\n",
    "    features=important_vars,\n",
    "    n_bootstrap=1,\n",
    "    subsample=1000,\n",
    "    n_jobs=1,\n",
    "    n_bins=10,\n",
    ")\n",
    "\n",
    "fig, axes = explainer.plot_ale(ale=ale_1d_ds)"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When features are independent, PD and ALE will give similar results. When features are correlated (e.g., `sfc_temp` and `temp2m`), the PD curves may show artifacts because they evaluate the model at unrealistic feature combinations. ALE avoids this by computing effects within narrow bins of the feature distribution.\n",
    "\n",
    "As a rule of thumb, prefer ALE when you suspect feature correlations, and use PD when features are roughly independent or you want the marginal effect interpretation."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.8.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}