{ "cells": [ { "cell_type": "markdown", "id": "0de7782b", "metadata": {}, "source": [ "# Comparing Different Ranking Methods\n", "\n", "There are several options with skexplain to rank features. " ] }, { "cell_type": "code", "execution_count": 1, "id": "8571d857", "metadata": {}, "outputs": [], "source": [ "import sys, os \n", "current_dir = os.getcwd()\n", "path = os.path.dirname(current_dir)\n", "sys.path.append(path)\n", "import skexplain\n", "import plotting_config\n", "import shap\n", "from skexplain.common.utils import shap_values_to_importance, coefficients_to_importance" ] }, { "cell_type": "code", "execution_count": 2, "id": "5fd55d65", "metadata": {}, "outputs": [], "source": [ "estimators = skexplain.load_models()\n", "X,y = skexplain.load_data()\n", "X = X.astype({'urban': 'category', 'rural':'category'})" ] }, { "cell_type": "markdown", "id": "b727eef3", "metadata": {}, "source": [ "For demonstration purposes, we will evaluate how important predictors vary for the logistic regression model. " ] }, { "cell_type": "code", "execution_count": null, "id": "61749bbd", "metadata": {}, "outputs": [], "source": "explainer = skexplain.InterpretToolkit(estimators=estimators[-1], X=X, y=y,)\nexplainer.set_plotting_config(\n display_feature_names=plotting_config.display_feature_names,\n display_units=plotting_config.display_units,\n feature_colors=plotting_config.color_dict,\n)" }, { "cell_type": "markdown", "id": "4e2895ad", "metadata": {}, "source": [ "## Using ALE for feature rankings \n", "\n", "Though feature importance is defined with respect to model performance, we can also rank features by their magnitude of their first-order effect. Inspired by Greenwell et al. (2018, https://arxiv.org/abs/1805.04755), we can compute the standard deviation of the ALE (rather than the partial dependence) and those features with the highest standard deviation have the biggest range of contributions to a model's performance. This method has limitations though: \n", "* The standard deviation is affected by outliers and may skew the results. \n", "* When feature interactions are strong, the 1-D expected contribution may not fully capture the strength of the first-order effect.\n", "* Only considers first-order effects\n", "* Does not explicitly measure the contribution of features to the model's accuracy\n", "* Unclear when the importance score is significantly/practically different from zero" ] }, { "cell_type": "code", "execution_count": 4, "id": "ee4cc11c", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "3dc2317ec85e4a4a92482f4d75853395", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/28 [00:00