Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/contribution plot improvment #553

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
45d0820
smart subset ; distribution visualization
guillaume-vignal Apr 29, 2024
eb93363
contribution plot disribution color parameter
guillaume-vignal Apr 30, 2024
515be5a
Merge branch 'master' of https://github.com/guillaume-vignal/shapash …
guillaume-vignal Apr 30, 2024
cd237c2
Update of contribution test
guillaume-vignal May 3, 2024
991cc3b
Merge branch 'master' of https://github.com/guillaume-vignal/shapash …
guillaume-vignal May 6, 2024
5454569
Merge branch 'master' of https://github.com/guillaume-vignal/shapash …
guillaume-vignal May 6, 2024
56ec766
Merge branch 'master' of https://github.com/guillaume-vignal/shapash …
guillaume-vignal May 6, 2024
eeb047a
fix: distribution bar printing in report
guillaume-vignal May 13, 2024
ca6a80a
add for contribution plot explaination about the distributions
guillaume-vignal May 13, 2024
271cf60
fix: jittering points when shapeley values are the same for each violin
guillaume-vignal May 13, 2024
1a7c516
fix bug violin plot when all the points have the same contribution value
guillaume-vignal May 14, 2024
b092275
fix test for the bugfix violin plot when all the points have the same…
guillaume-vignal May 14, 2024
184f628
Update of jupyter notebooks with the new visualisation of notebooks
guillaume-vignal May 16, 2024
0e0fc1d
fix: bug on colorscale with one value and dendity plot with less tahn…
guillaume-vignal Jun 13, 2024
57f90ea
fix: test on contribution plot
guillaume-vignal Jun 13, 2024
4a82275
code optimization
guillaume-vignal Jun 24, 2024
d046410
Merge branch 'MAIF:master' into feature/contribution_plot_improvment
guillaume-vignal Jun 24, 2024
e9e2337
Merge branch 'master' of https://github.com/guillaume-vignal/shapash …
guillaume-vignal Jun 24, 2024
a2cb295
handle nan values in KernelDensity and prediction with mostly the sam…
guillaume-vignal Jun 27, 2024
41750d6
Merge branch 'feature/contribution_plot_improvment' of https://github…
guillaume-vignal Jun 27, 2024
75228f9
fix: webapp where a float column has infinite value in it
guillaume-vignal Jul 1, 2024
7e36db5
improve quantile selection in prediction_regression_plot
guillaume-vignal Jul 2, 2024
786b5ff
Delete nan values inplace of replacing them by the median
guillaume-vignal Jul 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified docs/_static/tutorial/tuto-webapp01-additional_filtered.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/tutorial/tuto-webapp01-additional_in_dataset.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/tutorial/tuto-webapp01-additional_picking.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion shapash/explainer/consistency.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ def tuning_colorscale(self, values):
Parameters
----------
values: 1 column pd.DataFrame
values ​​whose quantiles must be calculated
values whose quantiles must be calculated
"""
desc_df = values.describe(percentiles=np.arange(0.1, 1, 0.1).tolist())
min_pred, max_init = list(desc_df.loc[["min", "max"]].values)
Expand Down
905 changes: 697 additions & 208 deletions shapash/explainer/smart_plotter.py

Large diffs are not rendered by default.

6 changes: 5 additions & 1 deletion shapash/report/project_report.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ class ProjectReport:
Information about the project (author, description, ...).
x_train : pd.DataFrame
DataFrame used for training the model.
y_test : pd.Series or pd.DataFrame
y_train : pd.Series or pd.DataFrame
Series of labels in the train set.
y_test : pd.Series or pd.DataFrame
Series of labels in the test set.
Expand Down Expand Up @@ -393,6 +393,10 @@ def display_model_explainability(self):
for feature_label in sorted(list_cols_labels):
feature = self.explainer.inv_features_dict.get(feature_label, feature_label)
fig = self.explainer.plot.contribution_plot(feature, label=label, max_points=200)
# Apparently matkers are not supported during conversion into html
for el in fig.data:
if el.type == "bar":
el.marker.color = "lightgrey"
explain_contrib_data.append(
{
"feature_index": int(inv_columns_dict[feature]),
Expand Down
2 changes: 2 additions & 0 deletions shapash/style/colors.json
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
"rgb(0, 98, 128)",
"rgb(0, 70, 92)"
],
"contrib_distribution": "rgb(211, 211, 211)",
"featureimp_bar": {
"1": "rgba(0, 154, 203, 1)",
"2": "rgba(223, 103, 0, 0.8)"
Expand Down Expand Up @@ -126,6 +127,7 @@
"rgb(255, 123, 38)",
"rgb(255, 77, 7)"
],
"contrib_distribution": "rgb(211, 211, 211)",
"featureimp_bar": {
"1": "rgba(244, 192, 0, 1.0)",
"2": "rgba(52, 55, 54, 0.7)"
Expand Down
1 change: 1 addition & 0 deletions shapash/style/style_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ def define_style(palette):
}
style_dict["featureimp_groups"] = list(palette["featureimp_groups"].values())
style_dict["init_contrib_colorscale"] = palette["contrib_colorscale"]
style_dict["contrib_distribution"] = palette["contrib_distribution"]
style_dict["violin_area_classif"] = list(palette["violin_area_classif"].values())
style_dict["prediction_plot"] = list(palette["prediction_plot"].values())
style_dict["violin_default"] = palette["violin_default"]
Expand Down
2 changes: 1 addition & 1 deletion shapash/utils/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ def predict_error(y_target, y_pred, case):
"""
prediction_error = None
if y_target is not None and y_pred is not None and case == "regression":
if (y_target == 0).any()[0]:
if (y_target == 0).any().iloc[0]:
prediction_error = abs(y_target.values - y_pred.values)
else:
prediction_error = abs((y_target.values - y_pred.values) / y_target.values)
Expand Down
23 changes: 13 additions & 10 deletions shapash/webapp/smart_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import copy
import random
import re
from math import log10
from math import isfinite, log10

import dash
import dash_bootstrap_components as dbc
Expand Down Expand Up @@ -193,7 +193,7 @@ def init_data(self, rows=None):
typ = self.dataframe[col].dtype
if typ == float:
std = self.dataframe[col].std()
if std != 0:
if isfinite(std) and std != 0:
digit = max(round(log10(1 / std) + 1) + 2, 0)
self.round_dataframe[col] = self.dataframe[col].map(f"{{:.{digit}f}}".format).astype(float)

Expand Down Expand Up @@ -1778,7 +1778,7 @@ def update_feature_selector(feature, data, label, click_zoom, points, violin, gf
if feature is not None and feature["points"][0]["curveNumber"] == 0 and len(gfi_figure["data"]) == 2:
subset = get_indexes_from_datatable(data, list_index)
else:
subset = None
subset = self.list_index

fs_figure = self.explainer.plot.contribution_plot(
col=selected_feature,
Expand Down Expand Up @@ -1834,13 +1834,16 @@ def update_index_id(
"""
ctx = dash.callback_context
selected = None
if ctx.triggered[0]["prop_id"] == "feature_selector.clickData":
selected = click_data["points"][0]["customdata"][1]
elif ctx.triggered[0]["prop_id"] == "prediction_picking.clickData":
selected = prediction_picking["points"][0]["customdata"]
elif ctx.triggered[0]["prop_id"] == "dataset.active_cell":
selected = data[cell["row"]]["_index_"]
elif ("del_dropdown_button" in ctx.triggered[0]["prop_id"]) & (None in nclicks_del):
try:
if ctx.triggered[0]["prop_id"] == "feature_selector.clickData":
selected = click_data["points"][0]["customdata"][1]
elif ctx.triggered[0]["prop_id"] == "prediction_picking.clickData":
selected = prediction_picking["points"][0]["customdata"]
elif ctx.triggered[0]["prop_id"] == "dataset.active_cell":
selected = data[cell["row"]]["_index_"]
elif ("del_dropdown_button" in ctx.triggered[0]["prop_id"]) & (None in nclicks_del):
selected = current_index_id
except KeyError:
selected = current_index_id
return selected, True

Expand Down
5 changes: 4 additions & 1 deletion shapash/webapp/utils/explanations.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,10 @@ def __init__(self):
feature positively impacts the prediction. \n
Positive impact means that the variable favors a higher probability
returned by the model or
increases the predicted value (in case of regression problem).
increases the predicted value (in case of regression problem).\n
In gray, the distribution of feature values is represented, either
by a curve if the values are considered continuous or by bars if
they are considered discrete.
"""
self.prediction_picking = """
**What are the samples with correct or wrong predictions?**
Expand Down
3 changes: 2 additions & 1 deletion shapash/webapp/utils/utils.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import pandas as pd
from pandas.api.types import is_any_real_numeric_dtype


def round_to_k(x, k):
Expand Down Expand Up @@ -37,7 +38,7 @@ def get_index_type(data):
str
Type numeric or text of the dataset index
"""
if data.index.is_numeric():
if is_any_real_numeric_dtype(data.index):
return "number"
else:
return "text"
Expand Down
Loading