Base Tests
mercury.explainability.explainers
run_until_timeout(timeout, fn, *args, **kwargs)
Timeout function to stop the execution in case it takes longer than the timeout argument. After timeout seconds it will raise an exception.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timeout |
Number of seconds until the Exception is raised. |
required | |
fn |
Function to execute. |
required | |
*args |
args of the function fn. |
()
|
|
**kwargs |
keyword args passed to fn. |
{}
|
Example
explanation = run_until_timeout(timeout, explainer.explain, data=data)
Source code in mercury/explainability/explainers/__init__.py
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
ale
ALEExplainer(predictor, target_names)
Bases: Explainer
, MercuryExplainer
Accumulated Local Effects for tabular datasets. Current implementation supports first order feature effects of numerical features.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
predictor |
Callable
|
A callable that takes in an NxF array as input and outputs an NxT array (N - number of data points, F - number of features, T - number of outputs/targets (e.g. 1 for single output regression, >=2 for classification). |
required |
target_names |
Union[List[str], str]
|
A list of target/output names used for displaying results. |
required |
Source code in mercury/explainability/explainers/ale.py
33 34 35 36 37 38 39 40 41 42 43 44 |
|
explain(X, min_bin_points=4, ignore_features=[])
Calculate the ALE curves for each feature with respect to the dataset X
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
DataFrame
|
An NxF tabular dataset used to calculate the ALE curves. This is typically the training dataset or a representative sample. |
required |
min_bin_points |
int
|
Minimum number of points each discretized interval should contain to ensure more precise ALE estimation. |
4
|
ignore_features |
list
|
Features that will be ignored while computing the ALE curves. Useful for reducing computing time if there are predictors we dont care about. |
[]
|
Returns:
Type | Description |
---|---|
Explanation
|
An |
Source code in mercury/explainability/explainers/ale.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
|
save(filename)
Overwrite to ensure that we use MercuryExplainer.save
Source code in mercury/explainability/explainers/ale.py
140 141 142 |
|
plot_ale(exp, features='all', targets='all', n_cols=3, sharey='all', constant=False, ax=None, line_kw=None, fig_kw=None)
Plot ALE curves on matplotlib axes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exp |
Explanation
|
An |
required |
features |
Union[List[Union[int, str]], str]
|
A list of features for which to plot the ALE curves or |
'all'
|
targets |
Union[List[Union[int, str]], str]
|
A list of targets for which to plot the ALE curves or |
'all'
|
n_cols |
int
|
Number of columns to organize the resulting plot into. |
3
|
sharey |
str
|
A parameter specifying whether the y-axis of the ALE curves should be on the same scale
for several features. Possible values are |
'all'
|
constant |
bool
|
A parameter specifying whether the constant zeroth order effects should be added to the ALE first order effects. |
False
|
ax |
Union[Axes, ndarray, None]
|
A |
None
|
line_kw |
Optional[dict]
|
Keyword arguments passed to the |
None
|
fig_kw |
Optional[dict]
|
Keyword arguments passed to the |
None
|
Returns:
Type | Description |
---|---|
ndarray
|
An array of matplotlib axes with the resulting ALE plots. |
Source code in mercury/explainability/explainers/ale.py
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 |
|
anchors
AnchorsWithImportanceExplainer(predict_fn, train_data, categorical_names={}, disc_perc=(25, 50, 75), *args, **kwargs)
Bases: AnchorTabular
, MercuryExplainer
Extending Alibi's AnchorsTabular Implementation, this module allows for the computation of feature importance by means of calculating several anchors. Initialize the anchor tabular explainer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
predict_fn |
Callable
|
Model prediction function |
required |
train_data |
DataFrame
|
Pandas Dataframe with the features |
required |
disc_perc |
Tuple[Union[int, float], ...]
|
List or tuple with percentiles (int) used for discretization. |
(25, 50, 75)
|
categorical_names |
Dict[str, List]
|
Dictionary where keys are feature columns and values are the categories for the feature |
{}
|
Raises:
Type | Description |
---|---|
AttributeError
|
if categorical_names is not a dict |
AttributeError
|
if train_data is not a pd.DataFrame |
Example
>>> explain_data = pd.read_csv('./test/explain_data.csv')
>>> model = MyModel() # (Trained) model prediction function (has be callable)
>>> explainer = AnchorsWithImportanceExplainer(model, explain_data)
>>> explanation = explainer.explain(explain_data.head(10).values) # For the first 10 samples
>>> explanation.interpret_explanations(n_important_features=2)
# We can also get the feature importances for the first 10 samples.
>>> anchorsExtendedExplainer.get_feature_importance(explain_data=explain_data.head(10))
Source code in mercury/explainability/explainers/anchors.py
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
|
explain(X, threshold=0.95, delta=0.1, tau=0.15, batch_size=100, coverage_samples=10000, beam_size=1, stop_on_first=False, max_anchor_size=None, min_samples_start=100, n_covered_ex=10, binary_cache_size=10000, cache_margin=1000, verbose=False, verbose_every=1, **kwargs)
Explain prediction made by classifier on instance X
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
ndarray
|
Instance to be explained. |
required |
threshold |
float
|
Minimum precision threshold. |
0.95
|
delta |
float
|
Used to compute |
0.1
|
tau |
float
|
Margin between lower confidence bound and minimum precision or upper bound. |
0.15
|
batch_size |
int
|
Batch size used for sampling. |
100
|
coverage_samples |
int
|
Number of samples used to estimate coverage from during result search. |
10000
|
beam_size |
int
|
The number of anchors extended at each step of new anchors construction. |
1
|
stop_on_first |
bool
|
If |
False
|
max_anchor_size |
Optional[int]
|
Maximum number of features in result. |
None
|
min_samples_start |
int
|
Min number of initial samples. |
100
|
n_covered_ex |
int
|
How many examples where anchors apply to store for each anchor sampled during search
(both examples where prediction on samples agrees/disagrees with |
10
|
binary_cache_size |
int
|
The result search pre-allocates |
10000
|
cache_margin |
int
|
When only |
1000
|
verbose |
bool
|
Display updates during the anchor search iterations. |
False
|
verbose_every |
int
|
Frequency of displayed iterations during anchor search process. |
1
|
Returns:
Type | Description |
---|---|
Explanation
|
explanation
|
Source code in mercury/explainability/explainers/anchors.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
|
get_feature_importance(explain_data, threshold=0.95, print_every=0, print_explanations=False, n_important_features=3, tau=0.15, timeout=0)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
explain_data |
DataFrame
|
Pandas dataframe containing all the instances for which to find an anchor and therefore obtain feature importances. |
required |
threshold |
float
|
To be used in and passed down to the anchor explainer as defined on Alibi's documentation. Controls the minimum precision desired when looking for anchors. Defaults to 0.95. |
0.95
|
print_every |
int
|
Logging information. Defaults to 0 - No logging |
0
|
print_explanations |
bool
|
Boolean that determines whether to print the explanations at the end of the method or not. Defaults to False. |
False
|
n_important_features |
int
|
Number of top features that will be printed. Defaults to 3. |
3
|
tau |
float
|
To be used in and passed down to the anchos explainer as defined on Alibi's documentation. Used within the multi-armed bandit part of the optimisation problem. Defaults to 0.15 |
0.15
|
timeout |
int
|
Maximum time to be spent looking for an Anchor in seconds. A value of 0 means that no timeout is set. Defaults to 0. |
0
|
Returns:
Type | Description |
---|---|
AnchorsWithImportanceExplanation
|
A list containing all the explanations. |
Source code in mercury/explainability/explainers/anchors.py
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 |
|
save(filename)
Overwrite to ensure that we use MercuryExplainer.save
Source code in mercury/explainability/explainers/anchors.py
251 252 253 |
|
translate(explanation)
Translates an explanation into simple words
Parameters:
Name | Type | Description | Default |
---|---|---|---|
explanation |
Explanation
|
Alibi explanation object |
required |
Source code in mercury/explainability/explainers/anchors.py
224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 |
|
cf_strategies
Optimization strategies for simple counterfactual explanations.
Backtracking(state, bounds, step, fn, class_idx=1, threshold=0.0, kernel=None, report=False, keep_explored_points=True)
Bases: Strategy
Backtracking strategy.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state |
ndarray
|
Initial state (initial starting point). |
required |
bounds |
ndarray
|
Bounds to be used when moving around the probability space defined by |
required |
step |
ndarray
|
Step size values to be used when moving around the probability space defined by |
required |
fn |
Callable[[Union[ndarray, DataFrame]], ndarray]
|
Classifier |
required |
class_idx |
int
|
Class to be explained (e.g. 1 for binary classifiers). Default value is 1. |
1
|
threshold |
float
|
Probability to be achieved (if path is found). Default value is 0.0. |
0.0
|
kernel |
Optional[ndarray]
|
Used to penalize certain dimensions when trying to move around the probability space (some dimensions may be more difficult to explain, hence don't move along them). |
None
|
report |
bool
|
Whether to display probability updates during space search. |
False
|
keep_explored_points |
bool
|
Whether to keep the points that the algorithm explores. Setting it to False will decrease the computation time and memory usage in some cases. Default value is True. |
True
|
Source code in mercury/explainability/explainers/cf_strategies.py
229 230 231 232 233 234 235 236 237 238 239 240 241 |
|
run(*args, **kwargs)
Kick off the backtracking.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
**kwargs |
Backtracking specific arguments - max_iter: max number of iterations |
{}
|
Returns:
Name | Type | Description |
---|---|---|
result |
Tuple[ndarray, float, ndarray, Optional[ndarray]]
|
Tuple containing found solution, probability achieved, points visited and corresponding energies (i.e. probabilities). |
Source code in mercury/explainability/explainers/cf_strategies.py
243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 |
|
SimulatedAnnealing(state, bounds, step, fn, class_idx=1, threshold=0.0, kernel=None, report=False)
Bases: Strategy
, Annealer
Simulated Annealing strategy.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state |
ndarray
|
Initial state (initial starting point). |
required |
bounds |
ndarray
|
Bounds to be used when moving around the probability space defined by |
required |
step |
ndarray
|
Step size values to be used when moving around the probability space defined by |
required |
fn |
Callable[[Union[ndarray, DataFrame]], ndarray]
|
Classifier |
required |
class_idx |
int
|
Class to be explained (e.g. 1 for binary classifiers). Default value is 1. |
1
|
threshold |
float
|
Probability to be achieved (if path is found). Default value is 0.0. |
0.0
|
kernel |
Optional[ndarray]
|
Used to penalize certain dimensions when trying to move around the probability space (some dimensions may be more difficult to explain, hence don't move along them). |
None
|
report |
bool
|
Whether to display probability updates during space search. |
False
|
Source code in mercury/explainability/explainers/cf_strategies.py
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
|
best_solution(n=3)
Returns the n best solutions found during the Simulated Annealing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n |
int
|
Number of solutions to be retrieved. Default value is 3. |
3
|
Source code in mercury/explainability/explainers/cf_strategies.py
136 137 138 139 140 141 142 143 144 145 146 147 |
|
energy()
Energy step in Simulated Annealing.
Source code in mercury/explainability/explainers/cf_strategies.py
122 123 124 125 126 127 128 129 130 131 132 133 134 |
|
move()
Move step in Simulated Annealing.
Source code in mercury/explainability/explainers/cf_strategies.py
111 112 113 114 115 116 117 118 119 120 |
|
run(*args, **kwargs)
Kick off Simulated Annealing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
**kargs |
Simulated Annealing specific arguments - tmin: min temperature - tmax: max temperature - steps: number of iterations |
required |
Returns:
Name | Type | Description |
---|---|---|
result |
Tuple[ndarray, float, ndarray, Optional[ndarray]]
|
Tuple containing found solution, probability achieved, points visited and corresponding energies (i.e. probabilities). |
Source code in mercury/explainability/explainers/cf_strategies.py
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 |
|
Strategy(state, bounds, step, fn, class_idx=1, threshold=0.0, kernel=None, report=False)
Base class for explanation strategies.
Source code in mercury/explainability/explainers/cf_strategies.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
|
clustering_tree_explainer
ClusteringTreeExplainer(clustering_model, max_leaves, verbose=0, base_tree='IMM', n_jobs=None, random_state=None)
Bases: MercuryExplainer
ClusteringTreeExplainer explains a clustering model using a DecisionTree.
It is based on the method Iterative Mistake Minimization (IMM). The high-level idea is to find build a decision tree with the same number of leaves as the number of clusters. The tree is build by using the predicted clusters of a dataset using some previously fitted clustering model (like K-means) and fitting the decision tree using the clusters as labels. At each step, the a node with containing two or more of reference centres is split so the resulting split sends at least one reference centre to each side and moreover produces the fewest mistakes: that is, separates the minimum points from their corresponding centres.
There is also de option to create a decision tree with a higher number of leaves than clusters. This is based on the ExKMC method, which is an extension of the IMM algorithm. The goal in this case is to achieve fewer mistakes in the resulting decision tree, with the trade-off that it will be less explainable. You can see more details of the methods in the referenced papers below
In this implementation, the clustering solution can be created before using the ClusteringTreeExplainer. Otherwise, a k-means with default parameters is created before fitting the decision tree.
References
"Explainable k-Means and k-Medians Clustering": (http://proceedings.mlr.press/v119/moshkovitz20a/moshkovitz20a.pdf) "ExKMC: Expanding Explainable k-Means Clustering": (https://arxiv.org/pdf/2006.02399.pdf)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
clustering_model |
Union[BaseEstimator, Model]
|
The clustering model fitted. It supports sklearn and pyspark.
When using sklearn, the model must be a sklearn |
required |
k |
number of clusters |
required | |
max_leaves |
int
|
the maximum number of leaves. If max_leaves == k, then the method is the Iterative Mistake Minimization (IMM). If max_leaves > k, then the method to expand the tree further than k leaves is ExKMC. It cannot be max_leaves < k |
required |
verbose |
int
|
whether to show some messages during the process. If 0, it doesn't show any message. If >=1 show messages. |
0
|
base_tree |
str
|
method to use to build the tree. If |
'IMM'
|
n_jobs |
int
|
number of jobs |
None
|
random_state |
int
|
seed to use |
None
|
Example
>>> from mercury.explainability.explainers.clustering_tree_explainer import ClusteringTreeExplainer
>>> from sklearn.cluster import KMeans
>>> k = 4
>>> kmeans = KMeans(k, random_state=42)
>>> kmeans.fit(df)
>>> clustering_tree_explainer = ClusteringTreeExplainer(clustering_model=kmeans, k=k, max_leaves=k)
>>> explanation = clustering_tree_explainer.explain(df)
>>> plot_explanation = explanation.plot(filename="explanation")
>>> plot_explanation
Source code in mercury/explainability/explainers/clustering_tree_explainer.py
355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 |
|
explain(X, subsample=None)
Create explanation for clustering algorithm.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
Union[DataFrame, DataFrame]
|
inputs of the data that we are clustering |
required |
subsample |
float
|
percentage of |
None
|
Returns:
Type | Description |
---|---|
ClusteringTreeExplanation
|
ClusteringTreeExplanation object, which contains plot() method to display the built decision tree |
Source code in mercury/explainability/explainers/clustering_tree_explainer.py
381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 |
|
score(X)
Return the k-means cost of X. The k-means cost is the sum of squared distances of each point to the mean of points associated with the cluster. Args: X: The input samples. Returns: k-means cost of X.
Source code in mercury/explainability/explainers/clustering_tree_explainer.py
465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 |
|
surrogate_score(X)
Return the k-means surrogate cost of X. The k-means surrogate cost is the sum of squared distances of each point to the closest center of the kmeans given (or trained) in the fit method. k-means surrogate cost > k-means cost, as k-means cost is computed with respect to the optimal centers. Args: X: The input samples. Returns: k-means surrogate cost of X.
Source code in mercury/explainability/explainers/clustering_tree_explainer.py
484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 |
|
Tree(base_tree, k, max_leaves, all_centers, verbose, n_jobs)
Source code in mercury/explainability/explainers/clustering_tree_explainer.py
40 41 42 43 44 45 46 47 48 |
|
__max_depth__(node)
Return the depth of the subtree rooted by node.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node |
Node
|
root of a subtree. |
required |
Returns:
Type | Description |
---|---|
The depth of the subtree rooted by node. |
Source code in mercury/explainability/explainers/clustering_tree_explainer.py
173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 |
|
__size__(node)
Return the number of nodes in the subtree rooted by node. Args: node: root of a subtree. Returns the number of nodes in the subtree rooted by node.
Source code in mercury/explainability/explainers/clustering_tree_explainer.py
152 153 154 155 156 157 158 159 160 161 162 163 164 165 |
|
predict(X)
Predict clusters for X. X: The input samples. Returns: The predicted clusters.
Source code in mercury/explainability/explainers/clustering_tree_explainer.py
126 127 128 129 130 131 132 133 134 |
|
counter_fact_basic
CounterFactualExplainerBasic(train, fn, labels=[], bounds=None, n_steps=200)
Bases: MercuryExplainer
Explains predictions on tabular (i.e. matrix) data for binary/multiclass classifiers. Currently two main strategies are implemented: one following a backtracking strategy and another following a probabilistic process (simulated annealing strategy).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train |
Union[ndarray, DataFrame]
|
Training dataset to extract feature bounds from. |
required |
fn |
Callable[[Union[ndarray, DataFrame]], Union[float, ndarray]]
|
Classifier |
required |
labels |
List[str]
|
List of labels to be used when plotting results. If DataFrame used, labels take dataframe column names. Default is empty list. |
[]
|
bounds |
Optional[ndarray]
|
Feature bounds used when no train data is provided (shape must match labels'). Default is None. |
None
|
n_steps |
int
|
Parameter used to indicate how small/large steps should be when exploring the space (default is 200). |
200
|
Raises:
Type | Description |
---|---|
AssertionError
|
if bounds.size <= 0 when no train data is provided | if bounds.ndim != 2 when no train data is provided | if bounds.shape[1] != 2 when no train data is provided | if bounds.shape[0] != len(labels) |
TypeError
|
if train is not a DataFrame or numpy array. |
Source code in mercury/explainability/explainers/counter_fact_basic.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
|
explain(from_, threshold, class_idx=1, kernel=None, bounds=None, step=None, strategy='backtracking', report=False, keep_explored_points=True, **kwargs)
Roll the panellet down the valley and find an explanation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
from_ |
ndarray
|
Starting point. |
required |
threshold |
float
|
Probability to be achieved (if path is found). |
required |
class_idx |
int
|
Class to be explained (e.g. 1 for binary classifiers). |
1
|
kernel |
Optional[ndarray]
|
Used to penalize certain dimensions when trying to move around the probability space (some dimensions may be more difficult to explain, hence don't move along them). Default is np.ones(n), meaning all dimensions can be used to move around the space (must be a value between 0 and 1). |
None
|
bounds |
Optional[ndarray]
|
Feature bound values to be used when exploring the probability space. If not specified, the ones extracted from the training data are used instead. |
None
|
step |
Optional[ndarray]
|
Step values to be used when moving around the probability space. If not specified, training bounds are divided by 200 (arbitrary value) and these are used as step value. |
None
|
strategy |
str
|
If 'backtracking', the backtracking strategy is used to move around the probability space. If 'simanneal', the simulated annealing strategy is used to move around the probability space. |
'backtracking'
|
report |
bool
|
Whether to report the algorithm progress during the execution. |
False
|
keep_explored_points |
bool
|
Whether to keep the points that the algorithm explores. Setting it to False will decrease the computation time and memory usage in some cases. Default value is True. |
True
|
Raises:
Type | Description |
---|---|
AssertionError
|
If |
ValueError
|
if strategy is not 'backtacking' or 'simanneal'. |
Returns:
Name | Type | Description |
---|---|---|
explanation |
CounterfactualBasicExplanation
|
CounterfactualBasicExplanation with the solution found and how it differs from the starting point. |
Source code in mercury/explainability/explainers/counter_fact_basic.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 |
|
counter_fact_importance
CounterFactualExplainerBase(feature_names, drop_features=[])
Bases: ABC
Abstract base class for CounterFactual explainers which hold common functionalities. You should extend this if you are implementing a custom counter factual explainer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
feature_names |
List[str]
|
List of names of the features used in the model. |
required |
drop_features |
List[Any]
|
List with the names, as specified in feature_names, or with the indexes of the variables to leave out while looking for explanations. |
[]
|
Source code in mercury/explainability/explainers/counter_fact_importance.py
33 34 35 36 37 38 39 |
|
get_feature_importance(explain_data, n_important_features=3, print_every=5, threshold_probability=0.5, threshold_incr=0.01, init_threshold=0.1, max_iterations=10000, get_report=True)
This method computes the feature importance for a set of observations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
explain_data |
DataFrame
|
Pandas DataFrame containing all the instances for which to find an anchor and therefore obtain feature importances. |
required |
n_important_features |
int
|
Number of top features that will be printed. Defaults to 3. |
3
|
print_every |
int
|
Logging information (default=5). |
5
|
threshold_incr |
float
|
The increment of the threshold to be used in the regularizer so as to bring the differences between a counterfactual instance and the original one to zero. |
0.01
|
init_threshold |
float
|
The initial and minimum value to be used in the regularizer to bring the differences between a counterfactual instance and the original one to zero. |
0.1
|
max_iterations |
int
|
Maximum number of iterations for the regularizer. This parameter gets pass down to the regularize_counterfactual method. |
10000
|
get_report |
bool
|
Boolean determining whether to print the explanation or not. |
True
|
Returns:
Type | Description |
---|---|
CounterfactualWithImportanceExplanation
|
A CounterfactualExtendedExplanation object that contains all explanations as well as |
CounterfactualWithImportanceExplanation
|
their interpretation. |
Source code in mercury/explainability/explainers/counter_fact_importance.py
165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 |
|
CounterfactualExplainer(predict_fn, feature_names, shape=None, drop_features=[], distance_fn='l1', target_proba=1.0, target_class='other', max_iter=1000, early_stop=50, lam_init=0.1, max_lam_steps=10, tol=0.05, learning_rate_init=0.1, feature_range=(-10000000000.0, 10000000000.0), eps=0.01, init='identity', decay=True, write_dir=None, debug=False, sess=None)
Bases: CounterFactualExplainerBase
Backed by Alibi's CounterFactual, this class extends its functionality to allow for the computation of Feature Importances by means of the computation of several Counterfactuals.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
predict_fn |
Callable[[Union[ndarray, DataFrame]], Union[float, ndarray]]
|
Model prediction function. This function should return the probabilities of belonging to each class. |
required |
feature_names |
List[str]
|
List of names of the features used in the model. |
required |
shape |
Tuple[Any]
|
Shape of input data starting with batch size. By default it is inferred from feature_names. |
None
|
drop_features |
List[Any]
|
List with the names, as specified in feature_names, or with the indexes of the variables to leave out while looking for explanations (in get_feature_importance method). |
[]
|
distance_fn |
str
|
Distance function to use in the loss term |
'l1'
|
target_proba |
float
|
Target probability for the counterfactual to reach |
1.0
|
target_class |
Union[str, int]
|
Target class for the counterfactual to reach, one of 'other', 'same' or an integer denoting desired class membership for the counterfactual instance |
'other'
|
max_iter |
int
|
Maximum number of interations to run the gradient descent for (inner loop) |
1000
|
early_stop |
int
|
Number of steps after which to terminate gradient descent if all or none of found instances are solutions |
50
|
lam_init |
float
|
Initial regularization constant for the prediction part of the Wachter loss |
0.1
|
max_lam_steps |
int
|
Maximum number of times to adjust the regularization constant (outer loop) before terminating the search |
10
|
tol |
float
|
Tolerance for the counterfactual target probability |
0.05
|
learning_rate_init |
float
|
Initial learning rate for each outer loop of lambda |
0.1
|
feature_range |
Union[Tuple[Any], str]
|
Tuple with min and max ranges to allow for perturbed instances. Min and max ranges can be floats or numpy arrays with dimension (1 x nb of features) for feature-wise ranges |
(-10000000000.0, 10000000000.0)
|
eps |
Union[float, ndarray]
|
Gradient step sizes used in calculating numerical gradients, defaults to a single value for all features, but can be passed an array for feature-wise step sizes |
0.01
|
init |
str
|
Initialization method for the search of counterfactuals, currently must be 'identity' |
'identity'
|
decay |
bool
|
Flag to decay learning rate to zero for each outer loop over lambda |
True
|
write_dir |
str
|
Directory to write Tensorboard files to |
None
|
debug |
bool
|
Flag to write Tensorboard summaries for debugging |
False
|
sess |
Session
|
Optional Tensorflow session that will be used if passed instead of creating or inferring one internally |
None
|
Source code in mercury/explainability/explainers/counter_fact_importance.py
371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 |
|
explain(explain_data)
This method serves two purposes. Firstly as an interface between the user and the CounterFactual class explain method and secondly, if there are features that shouldn't be taken into account, the method recomputes feature_range to account for this and reinstantiates the parent class (CounterFactual). Explain an instance and return the counterfactual with metadata.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
explain_data |
ndarray
|
Instance to be explained |
required |
Returns:
Type | Description |
---|---|
Explanation
|
Explanation object as specified in Alibi's original explain method. |
Source code in mercury/explainability/explainers/counter_fact_importance.py
429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 |
|
CounterfactualProtoExplainer(predict_fn, train_data, feature_names=None, shape=None, drop_features=[], kappa=0.0, beta=0.1, feature_range=None, gamma=0.0, ae_model=None, enc_model=None, theta=5.0, cat_vars=None, ohe=False, use_kdtree=False, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(-1000, 1000), update_num_grad=1, write_dir=None, sess=None, trustscore_kwargs=None, d_type='abdm', w=None, disc_perc=[25, 50, 75], standardize_cat_vars=False, smooth=1.0, center=True, update_feature_range=True)
Bases: CounterFactualExplainerBase
Backed by Alibi's CounterFactualProto, this class extends its functionality to allow for the computation of Feature Importances by means of the computation of several Counterfactuals.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
predict_fn |
Callable[[Union[ndarray, DataFrame]], Union[float, ndarray]]
|
Model prediction function. This function should return the probabilities of belonging to each class. |
required |
train_data |
Union[ndarray, DataFrame]
|
Numpy array or Pandas dataframe with the training examples |
required |
feature_names |
List[str]
|
List of names of the features used in the model. Will be inferred by default from train_data |
None
|
shape |
Tuple[Any]
|
Shape of input data starting with batch size. Will be inferred by default from train_data |
None
|
drop_features |
List[Any]
|
List with the names, as specified in feature_names, or with the indexes of the variables to leave out while looking for explanations. |
[]
|
beta |
float
|
Regularization constant for L1 loss term |
0.1
|
kappa |
float
|
Confidence parameter for the attack loss term |
0.0
|
feature_range |
Union[Tuple[Any], str]
|
Tuple with min and max ranges to allow for perturbed instances. Min and max ranges can be floats or numpy arrays with dimension (1x nb of features) for feature-wise ranges. Will be inferred by default from train_data |
None
|
gamma |
float
|
Regularization constant for optional auto-encoder loss term |
0.0
|
ae_model |
Model
|
Optional auto-encoder model used for loss regularization |
None
|
enc_model |
Model
|
Optional encoder model used to guide instance perturbations towards a class prototype |
None
|
theta |
float
|
Constant for the prototype search loss term. Default is 5. Set it to zero to disable it. |
5.0
|
cat_vars |
dict
|
Dict with as keys the categorical columns and as values the number of categories per categorical variable. |
None
|
ohe |
bool
|
Whether the categorical variables are one-hot encoded (OHE) or not. If not OHE, they are assumed to have ordinal encodings. |
False
|
use_kdtree |
bool
|
Whether to use k-d trees for the prototype loss term if no encoder is available |
False
|
learning_rate_init |
float
|
Initial learning rate of optimizer |
0.01
|
max_iterations |
int
|
Maximum number of iterations for finding a counterfactual |
1000
|
c_init |
float
|
Initial value to scale the attack loss term. If the computation shall be fastened up, this parameter should take a small value (0 or 1). |
10.0
|
c_steps |
int
|
Number of iterations to adjust the constant scaling the attack loss term. If the computation shall be fastened up this parameter should take a somewhat small value (between 1 and 5). |
10
|
eps |
Union[float, ndarray]
|
If numerical gradients are used to compute dL/dx = (dL/dp) * (dp/dx), then eps[0] is used to calculate dL/dp and eps[1] is used for dp/dx. eps[0] and eps[1] can be a combination of float values and numpy arrays. For eps[0], the array dimension should be (1x nb of prediction categories) and for eps[1] it should be (1x nb of features) |
(0.001, 0.001)
|
clip |
Tuple[float]
|
Tuple with min and max clip ranges for both the numerical gradients and the gradients obtained from the TensorFlow graph |
(-1000, 1000)
|
update_num_grad |
int
|
If numerical gradients are used, they will be updated every update_num_grad iterations. If the computation shall be fastened up this parameter should take a somewhat large value (between 100 and 250). |
1
|
update_num_grad |
int
|
If numerical gradients are used, they will be updated every update_num_grad iterations |
1
|
write_dir |
str
|
Directory to write tensorboard files to |
None
|
sess |
Session
|
Optional Tensorflow session that will be used if passed instead of creating or inferring one internally |
None
|
trustscore_kwargs |
dict
|
keyword arguments for the trust score object used to define the k-d trees for each class. (See original alibi) counterfactual guided by prototypes docs. |
None
|
d_type |
str
|
Distance metric. Options are "abdm", "mvdm" or "abdm-mvdm". |
'abdm'
|
w |
float
|
If the combined metric "abdm-mvdm" is used, w is the weight (between 0 and 1) given to abdm. |
None
|
disc_perc |
Union[List[Union[int, float]], Tuple[Union[int, float]]]
|
List with percentiles (int) used for discretization |
[25, 50, 75]
|
standardize_cat_vars |
bool
|
Whether to return the standardized values for the numerical distances of each categorical feature. |
False
|
smooth |
float
|
if the difference in the distances between the categorical variables is too large, then a lower value of the smooth argument (0, 1) can smoothen out this difference. This would only be relevant if one categorical variable has significantly larger differences between its categories than others. As a result, the counterfactual search process will likely leave that categorical variable unchanged. |
1.0
|
center |
bool
|
Whether to center the numerical distances of the categorical variables between the min and max feature ranges. |
True
|
update_feature_range |
bool
|
whether to update the feature_range parameter for the categorical variables based on the min and max values it computed in the constructor. |
True
|
Source code in mercury/explainability/explainers/counter_fact_importance.py
540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 |
|
explain(explain_data, Y=None, target_class=None, k=None, k_type='mean', threshold=0.0, verbose=False, print_every=100, log_every=100)
This method serves three purposes. Firstly as an interface between the user and the CounterFactualProto class explain method, secondly, if there are features that shouldn't be taken into account, the method recomputes feature_range to account for this and reinstantiates the parent class (CounterFactualProto) and thridly, if a new CounterFactualProto instance has been created it's fitted. Explain instance and return counterfactual with metadata.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
Instances to explain |
required | |
Y |
ndarray
|
Labels for X as one-hot-encoding |
None
|
target_class |
list
|
List with target classes used to find closest prototype. If None, the nearest prototype except for the predict class on the instance is used. |
None
|
k |
int
|
Number of nearest instances used to define the prototype for a class. Defaults to using all instances belonging to the class if an encoder is used and to 1 for k-d trees. |
None
|
k_type |
str
|
Use either the average encoding of the k nearest instances in a class (k_type='mean') or the k-nearest encoding in the class (k_type='point') to define the prototype of that class. Only relevant if an encoder is used to define the prototypes. |
'mean'
|
threshold |
float
|
Threshold level for the ratio between the distance of the counterfactual to the prototype of the predicted class for the original instance over the distance to the prototype of the predicted class for the counterfactual. If the trust score is below the threshold, the proposed counterfactual does not meet the requirements. |
0.0
|
verbose |
bool
|
Print intermediate results of optimization if True |
False
|
print_every |
int
|
Print frequency if verbose is True |
100
|
log_every |
int
|
Tensorboard log frequency if write directory is specified |
100
|
Returns:
Type | Description |
---|---|
Explanation
|
explanation, Dictionary containing the counterfactual with additional metadata |
Source code in mercury/explainability/explainers/counter_fact_importance.py
628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 |
|
explainer
MercuryExplainer
Bases: ABC
load(filename='explainer.pkl')
classmethod
Loads a previosly saved explainer with its internal state to a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filename |
str
|
Path where the explainer is stored |
'explainer.pkl'
|
Source code in mercury/explainability/explainers/explainer.py
23 24 25 26 27 28 29 30 31 32 33 |
|
save(filename='explainer.pkl')
Saves the explainer with its internal state to a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filename |
str
|
Path where the explainer will be saved |
'explainer.pkl'
|
Source code in mercury/explainability/explainers/explainer.py
12 13 14 15 16 17 18 19 20 21 |
|
partial_dependence
PartialDependenceExplainer(predict_fn, output_col='prediction', max_categorical_thresh=5, quantiles=0.05, resolution=50, verbose=False)
Bases: MercuryExplainer
This explainer will calculate the partial dependences for a ML model. Also contains a distributed (pyspark) implementation which allows PySpark transformers/pipelines to be explained via PD.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
predict_fn |
Callable
|
Inference function. This function will take a DataFrame (Pandas or PySpark) and must return the output of your estimator (usually a NumPy array for a 'plain' python model or a DataFrame in case of PySpark). |
required |
output_col |
str
|
Name of the output column name of the PySpark transformer model. This will only be used in case the passed explain data is of type pyspark.DataFrame |
'prediction'
|
max_categorical_thresh |
int
|
If a column contains less unique values than this threshold, it will be auto-considered categorical. |
5
|
quantiles |
float or tuple
|
Calculate the quantiles of the model predictions. If type==float,
(quantiles, 1-quantiles) range will be calculated. If type==tuple, range
will be |
0.05
|
resolution |
int
|
Number of different values to test for each non-categorical variable. Lowering this value will increase speed but reduce resolution in the plots. |
50
|
verbose |
bool
|
Print progress status. Default is False. |
False
|
Example
# "Plain python" example
>>> features = dataset.loc[:, FEATURE_NAMES] # DataFrame with only features
# You can create a custom inference function.
>>> def my_inference_fn(feats):
... return my_fitted_model.predict(feats)
# You can also pass anything as long as it is callable and receives the predictors (e.g. my_fitted_model.predict / predict_proba).
>>> explainer = PartialDependenceExplainer(my_fitted_model.predict)
>>> explanation = explainer.explain(features)
# Plot a summary of partial dependences for all the features.
>>> explanation.plot()
# Plot the partial dependence of a single variable.
>>> fig, ax = plt.subplots()
>>> explanation.plot_single('FEATURE_1', ax=ax)
# Pyspark Example
# Dataset with ONLY feature columns. (Assumed an already trained model)
>>> features = dataset.drop('target')
# Inference function
>>> def my_pred_fn(data):
... temp_df = assembler.transform(data)
... return my_transformer.transform(temp_df)
# We can tell the explainer not to calculate partial dependences for certain features. This will save time.
>>> features_to_ignore = ['FEATURE_4', 'FEATURE_88']
# We pass our custom inference function and also, tell the explainer which column will hold the transformer's output (necessary only when explaining pyspark models).
>>> explainer = PartialDependenceExplainer(my_pred_fn, output_col='probability')
# Explain the model ignoring features.
>>> explanation = explainer.explain(features, ignore_feats=features_to_ignore)
Source code in mercury/explainability/explainers/partial_dependence.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|
__base_impl(features, feat_names, categoricals)
Contains the logic for the base implementation, i.e. the one which uses np.ndarrays/pd.DataFrames and undistributed models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features |
DataFrame
|
pandas.DataFrame with ONLY the predictors. |
required |
feat_names |
list[str]
|
List containing the names of the features that will be explained |
required |
categoricals |
set[str]
|
Set with the feature names that will be forced to be categorical. |
required |
Returns:
Type | Description |
---|---|
dict
|
Dictionary with the partial dependences of each selected feature |
dict
|
and whether that feature is categorical or not. |
Source code in mercury/explainability/explainers/partial_dependence.py
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
|
__partial_dep_base(feat_name, data, grid)
Logic for computing the partial dependence of one feature for the base implementation (undistributed models)
Source code in mercury/explainability/explainers/partial_dependence.py
270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 |
|
__partial_dep_pyspark(feat_name, data, grid, is_categorical=False)
Helper method for the PySpark implementation (distributed models). Computes the partial dependences of one feature given its type (categorical or not)
Source code in mercury/explainability/explainers/partial_dependence.py
290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 |
|
__pyspark_impl(features, feat_names, categoricals)
Contains the logic for the pyspark implementation, i.e. the one which uses pyspark.DataFrames and undistributed models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features |
DataFrame
|
pyspark.DataFrame with ONLY the predictors. |
required |
feat_names |
list
|
list[str] List containing the names of the features that will be explained |
required |
categoricals |
set
|
set[str] Set with the feature names that will be forced to be categorical. |
required |
Returns:
Type | Description |
---|---|
dict
|
Dictionary with the partial dependences of each selected feature |
dict
|
and whether that feature is categorical or not. |
Source code in mercury/explainability/explainers/partial_dependence.py
188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 |
|
explain(features, ignore_feats=None, categoricals=None)
This method will compute the partial dependences for a ML model. This explainer also contains a distributed (pyspark) implementation which allows PySpark transformers/pipelines to be explained via PD.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features |
DataFrame or DataFrame
|
DataFrame with only the features needed by the model. This dataframe should ONLY contain the features consumed by the model and, in the PySpark case, the vector-assembled column should be generated inside of the predict_fn. |
required |
ignore_feats |
List
|
Feature names which won't be explained |
None
|
categoricals |
List
|
of feature names that will be forced to be taken as categoricals. If it's empty, the explainer will guess what columns are categorical |
None
|
Returns:
Type | Description |
---|---|
PartialDependenceExplanation
|
PartialDependenceExplanation containing the explanation results. |
Source code in mercury/explainability/explainers/partial_dependence.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
|
shuffle_importance
ShuffleImportanceExplainer(eval_fn, normalize=True)
Bases: MercuryExplainer
This explainer estimates the feature importance of each predictor for a given black-box model. The used strategy consists on random shuffling one variable at a time and, on each step, checking how much a particular metric worses. The features which make the model to perform the worst are the most important ones.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
eval_fn |
Callable
|
Custom evaluation function. It will recieve a DataFrame with features and
a Numpy array with the target outputs for each instance. It must
implement an inference process and return a metric to score the model
performance on the given data. This metric must be real numbered.
If we use a metric which higher values means better metric (like accuracy)
and we use the parameter |
required |
normalize |
bool
|
Whether to scale the feature importances between 0 and 1. If True, then
it shows the relative importance of the features.
If False, then the feature importances will be the value of the metric
returned in |
True
|
Example
# "Plain python" example
>>> features = pd.read_csv(PATH_DATA)
>>> targets = features['target'] # Targets
>>> features = features.loc[:, FEATURE_NAMES] # DataFrame with only features
>>> def my_inference_function(features, targets):
... predictions = model.predict(features)
... return mean_squared_error(targets, predictions)
>>> explainer = ShuffleImportanceExplainer(my_inference_function)
>>> explanation = explainer.explain(features, targets)
>>> explanation.plot()
# Explain a pyspark model (or pipeline)
>>> features = sess.createDataFrame(pandas_dataframe)
>>> target_colname = "target" # Column name with the ground truth labels
>>> def my_inference_function(features, targets):
... model_inp = vectorAssembler.transform(features)
... model_out = my_pyspark_transformer.transform(model_inp)
... return my_evaluator.evaluate(model_out)
>>> explainer = ShuffleImportanceExplainer(my_inference_function)
>>> explanation = explainer.explain(features, target_colname)
>>> explanation.plot()
Source code in mercury/explainability/explainers/shuffle_importance.py
64 65 66 67 68 69 |
|
explain(predictors, target)
Explains the model given the data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
predictors |
Union[DataFrame, DataFrame]
|
DataFrame with the features the model needs and that will be explained. In the case of PySpark, this dataframe must also contain a column with the target. |
required |
target |
Union[ndarray, str]
|
The ground-truth target for each one of the instances. In the case of Pyspark, this should be the name of the column in the DataFrame which holds the target. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
if type(predictors) == pyspark.sql.DataFrame && type(target) != str |
ValueError
|
if type(predictors) == pyspark.sql.DataFrame && target not in predictors.columns |
Returns:
Type | Description |
---|---|
FeatureImportanceExplanation
|
FeatureImportanceExplanation with the performances of the model |
Source code in mercury/explainability/explainers/shuffle_importance.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
|