Common
compute_shapley_values(u, *, done=MaxUpdates(100), mode=ShapleyMode.TruncatedMontecarlo, n_jobs=1, seed=None, **kwargs)
¶
Umbrella method to compute Shapley values with any of the available algorithms.
See [[data-valuation]] for an overview.
The following algorithms are available. Note that the exact methods can only work with very small datasets and are thus intended only for testing. Some algorithms also accept additional arguments, please refer to the documentation of each particular method.
combinatorial_exact
: uses the combinatorial implementation of data Shapley. Implemented in combinatorial_exact_shapley().combinatorial_montecarlo
: uses the approximate Monte Carlo implementation of combinatorial data Shapley. Implemented in combinatorial_montecarlo_shapley().permutation_exact
: uses the permutation-based implementation of data Shapley. Computation is not parallelized. Implemented in permutation_exact_shapley().permutation_montecarlo
: uses the approximate Monte Carlo implementation of permutation data Shapley. Accepts a TruncationPolicy to stop computing marginals. Implemented in permutation_montecarlo_shapley().owen_sampling
: Uses the Owen continuous extension of the utility function to the unit cube. Implemented in owen_sampling_shapley(). This method does not take a StoppingCriterion but instead requires a parameterq_max
for the number of subdivisions of the unit interval to use for integration, and another parametern_samples
for the number of subsets to sample for each \(q\).owen_halved
: Same as 'owen_sampling' but uses correlated samples in the expectation. Implemented in owen_sampling_shapley(). This method requires an additional parameterq_max
for the number of subdivisions of the interval [0,0.5] to use for integration, and another parametern_samples
for the number of subsets to sample for each \(q\).group_testing
: estimates differences of Shapley values and solves a constraint satisfaction problem. High sample complexity, not recommended. Implemented in group_testing_shapley(). This method does not take a StoppingCriterion but instead requires a parametern_samples
for the number of iterations to run.
Additionally, one can use model-specific methods:
knn
: Exact method for K-Nearest neighbour models. Implemented in knn_shapley().
PARAMETER | DESCRIPTION |
---|---|
u |
Utility object with model, data, and scoring function.
TYPE:
|
done |
Object used to determine when to stop the computation for Monte Carlo methods. The default is to stop after 100 iterations. See the available criteria in stopping. It is possible to combine several of them using boolean operators. Some methods ignore this argument, others require specific subtypes.
TYPE:
|
n_jobs |
Number of parallel jobs (available only to some methods)
TYPE:
|
seed |
Either an instance of a numpy random number generator or a seed for it.
TYPE:
|
mode |
Choose which shapley algorithm to use. See ShapleyMode for a list of allowed value.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ValuationResult
|
Object with the results. |
Source code in src/pydvl/value/shapley/common.py
|
|