pydvl.value.shapley.common
¶
compute_shapley_values
¶
compute_shapley_values(
u: Utility,
*,
done: StoppingCriterion = MaxChecks(None),
mode: ShapleyMode = ShapleyMode.TruncatedMontecarlo,
n_jobs: int = 1,
seed: Optional[Seed] = None,
**kwargs
) -> ValuationResult
Umbrella method to compute Shapley values with any of the available algorithms.
See Data valuation for an overview.
The following algorithms are available. Note that the exact methods can only work with very small datasets and are thus intended only for testing. Some algorithms also accept additional arguments, please refer to the documentation of each particular method.
combinatorial_exact
: uses the combinatorial implementation of data Shapley. Implemented in combinatorial_exact_shapley().combinatorial_montecarlo
: uses the approximate Monte Carlo implementation of combinatorial data Shapley. Implemented in combinatorial_montecarlo_shapley().permutation_exact
: uses the permutation-based implementation of data Shapley. Computation is not parallelized. Implemented in permutation_exact_shapley().permutation_montecarlo
: uses the approximate Monte Carlo implementation of permutation data Shapley. Accepts a TruncationPolicy to stop computing marginals. Implemented in permutation_montecarlo_shapley().owen_sampling
: Uses the Owen continuous extension of the utility function to the unit cube. Implemented in owen_sampling_shapley(). This method does not take a StoppingCriterion but instead requires a parameterq_max
for the number of subdivisions of the unit interval to use for integration, and another parametern_samples
for the number of subsets to sample for each \(q\).owen_halved
: Same as 'owen_sampling' but uses correlated samples in the expectation. Implemented in owen_sampling_shapley(). This method requires an additional parameterq_max
for the number of subdivisions of the interval [0,0.5] to use for integration, and another parametern_samples
for the number of subsets to sample for each \(q\).group_testing
: estimates differences of Shapley values and solves a constraint satisfaction problem. High sample complexity, not recommended. Implemented in group_testing_shapley(). This method does not take a StoppingCriterion but instead requires a parametern_samples
for the number of iterations to run.
Additionally, one can use model-specific methods:
knn
: Exact method for K-Nearest neighbour models. Implemented in knn_shapley().
PARAMETER | DESCRIPTION |
---|---|
u |
Utility object with model, data, and scoring function.
TYPE:
|
done |
Object used to determine when to stop the computation for Monte Carlo methods. The default is to stop after 100 iterations. See the available criteria in stopping. It is possible to combine several of them using boolean operators. Some methods ignore this argument, others require specific subtypes.
TYPE:
|
n_jobs |
Number of parallel jobs (available only to some methods)
TYPE:
|
seed |
Either an instance of a numpy random number generator or a seed for it.
TYPE:
|
mode |
Choose which shapley algorithm to use. See ShapleyMode for a list of allowed value.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ValuationResult
|
Object with the results. |
Source code in src/pydvl/value/shapley/common.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
|