pydvl.valuation.methods.classwise_shapley
¶
Class-wise Shapley (Schoch et al., 2022)1 offers a Shapley framework tailored for classification problems. Let \(D\) be a dataset, \(D_{y_i}\) be the subset of \(D\) with labels \(y_i\), and \(D_{-y_i}\) be the complement of \(D_{y_i}\) in \(D\). The key idea is that a sample \((x_i, y_i)\), might enhance the overall performance on \(D\), while being detrimental for the performance on \(D_{y_i}\). The Class-wise value is defined as:
where \(S_{y_i} \subseteq D_{y_i} \setminus \{i\}\) and \(S_{-y_i} \subseteq D_{-y_i}\).
Analysis of Class-wise Shapley
For a detailed analysis of the method, with comparison to other valuation techniques, please refer to the main documentation.
In practice, the quantity above is estimated using Monte Carlo sampling of the powerset and the set of index permutations. This results in the estimator
with \(S^{(1)}, \dots, S^{(K)} \subseteq T_{-y_i},\) \(\sigma^{(1)}, \dots, \sigma^{(L)} \in \Pi(T_{y_i}\setminus\{i\}),\) and \(\sigma^{(l)}_{:i}\) denoting the set of indices in permutation \(\sigma^{(l)}\) before the position where \(i\) appears. The sets \(T_{y_i}\) and \(T_{-y_i}\) are the training sets for the labels \(y_i\) and \(-y_i\), respectively.
Notes for derivation of test cases
The unit tests include the following manually constructed data: Let \(D=\{(1,0),(2,0),(3,0),(4,1)\}\) be the test set and \(T=\{(1,0),(2,0),(3,1),(4,1)\}\) the train set. This specific dataset is chosen as it allows to solve the model
in closed form \(\beta = \frac{\text{dot}(x, y)}{\text{dot}(x, x)}\). From the closed-form solution, the tables for in-class accuracy \(a_S(D_{y_i})\) and out-of-class accuracy \(a_S(D_{-y_i})\) can be calculated. By using these tables and setting \(\{S^{(1)}, \dots, S^{(K)}\} = 2^{T_{-y_i}}\) and \(\{\sigma^{(1)}, \dots, \sigma^{(L)}\} = \Pi(T_{y_i}\setminus\{i\})\), the Monte Carlo estimator can be evaluated (\(2^M\) is the powerset of \(M\)). The details of the derivation are left to the eager reader.
References¶
-
Schoch, Stephanie, Haifeng Xu, and Yangfeng Ji. CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification. In Proc. of the Thirty-Sixth Conference on Neural Information Processing Systems (NeurIPS). New Orleans, Louisiana, USA, 2022. ↩
ClasswiseShapleyValuation
¶
ClasswiseShapleyValuation(
utility: ClasswiseModelUtility,
sampler: ClasswiseSampler,
is_done: StoppingCriterion,
progress: dict[str, Any] | bool = False,
*,
normalize_values: bool = True
)
Bases: Valuation
Class to compute Class-wise Shapley values.
It proceeds by sampling independent permutations of the index set for each label and index sets sampled from the powerset of the complement (with respect to the currently evaluated label).
PARAMETER | DESCRIPTION |
---|---|
utility |
Classwise utility object with model and classwise scoring function.
TYPE:
|
sampler |
Classwise sampling scheme to use.
TYPE:
|
is_done |
Stopping criterion to use.
TYPE:
|
progress |
Whether to show a progress bar. |
normalize_values |
Whether to normalize values after valuation.
TYPE:
|
Source code in src/pydvl/valuation/methods/classwise_shapley.py
values
¶
values(sort: bool = False) -> ValuationResult
Returns a copy of the valuation result.
The valuation must have been run with fit()
before calling this method.
PARAMETER | DESCRIPTION |
---|---|
sort |
Whether to sort the valuation result before returning it.
TYPE:
|
Returns: The result of the valuation.