Deprecation notice
This module is deprecated since v0.10.0 in favor of pydvl.valuation.
pydvl.value.shapley.truncated
¶
References¶
-
Ghorbani, A., Zou, J., 2019. Data Shapley: Equitable Valuation of Data for Machine Learning. In: Proceedings of the 36th International Conference on Machine Learning, PMLR, pp. 2242–2251. ↩
BootstrapTruncation
¶
Bases: TruncationPolicy
Break a permutation if the last computed utility is close to the total utility, measured as a multiple of the standard deviation of the utilities.
| PARAMETER | DESCRIPTION |
|---|---|
u
|
Utility object with model, data, and scoring function
TYPE:
|
n_samples
|
Number of bootstrap samples to use to compute the variance of the utilities.
TYPE:
|
sigmas
|
Number of standard deviations to use as a threshold.
TYPE:
|
Source code in src/pydvl/value/shapley/truncated.py
__call__
¶
Check whether the computation should be interrupted.
| PARAMETER | DESCRIPTION |
|---|---|
idx
|
Position in the permutation currently being computed.
TYPE:
|
score
|
Last utility computed.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
|
Source code in src/pydvl/value/shapley/truncated.py
FixedTruncation
¶
Bases: TruncationPolicy
Break a permutation after computing a fixed number of marginals.
The experiments in Appendix B of (Ghorbani and Zou, 2019)1 show that when the training set size is large enough, one can simply truncate the iteration over permutations after a fixed number of steps. This happens because beyond a certain number of samples in a training set, the model becomes insensitive to new ones. Alas, this strongly depends on the data distribution and the model and there is no automatic way of estimating this number.
| PARAMETER | DESCRIPTION |
|---|---|
u
|
Utility object with model, data, and scoring function
TYPE:
|
fraction
|
Fraction of marginals in a permutation to compute before stopping (e.g. 0.5 to compute half of the marginals).
TYPE:
|
Source code in src/pydvl/value/shapley/truncated.py
__call__
¶
Check whether the computation should be interrupted.
| PARAMETER | DESCRIPTION |
|---|---|
idx
|
Position in the permutation currently being computed.
TYPE:
|
score
|
Last utility computed.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
|
Source code in src/pydvl/value/shapley/truncated.py
NoTruncation
¶
Bases: TruncationPolicy
A policy which never interrupts the computation.
Source code in src/pydvl/value/shapley/truncated.py
__call__
¶
Check whether the computation should be interrupted.
| PARAMETER | DESCRIPTION |
|---|---|
idx
|
Position in the permutation currently being computed.
TYPE:
|
score
|
Last utility computed.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
|
Source code in src/pydvl/value/shapley/truncated.py
RelativeTruncation
¶
Bases: TruncationPolicy
Break a permutation if the marginal utility is too low.
This is called "performance tolerance" in (Ghorbani and Zou, 2019)1.
| PARAMETER | DESCRIPTION |
|---|---|
u
|
Utility object with model, data, and scoring function
TYPE:
|
rtol
|
Relative tolerance. The permutation is broken if the
last computed utility is less than
TYPE:
|
Source code in src/pydvl/value/shapley/truncated.py
__call__
¶
Check whether the computation should be interrupted.
| PARAMETER | DESCRIPTION |
|---|---|
idx
|
Position in the permutation currently being computed.
TYPE:
|
score
|
Last utility computed.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
|
Source code in src/pydvl/value/shapley/truncated.py
TruncationPolicy
¶
Bases: ABC
A policy for deciding whether to stop computing marginals in a permutation.
Statistics are kept on the number of calls and truncations as n_calls and
n_truncations respectively.
| ATTRIBUTE | DESCRIPTION |
|---|---|
n_calls |
Number of calls to the policy.
TYPE:
|
n_truncations |
Number of truncations made by the policy.
TYPE:
|
Todo
Because the policy objects are copied to the workers, the statistics are not accessible from the coordinating process. We need to add methods for this.
Source code in src/pydvl/value/shapley/truncated.py
__call__
¶
Check whether the computation should be interrupted.
| PARAMETER | DESCRIPTION |
|---|---|
idx
|
Position in the permutation currently being computed.
TYPE:
|
score
|
Last utility computed.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
|