pydvl.value.shapley.truncated
¶
References¶
-
Ghorbani, A., Zou, J., 2019. Data Shapley: Equitable Valuation of Data for Machine Learning. In: Proceedings of the 36th International Conference on Machine Learning, PMLR, pp. 2242–2251. ↩
TruncationPolicy
¶
Bases: ABC
A policy for deciding whether to stop computing marginals in a permutation.
Statistics are kept on the number of calls and truncations as n_calls
and
n_truncations
respectively.
ATTRIBUTE | DESCRIPTION |
---|---|
n_calls |
Number of calls to the policy.
TYPE:
|
n_truncations |
Number of truncations made by the policy.
TYPE:
|
Todo
Because the policy objects are copied to the workers, the statistics are not accessible from the coordinating process. We need to add methods for this.
Source code in src/pydvl/value/shapley/truncated.py
reset
abstractmethod
¶
__call__
¶
Check whether the computation should be interrupted.
PARAMETER | DESCRIPTION |
---|---|
idx |
Position in the permutation currently being computed.
TYPE:
|
score |
Last utility computed.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
|
Source code in src/pydvl/value/shapley/truncated.py
NoTruncation
¶
Bases: TruncationPolicy
A policy which never interrupts the computation.
Source code in src/pydvl/value/shapley/truncated.py
__call__
¶
Check whether the computation should be interrupted.
PARAMETER | DESCRIPTION |
---|---|
idx |
Position in the permutation currently being computed.
TYPE:
|
score |
Last utility computed.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
|
Source code in src/pydvl/value/shapley/truncated.py
FixedTruncation
¶
Bases: TruncationPolicy
Break a permutation after computing a fixed number of marginals.
The experiments in Appendix B of (Ghorbani and Zou, 2019)1 show that when the training set size is large enough, one can simply truncate the iteration over permutations after a fixed number of steps. This happens because beyond a certain number of samples in a training set, the model becomes insensitive to new ones. Alas, this strongly depends on the data distribution and the model and there is no automatic way of estimating this number.
PARAMETER | DESCRIPTION |
---|---|
u |
Utility object with model, data, and scoring function
TYPE:
|
fraction |
Fraction of marginals in a permutation to compute before stopping (e.g. 0.5 to compute half of the marginals).
TYPE:
|
Source code in src/pydvl/value/shapley/truncated.py
__call__
¶
Check whether the computation should be interrupted.
PARAMETER | DESCRIPTION |
---|---|
idx |
Position in the permutation currently being computed.
TYPE:
|
score |
Last utility computed.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
|
Source code in src/pydvl/value/shapley/truncated.py
RelativeTruncation
¶
Bases: TruncationPolicy
Break a permutation if the marginal utility is too low.
This is called "performance tolerance" in (Ghorbani and Zou, 2019)1.
PARAMETER | DESCRIPTION |
---|---|
u |
Utility object with model, data, and scoring function
TYPE:
|
rtol |
Relative tolerance. The permutation is broken if the
last computed utility is less than
TYPE:
|
Source code in src/pydvl/value/shapley/truncated.py
__call__
¶
Check whether the computation should be interrupted.
PARAMETER | DESCRIPTION |
---|---|
idx |
Position in the permutation currently being computed.
TYPE:
|
score |
Last utility computed.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
|
Source code in src/pydvl/value/shapley/truncated.py
BootstrapTruncation
¶
Bases: TruncationPolicy
Break a permutation if the last computed utility is close to the total utility, measured as a multiple of the standard deviation of the utilities.
PARAMETER | DESCRIPTION |
---|---|
u |
Utility object with model, data, and scoring function
TYPE:
|
n_samples |
Number of bootstrap samples to use to compute the variance of the utilities.
TYPE:
|
sigmas |
Number of standard deviations to use as a threshold.
TYPE:
|
Source code in src/pydvl/value/shapley/truncated.py
__call__
¶
Check whether the computation should be interrupted.
PARAMETER | DESCRIPTION |
---|---|
idx |
Position in the permutation currently being computed.
TYPE:
|
score |
Last utility computed.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
|