Truncated
References¶
-
Ghorbani, A., Zou, J., 2019. Data Shapley: Equitable Valuation of Data for Machine Learning. In: Proceedings of the 36th International Conference on Machine Learning, PMLR, pp. 2242–2251. ↩
TruncationPolicy()
¶
Bases: ABC
A policy for deciding whether to stop computing marginals in a permutation.
Statistics are kept on the number of calls and truncations as n_calls and
n_truncations respectively.
| ATTRIBUTE | DESCRIPTION |
|---|---|
n_calls |
Number of calls to the policy.
TYPE:
|
n_truncations |
Number of truncations made by the policy.
TYPE:
|
Todo
Because the policy objects are copied to the workers, the statistics are not accessible from the coordinating process. We need to add methods for this.
Source code in src/pydvl/value/shapley/truncated.py
reset()
abstractmethod
¶
__call__(idx, score)
¶
Check whether the computation should be interrupted.
| PARAMETER | DESCRIPTION |
|---|---|
idx |
Position in the permutation currently being computed.
TYPE:
|
score |
Last utility computed.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
|
Source code in src/pydvl/value/shapley/truncated.py
NoTruncation
¶
FixedTruncation(u, fraction)
¶
Bases: TruncationPolicy
Break a permutation after computing a fixed number of marginals.
The experiments in Appendix B of (Ghorbani and Zou, 2019)1 show that when the training set size is large enough, one can simply truncate the iteration over permutations after a fixed number of steps. This happens because beyond a certain number of samples in a training set, the model becomes insensitive to new ones. Alas, this strongly depends on the data distribution and the model and there is no automatic way of estimating this number.
| PARAMETER | DESCRIPTION |
|---|---|
u |
Utility object with model, data, and scoring function
TYPE:
|
fraction |
Fraction of marginals in a permutation to compute before stopping (e.g. 0.5 to compute half of the marginals).
TYPE:
|
Source code in src/pydvl/value/shapley/truncated.py
RelativeTruncation(u, rtol)
¶
Bases: TruncationPolicy
Break a permutation if the marginal utility is too low.
This is called "performance tolerance" in (Ghorbani and Zou, 2019)1.
| PARAMETER | DESCRIPTION |
|---|---|
u |
Utility object with model, data, and scoring function
TYPE:
|
rtol |
Relative tolerance. The permutation is broken if the
last computed utility is less than
TYPE:
|
Source code in src/pydvl/value/shapley/truncated.py
BootstrapTruncation(u, n_samples, sigmas=1)
¶
Bases: TruncationPolicy
Break a permutation if the last computed utility is close to the total utility, measured as a multiple of the standard deviation of the utilities.
| PARAMETER | DESCRIPTION |
|---|---|
u |
Utility object with model, data, and scoring function
TYPE:
|
n_samples |
Number of bootstrap samples to use to compute the variance of the utilities.
TYPE:
|
sigmas |
Number of standard deviations to use as a threshold.
TYPE:
|
Source code in src/pydvl/value/shapley/truncated.py
truncated_montecarlo_shapley(u, *, done, truncation, config=ParallelConfig(), n_jobs=1, coordinator_update_period=10, worker_update_period=5)
¶
Warning
This method is deprecated and only a wrapper for permutation_montecarlo_shapley.
Todo
Think of how to add Robin-Gelman or some other more principled stopping criterion.
| PARAMETER | DESCRIPTION |
|---|---|
u |
Utility object with model, data, and scoring function
TYPE:
|
done |
Check on the results which decides when to stop sampling permutations.
TYPE:
|
truncation |
callable that decides whether to stop computing marginals for a given permutation.
TYPE:
|
config |
Object configuring parallel computation, with cluster address, number of cpus, etc.
TYPE:
|
n_jobs |
Number of permutation monte carlo jobs to run concurrently.
TYPE:
|
Returns: Object with the data values.
Source code in src/pydvl/value/shapley/truncated.py
Created: 2023-09-02