pydvl.valuation.stopping
¶
Stopping criteria for value computations.
This module provides a basic set of stopping criteria to be used with valuation algorithms, in particular all semi-values. Common examples are MinUpdates, MaxTime, or HistoryDeviation.
Stopping criteria can behave in different ways depending on the context. For example, MaxUpdates limits the number of updates to values, which depending on the algorithm may mean a different number of utility evaluations or imply other computations like solving a linear or quadratic program. In the case of SemivalueValuation, the criteria are evaluated once per batch, which might lead to different behavior depending on the batch size (e.g. for certain batch_sizes it might happen that the number of updates to values after convergence is not exactly was required, since multiple updates might happen at once).
Stopping criteria are callables that are evaluated on a ValuationResult and return a Status object. They can be combined using boolean operators.
Combining stopping criteria¶
Objects of type StoppingCriterion can
be combined with the binary operators &
(and), and |
(or), following the
truth tables of Status. The unary operator ~
(not) is also supported. See
StoppingCriterion for details on how
these operations affect the behavior of the stopping criteria.
How convergence is determined¶
Most stopping criteria keep track of the convergence of each index separately but make global decisions based on the overall convergence of some fraction of all indices. For example, if we have a stopping criterion that checks whether the standard error of 90% of values is below a threshold, then methods will keep updating all indices until 90% of them have converged, irrespective of the quality of the individual estimates, and without freezing updates for indices along the way as values individually attain low standard error.
This has some practical implications, because some values do tend to converge
sooner than others. For example, assume we use the criterion
AbsoluteStandardError(0.02) | MaxUpdates(1000)
. Then values close to 0 might
be marked as "converged" rather quickly because they fulfill the first
criterion, say after 20 iterations, despite being poor estimates (see the section on
pitfalls below for commentary on stopping criteria based on standard errors).
Because other indices take longer to reach a low standard error and the criterion is a
global check, the "converged" ones keep being updated and end up being good
estimates. In this case, this has been beneficial, but one might not wish for
converged values to be updated, if one is sure that the criterion is adequate
for individual values.
Semi-value methods include a
parameter skip_converged
that allows to skip the computation of values that have
converged. The way to avoid doing this too early is to use a more stringent
check, e.g. AbsoluteStandardError(1e-3) | MaxUpdates(1000)
. With
skip_converged=True
this check can still take less time than the first one,
despite requiring more iterations for some indices.
Stoppig criterion for finite samplers
Using a finite sampler naturally defines when the valuation algorithm terminates. However, in order to properly report progress, we need to use a stopping criterion that keeps track of the number of iterations. In this case, one can use NoStopping with the sampler as an argument. This quirk is due to progress reported depending on the completion attribute of a criterion. Here's how it's done:
Choosing a stopping criterion¶
The choice of a stopping criterion greatly depends on the algorithm and the
context. A safe bet is to combine a MaxUpdates
or a MaxTime with a
HistoryDeviation or an
RankCorrelation. The former
will ensure that the computation does not run for too long, while the latter
will try to achieve results that are stable enough. Note however that if the
threshold is too strict, one will always end up running until a maximum number
of iterations or time. Also keep in mind that different values converge at
different times, so you might want to use tight thresholds and skip_converged
as described above for semi-values.
Example
from pydvl.valuation import DataBanzhafValuation, MinUpdates, MSRSampler, RankCorrelation
model = ... # Some sklearn-compatible model
scorer = SupervisedScorer("accuracy", test_data, default=0.0)
utility = ModelUtility(model, scorer)
sampler = MSRSampler(seed=seed)
stopping = RankCorrelation(rtol=1e-2, burn_in=32) | MinUpdates(1000)
valuation = DataBanzhafValuation(utility=utility, sampler=sampler, is_done=stopping)
with parallel_config(n_jobs=4):
valuation.fit(training_data)
result = valuation.values()
utility
until either the change in
Spearman rank correlation between updates is below 1e-2
or 1000
updates have
been performed. The burn_in
parameter is used to discard the first 32
updates
from the computation of the standard error.
Warning
Be careful not to reuse the same stopping criterion for different computations. The
object has state, which is reset by fit()
for some valuation methods, but this is
not guaranteed for all methods. If you need to reuse the same criterion, it's
safer to create a new instance.
Interactions with sampling schemes and other pitfalls of stopping criteria¶
Different samplers define different "update strategies" for values. For example,
MSRSampler updates the counts
field
of a ValuationResult only for about half of
the utility evaluations, because it reuses samples. This means that a stopping criterion
like MaxChecks will not work as expected, because
it will count the number of calls to the criterion, not the number of updates to the
values. In this case, one should use MaxUpdates
or, more likely, MinUpdates instead.
Another pitfall is the interaction with the skip_converged
parameter of
Semi-value methods. If this is
set to True
, the stopping criterion should probably be more stringent.
Finally, stopping criteria that rely on the standard error of the values, like AbsoluteStandardError, should be used with care. The standard error is a measure of the uncertainty of the estimate, but it does not guarantee that the estimate is close to the true value. For example, if the utility function is very noisy, the standard error might be very low, but the estimate might be far from the true value. In this case, one might want to use a RankCorrelation instead, which checks whether the rank of the values is stable.
Creating stopping criteria¶
In order to create a new stopping criterion, one can subclass
StoppingCriterion and implement the
_check
method. This method should return a Status and
update the _converged
attribute, which is a boolean array indicating whether the
value for each index has converged. When this does not make sense for a particular
stopping criterion, completion
should be overridden to provide an overall completion value, since its default
implementation attempts to compute the mean of _converged
.
References¶
-
Ghorbani, A., Zou, J., 2019. Data Shapley: Equitable Valuation of Data for Machine Learning. In: Proceedings of the 36th International Conference on Machine Learning, PMLR, pp. 2242–2251. ↩
StoppingCriterion
¶
StoppingCriterion(modify_result: bool = True)
Bases: ABC
A composable callable object to determine whether a computation must stop.
A StoppingCriterion
is a callable taking a
ValuationResult and returning a
Status. It also keeps track of individual
convergence of values with
converged, and reports
the overall completion of the computation with
completion.
Instances of StoppingCriterion
can be composed with the binary operators
&
(and), and |
(or), following the truth tables of
Status. The unary operator ~
(not) is
also supported. These boolean operations act according to the following
rules:
- The results of
check()
are combined with the operator. See Status for the truth tables. - The results of converged are combined with the operator (returning another boolean array).
- The completion method returns the min, max, or the complement to 1 of the completions of the operands, for AND, OR and NOT respectively. This is required for cases where one of the criteria does not keep track of the convergence of single values, e.g. MaxUpdates, because completion by default returns the mean of the boolean convergence array.
Subclassing¶
Subclassing this class requires implementing a check()
method that
returns a Status object based on a given
ValuationResult. This method should
update the attribute _converged
, which is a boolean array indicating
whether the value for each index has converged.
When this does not make sense for a particular stopping criterion,
completion should be
overridden to provide an overall completion value, since its default
implementation attempts to compute the mean of _converged
.
PARAMETER | DESCRIPTION |
---|---|
modify_result
|
If
TYPE:
|
Source code in src/pydvl/valuation/stopping.py
converged
property
¶
completion
¶
completion() -> float
Returns a value between 0 and 1 indicating the completion of the computation.
__call__
¶
__call__(result: ValuationResult) -> Status
Calls check()
, maybe updating the result.
Source code in src/pydvl/valuation/stopping.py
AbsoluteStandardError
¶
AbsoluteStandardError(
threshold: float,
fraction: float = 1.0,
burn_in: int = 4,
modify_result: bool = True,
)
Bases: StoppingCriterion
Determine convergence based on the standard error of the values.
If \(s_i\) is the standard error for datum \(i\), then this criterion returns Converged if \(s_i < \epsilon\) for all \(i\) and a threshold value \(\epsilon \gt 0\).
Warning
This criterion should be used with care. The standard error is a measure of the uncertainty of the estimate, but it does not guarantee that the estimate is close to the true value. For example, if the utility function is very noisy, the standard error might be very low, but the estimate might be far from the true value. In this case, one might want to use a RankCorrelation instead, which checks whether the rank of the values is stable.
PARAMETER | DESCRIPTION |
---|---|
threshold
|
A value is considered to have converged if the standard error is below this threshold. A way of choosing it is to pick some percentage of the range of the values. For Shapley values this is the difference between the maximum and minimum of the utility function (to see this substitute the maximum and minimum values of the utility into the marginal contribution formula).
TYPE:
|
fraction
|
The fraction of values that must have converged for the criterion to return Converged.
TYPE:
|
burn_in
|
The number of iterations to ignore before checking for convergence. This is required because computations typically start with zero variance, as a result of using zeros(). The default is set to an arbitrary minimum which is usually enough but may need to be increased.
TYPE:
|
modify_result
|
If
TYPE:
|
Source code in src/pydvl/valuation/stopping.py
converged
property
¶
completion
¶
completion() -> float
Returns a value between 0 and 1 indicating the completion of the computation.
__call__
¶
__call__(result: ValuationResult) -> Status
Calls check()
, maybe updating the result.
Source code in src/pydvl/valuation/stopping.py
MaxChecks
¶
Bases: StoppingCriterion
Terminate as soon as the number of checks exceeds the threshold.
A "check" is one call to the criterion. Note that this might have different
interpretations depending on the sampler. For example,
MSRSampler performs a single
utility evaluation to update all indices, so that's len(training_data)
checks for
a single training of the model. But it also only changes the counts
field of the
ValuationResult for about half of the
indices, which is what e.g. MaxUpdates checks.
PARAMETER | DESCRIPTION |
---|---|
n_checks
|
Threshold: if
TYPE:
|
modify_result
|
If
TYPE:
|
Source code in src/pydvl/valuation/stopping.py
converged
property
¶
__call__
¶
__call__(result: ValuationResult) -> Status
Calls check()
, maybe updating the result.
Source code in src/pydvl/valuation/stopping.py
MaxUpdates
¶
Bases: StoppingCriterion
Terminate if any number of value updates exceeds or equals the given threshold.
Note
If you want to ensure that all values have been updated, you probably want MinUpdates instead.
This checks the counts
field of a
ValuationResult, i.e. the number of
times that each index has been updated. For powerset samplers, the maximum
of this number coincides with the maximum number of subsets sampled. For
permutation samplers, it coincides with the number of permutations sampled.
PARAMETER | DESCRIPTION |
---|---|
n_updates
|
Threshold: if
TYPE:
|
modify_result
|
If
TYPE:
|
Source code in src/pydvl/valuation/stopping.py
converged
property
¶
__call__
¶
__call__(result: ValuationResult) -> Status
Calls check()
, maybe updating the result.
Source code in src/pydvl/valuation/stopping.py
NoStopping
¶
NoStopping(sampler: IndexSampler | None = None, modify_result: bool = True)
Bases: StoppingCriterion
Keep running forever or until sampling stops.
If a sampler instance is passed, and it is a finite sampler, its counter will be used to update completion status.
PARAMETER | DESCRIPTION |
---|---|
sampler
|
A sampler instance to use for completion status.
TYPE:
|
modify_result
|
If
TYPE:
|
Source code in src/pydvl/valuation/stopping.py
converged
property
¶
__call__
¶
__call__(result: ValuationResult) -> Status
Calls check()
, maybe updating the result.
Source code in src/pydvl/valuation/stopping.py
MinUpdates
¶
Bases: StoppingCriterion
Terminate as soon as all value updates exceed or equal the given threshold.
This checks the counts
field of a
ValuationResult, i.e. the number of times
that each index has been updated. For powerset samplers, the minimum of this number
is a lower bound for the number of subsets sampled. For permutation samplers, it
lower-bounds the amount of permutations sampled.
PARAMETER | DESCRIPTION |
---|---|
n_updates
|
Threshold: if
TYPE:
|
modify_result
|
If
TYPE:
|
Source code in src/pydvl/valuation/stopping.py
converged
property
¶
__call__
¶
__call__(result: ValuationResult) -> Status
Calls check()
, maybe updating the result.
Source code in src/pydvl/valuation/stopping.py
MaxTime
¶
Bases: StoppingCriterion
Terminate if the computation time exceeds the given number of seconds.
Checks the elapsed time since construction.
PARAMETER | DESCRIPTION |
---|---|
seconds
|
Threshold: The computation is terminated if the elapsed time
between object construction and a _check exceeds this value. If
TYPE:
|
modify_result
|
If
TYPE:
|
Source code in src/pydvl/valuation/stopping.py
converged
property
¶
__call__
¶
__call__(result: ValuationResult) -> Status
Calls check()
, maybe updating the result.
Source code in src/pydvl/valuation/stopping.py
RollingMemory
¶
Bases: Generic[DT]
A simple rolling memory for the last n_steps
values of each index.
Updating the memory results in new values are copied to the last column of the matrix and old ones removed from the first.
PARAMETER | DESCRIPTION |
---|---|
n_steps
|
The number of steps to remember.
TYPE:
|
default
|
The default value to use when the memory is empty. |
Source code in src/pydvl/valuation/stopping.py
update
¶
update(values: NDArray[DT]) -> Self
Update the memory with the values of the current result.
The values are appended to the memory as its last column, and the oldest values (the first column) are removed
Source code in src/pydvl/valuation/stopping.py
HistoryDeviation
¶
HistoryDeviation(
n_steps: int,
rtol: float,
pin_converged: bool = True,
modify_result: bool = True,
)
Bases: StoppingCriterion
A simple check for relative distance to a previous step in the computation.
The method used by Ghorbani and Zou, (2019)1 computes the relative distances between the current values \(v_i^t\) and the values at the previous checkpoint \(v_i^{t-\tau}\). If the sum is below a given threshold, the computation is terminated.
When the denominator is zero, the summand is set to the value of \(v_i^{ t-\tau}\).
This implementation is slightly generalised to allow for different number of updates to individual indices, as happens with powerset samplers instead of permutations. Every subset of indices that is found to converge can be pinned to that state. Once all indices have converged the method has converged.
Warning
This criterion is meant for the reproduction of the results in the paper, but we do not recommend using it in practice.
PARAMETER | DESCRIPTION |
---|---|
n_steps
|
Compare values after so many steps. A step is one evaluation of the criterion, which happens once per batch.
TYPE:
|
rtol
|
Relative tolerance for convergence (\(\epsilon\) in the formula).
TYPE:
|
pin_converged
|
If
TYPE:
|
Source code in src/pydvl/valuation/stopping.py
converged
property
¶
completion
¶
completion() -> float
Returns a value between 0 and 1 indicating the completion of the computation.
__call__
¶
__call__(result: ValuationResult) -> Status
Calls check()
, maybe updating the result.
Source code in src/pydvl/valuation/stopping.py
RankCorrelation
¶
Bases: StoppingCriterion
A check for stability of Spearman correlation between checks.
Convergence is reached when the change in rank correlation between two successive iterations is below a given threshold.
This criterion is used in (Wang et al.)2.
The meaning of successive iterations
Stopping criteria in pyDVL are typically evaluated after each batch of value
updates is received. This can imply very different things, depending on the
configuration of the samplers. For this reason, RankCorrelation
keeps itself
track of the number of updates that each index has seen, and only checks for
correlation changes when a given fraction of all indices has been updated more
than burn_in
times and once since last time the criterion was checked.
PARAMETER | DESCRIPTION |
---|---|
rtol
|
Relative tolerance for convergence (\(\epsilon\) in the formula)
TYPE:
|
burn_in
|
The minimum number of updates an index must have seen before checking for convergence. This is required because the first correlation checks are usually meaningless.
TYPE:
|
fraction
|
The fraction of values that must have been updated between two correlation checks. This is to avoid comparing two results where only one value has been updated, which would have almost perfect rank correlation.
TYPE:
|
modify_result
|
If
TYPE:
|
Added in 0.9.0
Changed in 0.10.0
The behaviour of the burn_in
parameter was changed to look at value updates.
The parameter fraction
was added.
Source code in src/pydvl/valuation/stopping.py
converged
property
¶
__call__
¶
__call__(result: ValuationResult) -> Status
Calls check()
, maybe updating the result.