Stopping
Stopping criteria for value computations.
This module provides a basic set of stopping criteria, like MaxUpdates, MaxTime, or HistoryDeviation among others. These can behave in different ways depending on the context. For example, MaxUpdates limits the number of updates to values, which depending on the algorithm may mean a different number of utility evaluations or imply other computations like solving a linear or quadratic program.
Creating stopping criteria¶
The easiest way is to declare a function implementing the interface StoppingCriterionCallable and wrap it with make_criterion(). This creates a StoppingCriterion object that can be composed with other stopping criteria.
Alternatively, and in particular if reporting of completion is required, one can inherit from this class and implement the abstract methods [_check][pydvl.value.stopping.StoppingCriterion._check] and completion.
Composing stopping criteria¶
Objects of type StoppingCriterion can
be composed with the binary operators &
(and), and |
(or), following the
truth tables of Status. The unary operator ~
(not) is also supported. See
StoppingCriterion for details on how
these operations affect the behavior of the stopping criteria.
References¶
-
Ghorbani, A., Zou, J., 2019. Data Shapley: Equitable Valuation of Data for Machine Learning. In: Proceedings of the 36th International Conference on Machine Learning, PMLR, pp. 2242–2251. ↩
StoppingCriterion(modify_result=True)
¶
Bases: ABC
A composable callable object to determine whether a computation must stop.
A StoppingCriterion
is a callable taking a
ValuationResult and returning a
Status. It also keeps track of individual
convergence of values with
converged, and reports
the overall completion of the computation with
completion.
Instances of StoppingCriterion
can be composed with the binary operators
&
(and), and |
(or), following the truth tables of
Status. The unary operator ~
(not) is
also supported. These boolean operations act according to the following
rules:
- The results of [_check][pydvl.value.stopping.StoppingCriterion._check] are combined with the operator. See Status for the truth tables.
- The results of converged are combined with the operator (returning another boolean array).
- The completion method returns the min, max, or the complement to 1 of the completions of the operands, for AND, OR and NOT respectively. This is required for cases where one of the criteria does not keep track of the convergence of single values, e.g. MaxUpdates, because completion by default returns the mean of the boolean convergence array.
Subclassing¶
Subclassing this class requires implementing a [_check][pydvl.value.stopping.StoppingCriterion._check] method that returns a Status object based on a given ValuationResult. This method should update the attribute [_converged][pydvl.value.stopping.StoppingCriterion._converged], which is a boolean array indicating whether the value for each index has converged. When this does not make sense for a particular stopping criterion, completion should be overridden to provide an overall completion value, since its default implementation attempts to compute the mean of [_converged][pydvl.value.stopping.StoppingCriterion._converged].
PARAMETER | DESCRIPTION |
---|---|
modify_result |
If
TYPE:
|
Source code in src/pydvl/value/stopping.py
converged: NDArray[np.bool_]
property
¶
Returns a boolean array indicating whether the values have converged for each data point.
Inheriting classes must set the _converged
attribute in their
[_check][pydvl.value.stopping.StoppingCriterion._check].
RETURNS | DESCRIPTION |
---|---|
NDArray[numpy.bool_]
|
A boolean array indicating whether the values have converged for |
NDArray[numpy.bool_]
|
each data point. |
completion()
¶
Returns a value between 0 and 1 indicating the completion of the computation.
__call__(result)
¶
Calls [_check][pydvl.value.stopping.StoppingCriterion._check], maybe updating the result.
Source code in src/pydvl/value/stopping.py
AbsoluteStandardError(threshold, fraction=1.0, burn_in=4, modify_result=True)
¶
Bases: StoppingCriterion
Determine convergence based on the standard error of the values.
If \(s_i\) is the standard error for datum \(i\) and \(v_i\) its value, then this criterion returns Converged if \(s_i < \epsilon\) for all \(i\) and a threshold value \(\epsilon \gt 0\).
PARAMETER | DESCRIPTION |
---|---|
threshold |
A value is considered to have converged if the standard error is below this value. A way of choosing it is to pick some percentage of the range of the values. For Shapley values this is the difference between the maximum and minimum of the utility function (to see this substitute the maximum and minimum values of the utility into the marginal contribution formula).
TYPE:
|
fraction |
The fraction of values that must have converged for the criterion to return Converged.
TYPE:
|
burn_in |
The number of iterations to ignore before checking for convergence. This is required because computations typically start with zero variance, as a result of using empty(). The default is set to an arbitrary minimum which is usually enough but may need to be increased.
TYPE:
|
Source code in src/pydvl/value/stopping.py
MaxChecks(n_checks, modify_result=True)
¶
Bases: StoppingCriterion
Terminate as soon as the number of checks exceeds the threshold.
A "check" is one call to the criterion.
PARAMETER | DESCRIPTION |
---|---|
n_checks |
Threshold: if |
Source code in src/pydvl/value/stopping.py
MaxUpdates(n_updates, modify_result=True)
¶
Bases: StoppingCriterion
Terminate if any number of value updates exceeds or equals the given threshold.
Note
If you want to ensure that all values have been updated, you probably want MinUpdates instead.
This checks the counts
field of a
ValuationResult, i.e. the number of
times that each index has been updated. For powerset samplers, the maximum
of this number coincides with the maximum number of subsets sampled. For
permutation samplers, it coincides with the number of permutations sampled.
PARAMETER | DESCRIPTION |
---|---|
n_updates |
Threshold: if |
Source code in src/pydvl/value/stopping.py
MinUpdates(n_updates, modify_result=True)
¶
Bases: StoppingCriterion
Terminate as soon as all value updates exceed or equal the given threshold.
This checks the counts
field of a
ValuationResult, i.e. the number of times that
each index has been updated. For powerset samplers, the minimum of this
number is a lower bound for the number of subsets sampled. For
permutation samplers, it lower-bounds the amount of permutations sampled.
PARAMETER | DESCRIPTION |
---|---|
n_updates |
Threshold: if |
Source code in src/pydvl/value/stopping.py
MaxTime(seconds, modify_result=True)
¶
Bases: StoppingCriterion
Terminate if the computation time exceeds the given number of seconds.
Checks the elapsed time since construction
PARAMETER | DESCRIPTION |
---|---|
seconds |
Threshold: The computation is terminated if the elapsed time
between object construction and a _check exceeds this value. If |
Source code in src/pydvl/value/stopping.py
HistoryDeviation(n_steps, rtol, pin_converged=True, modify_result=True)
¶
Bases: StoppingCriterion
A simple check for relative distance to a previous step in the computation.
The method used by (Ghorbani and Zou, 2019)1 computes the relative distances between the current values \(v_i^t\) and the values at the previous checkpoint \(v_i^{t-\tau}\). If the sum is below a given threshold, the computation is terminated.
When the denominator is zero, the summand is set to the value of \(v_i^{ t-\tau}\).
This implementation is slightly generalised to allow for different number of updates to individual indices, as happens with powerset samplers instead of permutations. Every subset of indices that is found to converge can be pinned to that state. Once all indices have converged the method has converged.
Warning
This criterion is meant for the reproduction of the results in the paper, but we do not recommend using it in practice.
PARAMETER | DESCRIPTION |
---|---|
n_steps |
Checkpoint values every so many updates and use these saved values to compare.
TYPE:
|
rtol |
Relative tolerance for convergence (\(\epsilon\) in the formula).
TYPE:
|
pin_converged |
If
TYPE:
|
Source code in src/pydvl/value/stopping.py
make_criterion(fun, converged=None, completion=None, name=None)
¶
Create a new StoppingCriterion from a function. Use this to enable simpler functions to be composed with bitwise operators
PARAMETER | DESCRIPTION |
---|---|
fun |
The callable to wrap. |
converged |
A callable that returns a boolean array indicating what values have converged.
TYPE:
|
completion |
A callable that returns a value between 0 and 1 indicating the rate of completion of the computation. If not provided, the fraction of converged values is used. |
name |
The name of the new criterion. If
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Type[StoppingCriterion]
|
A new subclass of StoppingCriterion. |
Source code in src/pydvl/value/stopping.py
Created: 2023-09-02