pydvl.valuation.methods.semivalue
¶
This module contains the base class for all semi-value valuation methods.
A semi-value is any valuation function with the form:
where \(U\) is the utility, and the coefficients \(w(k)\) satisfy the property:
This is the largest class of marginal-contribution-based valuation methods. These compute the value of a data point by evaluating the change in utility when the data point is removed from one or more subsets of the data.
SemivalueValuation
¶
SemivalueValuation(
utility: UtilityBase,
sampler: IndexSampler,
is_done: StoppingCriterion,
skip_converged: bool = False,
show_warnings: bool = True,
progress: dict[str, Any] | bool = False,
)
Bases: Valuation
Abstract class to define semi-values.
Implementations must only provide the log_coefficient()
method, corresponding
to the semi-value coefficient.
Note
For implementation consistency, we slightly depart from the common definition of semi-values, which includes a factor \(1/n\) in the sum over subsets. Instead, we subsume this factor into the coefficient \(w(k)\).
PARAMETER | DESCRIPTION |
---|---|
utility
|
Object to compute utilities.
TYPE:
|
sampler
|
Sampling scheme to use.
TYPE:
|
is_done
|
Stopping criterion to use.
TYPE:
|
skip_converged
|
Whether to skip converged indices, as determined by the
stopping criterion's
TYPE:
|
show_warnings
|
Whether to show warnings.
TYPE:
|
progress
|
Whether to show a progress bar. If a dictionary, it is passed to
|
Source code in src/pydvl/valuation/methods/semivalue.py
values
¶
values(sort: bool = False) -> ValuationResult
Returns a copy of the valuation result.
The valuation must have been run with fit()
before calling this method.
PARAMETER | DESCRIPTION |
---|---|
sort
|
Whether to sort the valuation result by value before returning it.
TYPE:
|
Returns: The result of the valuation.
Source code in src/pydvl/valuation/base.py
log_coefficient
abstractmethod
¶
The semi-value coefficient in log-space.
The semi-value coefficient is a function of the number of elements in the set, and the size of the subset for which the coefficient is being computed. Because both coefficients and sampler weights can be very large or very small, we perform all computations in log-space to avoid numerical issues.
PARAMETER | DESCRIPTION |
---|---|
n
|
Total number of elements in the set.
TYPE:
|
k
|
Size of the subset for which the coefficient is being computed
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
The natural logarithm of the semi-value coefficient. |