pydvl.valuation.methods.semivalue
¶
This module contains the base class for all semi-value valuation methods.
A semi-value is any marginal contribution-based valuation method which weights the marginal contributions of a data point \(i\) to the utility of a subset \(S\) by weights \(w(k)\), where \(k\) is the size of the subset, fulfilling certain conditions. For details, please refer to the introduction to semi-values.
Implementing new methods with importance sampling¶
Semi-values and importance sampling
For a more detailed analysis of the ideas in this and the following section, please read Sampling strategies for semi-values.
Because almost every method employs Monte Carlo sampling of subsets, our architecture allows for importance sampling. Early valuation methods chose samplers to implicitly provide the weights \(w(k)\) as exactly the sampling probabilities of sets \(p(S|k)\), e.g. permutation Shapley.
However, this is not a requirement. In fact, other methods employ different forms of importance sampling as a means to reduce the variance both of the Monte Carlo estimates and the utility function.
For this reason, our implementation allows mix-and-matching of any semi-value coefficient with any sampler. For importance sampling, the mechanism is as follows:
-
Choose a sampler to go with the semi-value. The sampler must implement the
log_weight()
property, which returns the logarithm of the sampling probability of a subset \(S\) of size \(k\), i.e. \(p(S|k).\) Note that this is not p(|S|=k).$ The sampler also implements an EvaluationStrategy which is used to compute the utility of the sampled subsets in subprocesses. -
Subclass SemivalueValuation and implement the
log_coefficient()
method. This method should return the final coefficient in log-space, i.e. the natural logarithm of the coefficient, for numerical stability. The coefficient is a function of the number of elements in the set \(n\) and the size of the subset \(k\) for which the coefficient is being computed, and of the sampler's weight. You can combine the method's coefficient and the weight in any way. For instance, in order to entirely compensate for the sampling distribution one simply subtracts the log-weights from the log-coefficient.
Disabling importance sampling¶
In case you have a sampler that already provides the coefficients you need implicitly
as the sampling probabilities, you can override the log_coefficient
property to
return None
.
SemivalueValuation
¶
SemivalueValuation(
utility: UtilityBase,
sampler: IndexSampler,
is_done: StoppingCriterion,
skip_converged: bool = False,
show_warnings: bool = True,
progress: dict[str, Any] | bool = False,
)
Bases: Valuation
Abstract class to define semi-values.
Implementations must only provide the log_coefficient()
property, corresponding
to the semi-value coefficient.
Note
For implementation consistency, we slightly depart from the common definition of semi-values, which includes a factor \(1/n\) in the sum over subsets. Instead, we subsume this factor into the coefficient \(w(k)\).
PARAMETER | DESCRIPTION |
---|---|
utility
|
Object to compute utilities.
TYPE:
|
sampler
|
Sampling scheme to use.
TYPE:
|
is_done
|
Stopping criterion to use.
TYPE:
|
skip_converged
|
Whether to skip converged indices, as determined by the
stopping criterion's
TYPE:
|
show_warnings
|
Whether to show warnings.
TYPE:
|
progress
|
Whether to show a progress bar. If a dictionary, it is passed to
|
Source code in src/pydvl/valuation/methods/semivalue.py
log_coefficient
abstractmethod
property
¶
log_coefficient: SemivalueCoefficient | None
This property returns the function computing the semi-value coefficient.
Return None
in subclasses that do not need to correct for the sampling
distribution probabilities because of a specific, fixed sampler choice which
already yields the semi-value coefficient.
fit
¶
fit(data: Dataset, continue_from: ValuationResult | None = None) -> Self
Fits the semi-value valuation to the data.
Access the results through the result
property.
PARAMETER | DESCRIPTION |
---|---|
data
|
Data for which to compute values
TYPE:
|
continue_from
|
A previously computed valuation result to continue from.
TYPE:
|
Source code in src/pydvl/valuation/methods/semivalue.py
values
¶
values(sort: bool = False) -> ValuationResult
Returns a copy of the valuation result.
The valuation must have been run with fit()
before calling this method.
PARAMETER | DESCRIPTION |
---|---|
sort
|
Whether to sort the valuation result by value before returning it.
TYPE:
|
Returns: The result of the valuation.