pydvl.valuation.methods.delta_shapley
¶
This module implements the \(\delta\)-Shapley valuation method, introduced by Watson et al. (2023)1.
\(\delta\)-Shapley uses a stratified sampling approach to accurately approximate Shapley values for certain model classes, based on uniform stability bounds.
Additionally, it reduces computation by skipping the marginal utilities for set sizes outside a small range.2
Info
See the documentation or Watson et al. (2023)1 for a more detailed introduction to the method.
References¶
-
Watson, Lauren, Zeno Kujawa, Rayna Andreeva, Hao-Tsung Yang, Tariq Elahi, and Rik Sarkar. Accelerated Shapley Value Approximation for Data Evaluation. arXiv, 9 November 2023. ↩↩
-
When this is done, the final values are off by a constant factor with respect to the true Shapley values. ↩
DeltaShapleyValuation
¶
DeltaShapleyValuation(
utility: UtilityBase,
sampler: StratifiedSampler | StratifiedPermutationSampler,
is_done: StoppingCriterion,
skip_converged: bool = False,
show_warnings: bool = True,
progress: dict[str, Any] | bool = False,
)
Bases: SemivalueValuation
Computes \(\delta\)-Shapley values.
PARAMETER | DESCRIPTION |
---|---|
utility
|
Object to compute utilities.
TYPE:
|
sampler
|
The sampling scheme to use. Must be a stratified sampler. |
is_done
|
Stopping criterion to use.
TYPE:
|
skip_converged
|
Whether to skip converged indices, as determined by the
stopping criterion's
TYPE:
|
show_warnings
|
Whether to show warnings.
TYPE:
|
progress
|
Whether to show a progress bar. If a dictionary, it is passed to
|
Source code in src/pydvl/valuation/methods/delta_shapley.py
log_coefficient
property
¶
log_coefficient: SemivalueCoefficient | None
Returns the log-coefficient of the \(\delta\)-Shapley valuation.
This is constructed to account for the sampling distribution of a StratifiedSampler and yield the Shapley coefficient as effective coefficient (truncated by the size bounds in the sampler).
Normalization
This coefficient differs from the one used in the original paper by a normalization factor of \(m=\sum_k m_k,\) where \(m_k\) is the number of samples of size \(k\). Since, contrary to their layer-wise means, we are computing running averages of all \(m\) value updates, this cancels out, and we are left with the same effective coefficient.
fit
¶
fit(data: Dataset, continue_from: ValuationResult | None = None) -> Self
Fits the semi-value valuation to the data.
Access the results through the result
property.
PARAMETER | DESCRIPTION |
---|---|
data
|
Data for which to compute values
TYPE:
|
continue_from
|
A previously computed valuation result to continue from.
TYPE:
|
Source code in src/pydvl/valuation/methods/semivalue.py
values
¶
values(sort: bool = False) -> ValuationResult
Returns a copy of the valuation result.
The valuation must have been run with fit()
before calling this method.
PARAMETER | DESCRIPTION |
---|---|
sort
|
Whether to sort the valuation result by value before returning it.
TYPE:
|
Returns: The result of the valuation.