pydvl.valuation.methods.delta_shapley ¶

This module implements the \(\delta\)-Shapley valuation method, introduced by Watson et al. (2023)¹.

\(\delta\)-Shapley uses a stratified sampling approach to accurately approximate Shapley values for certain model classes, based on uniform stability bounds.

Additionally, it reduces computation by skipping the marginal utilities for set sizes outside a small range.²

Info

See the documentation or Watson et al. (2023)¹ for a more detailed introduction to the method.

References¶

Watson, Lauren, Zeno Kujawa, Rayna Andreeva, Hao-Tsung Yang, Tariq Elahi, and Rik Sarkar. Accelerated Shapley Value Approximation for Data Evaluation. arXiv, 9 November 2023. ↩↩
When this is done, the final values are off by a constant factor with respect to the true Shapley values. ↩

DeltaShapleyValuation ¶

DeltaShapleyValuation(
    utility: UtilityBase,
    sampler: StratifiedSampler | StratifiedPermutationSampler,
    is_done: StoppingCriterion,
    skip_converged: bool = False,
    show_warnings: bool = True,
    progress: dict[str, Any] | bool = False,
)

Bases: SemivalueValuation

Computes \(\delta\)-Shapley values.

PARAMETER	DESCRIPTION
`utility`	Object to compute utilities. TYPE: `UtilityBase`
`sampler`	The sampling scheme to use. Must be a stratified sampler. TYPE: `StratifiedSampler \| StratifiedPermutationSampler`
`is_done`	Stopping criterion to use. TYPE: `StoppingCriterion`
`skip_converged`	Whether to skip converged indices, as determined by the stopping criterion's `converged` array. TYPE: `bool` DEFAULT: `False`
`show_warnings`	Whether to show warnings. TYPE: `bool` DEFAULT: `True`
`progress`	Whether to show a progress bar. If a dictionary, it is passed to `tqdm` as keyword arguments, and the progress bar is displayed. TYPE: `dict[str, Any] \| bool` DEFAULT: `False`

Source code in src/pydvl/valuation/methods/delta_shapley.py

def __init__(
    self,
    utility: UtilityBase,
    sampler: StratifiedSampler | StratifiedPermutationSampler,
    is_done: StoppingCriterion,
    skip_converged: bool = False,
    show_warnings: bool = True,
    progress: dict[str, Any] | bool = False,
):
    super().__init__(
        utility, sampler, is_done, skip_converged, show_warnings, progress
    )

log_coefficient `property` ¶

log_coefficient: SemivalueCoefficient | None

Returns the log-coefficient of the \(\delta\)-Shapley valuation.

This is constructed to account for the sampling distribution of a StratifiedSampler and yield the Shapley coefficient as effective coefficient (truncated by the size bounds in the sampler).

Normalization

This coefficient differs from the one used in the original paper by a normalization factor of \(m=\sum_k m_k,\) where \(m_k\) is the number of samples of size \(k\). Since, contrary to their layer-wise means, we are computing running averages of all \(m\) value updates, this cancels out, and we are left with the same effective coefficient.

result `property` ¶

result: ValuationResult

The current valuation result (not a copy).

fit ¶

fit(data: Dataset, continue_from: ValuationResult | None = None) -> Self

Fits the semi-value valuation to the data.

Access the results through the result property.

PARAMETER	DESCRIPTION
`data`	Data for which to compute values TYPE: `Dataset`
`continue_from`	A previously computed valuation result to continue from. TYPE: `ValuationResult \| None` DEFAULT: `None`

Source code in src/pydvl/valuation/methods/semivalue.py

@suppress_warnings(flag="show_warnings")
def fit(self, data: Dataset, continue_from: ValuationResult | None = None) -> Self:
    """Fits the semi-value valuation to the data.

    Access the results through the `result` property.

    Args:
        data: Data for which to compute values
        continue_from: A previously computed valuation result to continue from.

    """
    self._result = self._init_or_check_result(data, continue_from)
    ensure_backend_has_generator_return()

    self.is_done.reset()
    self.utility = self.utility.with_dataset(data)

    strategy = self.sampler.make_strategy(self.utility, self.log_coefficient)
    updater = self.sampler.result_updater(self._result)
    processor = delayed(strategy.process)

    with Parallel(return_as="generator_unordered") as parallel:
        with make_parallel_flag() as flag:
            delayed_evals = parallel(
                processor(batch=list(batch), is_interrupted=flag)
                for batch in self.sampler.generate_batches(data.indices)
            )
            for batch in Progress(delayed_evals, self.is_done, **self.tqdm_args):
                for update in batch:
                    self._result = updater.process(update)
                if self.is_done(self._result):
                    flag.set()
                    self.sampler.interrupt()
                    break
                if self.skip_converged:
                    self.sampler.skip_indices = data.indices[self.is_done.converged]
    logger.debug(f"Fitting done after {updater.n_updates} value updates.")
    return self

values ¶

values(sort: bool = False) -> ValuationResult

Returns a copy of the valuation result.

The valuation must have been run with fit() before calling this method.

PARAMETER	DESCRIPTION
`sort`	Whether to sort the valuation result by value before returning it. TYPE: `bool` DEFAULT: `False`

Returns: The result of the valuation.

Source code in src/pydvl/valuation/base.py

@deprecated(
    target=None,
    deprecated_in="0.10.0",
    remove_in="0.11.0",
)
def values(self, sort: bool = False) -> ValuationResult:
    """Returns a copy of the valuation result.

    The valuation must have been run with `fit()` before calling this method.

    Args:
        sort: Whether to sort the valuation result by value before returning it.
    Returns:
        The result of the valuation.
    """
    if not self.is_fitted:
        raise NotFittedException(type(self))
    assert self._result is not None

    r = self._result.copy()
    if sort:
        r.sort(inplace=True)
    return r