Skip to content

pydvl.valuation.methods.delta_shapley

This module implements the \(\delta\)-Shapley valuation method, introduced by Watson et al. (2023)1.

\(\delta\)-Shapley uses a stratified sampling approach to accurately approximate Shapley values for certain model classes, based on uniform stability bounds.

Additionally, it reduces computation by skipping the marginal utilities for set sizes outside a small range.2

Info

See the documentation or Watson et al. (2023)1 for a more detailed introduction to the method.

References


  1. Watson, Lauren, Zeno Kujawa, Rayna Andreeva, Hao-Tsung Yang, Tariq Elahi, and Rik Sarkar. Accelerated Shapley Value Approximation for Data Evaluation. arXiv, 9 November 2023. 

  2. When this is done, the final values are off by a constant factor with respect to the true Shapley values. 

DeltaShapleyValuation

DeltaShapleyValuation(
    utility: UtilityBase,
    sampler: StratifiedSampler | StratifiedPermutationSampler,
    is_done: StoppingCriterion,
    skip_converged: bool = False,
    show_warnings: bool = True,
    progress: dict[str, Any] | bool = False,
)

Bases: SemivalueValuation

Computes \(\delta\)-Shapley values.

PARAMETER DESCRIPTION
utility

Object to compute utilities.

TYPE: UtilityBase

sampler

The sampling scheme to use. Must be a stratified sampler.

TYPE: StratifiedSampler | StratifiedPermutationSampler

is_done

Stopping criterion to use.

TYPE: StoppingCriterion

skip_converged

Whether to skip converged indices, as determined by the stopping criterion's converged array.

TYPE: bool DEFAULT: False

show_warnings

Whether to show warnings.

TYPE: bool DEFAULT: True

progress

Whether to show a progress bar. If a dictionary, it is passed to tqdm as keyword arguments, and the progress bar is displayed.

TYPE: dict[str, Any] | bool DEFAULT: False

Source code in src/pydvl/valuation/methods/delta_shapley.py
def __init__(
    self,
    utility: UtilityBase,
    sampler: StratifiedSampler | StratifiedPermutationSampler,
    is_done: StoppingCriterion,
    skip_converged: bool = False,
    show_warnings: bool = True,
    progress: dict[str, Any] | bool = False,
):
    super().__init__(
        utility, sampler, is_done, skip_converged, show_warnings, progress
    )

log_coefficient property

log_coefficient: SemivalueCoefficient | None

Returns the log-coefficient of the \(\delta\)-Shapley valuation.

This is constructed to account for the sampling distribution of a StratifiedSampler and yield the Shapley coefficient as effective coefficient (truncated by the size bounds in the sampler).

Normalization

This coefficient differs from the one used in the original paper by a normalization factor of \(m=\sum_k m_k,\) where \(m_k\) is the number of samples of size \(k\). Since, contrary to their layer-wise means, we are computing running averages of all \(m\) value updates, this cancels out, and we are left with the same effective coefficient.

result property

The current valuation result (not a copy).

fit

fit(data: Dataset, continue_from: ValuationResult | None = None) -> Self

Fits the semi-value valuation to the data.

Access the results through the result property.

PARAMETER DESCRIPTION
data

Data for which to compute values

TYPE: Dataset

continue_from

A previously computed valuation result to continue from.

TYPE: ValuationResult | None DEFAULT: None

Source code in src/pydvl/valuation/methods/semivalue.py
@suppress_warnings(flag="show_warnings")
def fit(self, data: Dataset, continue_from: ValuationResult | None = None) -> Self:
    """Fits the semi-value valuation to the data.

    Access the results through the `result` property.

    Args:
        data: Data for which to compute values
        continue_from: A previously computed valuation result to continue from.

    """
    self._result = self._init_or_check_result(data, continue_from)
    ensure_backend_has_generator_return()

    self.is_done.reset()
    self.utility = self.utility.with_dataset(data)

    strategy = self.sampler.make_strategy(self.utility, self.log_coefficient)
    updater = self.sampler.result_updater(self._result)
    processor = delayed(strategy.process)

    with Parallel(return_as="generator_unordered") as parallel:
        with make_parallel_flag() as flag:
            delayed_evals = parallel(
                processor(batch=list(batch), is_interrupted=flag)
                for batch in self.sampler.generate_batches(data.indices)
            )
            for batch in Progress(delayed_evals, self.is_done, **self.tqdm_args):
                for update in batch:
                    self._result = updater.process(update)
                if self.is_done(self._result):
                    flag.set()
                    self.sampler.interrupt()
                    break
                if self.skip_converged:
                    self.sampler.skip_indices = data.indices[self.is_done.converged]
    logger.debug(f"Fitting done after {updater.n_updates} value updates.")
    return self

values

values(sort: bool = False) -> ValuationResult

Returns a copy of the valuation result.

The valuation must have been run with fit() before calling this method.

PARAMETER DESCRIPTION
sort

Whether to sort the valuation result by value before returning it.

TYPE: bool DEFAULT: False

Returns: The result of the valuation.

Source code in src/pydvl/valuation/base.py
@deprecated(
    target=None,
    deprecated_in="0.10.0",
    remove_in="0.11.0",
)
def values(self, sort: bool = False) -> ValuationResult:
    """Returns a copy of the valuation result.

    The valuation must have been run with `fit()` before calling this method.

    Args:
        sort: Whether to sort the valuation result by value before returning it.
    Returns:
        The result of the valuation.
    """
    if not self.is_fitted:
        raise NotFittedException(type(self))
    assert self._result is not None

    r = self._result.copy()
    if sort:
        r.sort(inplace=True)
    return r