Skip to content

pydvl.valuation.methods.semivalue

This module contains the base class for all semi-value valuation methods.

A semi-value is any valuation function with the form:

\[ v_\text{semi}(i) = \sum_{i=1}^n w(k) \sum_{S \subset D_{-i}^{(k)}} [U(S_{+i})-U(S)], \]

where \(U\) is the utility, and the coefficients \(w(k)\) satisfy the property:

\[ \sum_{k=1}^n w(k) = 1. \]

This is the largest class of marginal-contribution-based valuation methods. These compute the value of a data point by evaluating the change in utility when the data point is removed from one or more subsets of the data.

SemivalueValuation

SemivalueValuation(
    utility: UtilityBase,
    sampler: IndexSampler,
    is_done: StoppingCriterion,
    progress: dict[str, Any] | bool = False,
)

Bases: Valuation

Abstract class to define semi-values.

Implementations must only provide the coefficient() method, corresponding to the semi-value coefficient.

Note

For implementation consistency, we slightly depart from the common definition of semi-values, which includes a factor \(1/n\) in the sum over subsets. Instead, we subsume this factor into the coefficient \(w(k)\). TODO: see ...

PARAMETER DESCRIPTION
utility

Object to compute utilities.

TYPE: UtilityBase

sampler

Sampling scheme to use.

TYPE: IndexSampler

is_done

Stopping criterion to use.

TYPE: StoppingCriterion

progress

Whether to show a progress bar.

TYPE: dict[str, Any] | bool DEFAULT: False

Source code in src/pydvl/valuation/methods/semivalue.py
def __init__(
    self,
    utility: UtilityBase,
    sampler: IndexSampler,
    is_done: StoppingCriterion,
    progress: dict[str, Any] | bool = False,
):
    super().__init__()
    self.utility = utility
    self.sampler = sampler
    self.is_done = is_done
    self.tqdm_args: dict[str, Any] = {
        "desc": f"{self.__class__.__name__}: {str(is_done)}"
    }
    # HACK: parse additional args for the progress bar if any (we probably want
    #  something better)
    if isinstance(progress, bool):
        self.tqdm_args.update({"disable": not progress})
    else:
        self.tqdm_args.update(progress if isinstance(progress, dict) else {})

values

values(sort: bool = False) -> ValuationResult

Returns a copy of the valuation result.

The valuation must have been run with fit() before calling this method.

PARAMETER DESCRIPTION
sort

Whether to sort the valuation result before returning it.

TYPE: bool DEFAULT: False

Returns: The result of the valuation.

Source code in src/pydvl/valuation/base.py
def values(self, sort: bool = False) -> ValuationResult:
    """Returns a copy of the valuation result.

    The valuation must have been run with `fit()` before calling this method.

    Args:
        sort: Whether to sort the valuation result before returning it.
    Returns:
        The result of the valuation.
    """
    if not self.is_fitted:
        raise NotFittedException(type(self))
    assert self.result is not None

    from copy import copy

    r = copy(self.result)
    if sort:
        r.sort()
    return r

coefficient abstractmethod

coefficient(n: int, k: int) -> float

Computes the coefficient for a given subset size.

PARAMETER DESCRIPTION
n

Total number of elements in the set.

TYPE: int

k

Size of the subset for which the coefficient is being computed

TYPE: int

Source code in src/pydvl/valuation/methods/semivalue.py
@abstractmethod
def coefficient(self, n: int, k: int) -> float:
    """Computes the coefficient for a given subset size.

    Args:
        n: Total number of elements in the set.
        k: Size of the subset for which the coefficient is being computed
    """
    ...