Skip to content

pydvl.valuation.methods.semivalue

This module contains the base class for all semi-value valuation methods.

A semi-value is any valuation function with the form:

\[ v_\text{semi}(i) = \sum_{i=1}^n w(k) \sum_{S \subset D_{-i}^{(k)}} [U(S_{+i})-U(S)], \]

where \(U\) is the utility, and the coefficients \(w(k)\) satisfy the property:

\[ \sum_{k=1}^n w(k) = 1. \]

This is the largest class of marginal-contribution-based valuation methods. These compute the value of a data point by evaluating the change in utility when the data point is removed from one or more subsets of the data.

SemivalueValuation

SemivalueValuation(
    utility: UtilityBase,
    sampler: IndexSampler,
    is_done: StoppingCriterion,
    skip_converged: bool = False,
    show_warnings: bool = True,
    progress: dict[str, Any] | bool = False,
)

Bases: Valuation

Abstract class to define semi-values.

Implementations must only provide the log_coefficient() method, corresponding to the semi-value coefficient.

Note

For implementation consistency, we slightly depart from the common definition of semi-values, which includes a factor \(1/n\) in the sum over subsets. Instead, we subsume this factor into the coefficient \(w(k)\).

PARAMETER DESCRIPTION
utility

Object to compute utilities.

TYPE: UtilityBase

sampler

Sampling scheme to use.

TYPE: IndexSampler

is_done

Stopping criterion to use.

TYPE: StoppingCriterion

skip_converged

Whether to skip converged indices, as determined by the stopping criterion's converged array.

TYPE: bool DEFAULT: False

show_warnings

Whether to show warnings.

TYPE: bool DEFAULT: True

progress

Whether to show a progress bar. If a dictionary, it is passed to tqdm as keyword arguments, and the progress bar is displayed.

TYPE: dict[str, Any] | bool DEFAULT: False

Source code in src/pydvl/valuation/methods/semivalue.py
def __init__(
    self,
    utility: UtilityBase,
    sampler: IndexSampler,
    is_done: StoppingCriterion,
    skip_converged: bool = False,
    show_warnings: bool = True,
    progress: dict[str, Any] | bool = False,
):
    super().__init__()
    self.utility = utility
    self.sampler = sampler
    self.is_done = is_done
    self.skip_converged = skip_converged
    self.show_warnings = show_warnings
    self.tqdm_args: dict[str, Any] = {
        "desc": f"{self.__class__.__name__}: {str(is_done)}"
    }
    # HACK: parse additional args for the progress bar if any (we probably want
    #  something better)
    if isinstance(progress, bool):
        self.tqdm_args.update({"disable": not progress})
    elif isinstance(progress, dict):
        self.tqdm_args.update(progress)
    else:
        raise TypeError(f"Invalid type for progress: {type(progress)}")

values

values(sort: bool = False) -> ValuationResult

Returns a copy of the valuation result.

The valuation must have been run with fit() before calling this method.

PARAMETER DESCRIPTION
sort

Whether to sort the valuation result by value before returning it.

TYPE: bool DEFAULT: False

Returns: The result of the valuation.

Source code in src/pydvl/valuation/base.py
def values(self, sort: bool = False) -> ValuationResult:
    """Returns a copy of the valuation result.

    The valuation must have been run with `fit()` before calling this method.

    Args:
        sort: Whether to sort the valuation result by value before returning it.
    Returns:
        The result of the valuation.
    """
    if not self.is_fitted:
        raise NotFittedException(type(self))
    assert self.result is not None

    from copy import copy

    r = copy(self.result)
    if sort:
        r.sort()
    return r

log_coefficient abstractmethod

log_coefficient(n: int, k: int) -> float

The semi-value coefficient in log-space.

The semi-value coefficient is a function of the number of elements in the set, and the size of the subset for which the coefficient is being computed. Because both coefficients and sampler weights can be very large or very small, we perform all computations in log-space to avoid numerical issues.

PARAMETER DESCRIPTION
n

Total number of elements in the set.

TYPE: int

k

Size of the subset for which the coefficient is being computed

TYPE: int

RETURNS DESCRIPTION
float

The natural logarithm of the semi-value coefficient.

Source code in src/pydvl/valuation/methods/semivalue.py
@abstractmethod
def log_coefficient(self, n: int, k: int) -> float:
    """The semi-value coefficient in log-space.

    The semi-value coefficient is a function of the number of elements in the set,
    and the size of the subset for which the coefficient is being computed.
    Because both coefficients and sampler weights can be very large or very small,
    we perform all computations in log-space to avoid numerical issues.

    Args:
        n: Total number of elements in the set.
        k: Size of the subset for which the coefficient is being computed

    Returns:
        The natural logarithm of the semi-value coefficient.
    """
    ...