pydvl.valuation.samplers.powerset ¶

This module provides the base implementation for powerset samplers.

These samplers operate in two loops:

Outer iteration over all indices. This is configurable with subclasses of IndexIteration. At each step we fix an index \(i \in N\).
Inner iteration over subsets of \(N_{-i}\). This step can return one or more subsets, sampled in different ways: uniformly, with varying probabilities, in tuples of complementary sets, etc.

Index iteration¶

The type of iteration over indices \(i\) and their complements is configured upon construction of the sampler with the classes SequentialIndexIteration, RandomIndexIteration, or their finite counterparts, when each index must be visited just once (albeit possibly generating many samples per index).

However, some valuation schemes require iteration over subsets of the whole set (as opposed to iterating over complements of individual indices). For this purpose, one can use NoIndexIteration or its finite counterpart.

References¶

Mitchell, Rory, Joshua Cooper, Eibe Frank, and Geoffrey Holmes. Sampling Permutations for Shapley Value Estimation. Journal of Machine Learning Research 23, no. 43 (2022): 1–46. ↩
Maleki, Sasan, Long Tran-Thanh, Greg Hines, Talal Rahwan, and Alex Rogers. Bounding the Estimation Error of Sampling-Based Shapley Value Approximation. arXiv:1306.4265 [Cs], 12 February 2014. ↩

AntitheticSampler ¶

AntitheticSampler(*args, seed: Seed | None = None, **kwargs)

Bases: StochasticSamplerMixin, PowersetSampler

A sampler that draws samples uniformly, followed by their complements.

Works as UniformSampler, but for every tuple \((i,S)\), it subsequently returns \((i,S^c)\), where \(S^c\) is the complement of the set \(S\) in the set of indices, excluding \(i\).

By symmetry, the probability of sampling a set \(S\) is the same as the probability of sampling its complement \(S^c\), so that \(p(S)\) in log_weight is the same as in the PowersetSampler class.

PARAMETER	DESCRIPTION
`batch_size`	The number of samples to generate per batch. Batches are processed together by `process()` in the evaluation strategy PowersetEvaluationStrategy.
`index_iteration`	the strategy to use for iterating over indices to update.
`seed`	Seed for the random number generator. Passed to numpy.random.default_rng. TYPE: `Seed \| None` DEFAULT: `None`

Source code in src/pydvl/valuation/samplers/utils.py

def __init__(self, *args, seed: Seed | None = None, **kwargs):
    super().__init__(*args, **kwargs)
    self._rng = np.random.default_rng(seed)

interrupted `property` ¶

interrupted: bool

Whether the sampler has been interrupted.

skip_indices `property` `writable` ¶

skip_indices

Set of indices to skip in the outer loop.

len ¶

__len__() -> int

Returns the length of the current sample generation in generate_batches.

RAISES	DESCRIPTION
`TypeError`	if the sampler is infinite or generate_batches has not been called yet.

Source code in src/pydvl/valuation/samplers/base.py

def __len__(self) -> int:
    """Returns the length of the current sample generation in generate_batches.

    Raises:
        `TypeError`: if the sampler is infinite or
            [generate_batches][pydvl.valuation.samplers.IndexSampler.generate_batches]
            has not been called yet.
    """
    if self._len is None:
        raise TypeError(f"This {self.__class__.__name__} has no length")
    return self._len

repr ¶

__repr__() -> str

FIXME: This is not a proper representation of the sampler.

Source code in src/pydvl/valuation/samplers/base.py

def __repr__(self) -> str:
    """FIXME: This is not a proper representation of the sampler."""
    return f"{self.__class__.__name__}"

generate_batches ¶

generate_batches(indices: IndexSetT) -> BatchGenerator

Batches the samples and yields them.

Source code in src/pydvl/valuation/samplers/base.py

def generate_batches(self, indices: IndexSetT) -> BatchGenerator:
    """Batches the samples and yields them."""
    self._len = self.sample_limit(indices)

    # Create an empty generator if the indices are empty: `return` acts like a
    # `break`, and produces an empty generator.
    if len(indices) == 0:
        return

    self._interrupted = False
    self._n_samples = 0
    for batch in chunked(self.generate(indices), self.batch_size):
        self._n_samples += len(batch)
        yield batch
        if self._interrupted:
            break

index_iterable ¶

index_iterable(indices: IndexSetT) -> Generator[IndexT | None, None, None]

Iterates over indices with the method specified at construction.

Source code in src/pydvl/valuation/samplers/powerset.py

def index_iterable(
    self, indices: IndexSetT
) -> Generator[IndexT | None, None, None]:
    """Iterates over indices with the method specified at construction."""
    try:
        iterable = self._index_iterator_cls(indices, seed=self._rng)  # type: ignore
    except (AttributeError, TypeError):
        iterable = self._index_iterator_cls(indices)
    for idx in iterable:
        if idx not in self.skip_indices:
            yield idx

interrupt ¶

interrupt()

Signals the sampler to stop generating samples after the current batch.

Source code in src/pydvl/valuation/samplers/base.py

def interrupt(self):
    """Signals the sampler to stop generating samples after the current batch."""
    self._interrupted = True

log_weight ¶

log_weight(n: int, subset_len: int) -> float

Probability of sampling a set S as a function of total number of indices and set size.

The uniform distribution over the powerset of a set with \(n\) elements has mass \(1/2^{n}\) over each subset.

PARAMETER	DESCRIPTION
`n`	The size of the index set. Note that the actual size of the set being sampled will often be n-1, as one index might be removed from the set. See IndexIteration for more. TYPE: `int`
`subset_len`	The size of the subset being sampled TYPE: `int`

RETURNS	DESCRIPTION
`float`	The natural logarithm of the probability of sampling a set of the given size, when the index set has size `n`, under the IndexIteration given upon construction.

Source code in src/pydvl/valuation/samplers/powerset.py

def log_weight(self, n: int, subset_len: int) -> float:
    """Probability of sampling a set S as a function of total number of indices and
     set size.

    The uniform distribution over the powerset of a set with $n$ elements has mass
    $1/2^{n}$ over each subset.

    Args:
        n: The size of the index set. Note that the actual size of the set being
            sampled will often be n-1, as one index might be removed from the set.
            See [IndexIteration][pydvl.valuation.samplers.powerset.IndexIteration]
            for more.
        subset_len: The size of the subset being sampled

    Returns:
        The natural logarithm of the probability of sampling a set of the given
            size, when the index set has size `n`, under the
            [IndexIteration][pydvl.valuation.samplers.powerset.IndexIteration] given
            upon construction.

    """
    m = self.complement_size(n)
    return float(-m * np.log(2))

result_updater ¶

result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]

Returns an object that updates a valuation result with a value update.

Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.

PARAMETER	DESCRIPTION
`result`	The result to update TYPE: `ValuationResult`

Returns: A callable object that updates the result with a value update

Source code in src/pydvl/valuation/samplers/base.py

def result_updater(self, result: ValuationResult) -> ResultUpdater[ValueUpdateT]:
    """Returns an object that updates a valuation result with a value update.

    Because we use log-space computation for numerical stability, the default result
    updater keeps track of several quantities required to maintain accurate running
    1st and 2nd moments.

    Args:
        result: The result to update
    Returns:
        A callable object that updates the result with a value update
    """
    return LogResultUpdater(result)

DeterministicUniformSampler ¶

DeterministicUniformSampler(
    batch_size: int = 1,
    index_iteration: Type[IndexIteration] = FiniteSequentialIndexIteration,
)

Bases: PowersetSampler

An iterator to perform uniform deterministic sampling of subsets.

For every index \(i\), each subset of the complement indices - {i} is returned.

PARAMETER	DESCRIPTION
`batch_size`	The number of samples to generate per batch. Batches are processed together by each subprocess when working in parallel. TYPE: `int` DEFAULT: `1`
`index_iteration`	the strategy to use for iterating over indices to update. This iteration can be either finite or infinite. TYPE: `Type[IndexIteration]` DEFAULT: `FiniteSequentialIndexIteration`

Example

The code:

from pydvl.valuation.samplers import DeterministicUniformSampler
import numpy as np
sampler = DeterministicUniformSampler()
for idx, s in sampler.generate_batches(np.arange(2)):
    print(f"{idx} - {s}", end=", ")

Should produce the output:

1 - [], 1 - [2], 2 - [], 2 - [1],

Source code in src/pydvl/valuation/samplers/powerset.py

def __init__(
    self,
    batch_size: int = 1,
    index_iteration: Type[IndexIteration] = FiniteSequentialIndexIteration,
):
    super().__init__(batch_size=batch_size, index_iteration=index_iteration)

interrupted `property` ¶

interrupted: bool

Whether the sampler has been interrupted.

skip_indices `property` `writable` ¶

skip_indices

Set of indices to skip in the outer loop.

len ¶

__len__() -> int

Returns the length of the current sample generation in generate_batches.

RAISES	DESCRIPTION
`TypeError`	if the sampler is infinite or generate_batches has not been called yet.

Source code in src/pydvl/valuation/samplers/base.py

def __len__(self) -> int:
    """Returns the length of the current sample generation in generate_batches.

    Raises:
        `TypeError`: if the sampler is infinite or
            [generate_batches][pydvl.valuation.samplers.IndexSampler.generate_batches]
            has not been called yet.
    """
    if self._len is None:
        raise TypeError(f"This {self.__class__.__name__} has no length")
    return self._len

repr ¶

__repr__() -> str

FIXME: This is not a proper representation of the sampler.

Source code in src/pydvl/valuation/samplers/base.py

def __repr__(self) -> str:
    """FIXME: This is not a proper representation of the sampler."""
    return f"{self.__class__.__name__}"

generate_batches ¶

generate_batches(indices: IndexSetT) -> BatchGenerator

Batches the samples and yields them.

Source code in src/pydvl/valuation/samplers/base.py

def generate_batches(self, indices: IndexSetT) -> BatchGenerator:
    """Batches the samples and yields them."""
    self._len = self.sample_limit(indices)

    # Create an empty generator if the indices are empty: `return` acts like a
    # `break`, and produces an empty generator.
    if len(indices) == 0:
        return

    self._interrupted = False
    self._n_samples = 0
    for batch in chunked(self.generate(indices), self.batch_size):
        self._n_samples += len(batch)
        yield batch
        if self._interrupted:
            break

index_iterable ¶

index_iterable(indices: IndexSetT) -> Generator[IndexT | None, None, None]

Iterates over indices with the method specified at construction.

Source code in src/pydvl/valuation/samplers/powerset.py

def index_iterable(
    self, indices: IndexSetT
) -> Generator[IndexT | None, None, None]:
    """Iterates over indices with the method specified at construction."""
    try:
        iterable = self._index_iterator_cls(indices, seed=self._rng)  # type: ignore
    except (AttributeError, TypeError):
        iterable = self._index_iterator_cls(indices)
    for idx in iterable:
        if idx not in self.skip_indices:
            yield idx

interrupt ¶

interrupt()

Signals the sampler to stop generating samples after the current batch.

Source code in src/pydvl/valuation/samplers/base.py

def interrupt(self):
    """Signals the sampler to stop generating samples after the current batch."""
    self._interrupted = True

log_weight ¶

log_weight(n: int, subset_len: int) -> float

Probability of sampling a set S as a function of total number of indices and set size.

The uniform distribution over the powerset of a set with \(n\) elements has mass \(1/2^{n}\) over each subset.

PARAMETER	DESCRIPTION
`n`	The size of the index set. Note that the actual size of the set being sampled will often be n-1, as one index might be removed from the set. See IndexIteration for more. TYPE: `int`
`subset_len`	The size of the subset being sampled TYPE: `int`

RETURNS	DESCRIPTION
`float`	The natural logarithm of the probability of sampling a set of the given size, when the index set has size `n`, under the IndexIteration given upon construction.

Source code in src/pydvl/valuation/samplers/powerset.py

def log_weight(self, n: int, subset_len: int) -> float:
    """Probability of sampling a set S as a function of total number of indices and
     set size.

    The uniform distribution over the powerset of a set with $n$ elements has mass
    $1/2^{n}$ over each subset.

    Args:
        n: The size of the index set. Note that the actual size of the set being
            sampled will often be n-1, as one index might be removed from the set.
            See [IndexIteration][pydvl.valuation.samplers.powerset.IndexIteration]
            for more.
        subset_len: The size of the subset being sampled

    Returns:
        The natural logarithm of the probability of sampling a set of the given
            size, when the index set has size `n`, under the
            [IndexIteration][pydvl.valuation.samplers.powerset.IndexIteration] given
            upon construction.

    """
    m = self.complement_size(n)
    return float(-m * np.log(2))

result_updater ¶

result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]

Returns an object that updates a valuation result with a value update.

Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.

PARAMETER	DESCRIPTION
`result`	The result to update TYPE: `ValuationResult`

Returns: A callable object that updates the result with a value update

Source code in src/pydvl/valuation/samplers/base.py

def result_updater(self, result: ValuationResult) -> ResultUpdater[ValueUpdateT]:
    """Returns an object that updates a valuation result with a value update.

    Because we use log-space computation for numerical stability, the default result
    updater keeps track of several quantities required to maintain accurate running
    1st and 2nd moments.

    Args:
        result: The result to update
    Returns:
        A callable object that updates the result with a value update
    """
    return LogResultUpdater(result)

FiniteIterationMixin ¶

Careful with MRO when using this and subclassing!

FiniteNoIndexIteration ¶

FiniteNoIndexIteration(indices: IndexSetT)

Bases: FiniteIterationMixin, NoIndexIteration

A finite iteration over no indices.

The iterator will yield None once and then stop.

Source code in src/pydvl/valuation/samplers/powerset.py

def __init__(self, indices: IndexSetT):
    self._indices = indices

length `staticmethod` ¶

length(n_indices: int) -> int | None

Returns 1, as the iteration yields exactly one item (None)

Source code in src/pydvl/valuation/samplers/powerset.py

@staticmethod
def length(n_indices: int) -> int | None:
    """Returns 1, as the iteration yields exactly one item (`None`)"""
    return 1

FiniteRandomIndexIteration ¶

FiniteRandomIndexIteration(indices: NDArray[IndexT], seed: Seed)

Bases: FiniteIterationMixin, RandomIndexIteration

Samples indices at random, once

Source code in src/pydvl/valuation/samplers/powerset.py

def __init__(self, indices: NDArray[IndexT], seed: Seed):
    super().__init__(indices, seed=seed)

FiniteSequentialIndexIteration ¶

FiniteSequentialIndexIteration(indices: IndexSetT)

Bases: FiniteIterationMixin, SequentialIndexIteration

Samples indices sequentially, once.

Source code in src/pydvl/valuation/samplers/powerset.py

def __init__(self, indices: IndexSetT):
    self._indices = indices

IndexIteration ¶

IndexIteration(indices: IndexSetT)

Bases: ABC

An index iteration defines the way in which the outer loop over indices in a PowersetSampler is done.

Iterations can be finite, infinite, at random, etc. Subclasses must implement certain methods to inform the samplers of the size of the complement, the number of iterations, etc.

Source code in src/pydvl/valuation/samplers/powerset.py

def __init__(self, indices: IndexSetT):
    self._indices = indices

complement_size `abstractmethod` `staticmethod` ¶

complement_size(n: int) -> int

Returns the size of complements of sets of size n, with respect to the indices returned by the iteration.

If the iteration returns single indices, then this is n-1, if it returns no indices, then it is n. If it returned tuples, then n-2, etc. Args: n: The size of the index set.

Source code in src/pydvl/valuation/samplers/powerset.py

@staticmethod
@abstractmethod
def complement_size(n: int) -> int:
    """Returns the size of complements of sets of size n, with respect to the
    indices returned by the iteration.

    If the iteration returns single indices, then this is n-1, if it returns no
    indices, then it is n. If it returned tuples, then n-2, etc.
    Args:
        n: The size of the index set.
    """
    ...

length `abstractmethod` `staticmethod` ¶

length(n_indices: int) -> int | None

Returns the length of the iteration over the index set

PARAMETER	DESCRIPTION
`n_indices`	The size of the index set. TYPE: `int`

RETURNS	DESCRIPTION
`int \| None`	The length of the iteration. It can be: - a non-negative integer, if the iteration is finite - `None` if the iteration never ends.

Source code in src/pydvl/valuation/samplers/powerset.py

@staticmethod
@abstractmethod
def length(n_indices: int) -> int | None:
    """Returns the length of the iteration over the index set

    Args:
        n_indices: The size of the index set.

    Returns:
        The length of the iteration. It can be:
            - a non-negative integer, if the iteration is finite
            - `None` if the iteration never ends.
    """
    ...

InfiniteIterationMixin ¶

Careful with MRO when using this and subclassing!

LOOEvaluationStrategy ¶

LOOEvaluationStrategy(
    utility: UtilityBase, coefficient: SemivalueCoefficient | None
)

Bases: PowersetEvaluationStrategy[LOOSampler]

Computes marginal values for LOO.

Upon construction, the total utility is computed once. Then, the utility for every sample processed in [process()] is subtracted from it and returned as value update.

PARAMETER	DESCRIPTION
`utility`	The utility function to use. TYPE: `UtilityBase`
`coefficient`	The coefficient to use. If `None`, the correction of importance sampling is disabled. TYPE: `SemivalueCoefficient \| None`

Source code in src/pydvl/valuation/samplers/powerset.py

def __init__(
    self,
    utility: UtilityBase,
    coefficient: SemivalueCoefficient | None,
):
    super().__init__(utility, coefficient)
    assert utility.training_data is not None
    self.total_utility = utility(Sample(None, utility.training_data.indices))

LOOSampler ¶

LOOSampler(
    batch_size: int = 1,
    index_iteration: Type[IndexIteration] = FiniteSequentialIndexIteration,
    seed: Seed | None = None,
)

Bases: PowersetSampler

Leave-One-Out sampler.

In this special case of a powerset sampler, for every index \(i\) in the set \(S\), the sample \((i, S_{-i})\) is returned.

PARAMETER	DESCRIPTION
`batch_size`	The number of samples to generate per batch. Batches are processed together by each subprocess when working in parallel. TYPE: `int` DEFAULT: `1`
`index_iteration`	the strategy to use for iterating over indices to update. By default, a finite sequential index iteration is used, which is what LOOValuation expects. TYPE: `Type[IndexIteration]` DEFAULT: `FiniteSequentialIndexIteration`
`seed`	The seed for the random number generator used in case the index iteration is random. TYPE: `Seed \| None` DEFAULT: `None`

New in version 0.10.0

Source code in src/pydvl/valuation/samplers/powerset.py

def __init__(
    self,
    batch_size: int = 1,
    index_iteration: Type[IndexIteration] = FiniteSequentialIndexIteration,
    seed: Seed | None = None,
):
    super().__init__(batch_size, index_iteration)
    if not self._index_iterator_cls.is_proper():
        raise ValueError("LOO samplers require a proper index iteration strategy")
    self._rng = np.random.default_rng(seed)

interrupted `property` ¶

interrupted: bool

Whether the sampler has been interrupted.

skip_indices `property` `writable` ¶

skip_indices

Set of indices to skip in the outer loop.

len ¶

__len__() -> int

Returns the length of the current sample generation in generate_batches.

RAISES	DESCRIPTION
`TypeError`	if the sampler is infinite or generate_batches has not been called yet.

Source code in src/pydvl/valuation/samplers/base.py

def __len__(self) -> int:
    """Returns the length of the current sample generation in generate_batches.

    Raises:
        `TypeError`: if the sampler is infinite or
            [generate_batches][pydvl.valuation.samplers.IndexSampler.generate_batches]
            has not been called yet.
    """
    if self._len is None:
        raise TypeError(f"This {self.__class__.__name__} has no length")
    return self._len

repr ¶

__repr__() -> str

FIXME: This is not a proper representation of the sampler.

Source code in src/pydvl/valuation/samplers/base.py

def __repr__(self) -> str:
    """FIXME: This is not a proper representation of the sampler."""
    return f"{self.__class__.__name__}"

generate_batches ¶

generate_batches(indices: IndexSetT) -> BatchGenerator

Batches the samples and yields them.

Source code in src/pydvl/valuation/samplers/base.py

def generate_batches(self, indices: IndexSetT) -> BatchGenerator:
    """Batches the samples and yields them."""
    self._len = self.sample_limit(indices)

    # Create an empty generator if the indices are empty: `return` acts like a
    # `break`, and produces an empty generator.
    if len(indices) == 0:
        return

    self._interrupted = False
    self._n_samples = 0
    for batch in chunked(self.generate(indices), self.batch_size):
        self._n_samples += len(batch)
        yield batch
        if self._interrupted:
            break

index_iterable ¶

index_iterable(indices: IndexSetT) -> Generator[IndexT | None, None, None]

Iterates over indices with the method specified at construction.

Source code in src/pydvl/valuation/samplers/powerset.py

def index_iterable(
    self, indices: IndexSetT
) -> Generator[IndexT | None, None, None]:
    """Iterates over indices with the method specified at construction."""
    try:
        iterable = self._index_iterator_cls(indices, seed=self._rng)  # type: ignore
    except (AttributeError, TypeError):
        iterable = self._index_iterator_cls(indices)
    for idx in iterable:
        if idx not in self.skip_indices:
            yield idx

interrupt ¶

interrupt()

Signals the sampler to stop generating samples after the current batch.

Source code in src/pydvl/valuation/samplers/base.py

def interrupt(self):
    """Signals the sampler to stop generating samples after the current batch."""
    self._interrupted = True

log_weight ¶

log_weight(n: int, subset_len: int) -> float

This sampler returns only sets of size n-1. There are n such sets, so the probability of drawing one is 1/n, or 0 if subset_len != n-1.

Source code in src/pydvl/valuation/samplers/powerset.py

def log_weight(self, n: int, subset_len: int) -> float:
    """This sampler returns only sets of size n-1. There are n such sets, so the
    probability of drawing one is 1/n, or 0 if subset_len != n-1."""
    return float(-np.log(n) if subset_len == n - 1 else -np.inf)

result_updater ¶

result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]

Returns an object that updates a valuation result with a value update.

Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.

PARAMETER	DESCRIPTION
`result`	The result to update TYPE: `ValuationResult`

Returns: A callable object that updates the result with a value update

Source code in src/pydvl/valuation/samplers/base.py

def result_updater(self, result: ValuationResult) -> ResultUpdater[ValueUpdateT]:
    """Returns an object that updates a valuation result with a value update.

    Because we use log-space computation for numerical stability, the default result
    updater keeps track of several quantities required to maintain accurate running
    1st and 2nd moments.

    Args:
        result: The result to update
    Returns:
        A callable object that updates the result with a value update
    """
    return LogResultUpdater(result)

NoIndexIteration ¶

NoIndexIteration(indices: IndexSetT)

Bases: InfiniteIterationMixin, IndexIteration

An infinite iteration over no indices.

Source code in src/pydvl/valuation/samplers/powerset.py

def __init__(self, indices: IndexSetT):
    self._indices = indices

PowersetEvaluationStrategy ¶

PowersetEvaluationStrategy(
    utility: UtilityBase, log_coefficient: SemivalueCoefficient | None
)

Bases: Generic[PowersetSamplerT], EvaluationStrategy[PowersetSamplerT, ValueUpdate]

The standard strategy for evaluating the utility of subsets of a set.

This strategy computes the marginal value of each subset of the complement of an index in the training set. The marginal value is the difference between the utility of the subset and the utility of the subset with the index added back in.

It is the standard strategy for the direct implementation of semi-values, when sampling is done over the powerset of the complement of an index.

Source code in src/pydvl/valuation/samplers/base.py

def __init__(
    self,
    utility: UtilityBase,
    log_coefficient: SemivalueCoefficient | None,
):
    self.utility = utility
    # Used by the decorator suppress_warnings:
    self.show_warnings = getattr(utility, "show_warnings", False)
    self.n_indices = (
        len(utility.training_data) if utility.training_data is not None else 0
    )
    if log_coefficient is not None:
        self.valuation_coefficient = log_coefficient
    else:
        # Allow method implementations to disable importance sampling by setting
        # the log_coefficient to None.
        # Note that being in logspace, we are adding and subtracting 0s, so that
        # this is not a problem.
        self.valuation_coefficient = lambda n, subset_len: 0.0

PowersetSampler ¶

PowersetSampler(
    batch_size: int = 1,
    index_iteration: Type[IndexIteration] = SequentialIndexIteration,
)

Bases: IndexSampler, ABC

An abstract class for samplers which iterate over the powerset of the complement of an index in the training set.

This is done in two nested loops, where the outer loop iterates over the set of indices, and the inner loop iterates over subsets of the complement of the current index. The outer iteration can be either sequential or at random. Args: batch_size: The number of samples to generate per batch. Batches are processed together by process() in the evaluation strategy PowersetEvaluationStrategy. index_iteration: the strategy to use for iterating over indices to update.

PARAMETER	DESCRIPTION
`batch_size`	The number of samples to generate per batch. Batches are processed together by EvaluationStrategy. TYPE: `int` DEFAULT: `1`
`index_iteration`	the strategy to use for iterating over indices to update TYPE: `Type[IndexIteration]` DEFAULT: `SequentialIndexIteration`

Source code in src/pydvl/valuation/samplers/powerset.py

def __init__(
    self,
    batch_size: int = 1,
    index_iteration: Type[IndexIteration] = SequentialIndexIteration,
):
    """
    Args:
        batch_size: The number of samples to generate per batch. Batches are
            processed together by
            [EvaluationStrategy][pydvl.valuation.samplers.base.EvaluationStrategy].
        index_iteration: the strategy to use for iterating over indices to update
    """
    super().__init__(batch_size)
    self._index_iterator_cls = index_iteration

interrupted `property` ¶

interrupted: bool

Whether the sampler has been interrupted.

skip_indices `property` `writable` ¶

skip_indices

Set of indices to skip in the outer loop.

len ¶

__len__() -> int

Returns the length of the current sample generation in generate_batches.

RAISES	DESCRIPTION
`TypeError`	if the sampler is infinite or generate_batches has not been called yet.

Source code in src/pydvl/valuation/samplers/base.py

def __len__(self) -> int:
    """Returns the length of the current sample generation in generate_batches.

    Raises:
        `TypeError`: if the sampler is infinite or
            [generate_batches][pydvl.valuation.samplers.IndexSampler.generate_batches]
            has not been called yet.
    """
    if self._len is None:
        raise TypeError(f"This {self.__class__.__name__} has no length")
    return self._len

repr ¶

__repr__() -> str

FIXME: This is not a proper representation of the sampler.

Source code in src/pydvl/valuation/samplers/base.py

def __repr__(self) -> str:
    """FIXME: This is not a proper representation of the sampler."""
    return f"{self.__class__.__name__}"

generate `abstractmethod` ¶

generate(indices: IndexSetT) -> SampleGenerator

Generates samples over the powerset of indices

Each PowersetSampler defines its own way to generate the subsets by implementing this method. The outer loop is handled by the index_iterable() method. Batching is handled by the generate_batches() method.

PARAMETER	DESCRIPTION
`indices`	The set from which to generate samples. TYPE: `IndexSetT`

Returns: A generator that yields samples over the powerset of indices.

Source code in src/pydvl/valuation/samplers/powerset.py

@abstractmethod
def generate(self, indices: IndexSetT) -> SampleGenerator:
    """Generates samples over the powerset of `indices`

    Each `PowersetSampler` defines its own way to generate the subsets by
    implementing this method. The outer loop is handled by the
    [index_iterable()][pydvl.valuation.samplers.powerset.PowersetSampler.index_iterable]
    method. Batching is handled by the
    [generate_batches()][pydvl.valuation.samplers.base.IndexSampler.generate_batches]
    method.

    Args:
        indices: The set from which to generate samples.
    Returns:
        A generator that yields samples over the powerset of `indices`.
    """
    ...

generate_batches ¶

generate_batches(indices: IndexSetT) -> BatchGenerator

Batches the samples and yields them.

Source code in src/pydvl/valuation/samplers/base.py

def generate_batches(self, indices: IndexSetT) -> BatchGenerator:
    """Batches the samples and yields them."""
    self._len = self.sample_limit(indices)

    # Create an empty generator if the indices are empty: `return` acts like a
    # `break`, and produces an empty generator.
    if len(indices) == 0:
        return

    self._interrupted = False
    self._n_samples = 0
    for batch in chunked(self.generate(indices), self.batch_size):
        self._n_samples += len(batch)
        yield batch
        if self._interrupted:
            break

index_iterable ¶

index_iterable(indices: IndexSetT) -> Generator[IndexT | None, None, None]

Iterates over indices with the method specified at construction.

Source code in src/pydvl/valuation/samplers/powerset.py

def index_iterable(
    self, indices: IndexSetT
) -> Generator[IndexT | None, None, None]:
    """Iterates over indices with the method specified at construction."""
    try:
        iterable = self._index_iterator_cls(indices, seed=self._rng)  # type: ignore
    except (AttributeError, TypeError):
        iterable = self._index_iterator_cls(indices)
    for idx in iterable:
        if idx not in self.skip_indices:
            yield idx

interrupt ¶

interrupt()

Signals the sampler to stop generating samples after the current batch.

Source code in src/pydvl/valuation/samplers/base.py

def interrupt(self):
    """Signals the sampler to stop generating samples after the current batch."""
    self._interrupted = True

log_weight ¶

log_weight(n: int, subset_len: int) -> float

Probability of sampling a set S as a function of total number of indices and set size.

The uniform distribution over the powerset of a set with \(n\) elements has mass \(1/2^{n}\) over each subset.

PARAMETER	DESCRIPTION
`n`	The size of the index set. Note that the actual size of the set being sampled will often be n-1, as one index might be removed from the set. See IndexIteration for more. TYPE: `int`
`subset_len`	The size of the subset being sampled TYPE: `int`

RETURNS	DESCRIPTION
`float`	The natural logarithm of the probability of sampling a set of the given size, when the index set has size `n`, under the IndexIteration given upon construction.

Source code in src/pydvl/valuation/samplers/powerset.py

def log_weight(self, n: int, subset_len: int) -> float:
    """Probability of sampling a set S as a function of total number of indices and
     set size.

    The uniform distribution over the powerset of a set with $n$ elements has mass
    $1/2^{n}$ over each subset.

    Args:
        n: The size of the index set. Note that the actual size of the set being
            sampled will often be n-1, as one index might be removed from the set.
            See [IndexIteration][pydvl.valuation.samplers.powerset.IndexIteration]
            for more.
        subset_len: The size of the subset being sampled

    Returns:
        The natural logarithm of the probability of sampling a set of the given
            size, when the index set has size `n`, under the
            [IndexIteration][pydvl.valuation.samplers.powerset.IndexIteration] given
            upon construction.

    """
    m = self.complement_size(n)
    return float(-m * np.log(2))

result_updater ¶

result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]

Returns an object that updates a valuation result with a value update.

Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.

PARAMETER	DESCRIPTION
`result`	The result to update TYPE: `ValuationResult`

Returns: A callable object that updates the result with a value update

Source code in src/pydvl/valuation/samplers/base.py

def result_updater(self, result: ValuationResult) -> ResultUpdater[ValueUpdateT]:
    """Returns an object that updates a valuation result with a value update.

    Because we use log-space computation for numerical stability, the default result
    updater keeps track of several quantities required to maintain accurate running
    1st and 2nd moments.

    Args:
        result: The result to update
    Returns:
        A callable object that updates the result with a value update
    """
    return LogResultUpdater(result)

sample_limit `abstractmethod` ¶

sample_limit(indices: IndexSetT) -> int | None

Returns the number of samples that can be generated from the index set.

This will depend, among other things, on the type of IndexIteration.

PARAMETER	DESCRIPTION
`indices`	The set of indices to sample from. TYPE: `IndexSetT`

Returns: The number of samples that can be generated from the index set.

Source code in src/pydvl/valuation/samplers/powerset.py

@abstractmethod
def sample_limit(self, indices: IndexSetT) -> int | None:
    """Returns the number of samples that can be generated from the index set.

    This will depend, among other things, on the type of
    [IndexIteration][pydvl.valuation.samplers.powerset.IndexIteration].

    Args:
        indices: The set of indices to sample from.
    Returns:
        The number of samples that can be generated from the index set.
    """
    ...

RandomIndexIteration ¶

RandomIndexIteration(indices: NDArray[IndexT], seed: Seed)

Bases: InfiniteIterationMixin, StochasticSamplerMixin, IndexIteration

Samples indices at random, indefinitely

Source code in src/pydvl/valuation/samplers/powerset.py

def __init__(self, indices: NDArray[IndexT], seed: Seed):
    super().__init__(indices, seed=seed)

SequentialIndexIteration ¶

SequentialIndexIteration(indices: IndexSetT)

Bases: InfiniteIterationMixin, IndexIteration

Samples indices sequentially, indefinitely.

Source code in src/pydvl/valuation/samplers/powerset.py

def __init__(self, indices: IndexSetT):
    self._indices = indices

UniformSampler ¶

UniformSampler(
    batch_size: int = 1,
    index_iteration: Type[IndexIteration] = SequentialIndexIteration,
    seed: Seed | None = None,
)

Bases: StochasticSamplerMixin, PowersetSampler

Draws random samples uniformly from the powerset of the index set.

Iterating over every index \(i\), either in sequence or at random depending on the value of index_iteration, one subset of the complement indices - {i} is sampled with equal probability \(2^{n-1}\).

PARAMETER	DESCRIPTION
`batch_size`	The number of samples to generate per batch. Batches are processed together by each subprocess when working in parallel. TYPE: `int` DEFAULT: `1`
`index_iteration`	the strategy to use for iterating over indices to update. TYPE: `Type[IndexIteration]` DEFAULT: `SequentialIndexIteration`
`seed`	The seed for the random number generator. TYPE: `Seed \| None` DEFAULT: `None`

Example

The code

for idx, s in UniformSampler(np.arange(3)):
   print(f"{idx} - {s}", end=", ")

Produces the output:

0 - [1 4], 1 - [2 3], 2 - [0 1 3], 3 - [], 4 - [2], 0 - [1 3 4], 1 - [0 2]
(...)

Source code in src/pydvl/valuation/samplers/powerset.py

def __init__(
    self,
    batch_size: int = 1,
    index_iteration: Type[IndexIteration] = SequentialIndexIteration,
    seed: Seed | None = None,
):
    super().__init__(
        batch_size=batch_size, index_iteration=index_iteration, seed=seed
    )

interrupted `property` ¶

interrupted: bool

Whether the sampler has been interrupted.

skip_indices `property` `writable` ¶

skip_indices

Set of indices to skip in the outer loop.

len ¶

__len__() -> int

Returns the length of the current sample generation in generate_batches.

RAISES	DESCRIPTION
`TypeError`	if the sampler is infinite or generate_batches has not been called yet.

Source code in src/pydvl/valuation/samplers/base.py

def __len__(self) -> int:
    """Returns the length of the current sample generation in generate_batches.

    Raises:
        `TypeError`: if the sampler is infinite or
            [generate_batches][pydvl.valuation.samplers.IndexSampler.generate_batches]
            has not been called yet.
    """
    if self._len is None:
        raise TypeError(f"This {self.__class__.__name__} has no length")
    return self._len

repr ¶

__repr__() -> str

FIXME: This is not a proper representation of the sampler.

Source code in src/pydvl/valuation/samplers/base.py

def __repr__(self) -> str:
    """FIXME: This is not a proper representation of the sampler."""
    return f"{self.__class__.__name__}"

generate_batches ¶

generate_batches(indices: IndexSetT) -> BatchGenerator

Batches the samples and yields them.

Source code in src/pydvl/valuation/samplers/base.py

def generate_batches(self, indices: IndexSetT) -> BatchGenerator:
    """Batches the samples and yields them."""
    self._len = self.sample_limit(indices)

    # Create an empty generator if the indices are empty: `return` acts like a
    # `break`, and produces an empty generator.
    if len(indices) == 0:
        return

    self._interrupted = False
    self._n_samples = 0
    for batch in chunked(self.generate(indices), self.batch_size):
        self._n_samples += len(batch)
        yield batch
        if self._interrupted:
            break

index_iterable ¶

index_iterable(indices: IndexSetT) -> Generator[IndexT | None, None, None]

Iterates over indices with the method specified at construction.

Source code in src/pydvl/valuation/samplers/powerset.py

def index_iterable(
    self, indices: IndexSetT
) -> Generator[IndexT | None, None, None]:
    """Iterates over indices with the method specified at construction."""
    try:
        iterable = self._index_iterator_cls(indices, seed=self._rng)  # type: ignore
    except (AttributeError, TypeError):
        iterable = self._index_iterator_cls(indices)
    for idx in iterable:
        if idx not in self.skip_indices:
            yield idx

interrupt ¶

interrupt()

Signals the sampler to stop generating samples after the current batch.

Source code in src/pydvl/valuation/samplers/base.py

def interrupt(self):
    """Signals the sampler to stop generating samples after the current batch."""
    self._interrupted = True

log_weight ¶

log_weight(n: int, subset_len: int) -> float

Probability of sampling a set S as a function of total number of indices and set size.

The uniform distribution over the powerset of a set with \(n\) elements has mass \(1/2^{n}\) over each subset.

PARAMETER	DESCRIPTION
`n`	The size of the index set. Note that the actual size of the set being sampled will often be n-1, as one index might be removed from the set. See IndexIteration for more. TYPE: `int`
`subset_len`	The size of the subset being sampled TYPE: `int`

RETURNS	DESCRIPTION
`float`	The natural logarithm of the probability of sampling a set of the given size, when the index set has size `n`, under the IndexIteration given upon construction.

Source code in src/pydvl/valuation/samplers/powerset.py

def log_weight(self, n: int, subset_len: int) -> float:
    """Probability of sampling a set S as a function of total number of indices and
     set size.

    The uniform distribution over the powerset of a set with $n$ elements has mass
    $1/2^{n}$ over each subset.

    Args:
        n: The size of the index set. Note that the actual size of the set being
            sampled will often be n-1, as one index might be removed from the set.
            See [IndexIteration][pydvl.valuation.samplers.powerset.IndexIteration]
            for more.
        subset_len: The size of the subset being sampled

    Returns:
        The natural logarithm of the probability of sampling a set of the given
            size, when the index set has size `n`, under the
            [IndexIteration][pydvl.valuation.samplers.powerset.IndexIteration] given
            upon construction.

    """
    m = self.complement_size(n)
    return float(-m * np.log(2))

result_updater ¶

result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]

Returns an object that updates a valuation result with a value update.

Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.

PARAMETER	DESCRIPTION
`result`	The result to update TYPE: `ValuationResult`

Returns: A callable object that updates the result with a value update

Source code in src/pydvl/valuation/samplers/base.py

def result_updater(self, result: ValuationResult) -> ResultUpdater[ValueUpdateT]:
    """Returns an object that updates a valuation result with a value update.

    Because we use log-space computation for numerical stability, the default result
    updater keeps track of several quantities required to maintain accurate running
    1st and 2nd moments.

    Args:
        result: The result to update
    Returns:
        A callable object that updates the result with a value update
    """
    return LogResultUpdater(result)

pydvl.valuation.samplers.powerset ¶

Index iteration¶

References¶

AntitheticSampler ¶

interrupted property ¶

skip_indices property writable ¶

__len__ ¶

__repr__ ¶

generate_batches ¶

index_iterable ¶

interrupt ¶

log_weight ¶

result_updater ¶

DeterministicUniformSampler ¶

interrupted property ¶

skip_indices property writable ¶

__len__ ¶

__repr__ ¶

generate_batches ¶

index_iterable ¶

interrupt ¶

log_weight ¶

result_updater ¶

FiniteIterationMixin ¶

FiniteNoIndexIteration ¶

length staticmethod ¶

FiniteRandomIndexIteration ¶

FiniteSequentialIndexIteration ¶

IndexIteration ¶

complement_size abstractmethod staticmethod ¶

length abstractmethod staticmethod ¶

InfiniteIterationMixin ¶

LOOEvaluationStrategy ¶

LOOSampler ¶

interrupted property ¶

skip_indices property writable ¶

__len__ ¶

__repr__ ¶

generate_batches ¶

index_iterable ¶

interrupt ¶

log_weight ¶

result_updater ¶

NoIndexIteration ¶

PowersetEvaluationStrategy ¶

PowersetSampler ¶

interrupted property ¶

skip_indices property writable ¶

__len__ ¶

__repr__ ¶

generate abstractmethod ¶

generate_batches ¶

index_iterable ¶

interrupt ¶

log_weight ¶

result_updater ¶

sample_limit abstractmethod ¶

RandomIndexIteration ¶

SequentialIndexIteration ¶

UniformSampler ¶

interrupted property ¶

skip_indices property writable ¶

__len__ ¶

__repr__ ¶

generate_batches ¶

index_iterable ¶

interrupt ¶

log_weight ¶

result_updater ¶

interrupted `property` ¶

skip_indices `property` `writable` ¶

len ¶

repr ¶

interrupted `property` ¶

skip_indices `property` `writable` ¶

len ¶

repr ¶

length `staticmethod` ¶

complement_size `abstractmethod` `staticmethod` ¶

length `abstractmethod` `staticmethod` ¶

interrupted `property` ¶

skip_indices `property` `writable` ¶

len ¶

repr ¶

interrupted `property` ¶

skip_indices `property` `writable` ¶

len ¶

repr ¶

generate `abstractmethod` ¶

sample_limit `abstractmethod` ¶

interrupted `property` ¶

skip_indices `property` `writable` ¶

len ¶

repr ¶