Skip to content

pydvl.valuation.samplers.powerset

Powerset samplers.

TODO: explain the formulation and the different samplers.

Stochastic samplers

...

References


  1. Mitchell, Rory, Joshua Cooper, Eibe Frank, and Geoffrey Holmes. Sampling Permutations for Shapley Value Estimation. Journal of Machine Learning Research 23, no. 43 (2022): 1–46. 

  2. Watson, Lauren, Zeno Kujawa, Rayna Andreeva, Hao-Tsung Yang, Tariq Elahi, and Rik Sarkar. Accelerated Shapley Value Approximation for Data Evaluation. arXiv, 9 November 2023. 

  3. Wu, Mengmeng, Ruoxi Jia, Changle Lin, Wei Huang, and Xiangyu Chang. Variance Reduced Shapley Value Estimation for Trustworthy Data Valuation. Computers & Operations Research 159 (1 November 2023): 106305. 

  4. Maleki, Sasan, Long Tran-Thanh, Greg Hines, Talal Rahwan, and Alex Rogers. Bounding the Estimation Error of Sampling-Based Shapley Value Approximation. arXiv:1306.4265 [Cs], 12 February 2014. 

PowersetSampler

PowersetSampler(
    batch_size: int = 1,
    index_iteration: Type[IndexIteration] = SequentialIndexIteration,
)

Bases: IndexSampler, ABC

An abstract class for samplers which iterate over the powerset of the complement of an index in the training set.

This is done in two nested loops, where the outer loop iterates over the set of indices, and the inner loop iterates over subsets of the complement of the current index. The outer iteration can be either sequential or at random.

    processed together by
    [UtilityEvaluator][pydvl.valuation.utility.evaluator.UtilityEvaluator].
index_iteration: the order in which indices are iterated over
Source code in src/pydvl/valuation/samplers/powerset.py
def __init__(
    self,
    batch_size: int = 1,
    index_iteration: Type[IndexIteration] = SequentialIndexIteration,
):
    """
    Args:
        batch_size: The number of samples to generate per batch. Batches are
            processed together by
            [UtilityEvaluator][pydvl.valuation.utility.evaluator.UtilityEvaluator].
        index_iteration: the order in which indices are iterated over
    """
    super().__init__(batch_size)
    self._index_iteration = index_iteration

generate_batches

generate_batches(indices: IndexSetT) -> BatchGenerator

Batches the samples and yields them.

Source code in src/pydvl/valuation/samplers/base.py
def generate_batches(self, indices: IndexSetT) -> BatchGenerator:
    """Batches the samples and yields them."""

    # create an empty generator if the indices are empty. `generate_batches` is
    # a generator function because it has a yield statement later in its body.
    # Inside generator function, `return` acts like a `break`, which produces an
    # empty generator function. See: https://stackoverflow.com/a/13243870
    if len(indices) == 0:
        return

    self._interrupted = False
    self._n_samples = 0
    for batch in chunked(self._generate(indices), self.batch_size):
        yield batch
        self._n_samples += len(batch)
        if self._interrupted:
            break

sample_limit

sample_limit(indices: IndexSetT) -> int | None

Number of samples that can be generated from the indices.

Returns None if the number of samples is infinite, which is the case for most stochastic samplers.

Source code in src/pydvl/valuation/samplers/base.py
def sample_limit(self, indices: IndexSetT) -> int | None:
    """Number of samples that can be generated from the indices.

    Returns None if the number of samples is infinite, which is the case for most
    stochastic samplers.
    """
    if len(indices) == 0:
        out = 0
    else:
        out = None
    return out

index_iterator

index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]

Iterates over indices with the method specified at construction.

Source code in src/pydvl/valuation/samplers/powerset.py
def index_iterator(
    self, indices: IndexSetT
) -> Generator[IndexT | None, None, None]:
    """Iterates over indices with the method specified at construction."""
    if issubclass(self._index_iteration, StochasticSamplerMixin):
        # To-Do: Need to do something more elegant here
        seed = self._rng.integers(0, 2**32, dtype=np.uint32).item()  # type: ignore
        yield from self._index_iteration(indices, seed)  # type: ignore
    else:
        yield from self._index_iteration(indices)

weight staticmethod

weight(n: int, subset_len: int) -> float

Correction coming from Monte Carlo integration so that the mean of the marginals converges to the value: the uniform distribution over the powerset of a set with n-1 elements has mass 2^{n-1} over each subset.

Source code in src/pydvl/valuation/samplers/powerset.py
@staticmethod
def weight(n: int, subset_len: int) -> float:
    """Correction coming from Monte Carlo integration so that the mean of
    the marginals converges to the value: the uniform distribution over the
    powerset of a set with n-1 elements has mass 2^{n-1} over each subset."""
    return float(2 ** (n - 1)) if n > 0 else 1.0

LOOSampler

LOOSampler(batch_size: int = 1)

Bases: IndexSampler

Leave-One-Out sampler. For every index \(i\) in the set \(S\), the sample \((i, S_{-i})\) is returned.

New in version 0.10.0

    processed by the
    [EvaluationStrategy][pydvl.valuation.samplers.base.EvaluationStrategy]
Source code in src/pydvl/valuation/samplers/base.py
def __init__(self, batch_size: int = 1):
    """
    Args:
        batch_size: The number of samples to generate per batch. Batches are
            processed by the
            [EvaluationStrategy][pydvl.valuation.samplers.base.EvaluationStrategy]
    """
    self._batch_size = batch_size
    self._n_samples = 0
    self._interrupted = False

generate_batches

generate_batches(indices: IndexSetT) -> BatchGenerator

Batches the samples and yields them.

Source code in src/pydvl/valuation/samplers/base.py
def generate_batches(self, indices: IndexSetT) -> BatchGenerator:
    """Batches the samples and yields them."""

    # create an empty generator if the indices are empty. `generate_batches` is
    # a generator function because it has a yield statement later in its body.
    # Inside generator function, `return` acts like a `break`, which produces an
    # empty generator function. See: https://stackoverflow.com/a/13243870
    if len(indices) == 0:
        return

    self._interrupted = False
    self._n_samples = 0
    for batch in chunked(self._generate(indices), self.batch_size):
        yield batch
        self._n_samples += len(batch)
        if self._interrupted:
            break

LOOEvaluationStrategy

LOOEvaluationStrategy(
    sampler: LOOSampler,
    utility: UtilityBase,
    coefficient: Callable[[int, int], float] | None = None,
)

Bases: EvaluationStrategy[LOOSampler, ValueUpdate]

Computes marginal values for LOO.

Source code in src/pydvl/valuation/samplers/powerset.py
def __init__(
    self,
    sampler: LOOSampler,
    utility: UtilityBase,
    coefficient: Callable[[int, int], float] | None = None,
):
    super().__init__(sampler, utility, coefficient)
    assert utility.training_data is not None
    self.total_utility = utility(Sample(None, utility.training_data.indices))

DeterministicUniformSampler

DeterministicUniformSampler(
    index_iteration: Type[
        SequentialIndexIteration | NoIndexIteration
    ] = SequentialIndexIteration,
    batch_size: int = 1,
)

Bases: PowersetSampler

An iterator to perform uniform deterministic sampling of subsets.

For every index \(i\), each subset of the complement indices - {i} is returned.

Note

Outer indices are iterated over sequentially

Example
>>> sampler = DeterministicUniformSampler()
>>> for idx, s in sampler.generate_batches(np.arange(2)):
>>>    print(f"{idx} - {s}", end=", ")
1 - [], 1 - [2], 2 - [], 2 - [1],
Source code in src/pydvl/valuation/samplers/powerset.py
def __init__(
    self,
    index_iteration: Type[
        SequentialIndexIteration | NoIndexIteration
    ] = SequentialIndexIteration,
    batch_size: int = 1,
):
    super().__init__(index_iteration=index_iteration, batch_size=batch_size)

generate_batches

generate_batches(indices: IndexSetT) -> BatchGenerator

Batches the samples and yields them.

Source code in src/pydvl/valuation/samplers/base.py
def generate_batches(self, indices: IndexSetT) -> BatchGenerator:
    """Batches the samples and yields them."""

    # create an empty generator if the indices are empty. `generate_batches` is
    # a generator function because it has a yield statement later in its body.
    # Inside generator function, `return` acts like a `break`, which produces an
    # empty generator function. See: https://stackoverflow.com/a/13243870
    if len(indices) == 0:
        return

    self._interrupted = False
    self._n_samples = 0
    for batch in chunked(self._generate(indices), self.batch_size):
        yield batch
        self._n_samples += len(batch)
        if self._interrupted:
            break

weight staticmethod

weight(n: int, subset_len: int) -> float

Correction coming from Monte Carlo integration so that the mean of the marginals converges to the value: the uniform distribution over the powerset of a set with n-1 elements has mass 2^{n-1} over each subset.

Source code in src/pydvl/valuation/samplers/powerset.py
@staticmethod
def weight(n: int, subset_len: int) -> float:
    """Correction coming from Monte Carlo integration so that the mean of
    the marginals converges to the value: the uniform distribution over the
    powerset of a set with n-1 elements has mass 2^{n-1} over each subset."""
    return float(2 ** (n - 1)) if n > 0 else 1.0

index_iterator

index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]

Iterates over indices with the method specified at construction.

Source code in src/pydvl/valuation/samplers/powerset.py
def index_iterator(
    self, indices: IndexSetT
) -> Generator[IndexT | None, None, None]:
    """Iterates over indices with the method specified at construction."""
    if issubclass(self._index_iteration, StochasticSamplerMixin):
        # To-Do: Need to do something more elegant here
        seed = self._rng.integers(0, 2**32, dtype=np.uint32).item()  # type: ignore
        yield from self._index_iteration(indices, seed)  # type: ignore
    else:
        yield from self._index_iteration(indices)

UniformSampler

UniformSampler(
    batch_size: int = 1,
    index_iteration: Type[IndexIteration] = SequentialIndexIteration,
    seed: Seed | None = None,
)

Bases: StochasticSamplerMixin, PowersetSampler

An iterator to perform uniform random sampling of subsets.

Iterating over every index \(i\), either in sequence or at random depending on the value of index_iteration, one subset of the complement indices - {i} is sampled with equal probability \(2^{n-1}\). The iterator never ends.

Example

The code

for idx, s in UniformSampler(np.arange(3)):
   print(f"{idx} - {s}", end=", ")
Produces the output:
0 - [1 4], 1 - [2 3], 2 - [0 1 3], 3 - [], 4 - [2], 0 - [1 3 4], 1 - [0 2]
(...)

Source code in src/pydvl/valuation/samplers/powerset.py
def __init__(
    self,
    batch_size: int = 1,
    index_iteration: Type[IndexIteration] = SequentialIndexIteration,
    seed: Seed | None = None,
):
    super().__init__(
        batch_size=batch_size, index_iteration=index_iteration, seed=seed
    )

generate_batches

generate_batches(indices: IndexSetT) -> BatchGenerator

Batches the samples and yields them.

Source code in src/pydvl/valuation/samplers/base.py
def generate_batches(self, indices: IndexSetT) -> BatchGenerator:
    """Batches the samples and yields them."""

    # create an empty generator if the indices are empty. `generate_batches` is
    # a generator function because it has a yield statement later in its body.
    # Inside generator function, `return` acts like a `break`, which produces an
    # empty generator function. See: https://stackoverflow.com/a/13243870
    if len(indices) == 0:
        return

    self._interrupted = False
    self._n_samples = 0
    for batch in chunked(self._generate(indices), self.batch_size):
        yield batch
        self._n_samples += len(batch)
        if self._interrupted:
            break

sample_limit

sample_limit(indices: IndexSetT) -> int | None

Number of samples that can be generated from the indices.

Returns None if the number of samples is infinite, which is the case for most stochastic samplers.

Source code in src/pydvl/valuation/samplers/base.py
def sample_limit(self, indices: IndexSetT) -> int | None:
    """Number of samples that can be generated from the indices.

    Returns None if the number of samples is infinite, which is the case for most
    stochastic samplers.
    """
    if len(indices) == 0:
        out = 0
    else:
        out = None
    return out

weight staticmethod

weight(n: int, subset_len: int) -> float

Correction coming from Monte Carlo integration so that the mean of the marginals converges to the value: the uniform distribution over the powerset of a set with n-1 elements has mass 2^{n-1} over each subset.

Source code in src/pydvl/valuation/samplers/powerset.py
@staticmethod
def weight(n: int, subset_len: int) -> float:
    """Correction coming from Monte Carlo integration so that the mean of
    the marginals converges to the value: the uniform distribution over the
    powerset of a set with n-1 elements has mass 2^{n-1} over each subset."""
    return float(2 ** (n - 1)) if n > 0 else 1.0

index_iterator

index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]

Iterates over indices with the method specified at construction.

Source code in src/pydvl/valuation/samplers/powerset.py
def index_iterator(
    self, indices: IndexSetT
) -> Generator[IndexT | None, None, None]:
    """Iterates over indices with the method specified at construction."""
    if issubclass(self._index_iteration, StochasticSamplerMixin):
        # To-Do: Need to do something more elegant here
        seed = self._rng.integers(0, 2**32, dtype=np.uint32).item()  # type: ignore
        yield from self._index_iteration(indices, seed)  # type: ignore
    else:
        yield from self._index_iteration(indices)

OwenSampler

OwenSampler(
    n_samples_outer: int,
    n_samples_inner: int = 2,
    batch_size: int = 1,
    seed: Seed | None = None,
)

Bases: StochasticSamplerMixin, PowersetSampler

A sampler for Owen shapley values.

For each index \(i\) the Owen sampler loops over a deterministic grid of probabilities (containing n_samples_outer entries between 0 and 1) and then draws n_samples_inner subsets of the complement of the current index where each element is sampled with the given probability.

The total number of samples drawn is therefore n_samples_outer * n_samples_inner.

PARAMETER DESCRIPTION
n_samples_outer

The number of entries in the probability grid used for the outer loop in Owen sampling.

TYPE: int

n_samples_inner

The number of samples drawn for each probability. In the original paper this was fixed to 2 for all experiments which is why we give it a default value of 2.

TYPE: int DEFAULT: 2

batch_size

The batch size of the sampler.

TYPE: int DEFAULT: 1

seed

The seed for the random number generator.

TYPE: Seed | None DEFAULT: None

Source code in src/pydvl/valuation/samplers/powerset.py
def __init__(
    self,
    n_samples_outer: int,
    n_samples_inner: int = 2,
    batch_size: int = 1,
    seed: Seed | None = None,
):
    super().__init__(
        batch_size=batch_size, index_iteration=SequentialIndexIteration, seed=seed
    )
    self._n_samples_inner = n_samples_inner
    self._n_samples_outer = n_samples_outer
    self._q_stop = 1.0

generate_batches

generate_batches(indices: IndexSetT) -> BatchGenerator

Batches the samples and yields them.

Source code in src/pydvl/valuation/samplers/base.py
def generate_batches(self, indices: IndexSetT) -> BatchGenerator:
    """Batches the samples and yields them."""

    # create an empty generator if the indices are empty. `generate_batches` is
    # a generator function because it has a yield statement later in its body.
    # Inside generator function, `return` acts like a `break`, which produces an
    # empty generator function. See: https://stackoverflow.com/a/13243870
    if len(indices) == 0:
        return

    self._interrupted = False
    self._n_samples = 0
    for batch in chunked(self._generate(indices), self.batch_size):
        yield batch
        self._n_samples += len(batch)
        if self._interrupted:
            break

index_iterator

index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]

Iterates over indices with the method specified at construction.

Source code in src/pydvl/valuation/samplers/powerset.py
def index_iterator(
    self, indices: IndexSetT
) -> Generator[IndexT | None, None, None]:
    """Iterates over indices with the method specified at construction."""
    if issubclass(self._index_iteration, StochasticSamplerMixin):
        # To-Do: Need to do something more elegant here
        seed = self._rng.integers(0, 2**32, dtype=np.uint32).item()  # type: ignore
        yield from self._index_iteration(indices, seed)  # type: ignore
    else:
        yield from self._index_iteration(indices)

AntitheticOwenSampler

AntitheticOwenSampler(
    n_samples_outer: int,
    n_samples_inner: int = 2,
    batch_size: int = 1,
    seed: Seed | None = None,
)

Bases: OwenSampler

A sampler for antithetic Owen shapley values.

For each index \(i\), the antithetic Owen sampler loops over a deterministic grid of probabilities (containing n_samples_outer entries between 0 and 0.5) and then draws n_samples_inner subsets of the complement of the current index where each element is sampled with the given probability. For each sample obtained that way, a second sample is generated by taking the complement of the first sample.

The total number of samples drawn is therefore 2 * n_samples_outer * n_samples_inner.

For the same number of total samples, the antithetic Owen sampler yields usually more precise estimates of shapley values than the regular Owen sampler.

PARAMETER DESCRIPTION
n_samples_outer

The number of entries in the probability grid used for the outer loop in Owen sampling.

TYPE: int

n_samples_inner

The number of samples drawn for each probability. In the original paper this was fixed to 2 for all experiments which is why we give it a default value of 2.

TYPE: int DEFAULT: 2

batch_size

The batch size of the sampler.

TYPE: int DEFAULT: 1

seed

The seed for the random number generator.

TYPE: Seed | None DEFAULT: None

Source code in src/pydvl/valuation/samplers/powerset.py
def __init__(
    self,
    n_samples_outer: int,
    n_samples_inner: int = 2,
    batch_size: int = 1,
    seed: Seed | None = None,
):
    super().__init__(
        n_samples_outer=n_samples_outer,
        n_samples_inner=n_samples_inner,
        batch_size=batch_size,
        seed=seed,
    )
    self._q_stop = 0.5

generate_batches

generate_batches(indices: IndexSetT) -> BatchGenerator

Batches the samples and yields them.

Source code in src/pydvl/valuation/samplers/base.py
def generate_batches(self, indices: IndexSetT) -> BatchGenerator:
    """Batches the samples and yields them."""

    # create an empty generator if the indices are empty. `generate_batches` is
    # a generator function because it has a yield statement later in its body.
    # Inside generator function, `return` acts like a `break`, which produces an
    # empty generator function. See: https://stackoverflow.com/a/13243870
    if len(indices) == 0:
        return

    self._interrupted = False
    self._n_samples = 0
    for batch in chunked(self._generate(indices), self.batch_size):
        yield batch
        self._n_samples += len(batch)
        if self._interrupted:
            break

index_iterator

index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]

Iterates over indices with the method specified at construction.

Source code in src/pydvl/valuation/samplers/powerset.py
def index_iterator(
    self, indices: IndexSetT
) -> Generator[IndexT | None, None, None]:
    """Iterates over indices with the method specified at construction."""
    if issubclass(self._index_iteration, StochasticSamplerMixin):
        # To-Do: Need to do something more elegant here
        seed = self._rng.integers(0, 2**32, dtype=np.uint32).item()  # type: ignore
        yield from self._index_iteration(indices, seed)  # type: ignore
    else:
        yield from self._index_iteration(indices)

AntitheticSampler

AntitheticSampler(*args, seed: Seed | None = None, **kwargs)

Bases: StochasticSamplerMixin, PowersetSampler

An iterator to perform uniform random sampling of subsets, and their complements.

Works as UniformSampler, but for every tuple \((i,S)\), it subsequently returns \((i,S^c)\), where \(S^c\) is the complement of the set \(S\) in the set of indices, excluding \(i\).

Source code in src/pydvl/valuation/samplers/utils.py
def __init__(self, *args, seed: Seed | None = None, **kwargs):
    super().__init__(*args, **kwargs)
    self._rng = np.random.default_rng(seed)

generate_batches

generate_batches(indices: IndexSetT) -> BatchGenerator

Batches the samples and yields them.

Source code in src/pydvl/valuation/samplers/base.py
def generate_batches(self, indices: IndexSetT) -> BatchGenerator:
    """Batches the samples and yields them."""

    # create an empty generator if the indices are empty. `generate_batches` is
    # a generator function because it has a yield statement later in its body.
    # Inside generator function, `return` acts like a `break`, which produces an
    # empty generator function. See: https://stackoverflow.com/a/13243870
    if len(indices) == 0:
        return

    self._interrupted = False
    self._n_samples = 0
    for batch in chunked(self._generate(indices), self.batch_size):
        yield batch
        self._n_samples += len(batch)
        if self._interrupted:
            break

sample_limit

sample_limit(indices: IndexSetT) -> int | None

Number of samples that can be generated from the indices.

Returns None if the number of samples is infinite, which is the case for most stochastic samplers.

Source code in src/pydvl/valuation/samplers/base.py
def sample_limit(self, indices: IndexSetT) -> int | None:
    """Number of samples that can be generated from the indices.

    Returns None if the number of samples is infinite, which is the case for most
    stochastic samplers.
    """
    if len(indices) == 0:
        out = 0
    else:
        out = None
    return out

weight staticmethod

weight(n: int, subset_len: int) -> float

Correction coming from Monte Carlo integration so that the mean of the marginals converges to the value: the uniform distribution over the powerset of a set with n-1 elements has mass 2^{n-1} over each subset.

Source code in src/pydvl/valuation/samplers/powerset.py
@staticmethod
def weight(n: int, subset_len: int) -> float:
    """Correction coming from Monte Carlo integration so that the mean of
    the marginals converges to the value: the uniform distribution over the
    powerset of a set with n-1 elements has mass 2^{n-1} over each subset."""
    return float(2 ** (n - 1)) if n > 0 else 1.0

index_iterator

index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]

Iterates over indices with the method specified at construction.

Source code in src/pydvl/valuation/samplers/powerset.py
def index_iterator(
    self, indices: IndexSetT
) -> Generator[IndexT | None, None, None]:
    """Iterates over indices with the method specified at construction."""
    if issubclass(self._index_iteration, StochasticSamplerMixin):
        # To-Do: Need to do something more elegant here
        seed = self._rng.integers(0, 2**32, dtype=np.uint32).item()  # type: ignore
        yield from self._index_iteration(indices, seed)  # type: ignore
    else:
        yield from self._index_iteration(indices)

UniformStratifiedSampler

UniformStratifiedSampler(*args, seed: Seed | None = None, **kwargs)

Bases: StochasticSamplerMixin, PowersetSampler

For every index, sample a set size, then a set of that size.

Source code in src/pydvl/valuation/samplers/utils.py
def __init__(self, *args, seed: Seed | None = None, **kwargs):
    super().__init__(*args, **kwargs)
    self._rng = np.random.default_rng(seed)

generate_batches

generate_batches(indices: IndexSetT) -> BatchGenerator

Batches the samples and yields them.

Source code in src/pydvl/valuation/samplers/base.py
def generate_batches(self, indices: IndexSetT) -> BatchGenerator:
    """Batches the samples and yields them."""

    # create an empty generator if the indices are empty. `generate_batches` is
    # a generator function because it has a yield statement later in its body.
    # Inside generator function, `return` acts like a `break`, which produces an
    # empty generator function. See: https://stackoverflow.com/a/13243870
    if len(indices) == 0:
        return

    self._interrupted = False
    self._n_samples = 0
    for batch in chunked(self._generate(indices), self.batch_size):
        yield batch
        self._n_samples += len(batch)
        if self._interrupted:
            break

sample_limit

sample_limit(indices: IndexSetT) -> int | None

Number of samples that can be generated from the indices.

Returns None if the number of samples is infinite, which is the case for most stochastic samplers.

Source code in src/pydvl/valuation/samplers/base.py
def sample_limit(self, indices: IndexSetT) -> int | None:
    """Number of samples that can be generated from the indices.

    Returns None if the number of samples is infinite, which is the case for most
    stochastic samplers.
    """
    if len(indices) == 0:
        out = 0
    else:
        out = None
    return out

weight staticmethod

weight(n: int, subset_len: int) -> float

Correction coming from Monte Carlo integration so that the mean of the marginals converges to the value: the uniform distribution over the powerset of a set with n-1 elements has mass 2^{n-1} over each subset.

Source code in src/pydvl/valuation/samplers/powerset.py
@staticmethod
def weight(n: int, subset_len: int) -> float:
    """Correction coming from Monte Carlo integration so that the mean of
    the marginals converges to the value: the uniform distribution over the
    powerset of a set with n-1 elements has mass 2^{n-1} over each subset."""
    return float(2 ** (n - 1)) if n > 0 else 1.0

index_iterator

index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]

Iterates over indices with the method specified at construction.

Source code in src/pydvl/valuation/samplers/powerset.py
def index_iterator(
    self, indices: IndexSetT
) -> Generator[IndexT | None, None, None]:
    """Iterates over indices with the method specified at construction."""
    if issubclass(self._index_iteration, StochasticSamplerMixin):
        # To-Do: Need to do something more elegant here
        seed = self._rng.integers(0, 2**32, dtype=np.uint32).item()  # type: ignore
        yield from self._index_iteration(indices, seed)  # type: ignore
    else:
        yield from self._index_iteration(indices)

TruncatedUniformStratifiedSampler

TruncatedUniformStratifiedSampler(
    *,
    lower_bound: int,
    upper_bound: int,
    index_iteration: Type[IndexIteration] = SequentialIndexIteration,
    seed: Seed | None = None
)

Bases: UniformStratifiedSampler

A sampler which samples set sizes between two bounds.

This sampler was suggested in (Watson et al. 2023)1 for \(\delta\)-Shapley

New in version 0.10.0

Source code in src/pydvl/valuation/samplers/powerset.py
def __init__(
    self,
    *,
    lower_bound: int,
    upper_bound: int,
    index_iteration: Type[IndexIteration] = SequentialIndexIteration,
    seed: Seed | None = None,
):
    super().__init__(index_iteration=index_iteration, seed=seed)
    self.lower_bound = lower_bound
    self.upper_bound = upper_bound

generate_batches

generate_batches(indices: IndexSetT) -> BatchGenerator

Batches the samples and yields them.

Source code in src/pydvl/valuation/samplers/base.py
def generate_batches(self, indices: IndexSetT) -> BatchGenerator:
    """Batches the samples and yields them."""

    # create an empty generator if the indices are empty. `generate_batches` is
    # a generator function because it has a yield statement later in its body.
    # Inside generator function, `return` acts like a `break`, which produces an
    # empty generator function. See: https://stackoverflow.com/a/13243870
    if len(indices) == 0:
        return

    self._interrupted = False
    self._n_samples = 0
    for batch in chunked(self._generate(indices), self.batch_size):
        yield batch
        self._n_samples += len(batch)
        if self._interrupted:
            break

sample_limit

sample_limit(indices: IndexSetT) -> int | None

Number of samples that can be generated from the indices.

Returns None if the number of samples is infinite, which is the case for most stochastic samplers.

Source code in src/pydvl/valuation/samplers/base.py
def sample_limit(self, indices: IndexSetT) -> int | None:
    """Number of samples that can be generated from the indices.

    Returns None if the number of samples is infinite, which is the case for most
    stochastic samplers.
    """
    if len(indices) == 0:
        out = 0
    else:
        out = None
    return out

weight staticmethod

weight(n: int, subset_len: int) -> float

Correction coming from Monte Carlo integration so that the mean of the marginals converges to the value: the uniform distribution over the powerset of a set with n-1 elements has mass 2^{n-1} over each subset.

Source code in src/pydvl/valuation/samplers/powerset.py
@staticmethod
def weight(n: int, subset_len: int) -> float:
    """Correction coming from Monte Carlo integration so that the mean of
    the marginals converges to the value: the uniform distribution over the
    powerset of a set with n-1 elements has mass 2^{n-1} over each subset."""
    return float(2 ** (n - 1)) if n > 0 else 1.0

index_iterator

index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]

Iterates over indices with the method specified at construction.

Source code in src/pydvl/valuation/samplers/powerset.py
def index_iterator(
    self, indices: IndexSetT
) -> Generator[IndexT | None, None, None]:
    """Iterates over indices with the method specified at construction."""
    if issubclass(self._index_iteration, StochasticSamplerMixin):
        # To-Do: Need to do something more elegant here
        seed = self._rng.integers(0, 2**32, dtype=np.uint32).item()  # type: ignore
        yield from self._index_iteration(indices, seed)  # type: ignore
    else:
        yield from self._index_iteration(indices)

VarianceReducedStratifiedSampler

VarianceReducedStratifiedSampler(
    samples_per_setsize: Callable[[int], int],
    index_iteration: Type[IndexIteration] = SequentialIndexIteration,
)

Bases: StochasticSamplerMixin, PowersetSampler

VRDS sampler.

This sampler was suggested in (Wu et al. 2023)3, a generalization of the stratified sampler in (Maleki et al. 2014)4

PARAMETER DESCRIPTION
samples_per_setsize

A function which returns the number of samples to take for a given set size.

TYPE: Callable[[int], int]

index_iteration

the order in which indices are iterated over

TYPE: Type[IndexIteration] DEFAULT: SequentialIndexIteration

New in version 0.10.0

Source code in src/pydvl/valuation/samplers/powerset.py
def __init__(
    self,
    samples_per_setsize: Callable[[int], int],
    index_iteration: Type[IndexIteration] = SequentialIndexIteration,
):
    super().__init__(index_iteration=index_iteration)
    self.samples_per_setsize = samples_per_setsize
    # HACK: closure around the argument to avoid weight() being an instance method
    # FIXME: is this the correct weight anyway?
    self.weight = lambda n, subset_len: samples_per_setsize(subset_len)  # type: ignore

generate_batches

generate_batches(indices: IndexSetT) -> BatchGenerator

Batches the samples and yields them.

Source code in src/pydvl/valuation/samplers/base.py
def generate_batches(self, indices: IndexSetT) -> BatchGenerator:
    """Batches the samples and yields them."""

    # create an empty generator if the indices are empty. `generate_batches` is
    # a generator function because it has a yield statement later in its body.
    # Inside generator function, `return` acts like a `break`, which produces an
    # empty generator function. See: https://stackoverflow.com/a/13243870
    if len(indices) == 0:
        return

    self._interrupted = False
    self._n_samples = 0
    for batch in chunked(self._generate(indices), self.batch_size):
        yield batch
        self._n_samples += len(batch)
        if self._interrupted:
            break

sample_limit

sample_limit(indices: IndexSetT) -> int | None

Number of samples that can be generated from the indices.

Returns None if the number of samples is infinite, which is the case for most stochastic samplers.

Source code in src/pydvl/valuation/samplers/base.py
def sample_limit(self, indices: IndexSetT) -> int | None:
    """Number of samples that can be generated from the indices.

    Returns None if the number of samples is infinite, which is the case for most
    stochastic samplers.
    """
    if len(indices) == 0:
        out = 0
    else:
        out = None
    return out

index_iterator

index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]

Iterates over indices with the method specified at construction.

Source code in src/pydvl/valuation/samplers/powerset.py
def index_iterator(
    self, indices: IndexSetT
) -> Generator[IndexT | None, None, None]:
    """Iterates over indices with the method specified at construction."""
    if issubclass(self._index_iteration, StochasticSamplerMixin):
        # To-Do: Need to do something more elegant here
        seed = self._rng.integers(0, 2**32, dtype=np.uint32).item()  # type: ignore
        yield from self._index_iteration(indices, seed)  # type: ignore
    else:
        yield from self._index_iteration(indices)

complement

complement(include: IndexSetT, exclude: Iterable[IndexT]) -> NDArray[IndexT]

Returns the complement of the set of indices excluding the given indices.

PARAMETER DESCRIPTION
include

The set of indices to consider.

TYPE: IndexSetT

exclude

The indices to exclude from the complement.

TYPE: Iterable[IndexT]

RETURNS DESCRIPTION
NDArray[IndexT]

The complement of the set of indices excluding the given indices.

Source code in src/pydvl/valuation/samplers/powerset.py
def complement(include: IndexSetT, exclude: Iterable[IndexT]) -> NDArray[IndexT]:
    """Returns the complement of the set of indices excluding the given
    indices.

    Args:
        include: The set of indices to consider.
        exclude: The indices to exclude from the complement.

    Returns:
        The complement of the set of indices excluding the given indices.
    """
    _exclude = [i for i in exclude if i is not None]
    return np.setxor1d(include, _exclude).astype(np.int_)