pydvl.valuation.samplers.powerset
¶
This module provides the base implementation for powerset samplers.
These samplers operate in two loops:
- Outer iteration over all indices. This is configurable with subclasses of IndexIteration. At each step we fix an index \(i \in N\).
- Inner iteration over subsets of \(N_{-i}\). This step can return one or more subsets, sampled in different ways: uniformly, with varying probabilities, in tuples of complementary sets, etc.
Index iteration¶
The type of iteration over indices \(i\) and their complements is configured upon construction of the sampler with the classes SequentialIndexIteration, RandomIndexIteration, or their finite counterparts, when each index must be visited just once (albeit possibly generating many samples per index).
However, some valuation schemes require iteration over subsets of the whole set (as opposed to iterating over complements of individual indices). For this purpose, one can use NoIndexIteration or its finite counterpart.
See also
General information on semi-values and, an explanation of importance sampling in the context of semi-values.
References¶
-
Mitchell, Rory, Joshua Cooper, Eibe Frank, and Geoffrey Holmes. Sampling Permutations for Shapley Value Estimation. Journal of Machine Learning Research 23, no. 43 (2022): 1–46. ↩
-
Maleki, Sasan, Long Tran-Thanh, Greg Hines, Talal Rahwan, and Alex Rogers. Bounding the Estimation Error of Sampling-Based Shapley Value Approximation. arXiv:1306.4265 [Cs], 12 February 2014. ↩
AntitheticSampler
¶
Bases: StochasticSamplerMixin
, PowersetSampler
A sampler that draws samples uniformly, followed by their complements.
Works as UniformSampler, but for every tuple \((i,S)\), it subsequently returns \((i,S^c)\), where \(S^c\) is the complement of the set \(S\) in the set of indices, excluding \(i\).
By symmetry, the probability of sampling a set \(S\) is the same as the probability of
sampling its complement \(S^c\), so that \(p(S)\) in log_weight
is the same as in the
PowersetSampler class.
PARAMETER | DESCRIPTION |
---|---|
batch_size
|
The number of samples to generate per batch. Batches are
processed together by
|
index_iteration
|
the strategy to use for iterating over indices to update.
|
seed
|
Seed for the random number generator. Passed to numpy.random.default_rng.
TYPE:
|
Source code in src/pydvl/valuation/samplers/utils.py
__len__
¶
__len__() -> int
Returns the length of the current sample generation in generate_batches.
RAISES | DESCRIPTION |
---|---|
`TypeError`
|
if the sampler is infinite or generate_batches has not been called yet. |
Source code in src/pydvl/valuation/samplers/base.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
index_iterable
¶
index_iterable(indices: IndexSetT) -> Generator[IndexT | None, None, None]
Iterates over indices with the method specified at construction.
Source code in src/pydvl/valuation/samplers/powerset.py
interrupt
¶
log_weight
¶
Probability of sampling a set S as a function of total number of indices and set size.
The uniform distribution over the powerset of a set with \(n\) elements has mass \(1/2^{n}\) over each subset.
PARAMETER | DESCRIPTION |
---|---|
n
|
The size of the index set. Note that the actual size of the set being sampled will often be n-1, as one index might be removed from the set. See IndexIteration for more.
TYPE:
|
subset_len
|
The size of the subset being sampled
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
The natural logarithm of the probability of sampling a set of the given
size, when the index set has size |
Source code in src/pydvl/valuation/samplers/powerset.py
result_updater
¶
result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]
Returns an object that updates a valuation result with a value update.
Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.
PARAMETER | DESCRIPTION |
---|---|
result
|
The result to update
TYPE:
|
Returns: A callable object that updates the result with a value update
Source code in src/pydvl/valuation/samplers/base.py
DeterministicUniformSampler
¶
DeterministicUniformSampler(
batch_size: int = 1,
index_iteration: Type[IndexIteration] = FiniteSequentialIndexIteration,
)
Bases: PowersetSampler
An iterator to perform uniform deterministic sampling of subsets.
For every index \(i\), each subset of the complement indices - {i}
is
returned.
PARAMETER | DESCRIPTION |
---|---|
batch_size
|
The number of samples to generate per batch. Batches are processed together by each subprocess when working in parallel.
TYPE:
|
index_iteration
|
the strategy to use for iterating over indices to update. This iteration can be either finite or infinite.
TYPE:
|
Example
The code:
from pydvl.valuation.samplers import DeterministicUniformSampler
import numpy as np
sampler = DeterministicUniformSampler()
for idx, s in sampler.generate_batches(np.arange(2)):
print(f"{idx} - {s}", end=", ")
Should produce the output:
Source code in src/pydvl/valuation/samplers/powerset.py
__len__
¶
__len__() -> int
Returns the length of the current sample generation in generate_batches.
RAISES | DESCRIPTION |
---|---|
`TypeError`
|
if the sampler is infinite or generate_batches has not been called yet. |
Source code in src/pydvl/valuation/samplers/base.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
index_iterable
¶
index_iterable(indices: IndexSetT) -> Generator[IndexT | None, None, None]
Iterates over indices with the method specified at construction.
Source code in src/pydvl/valuation/samplers/powerset.py
interrupt
¶
log_weight
¶
Probability of sampling a set S as a function of total number of indices and set size.
The uniform distribution over the powerset of a set with \(n\) elements has mass \(1/2^{n}\) over each subset.
PARAMETER | DESCRIPTION |
---|---|
n
|
The size of the index set. Note that the actual size of the set being sampled will often be n-1, as one index might be removed from the set. See IndexIteration for more.
TYPE:
|
subset_len
|
The size of the subset being sampled
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
The natural logarithm of the probability of sampling a set of the given
size, when the index set has size |
Source code in src/pydvl/valuation/samplers/powerset.py
result_updater
¶
result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]
Returns an object that updates a valuation result with a value update.
Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.
PARAMETER | DESCRIPTION |
---|---|
result
|
The result to update
TYPE:
|
Returns: A callable object that updates the result with a value update
Source code in src/pydvl/valuation/samplers/base.py
FiniteIterationMixin
¶
Careful with MRO when using this and subclassing!
FiniteNoIndexIteration
¶
Bases: FiniteIterationMixin
, NoIndexIteration
A finite iteration over no indices.
The iterator will yield None
once and then stop.
Source code in src/pydvl/valuation/samplers/powerset.py
length
staticmethod
¶
FiniteRandomIndexIteration
¶
FiniteRandomIndexIteration(indices: NDArray[IndexT], seed: Seed)
Bases: FiniteIterationMixin
, RandomIndexIteration
Samples indices at random, once
Source code in src/pydvl/valuation/samplers/powerset.py
FiniteSequentialIndexIteration
¶
Bases: FiniteIterationMixin
, SequentialIndexIteration
Samples indices sequentially, once.
Source code in src/pydvl/valuation/samplers/powerset.py
IndexIteration
¶
Bases: ABC
An index iteration defines the way in which the outer loop over indices in a PowersetSampler is done.
Iterations can be finite, infinite, at random, etc. Subclasses must implement certain methods to inform the samplers of the size of the complement, the number of iterations, etc.
Source code in src/pydvl/valuation/samplers/powerset.py
complement_size
abstractmethod
staticmethod
¶
Returns the size of complements of sets of size n, with respect to the indices returned by the iteration.
If the iteration returns single indices, then this is n-1, if it returns no indices, then it is n. If it returned tuples, then n-2, etc. Args: n: The size of the index set.
Source code in src/pydvl/valuation/samplers/powerset.py
length
abstractmethod
staticmethod
¶
Returns the length of the iteration over the index set
PARAMETER | DESCRIPTION |
---|---|
n_indices
|
The size of the index set.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
int | None
|
The length of the iteration. It can be:
- a non-negative integer, if the iteration is finite
- |
Source code in src/pydvl/valuation/samplers/powerset.py
InfiniteIterationMixin
¶
Careful with MRO when using this and subclassing!
LOOEvaluationStrategy
¶
LOOEvaluationStrategy(
utility: UtilityBase, coefficient: SemivalueCoefficient | None
)
Bases: PowersetEvaluationStrategy[LOOSampler]
Computes marginal values for LOO.
Upon construction, the total utility is computed once. Then, the utility for every sample processed in [process()] is subtracted from it and returned as value update.
PARAMETER | DESCRIPTION |
---|---|
utility
|
The utility function to use.
TYPE:
|
coefficient
|
The coefficient to use. If
TYPE:
|
Source code in src/pydvl/valuation/samplers/powerset.py
LOOSampler
¶
LOOSampler(
batch_size: int = 1,
index_iteration: Type[IndexIteration] = FiniteSequentialIndexIteration,
seed: Seed | None = None,
)
Bases: PowersetSampler
Leave-One-Out sampler.
In this special case of a powerset sampler, for every index \(i\) in the set \(S\), the sample \((i, S_{-i})\) is returned.
PARAMETER | DESCRIPTION |
---|---|
batch_size
|
The number of samples to generate per batch. Batches are processed together by each subprocess when working in parallel.
TYPE:
|
index_iteration
|
the strategy to use for iterating over indices to update. By default, a finite sequential index iteration is used, which is what LOOValuation expects.
TYPE:
|
seed
|
The seed for the random number generator used in case the index iteration is random.
TYPE:
|
New in version 0.10.0
Source code in src/pydvl/valuation/samplers/powerset.py
__len__
¶
__len__() -> int
Returns the length of the current sample generation in generate_batches.
RAISES | DESCRIPTION |
---|---|
`TypeError`
|
if the sampler is infinite or generate_batches has not been called yet. |
Source code in src/pydvl/valuation/samplers/base.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
index_iterable
¶
index_iterable(indices: IndexSetT) -> Generator[IndexT | None, None, None]
Iterates over indices with the method specified at construction.
Source code in src/pydvl/valuation/samplers/powerset.py
interrupt
¶
log_weight
¶
This sampler returns only sets of size n-1. There are n such sets, so the probability of drawing one is 1/n, or 0 if subset_len != n-1.
Source code in src/pydvl/valuation/samplers/powerset.py
result_updater
¶
result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]
Returns an object that updates a valuation result with a value update.
Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.
PARAMETER | DESCRIPTION |
---|---|
result
|
The result to update
TYPE:
|
Returns: A callable object that updates the result with a value update
Source code in src/pydvl/valuation/samplers/base.py
NoIndexIteration
¶
Bases: InfiniteIterationMixin
, IndexIteration
An infinite iteration over no indices.
Source code in src/pydvl/valuation/samplers/powerset.py
PowersetEvaluationStrategy
¶
PowersetEvaluationStrategy(
utility: UtilityBase, log_coefficient: SemivalueCoefficient | None
)
Bases: Generic[PowersetSamplerT]
, EvaluationStrategy[PowersetSamplerT, ValueUpdate]
The standard strategy for evaluating the utility of subsets of a set.
This strategy computes the marginal value of each subset of the complement of an index in the training set. The marginal value is the difference between the utility of the subset and the utility of the subset with the index added back in.
It is the standard strategy for the direct implementation of semi-values, when sampling is done over the powerset of the complement of an index.
Source code in src/pydvl/valuation/samplers/base.py
PowersetSampler
¶
PowersetSampler(
batch_size: int = 1,
index_iteration: Type[IndexIteration] = SequentialIndexIteration,
)
Bases: IndexSampler
, ABC
An abstract class for samplers which iterate over the powerset of the complement of an index in the training set.
This is done in two nested loops, where the outer loop iterates over the set
of indices, and the inner loop iterates over subsets of the complement of
the current index. The outer iteration can be either sequential or at random.
Args:
batch_size: The number of samples to generate per batch. Batches are
processed together by process()
in the evaluation strategy
PowersetEvaluationStrategy.
index_iteration: the strategy to use for iterating over indices to update.
PARAMETER | DESCRIPTION |
---|---|
batch_size
|
The number of samples to generate per batch. Batches are processed together by EvaluationStrategy.
TYPE:
|
index_iteration
|
the strategy to use for iterating over indices to update
TYPE:
|
Source code in src/pydvl/valuation/samplers/powerset.py
__len__
¶
__len__() -> int
Returns the length of the current sample generation in generate_batches.
RAISES | DESCRIPTION |
---|---|
`TypeError`
|
if the sampler is infinite or generate_batches has not been called yet. |
Source code in src/pydvl/valuation/samplers/base.py
generate
abstractmethod
¶
Generates samples over the powerset of indices
Each PowersetSampler
defines its own way to generate the subsets by
implementing this method. The outer loop is handled by the
index_iterable()
method. Batching is handled by the
generate_batches()
method.
PARAMETER | DESCRIPTION |
---|---|
indices
|
The set from which to generate samples.
TYPE:
|
Returns:
A generator that yields samples over the powerset of indices
.
Source code in src/pydvl/valuation/samplers/powerset.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
index_iterable
¶
index_iterable(indices: IndexSetT) -> Generator[IndexT | None, None, None]
Iterates over indices with the method specified at construction.
Source code in src/pydvl/valuation/samplers/powerset.py
interrupt
¶
log_weight
¶
Probability of sampling a set S as a function of total number of indices and set size.
The uniform distribution over the powerset of a set with \(n\) elements has mass \(1/2^{n}\) over each subset.
PARAMETER | DESCRIPTION |
---|---|
n
|
The size of the index set. Note that the actual size of the set being sampled will often be n-1, as one index might be removed from the set. See IndexIteration for more.
TYPE:
|
subset_len
|
The size of the subset being sampled
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
The natural logarithm of the probability of sampling a set of the given
size, when the index set has size |
Source code in src/pydvl/valuation/samplers/powerset.py
result_updater
¶
result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]
Returns an object that updates a valuation result with a value update.
Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.
PARAMETER | DESCRIPTION |
---|---|
result
|
The result to update
TYPE:
|
Returns: A callable object that updates the result with a value update
Source code in src/pydvl/valuation/samplers/base.py
sample_limit
abstractmethod
¶
sample_limit(indices: IndexSetT) -> int | None
Returns the number of samples that can be generated from the index set.
This will depend, among other things, on the type of IndexIteration.
PARAMETER | DESCRIPTION |
---|---|
indices
|
The set of indices to sample from.
TYPE:
|
Returns: The number of samples that can be generated from the index set.
Source code in src/pydvl/valuation/samplers/powerset.py
RandomIndexIteration
¶
RandomIndexIteration(indices: NDArray[IndexT], seed: Seed)
Bases: InfiniteIterationMixin
, StochasticSamplerMixin
, IndexIteration
Samples indices at random, indefinitely
Source code in src/pydvl/valuation/samplers/powerset.py
SequentialIndexIteration
¶
Bases: InfiniteIterationMixin
, IndexIteration
Samples indices sequentially, indefinitely.
Source code in src/pydvl/valuation/samplers/powerset.py
UniformSampler
¶
UniformSampler(
batch_size: int = 1,
index_iteration: Type[IndexIteration] = SequentialIndexIteration,
seed: Seed | None = None,
)
Bases: StochasticSamplerMixin
, PowersetSampler
Draws random samples uniformly from the powerset of the index set.
Iterating over every index \(i\), either in sequence or at random depending on
the value of index_iteration
, one subset of the complement
indices - {i}
is sampled with equal probability \(2^{n-1}\).
PARAMETER | DESCRIPTION |
---|---|
batch_size
|
The number of samples to generate per batch. Batches are processed together by each subprocess when working in parallel.
TYPE:
|
index_iteration
|
the strategy to use for iterating over indices to update.
TYPE:
|
seed
|
The seed for the random number generator.
TYPE:
|
Example
The code
Produces the output:Source code in src/pydvl/valuation/samplers/powerset.py
__len__
¶
__len__() -> int
Returns the length of the current sample generation in generate_batches.
RAISES | DESCRIPTION |
---|---|
`TypeError`
|
if the sampler is infinite or generate_batches has not been called yet. |
Source code in src/pydvl/valuation/samplers/base.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
index_iterable
¶
index_iterable(indices: IndexSetT) -> Generator[IndexT | None, None, None]
Iterates over indices with the method specified at construction.
Source code in src/pydvl/valuation/samplers/powerset.py
interrupt
¶
log_weight
¶
Probability of sampling a set S as a function of total number of indices and set size.
The uniform distribution over the powerset of a set with \(n\) elements has mass \(1/2^{n}\) over each subset.
PARAMETER | DESCRIPTION |
---|---|
n
|
The size of the index set. Note that the actual size of the set being sampled will often be n-1, as one index might be removed from the set. See IndexIteration for more.
TYPE:
|
subset_len
|
The size of the subset being sampled
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
The natural logarithm of the probability of sampling a set of the given
size, when the index set has size |
Source code in src/pydvl/valuation/samplers/powerset.py
result_updater
¶
result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]
Returns an object that updates a valuation result with a value update.
Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.
PARAMETER | DESCRIPTION |
---|---|
result
|
The result to update
TYPE:
|
Returns: A callable object that updates the result with a value update