pydvl.valuation.samplers.powerset
¶
This module provides the base implementation for powerset samplers.
These samplers operate in two loops:
- Outer iteration over all indices. This is configurable with subclasses of IndexIteration. At each step we fix an index \(i \in N\).
- Inner iteration over subsets of \(N_{-i}\). This step can return one or more subsets, sampled in different ways: uniformly, with varying probabilities, in tuples of complementary sets, etc.
This scheme follows the usual definition of semi-values as:
see semivalues for reference.
References¶
-
Mitchell, Rory, Joshua Cooper, Eibe Frank, and Geoffrey Holmes. Sampling Permutations for Shapley Value Estimation. Journal of Machine Learning Research 23, no. 43 (2022): 1–46. ↩
-
Maleki, Sasan, Long Tran-Thanh, Greg Hines, Talal Rahwan, and Alex Rogers. Bounding the Estimation Error of Sampling-Based Shapley Value Approximation. arXiv:1306.4265 [Cs], 12 February 2014. ↩
IndexIteration
¶
Bases: ABC
Source code in src/pydvl/valuation/samplers/powerset.py
length
abstractmethod
staticmethod
¶
Returns the length of the iteration over the index set
PARAMETER | DESCRIPTION |
---|---|
n_indices
|
The number of indices in the set.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
int | None
|
The length of the iteration. It can be:
- a non-negative integer, if the iteration is finite
- |
Source code in src/pydvl/valuation/samplers/powerset.py
complement_size
abstractmethod
staticmethod
¶
Returns the size of complements of sets of size n, with respect to the indices returned by the iteration.
If the iteration returns single indices, then this is n-1, if it returns no indices, then it is n. If it returned tuples, then n-2, etc.
Source code in src/pydvl/valuation/samplers/powerset.py
SequentialIndexIteration
¶
Bases: InfiniteIterationMixin
, IndexIteration
Samples indices sequentially, indefinitely.
Source code in src/pydvl/valuation/samplers/powerset.py
FiniteSequentialIndexIteration
¶
Bases: FiniteIterationMixin
, SequentialIndexIteration
Samples indices sequentially, once.
Source code in src/pydvl/valuation/samplers/powerset.py
RandomIndexIteration
¶
RandomIndexIteration(indices: NDArray[IndexT], seed: Seed)
Bases: InfiniteIterationMixin
, StochasticSamplerMixin
, IndexIteration
Samples indices at random, indefinitely
Source code in src/pydvl/valuation/samplers/powerset.py
FiniteRandomIndexIteration
¶
FiniteRandomIndexIteration(indices: NDArray[IndexT], seed: Seed)
Bases: FiniteIterationMixin
, RandomIndexIteration
Samples indices at random, once
Source code in src/pydvl/valuation/samplers/powerset.py
NoIndexIteration
¶
Bases: InfiniteIterationMixin
, IndexIteration
An infinite iteration over no indices.
Source code in src/pydvl/valuation/samplers/powerset.py
FiniteNoIndexIteration
¶
Bases: FiniteIterationMixin
, NoIndexIteration
A finite iteration over no indices. The iterator will yield None once and then stop.
Source code in src/pydvl/valuation/samplers/powerset.py
length
staticmethod
¶
PowersetSampler
¶
PowersetSampler(
batch_size: int = 1,
index_iteration: Type[IndexIteration] = SequentialIndexIteration,
)
Bases: IndexSampler
, ABC
An abstract class for samplers which iterate over the powerset of the complement of an index in the training set.
This is done in two nested loops, where the outer loop iterates over the set of indices, and the inner loop iterates over subsets of the complement of the current index. The outer iteration can be either sequential or at random.
processed together by
[UtilityEvaluator][pydvl.valuation.utility.evaluator.UtilityEvaluator].
index_iteration: the strategy to use for iterating over indices to update
Source code in src/pydvl/valuation/samplers/powerset.py
interrupt
¶
__len__
¶
__len__() -> int
Returns the length of the current sample generation in generate_batches.
RAISES | DESCRIPTION |
---|---|
`TypeError`
|
if the sampler is infinite or generate_batches has not been called yet. |
Source code in src/pydvl/valuation/samplers/base.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
sample_limit
abstractmethod
¶
sample_limit(indices: IndexSetT) -> int | None
Number of samples that can be generated from the indices.
PARAMETER | DESCRIPTION |
---|---|
indices
|
The indices used in the sampler.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
int | None
|
The maximum number of samples that will be generated, or |
Source code in src/pydvl/valuation/samplers/base.py
result_updater
¶
result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]
Returns a callable that updates a valuation result with a value update.
Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.
PARAMETER | DESCRIPTION |
---|---|
result
|
The result to update
TYPE:
|
Returns: A callable object that updates the result with a value update
Source code in src/pydvl/valuation/samplers/base.py
index_iterator
¶
index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]
Iterates over indices with the method specified at construction.
Source code in src/pydvl/valuation/samplers/powerset.py
generate
abstractmethod
¶
Generates samples over the powerset of indices
Each PowersetSampler
defines its own way to generate the subsets by
implementing this method. The outer loop is handled by the index_iterator
.
Batching is handled by the generate_batches
method.
PARAMETER | DESCRIPTION |
---|---|
indices
|
The set from which to generate samples.
TYPE:
|
Source code in src/pydvl/valuation/samplers/powerset.py
log_weight
¶
Correction coming from Monte Carlo integration so that the mean of the marginals converges to the value: the uniform distribution over the powerset of a set with n-1 elements has mass 1/2^{n-1} over each subset.
Source code in src/pydvl/valuation/samplers/powerset.py
PowersetEvaluationStrategy
¶
PowersetEvaluationStrategy(
sampler: SamplerT,
utility: UtilityBase,
log_coefficient: Callable[[int, int], float] | None = None,
)
Bases: Generic[PowersetSamplerT]
, EvaluationStrategy[PowersetSamplerT, ValueUpdate]
The standard strategy for evaluating the utility of subsets of a set.
This strategy computes the marginal value of each subset of the complement of an index in the training set. The marginal value is the difference between the utility of the subset and the utility of the subset with the index added back in.
It is the standard strategy for the direct implementation of semi-values, when sampling is done over the powerset of the complement of an index.
Source code in src/pydvl/valuation/samplers/base.py
LOOSampler
¶
LOOSampler(
batch_size: int = 1,
index_iteration: Type[IndexIteration] = FiniteSequentialIndexIteration,
seed: Seed | None = None,
)
Bases: PowersetSampler
Leave-One-Out sampler.
In this special case of a powerset sampler, for every index \(i\) in the set \(S\), the sample \((i, S_{-i})\) is returned.
PARAMETER | DESCRIPTION |
---|---|
batch_size
|
The number of samples to generate per batch. Batches are processed together by each subprocess when working in parallel.
TYPE:
|
index_iteration
|
the strategy to use for iterating over indices to update. By default, a finite sequential index iteration is used, which is what LOOValuation expects.
TYPE:
|
seed
|
The seed for the random number generator used in case the index iteration is random.
TYPE:
|
New in version 0.10.0
Source code in src/pydvl/valuation/samplers/powerset.py
interrupt
¶
__len__
¶
__len__() -> int
Returns the length of the current sample generation in generate_batches.
RAISES | DESCRIPTION |
---|---|
`TypeError`
|
if the sampler is infinite or generate_batches has not been called yet. |
Source code in src/pydvl/valuation/samplers/base.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
result_updater
¶
result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]
Returns a callable that updates a valuation result with a value update.
Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.
PARAMETER | DESCRIPTION |
---|---|
result
|
The result to update
TYPE:
|
Returns: A callable object that updates the result with a value update
Source code in src/pydvl/valuation/samplers/base.py
index_iterator
¶
index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]
Iterates over indices with the method specified at construction.
Source code in src/pydvl/valuation/samplers/powerset.py
log_weight
¶
This sampler returns only sets of size n-1. There are n such sets, so the probability of drawing one is 1/n, or 0 if subset_len != n-1.
Source code in src/pydvl/valuation/samplers/powerset.py
LOOEvaluationStrategy
¶
LOOEvaluationStrategy(
sampler: LOOSampler,
utility: UtilityBase,
coefficient: Callable[[int, int], float] | None = None,
)
Bases: PowersetEvaluationStrategy[LOOSampler]
Computes marginal values for LOO.
Source code in src/pydvl/valuation/samplers/powerset.py
DeterministicUniformSampler
¶
DeterministicUniformSampler(
batch_size: int = 1,
index_iteration: Type[IndexIteration] = FiniteSequentialIndexIteration,
)
Bases: PowersetSampler
An iterator to perform uniform deterministic sampling of subsets.
For every index \(i\), each subset of the complement indices - {i}
is
returned.
PARAMETER | DESCRIPTION |
---|---|
batch_size
|
The number of samples to generate per batch. Batches are processed together by each subprocess when working in parallel.
TYPE:
|
index_iteration
|
the strategy to use for iterating over indices to update. This iteration can be either finite or infinite.
TYPE:
|
Example
The code:
from pydvl.valuation.samplers import DeterministicUniformSampler
import numpy as np
sampler = DeterministicUniformSampler()
for idx, s in sampler.generate_batches(np.arange(2)):
print(f"{idx} - {s}", end=", ")
Should produce the output:
Source code in src/pydvl/valuation/samplers/powerset.py
interrupt
¶
__len__
¶
__len__() -> int
Returns the length of the current sample generation in generate_batches.
RAISES | DESCRIPTION |
---|---|
`TypeError`
|
if the sampler is infinite or generate_batches has not been called yet. |
Source code in src/pydvl/valuation/samplers/base.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
log_weight
¶
Correction coming from Monte Carlo integration so that the mean of the marginals converges to the value: the uniform distribution over the powerset of a set with n-1 elements has mass 1/2^{n-1} over each subset.
Source code in src/pydvl/valuation/samplers/powerset.py
result_updater
¶
result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]
Returns a callable that updates a valuation result with a value update.
Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.
PARAMETER | DESCRIPTION |
---|---|
result
|
The result to update
TYPE:
|
Returns: A callable object that updates the result with a value update
Source code in src/pydvl/valuation/samplers/base.py
index_iterator
¶
index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]
Iterates over indices with the method specified at construction.
Source code in src/pydvl/valuation/samplers/powerset.py
UniformSampler
¶
UniformSampler(
batch_size: int = 1,
index_iteration: Type[IndexIteration] = SequentialIndexIteration,
seed: Seed | None = None,
)
Bases: StochasticSamplerMixin
, PowersetSampler
Draws random samples uniformly from the powerset of the index set.
Iterating over every index \(i\), either in sequence or at random depending on
the value of index_iteration
, one subset of the complement
indices - {i}
is sampled with equal probability \(2^{n-1}\).
PARAMETER | DESCRIPTION |
---|---|
batch_size
|
The number of samples to generate per batch. Batches are processed together by each subprocess when working in parallel.
TYPE:
|
index_iteration
|
the strategy to use for iterating over indices to update. This iteration can be either finite or infinite.
TYPE:
|
seed
|
The seed for the random number generator.
TYPE:
|
Example
The code
Produces the output:Source code in src/pydvl/valuation/samplers/powerset.py
interrupt
¶
__len__
¶
__len__() -> int
Returns the length of the current sample generation in generate_batches.
RAISES | DESCRIPTION |
---|---|
`TypeError`
|
if the sampler is infinite or generate_batches has not been called yet. |
Source code in src/pydvl/valuation/samplers/base.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
log_weight
¶
Correction coming from Monte Carlo integration so that the mean of the marginals converges to the value: the uniform distribution over the powerset of a set with n-1 elements has mass 1/2^{n-1} over each subset.
Source code in src/pydvl/valuation/samplers/powerset.py
result_updater
¶
result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]
Returns a callable that updates a valuation result with a value update.
Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.
PARAMETER | DESCRIPTION |
---|---|
result
|
The result to update
TYPE:
|
Returns: A callable object that updates the result with a value update
Source code in src/pydvl/valuation/samplers/base.py
index_iterator
¶
index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]
Iterates over indices with the method specified at construction.
Source code in src/pydvl/valuation/samplers/powerset.py
AntitheticSampler
¶
Bases: StochasticSamplerMixin
, PowersetSampler
A sampler that draws samples uniformly and their complements.
Works as UniformSampler, but for every tuple \((i,S)\), it subsequently returns \((i,S^c)\), where \(S^c\) is the complement of the set \(S\) in the set of indices, excluding \(i\).
Source code in src/pydvl/valuation/samplers/utils.py
interrupt
¶
__len__
¶
__len__() -> int
Returns the length of the current sample generation in generate_batches.
RAISES | DESCRIPTION |
---|---|
`TypeError`
|
if the sampler is infinite or generate_batches has not been called yet. |
Source code in src/pydvl/valuation/samplers/base.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
log_weight
¶
Correction coming from Monte Carlo integration so that the mean of the marginals converges to the value: the uniform distribution over the powerset of a set with n-1 elements has mass 1/2^{n-1} over each subset.
Source code in src/pydvl/valuation/samplers/powerset.py
result_updater
¶
result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]
Returns a callable that updates a valuation result with a value update.
Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.
PARAMETER | DESCRIPTION |
---|---|
result
|
The result to update
TYPE:
|
Returns: A callable object that updates the result with a value update
Source code in src/pydvl/valuation/samplers/base.py
index_iterator
¶
index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]
Iterates over indices with the method specified at construction.