pydvl.valuation.samplers.powerset
¶
Powerset samplers.
TODO: explain the formulation and the different samplers.
Stochastic samplers¶
...
References¶
-
Mitchell, Rory, Joshua Cooper, Eibe Frank, and Geoffrey Holmes. Sampling Permutations for Shapley Value Estimation. Journal of Machine Learning Research 23, no. 43 (2022): 1–46. ↩
-
Watson, Lauren, Zeno Kujawa, Rayna Andreeva, Hao-Tsung Yang, Tariq Elahi, and Rik Sarkar. Accelerated Shapley Value Approximation for Data Evaluation. arXiv, 9 November 2023. ↩
-
Wu, Mengmeng, Ruoxi Jia, Changle Lin, Wei Huang, and Xiangyu Chang. Variance Reduced Shapley Value Estimation for Trustworthy Data Valuation. Computers & Operations Research 159 (1 November 2023): 106305. ↩
-
Maleki, Sasan, Long Tran-Thanh, Greg Hines, Talal Rahwan, and Alex Rogers. Bounding the Estimation Error of Sampling-Based Shapley Value Approximation. arXiv:1306.4265 [Cs], 12 February 2014. ↩
PowersetSampler
¶
PowersetSampler(
batch_size: int = 1,
index_iteration: Type[IndexIteration] = SequentialIndexIteration,
)
Bases: IndexSampler
, ABC
An abstract class for samplers which iterate over the powerset of the complement of an index in the training set.
This is done in two nested loops, where the outer loop iterates over the set of indices, and the inner loop iterates over subsets of the complement of the current index. The outer iteration can be either sequential or at random.
processed together by
[UtilityEvaluator][pydvl.valuation.utility.evaluator.UtilityEvaluator].
index_iteration: the order in which indices are iterated over
Source code in src/pydvl/valuation/samplers/powerset.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
sample_limit
¶
sample_limit(indices: IndexSetT) -> int | None
Number of samples that can be generated from the indices.
Returns None if the number of samples is infinite, which is the case for most stochastic samplers.
Source code in src/pydvl/valuation/samplers/base.py
index_iterator
¶
index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]
Iterates over indices with the method specified at construction.
Source code in src/pydvl/valuation/samplers/powerset.py
weight
staticmethod
¶
Correction coming from Monte Carlo integration so that the mean of the marginals converges to the value: the uniform distribution over the powerset of a set with n-1 elements has mass 2^{n-1} over each subset.
Source code in src/pydvl/valuation/samplers/powerset.py
LOOSampler
¶
LOOSampler(batch_size: int = 1)
Bases: IndexSampler
Leave-One-Out sampler. For every index \(i\) in the set \(S\), the sample \((i, S_{-i})\) is returned.
New in version 0.10.0
processed by the
[EvaluationStrategy][pydvl.valuation.samplers.base.EvaluationStrategy]
Source code in src/pydvl/valuation/samplers/base.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
LOOEvaluationStrategy
¶
LOOEvaluationStrategy(
sampler: LOOSampler,
utility: UtilityBase,
coefficient: Callable[[int, int], float] | None = None,
)
Bases: EvaluationStrategy[LOOSampler, ValueUpdate]
Computes marginal values for LOO.
Source code in src/pydvl/valuation/samplers/powerset.py
DeterministicUniformSampler
¶
DeterministicUniformSampler(
index_iteration: Type[
SequentialIndexIteration | NoIndexIteration
] = SequentialIndexIteration,
batch_size: int = 1,
)
Bases: PowersetSampler
An iterator to perform uniform deterministic sampling of subsets.
For every index \(i\), each subset of the complement indices - {i}
is
returned.
Note
Outer indices are iterated over sequentially
Example
Source code in src/pydvl/valuation/samplers/powerset.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
weight
staticmethod
¶
Correction coming from Monte Carlo integration so that the mean of the marginals converges to the value: the uniform distribution over the powerset of a set with n-1 elements has mass 2^{n-1} over each subset.
Source code in src/pydvl/valuation/samplers/powerset.py
index_iterator
¶
index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]
Iterates over indices with the method specified at construction.
Source code in src/pydvl/valuation/samplers/powerset.py
UniformSampler
¶
UniformSampler(
batch_size: int = 1,
index_iteration: Type[IndexIteration] = SequentialIndexIteration,
seed: Seed | None = None,
)
Bases: StochasticSamplerMixin
, PowersetSampler
An iterator to perform uniform random sampling of subsets.
Iterating over every index \(i\), either in sequence or at random depending on
the value of index_iteration
, one subset of the complement
indices - {i}
is sampled with equal probability \(2^{n-1}\). The
iterator never ends.
Example
The code
Produces the output:Source code in src/pydvl/valuation/samplers/powerset.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
sample_limit
¶
sample_limit(indices: IndexSetT) -> int | None
Number of samples that can be generated from the indices.
Returns None if the number of samples is infinite, which is the case for most stochastic samplers.
Source code in src/pydvl/valuation/samplers/base.py
weight
staticmethod
¶
Correction coming from Monte Carlo integration so that the mean of the marginals converges to the value: the uniform distribution over the powerset of a set with n-1 elements has mass 2^{n-1} over each subset.
Source code in src/pydvl/valuation/samplers/powerset.py
index_iterator
¶
index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]
Iterates over indices with the method specified at construction.
Source code in src/pydvl/valuation/samplers/powerset.py
OwenSampler
¶
OwenSampler(
n_samples_outer: int,
n_samples_inner: int = 2,
batch_size: int = 1,
seed: Seed | None = None,
)
Bases: StochasticSamplerMixin
, PowersetSampler
A sampler for Owen shapley values.
For each index \(i\) the Owen sampler loops over a deterministic grid of probabilities
(containing n_samples_outer
entries between 0 and 1) and then draws
n_samples_inner
subsets of the complement of the current index where each element
is sampled with the given probability.
The total number of samples drawn is therefore n_samples_outer * n_samples_inner
.
PARAMETER | DESCRIPTION |
---|---|
n_samples_outer |
The number of entries in the probability grid used for the outer loop in Owen sampling.
TYPE:
|
n_samples_inner |
The number of samples drawn for each probability. In the original paper this was fixed to 2 for all experiments which is why we give it a default value of 2.
TYPE:
|
batch_size |
The batch size of the sampler.
TYPE:
|
seed |
The seed for the random number generator.
TYPE:
|
Source code in src/pydvl/valuation/samplers/powerset.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
index_iterator
¶
index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]
Iterates over indices with the method specified at construction.
Source code in src/pydvl/valuation/samplers/powerset.py
AntitheticOwenSampler
¶
AntitheticOwenSampler(
n_samples_outer: int,
n_samples_inner: int = 2,
batch_size: int = 1,
seed: Seed | None = None,
)
Bases: OwenSampler
A sampler for antithetic Owen shapley values.
For each index \(i\), the antithetic Owen sampler loops over a deterministic grid of
probabilities (containing n_samples_outer
entries between 0 and 0.5) and then
draws n_samples_inner
subsets of the complement of the current index where each
element is sampled with the given probability. For each sample obtained that way,
a second sample is generated by taking the complement of the first sample.
The total number of samples drawn is therefore
2 * n_samples_outer * n_samples_inner
.
For the same number of total samples, the antithetic Owen sampler yields usually more precise estimates of shapley values than the regular Owen sampler.
PARAMETER | DESCRIPTION |
---|---|
n_samples_outer |
The number of entries in the probability grid used for the outer loop in Owen sampling.
TYPE:
|
n_samples_inner |
The number of samples drawn for each probability. In the original paper this was fixed to 2 for all experiments which is why we give it a default value of 2.
TYPE:
|
batch_size |
The batch size of the sampler.
TYPE:
|
seed |
The seed for the random number generator.
TYPE:
|
Source code in src/pydvl/valuation/samplers/powerset.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
index_iterator
¶
index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]
Iterates over indices with the method specified at construction.
Source code in src/pydvl/valuation/samplers/powerset.py
AntitheticSampler
¶
Bases: StochasticSamplerMixin
, PowersetSampler
An iterator to perform uniform random sampling of subsets, and their complements.
Works as UniformSampler, but for every tuple \((i,S)\), it subsequently returns \((i,S^c)\), where \(S^c\) is the complement of the set \(S\) in the set of indices, excluding \(i\).
Source code in src/pydvl/valuation/samplers/utils.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
sample_limit
¶
sample_limit(indices: IndexSetT) -> int | None
Number of samples that can be generated from the indices.
Returns None if the number of samples is infinite, which is the case for most stochastic samplers.
Source code in src/pydvl/valuation/samplers/base.py
weight
staticmethod
¶
Correction coming from Monte Carlo integration so that the mean of the marginals converges to the value: the uniform distribution over the powerset of a set with n-1 elements has mass 2^{n-1} over each subset.
Source code in src/pydvl/valuation/samplers/powerset.py
index_iterator
¶
index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]
Iterates over indices with the method specified at construction.
Source code in src/pydvl/valuation/samplers/powerset.py
UniformStratifiedSampler
¶
Bases: StochasticSamplerMixin
, PowersetSampler
For every index, sample a set size, then a set of that size.
Source code in src/pydvl/valuation/samplers/utils.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
sample_limit
¶
sample_limit(indices: IndexSetT) -> int | None
Number of samples that can be generated from the indices.
Returns None if the number of samples is infinite, which is the case for most stochastic samplers.
Source code in src/pydvl/valuation/samplers/base.py
weight
staticmethod
¶
Correction coming from Monte Carlo integration so that the mean of the marginals converges to the value: the uniform distribution over the powerset of a set with n-1 elements has mass 2^{n-1} over each subset.
Source code in src/pydvl/valuation/samplers/powerset.py
index_iterator
¶
index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]
Iterates over indices with the method specified at construction.
Source code in src/pydvl/valuation/samplers/powerset.py
TruncatedUniformStratifiedSampler
¶
TruncatedUniformStratifiedSampler(
*,
lower_bound: int,
upper_bound: int,
index_iteration: Type[IndexIteration] = SequentialIndexIteration,
seed: Seed | None = None
)
Bases: UniformStratifiedSampler
A sampler which samples set sizes between two bounds.
This sampler was suggested in (Watson et al. 2023)1 for \(\delta\)-Shapley
New in version 0.10.0
Source code in src/pydvl/valuation/samplers/powerset.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
sample_limit
¶
sample_limit(indices: IndexSetT) -> int | None
Number of samples that can be generated from the indices.
Returns None if the number of samples is infinite, which is the case for most stochastic samplers.
Source code in src/pydvl/valuation/samplers/base.py
weight
staticmethod
¶
Correction coming from Monte Carlo integration so that the mean of the marginals converges to the value: the uniform distribution over the powerset of a set with n-1 elements has mass 2^{n-1} over each subset.
Source code in src/pydvl/valuation/samplers/powerset.py
index_iterator
¶
index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]
Iterates over indices with the method specified at construction.
Source code in src/pydvl/valuation/samplers/powerset.py
VarianceReducedStratifiedSampler
¶
VarianceReducedStratifiedSampler(
samples_per_setsize: Callable[[int], int],
index_iteration: Type[IndexIteration] = SequentialIndexIteration,
)
Bases: StochasticSamplerMixin
, PowersetSampler
VRDS sampler.
This sampler was suggested in (Wu et al. 2023)3, a generalization of the stratified sampler in (Maleki et al. 2014)4
PARAMETER | DESCRIPTION |
---|---|
samples_per_setsize |
A function which returns the number of samples to take for a given set size. |
index_iteration |
the order in which indices are iterated over
TYPE:
|
New in version 0.10.0
Source code in src/pydvl/valuation/samplers/powerset.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
sample_limit
¶
sample_limit(indices: IndexSetT) -> int | None
Number of samples that can be generated from the indices.
Returns None if the number of samples is infinite, which is the case for most stochastic samplers.
Source code in src/pydvl/valuation/samplers/base.py
index_iterator
¶
index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]
Iterates over indices with the method specified at construction.
Source code in src/pydvl/valuation/samplers/powerset.py
complement
¶
Returns the complement of the set of indices excluding the given indices.
PARAMETER | DESCRIPTION |
---|---|
include |
The set of indices to consider.
TYPE:
|
exclude |
The indices to exclude from the complement.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
NDArray[IndexT]
|
The complement of the set of indices excluding the given indices. |