pydvl.valuation.samplers.permutation
¶
Permutation samplers draw permutations \(\sigma^N\) from the index set \(N\) uniformly at random and iterate through it returning the sets:
The probability of sampling a set¶
Read on below for the actual formula
This section describes the general case. However, because of how the samplers are used to compute marginals, the effective sampling probabilities are slightly different.
Let \(k\) be the size of a sampled set \(S\) and \(n=|N|\) the size of the index set. By the product rule:
Now, for a uniformly chosen permutation, every subset of size \(k\) appears as the prefix (the first \(k\) elements) with probability
and, because we iterate from left to right, each size \(k\) is sampled with probability \(p(k) = 1/n.\) Hence:
The same argument applies to the DeterministicPermutationSampler and PermutationSampler, but also to AntitheticPermutationSampler. For the latter, note that the sampling process is symmetric.
The case of a fixed index when computing marginals¶
However, when computing marginal utilities, which is what the samplers are used for, the situation is slightly different since we update indices \(i\) sequentially. In Sampling strategies for semi-values a more convoluted argument is made, whereby one splits the permutation by the index \(i\). This ends up being similar to what we found above:
Therefore, log_weight() is valid, when computing marginal utilities.
References¶
-
Mitchell, Rory, Joshua Cooper, Eibe Frank, and Geoffrey Holmes. Sampling Permutations for Shapley Value Estimation. Journal of Machine Learning Research 23, no. 43 (2022): 1–46. ↩
-
Watson, Lauren, Zeno Kujawa, Rayna Andreeva, Hao-Tsung Yang, Tariq Elahi, and Rik Sarkar. Accelerated Shapley Value Approximation for Data Evaluation. arXiv, 9 November 2023. ↩
AntitheticPermutationSampler
¶
AntitheticPermutationSampler(
truncation: TruncationPolicy | None = None,
seed: Seed | None = None,
batch_size: int = 1,
)
Bases: PermutationSampler
Samples permutations like PermutationSampler, but after each permutation, it returns the same permutation in reverse order.
This sampler was suggested in (Mitchell et al. 2022)1
Batching
Even though this sampler supports batching, it is not recommended to use it since the PermutationEvaluationStrategy processes whole permutations in one go, effectively batching the computation of up to n-1 marginal utilities in one process.
PARAMETER | DESCRIPTION |
---|---|
truncation
|
A policy to stop the permutation early.
TYPE:
|
seed
|
Seed for the random number generator.
TYPE:
|
batch_size
|
The number of samples (full permutations) to generate at once.
TYPE:
|
New in version 0.7.1
Source code in src/pydvl/valuation/samplers/permutation.py
__len__
¶
__len__() -> int
Returns the length of the current sample generation in generate_batches.
RAISES | DESCRIPTION |
---|---|
`TypeError`
|
if the sampler is infinite or generate_batches has not been called yet. |
Source code in src/pydvl/valuation/samplers/base.py
complement_size
¶
Size of the complement of an index wrt. set size n
.
Required in certain coefficient computations. Even though we are sampling permutations, updates are always done per-index and the size of the complement is always \(n-1\).
Source code in src/pydvl/valuation/samplers/permutation.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
interrupt
¶
log_weight
¶
Log probability of sampling a set S from a set of size n-1.
See the module's documentation for details.
Source code in src/pydvl/valuation/samplers/permutation.py
result_updater
¶
result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]
Returns an object that updates a valuation result with a value update.
Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.
PARAMETER | DESCRIPTION |
---|---|
result
|
The result to update
TYPE:
|
Returns: A callable object that updates the result with a value update
Source code in src/pydvl/valuation/samplers/base.py
DeterministicPermutationSampler
¶
DeterministicPermutationSampler(
*args,
truncation: TruncationPolicy | None = None,
batch_size: int = 1,
**kwargs,
)
Bases: PermutationSamplerBase
Samples all n! permutations of the indices deterministically.
Batching
Even though this sampler supports batching, it is not recommended to use it since the PermutationEvaluationStrategy processes whole permutations in one go, effectively batching the computation of up to n-1 marginal utilities in one process.
Source code in src/pydvl/valuation/samplers/permutation.py
skip_indices
property
writable
¶
Indices being skipped in the sampler. The exact behaviour will be sampler-dependent, so that setting this property is disabled by default.
__len__
¶
__len__() -> int
Returns the length of the current sample generation in generate_batches.
RAISES | DESCRIPTION |
---|---|
`TypeError`
|
if the sampler is infinite or generate_batches has not been called yet. |
Source code in src/pydvl/valuation/samplers/base.py
complement_size
¶
Size of the complement of an index wrt. set size n
.
Required in certain coefficient computations. Even though we are sampling permutations, updates are always done per-index and the size of the complement is always \(n-1\).
Source code in src/pydvl/valuation/samplers/permutation.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
interrupt
¶
log_weight
¶
Log probability of sampling a set S from a set of size n-1.
See the module's documentation for details.
Source code in src/pydvl/valuation/samplers/permutation.py
result_updater
¶
result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]
Returns an object that updates a valuation result with a value update.
Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.
PARAMETER | DESCRIPTION |
---|---|
result
|
The result to update
TYPE:
|
Returns: A callable object that updates the result with a value update
Source code in src/pydvl/valuation/samplers/base.py
PermutationEvaluationStrategy
¶
PermutationEvaluationStrategy(
sampler: PermutationSamplerBase,
utility: UtilityBase,
coefficient: SemivalueCoefficient | None,
)
Bases: EvaluationStrategy[PermutationSamplerBase, ValueUpdate]
Computes marginal values for permutation sampling schemes.
This strategy iterates over permutations from left to right, computing the marginal utility wrt. the previous one at each step to save computation.
The TruncationPolicy of the sampler is copied and reset for each permutation in a batch.
Source code in src/pydvl/valuation/samplers/permutation.py
PermutationSampler
¶
PermutationSampler(
truncation: TruncationPolicy | None = None,
seed: Seed | None = None,
batch_size: int = 1,
)
Bases: StochasticSamplerMixin
, PermutationSamplerBase
Samples permutations of indices.
Batching
Even though this sampler supports batching, it is not recommended to use it since the PermutationEvaluationStrategy processes whole permutations in one go, effectively batching the computation of up to n-1 marginal utilities in one process.
PARAMETER | DESCRIPTION |
---|---|
truncation
|
A policy to stop the permutation early.
TYPE:
|
seed
|
Seed for the random number generator.
TYPE:
|
batch_size
|
The number of samples (full permutations) to generate at once.
TYPE:
|
Source code in src/pydvl/valuation/samplers/permutation.py
__len__
¶
__len__() -> int
Returns the length of the current sample generation in generate_batches.
RAISES | DESCRIPTION |
---|---|
`TypeError`
|
if the sampler is infinite or generate_batches has not been called yet. |
Source code in src/pydvl/valuation/samplers/base.py
complement_size
¶
Size of the complement of an index wrt. set size n
.
Required in certain coefficient computations. Even though we are sampling permutations, updates are always done per-index and the size of the complement is always \(n-1\).
Source code in src/pydvl/valuation/samplers/permutation.py
generate
¶
Generates the permutation samples.
PARAMETER | DESCRIPTION |
---|---|
indices
|
The indices to sample from. If empty, no samples are generated. If skip_indices is set, these indices are removed from the set before generating the permutation.
TYPE:
|
Source code in src/pydvl/valuation/samplers/permutation.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
interrupt
¶
log_weight
¶
Log probability of sampling a set S from a set of size n-1.
See the module's documentation for details.
Source code in src/pydvl/valuation/samplers/permutation.py
result_updater
¶
result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]
Returns an object that updates a valuation result with a value update.
Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.
PARAMETER | DESCRIPTION |
---|---|
result
|
The result to update
TYPE:
|
Returns: A callable object that updates the result with a value update
Source code in src/pydvl/valuation/samplers/base.py
PermutationSamplerBase
¶
PermutationSamplerBase(
*args,
truncation: TruncationPolicy | None = None,
batch_size: int = 1,
**kwargs,
)
Bases: IndexSampler
, ABC
Base class for permutation samplers.
Source code in src/pydvl/valuation/samplers/permutation.py
skip_indices
property
writable
¶
Indices being skipped in the sampler. The exact behaviour will be sampler-dependent, so that setting this property is disabled by default.
__len__
¶
__len__() -> int
Returns the length of the current sample generation in generate_batches.
RAISES | DESCRIPTION |
---|---|
`TypeError`
|
if the sampler is infinite or generate_batches has not been called yet. |
Source code in src/pydvl/valuation/samplers/base.py
complement_size
¶
Size of the complement of an index wrt. set size n
.
Required in certain coefficient computations. Even though we are sampling permutations, updates are always done per-index and the size of the complement is always \(n-1\).
Source code in src/pydvl/valuation/samplers/permutation.py
generate
abstractmethod
¶
Generates single samples.
IndexSampler.generate_batches()
will batch these samples according to the
batch size set upon construction.
PARAMETER | DESCRIPTION |
---|---|
indices
|
TYPE:
|
YIELDS | DESCRIPTION |
---|---|
SampleGenerator
|
A tuple (idx, subset) for each sample. |
Source code in src/pydvl/valuation/samplers/base.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
interrupt
¶
log_weight
¶
Log probability of sampling a set S from a set of size n-1.
See the module's documentation for details.
Source code in src/pydvl/valuation/samplers/permutation.py
result_updater
¶
result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]
Returns an object that updates a valuation result with a value update.
Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.
PARAMETER | DESCRIPTION |
---|---|
result
|
The result to update
TYPE:
|
Returns: A callable object that updates the result with a value update
Source code in src/pydvl/valuation/samplers/base.py
sample_limit
abstractmethod
¶
sample_limit(indices: IndexSetT) -> int | None
Number of samples that can be generated from the indices.
PARAMETER | DESCRIPTION |
---|---|
indices
|
The indices used in the sampler.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
int | None
|
The maximum number of samples that will be generated, or |
Source code in src/pydvl/valuation/samplers/base.py
TruncationPolicy
¶
Bases: ABC
A policy for deciding whether to stop computation of a batch of samples
Statistics are kept on the total number of calls and truncations as n_calls
and
n_truncations
respectively.
ATTRIBUTE | DESCRIPTION |
---|---|
n_calls |
Number of calls to the policy.
TYPE:
|
n_truncations |
Number of truncations made by the policy.
TYPE:
|
Todo
Because the policy objects are copied to the workers, the statistics are not accessible from the coordinating process. We need to add methods for this.
Source code in src/pydvl/valuation/samplers/truncation.py
__call__
¶
Check whether the computation should be interrupted.
PARAMETER | DESCRIPTION |
---|---|
idx
|
Position in the batch currently being computed.
TYPE:
|
score
|
Last utility computed.
TYPE:
|
batch_size
|
Size of the batch being computed.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
|
Source code in src/pydvl/valuation/samplers/truncation.py
reset
abstractmethod
¶
reset(utility: UtilityBase)