pydvl.valuation.samplers.owen
¶
Samplers for Owen Shapley values.
The Owen Shapley value is a sampling-based method to estimate Shapley values. It samples probability values between 0 and 1 and then draws subsets of the complement of the current index where each element is sampled with the given probability.
Possible configurations¶
The basic sampler is OwenSampler. It can be configured with a deterministic grid of probability values or a uniform distribution between 0 and 1. The former follows the idea of the original paper (Okhrati and Lipani, 2021)1.
This strategy for the sampling of probability values can be specified with the GridOwenStrategy or UniformOwenStrategy classes.
In addition, and as is the case for all PowerSetSamplers, one can configure the way the sampler iterates over indices to be updated. This can be done with the IndexIteration strategy.
When using infinite index iteration, the sampler can be used with a stopping criterion to estimate Shapley values. This follows more closely the typical usage pattern in PyDVL than the original sampling method described in Okhrati and Lipani (2021)1.
Antithetic Owen sampling¶
We also provide an AntitheticOwenSampler, which draws probability values \(q\) between 0 and 0.5 again, either deterministically over a discrete grid or at uniformly at random, and then generates two samples for each index, one with the probability \(q\) and another with probability \(1-q\).
It can be configured in the same manner as the regular Owen sampler.
The four main configurations
standard_owen = OwenSampler(
outer_sampling_strategy=GridOwenStrategy(n_samples_outer=100),
n_samples_inner=2,
index_iteration=FiniteSequentialIndexIteration,
)
antithetic_owen = AntitheticOwenSampler(
outer_sampling_strategy=GridOwenStrategy(n_samples_outer=100),
n_samples_inner=2,
index_iteration=FiniteSequentialIndexIteration,
)
infinite_owen = OwenSampler(
outer_sampling_strategy=UniformOwenStrategy(seed=42),
n_samples_inner=2,
index_iteration=RandomIndexIteration,
seed=42
)
infinite_antithetic_owen = AntitheticOwenSampler(
outer_sampling_strategy=UniformOwenStrategy(seed=42),
n_samples_inner=2,
index_iteration=RandomIndexIteration,
seed=42
)
References¶
-
Okhrati, R., Lipani, A., 2021. A Multilinear Sampling Algorithm to Estimate Shapley Values. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 7992–7999. IEEE. ↩↩
UniformOwenStrategy
¶
UniformOwenStrategy(n_samples_outer: int, seed: Seed | None = None)
Bases: OwenStrategy
A strategy for OwenSampler to sample probability values uniformly between 0 and \(q_ ext{stop}\).
PARAMETER | DESCRIPTION |
---|---|
n_samples_outer
|
The number of probability values \(q\) used for the outer loop. Since samples are taken anew for each index, a high number will delay updating new indices and has no effect on the final accuracy if using an infinite index iteration. In general, it only makes sense to change this number if using a finite index iteration.
TYPE:
|
seed
|
The seed for the random number generator.
TYPE:
|
Source code in src/pydvl/valuation/samplers/owen.py
GridOwenStrategy
¶
GridOwenStrategy(n_samples_outer: int)
Bases: OwenStrategy
A strategy for OwenSampler to sample probability values on a linear grid.
PARAMETER | DESCRIPTION |
---|---|
n_samples_outer
|
The number of probability values \(q\) used for the outer loop. These will be linearly spaced between 0 and \(q_ ext{stop}\).
TYPE:
|
Source code in src/pydvl/valuation/samplers/owen.py
OwenSampler
¶
OwenSampler(
outer_sampling_strategy: OwenStrategy,
n_samples_inner: int = 2,
batch_size: int = 1,
index_iteration: Type[IndexIteration] = FiniteSequentialIndexIteration,
seed: Seed | None = None,
)
Bases: StochasticSamplerMixin
, PowersetSampler
A sampler for semi-values using the Owen method.
For each index \(i\) we sample n_samples_outer
probability values \(q_j\) between 0
and 1 and then, for each \(j\) we draw n_samples_inner
subsets of the complement of
the current index where each element is sampled probability \(q_j\).
The distribution for the outer sampling can be either uniform or deterministic. The default is deterministic on a grid, which is the original method described in Okhrati and Lipani (2021)1. This can be achieved by using the GridOwenStrategy strategy.
Alternatively, the distribution can be uniform between 0 and 1. This can be achieved by using the UniformOwenStrategy strategy.
By combining a UniformOwenStrategy with an infinite IndexIteration strategy, this sampler can be used with a stopping criterion to estimate semi-values. This follows more closely the typical usage pattern in PyDVL than the original sampling method described in Okhrati and Lipani (2021)1.
Example usage
PARAMETER | DESCRIPTION |
---|---|
n_samples_inner
|
The number of samples drawn for each probability. In the original paper this was fixed to 2 for all experiments.
TYPE:
|
batch_size
|
The batch size of the sampler.
TYPE:
|
index_iteration
|
The index iteration strategy, sequential or random, finite or infinite.
TYPE:
|
seed
|
The seed for the random number generator.
TYPE:
|
Source code in src/pydvl/valuation/samplers/owen.py
interrupt
¶
__len__
¶
__len__() -> int
Returns the length of the current sample generation in generate_batches.
RAISES | DESCRIPTION |
---|---|
`TypeError`
|
if the sampler is infinite or generate_batches has not been called yet. |
Source code in src/pydvl/valuation/samplers/base.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
result_updater
¶
result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]
Returns a callable that updates a valuation result with a value update.
Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.
PARAMETER | DESCRIPTION |
---|---|
result
|
The result to update
TYPE:
|
Returns: A callable object that updates the result with a value update
Source code in src/pydvl/valuation/samplers/base.py
index_iterator
¶
index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]
Iterates over indices with the method specified at construction.
Source code in src/pydvl/valuation/samplers/powerset.py
log_weight
¶
For each \(q_j, j \in \{1, ..., N\}\) in the outer probabilities, the probability of drawing a subset \(S_k\) of size \(k\) is:
So, if each \(q_j\) is chosen with equal weight (or more generally with probability \(p_j\)),then by total probability, the overall probability of obtaining a subset of size \(k\) is a mixture of the binomials: $$ P (| S | = k) = \sum_{j = 1}^N p_j \ \binom{n}{k} \ q_j^k (1 - q_j)^{n - k}. $$
In our case \(p_j = 1/N\), so that \(P(|S|=k) = \frac{1}{N} \sum_{j=1}^N P (| S_{q_j} | = k)\). For large enough \(N\) this is
where we computed the integral using the beta function and its expression as products of gamma functions.
Now, given the symmetry wrt. the indices in the sampling procedure, any given set \(S\) of size \(k\) is equally likely to be drawn. So the probability of a set being of size \(k\) must be equally divided by the number of sets of that size, and the weight of a set of size \(k\) is:
PARAMETER | DESCRIPTION |
---|---|
n
|
Size of the index set.
TYPE:
|
subset_len
|
Size of the subset.
TYPE:
|
Returns:
The logarithm of the weight of a subset of size subset_len
.
Source code in src/pydvl/valuation/samplers/owen.py
sample_limit
¶
sample_limit(indices: IndexSetT) -> int | None
The number of samples that will be generated by the sampler.
PARAMETER | DESCRIPTION |
---|---|
indices
|
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
int | None
|
0 if there are no indices, |
int | None
|
samples otherwise. |
Source code in src/pydvl/valuation/samplers/owen.py
AntitheticOwenSampler
¶
AntitheticOwenSampler(
outer_sampling_strategy: OwenStrategy,
n_samples_inner: int = 2,
batch_size: int = 1,
index_iteration: Type[IndexIteration] = FiniteSequentialIndexIteration,
seed: Seed | None = None,
)
Bases: OwenSampler
A sampler for antithetic Owen shapley values.
For each sample obtained with the method of OwenSampler, a second sample is generated by taking the complement of the first sample.
For the same number of total samples, the antithetic Owen sampler yields usually more precise estimates of shapley values than the regular Owen sampler.
Source code in src/pydvl/valuation/samplers/owen.py
interrupt
¶
__len__
¶
__len__() -> int
Returns the length of the current sample generation in generate_batches.
RAISES | DESCRIPTION |
---|---|
`TypeError`
|
if the sampler is infinite or generate_batches has not been called yet. |
Source code in src/pydvl/valuation/samplers/base.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
log_weight
¶
For each \(q_j, j \in \{1, ..., N\}\) in the outer probabilities, the probability of drawing a subset \(S_k\) of size \(k\) is:
So, if each \(q_j\) is chosen with equal weight (or more generally with probability \(p_j\)),then by total probability, the overall probability of obtaining a subset of size \(k\) is a mixture of the binomials: $$ P (| S | = k) = \sum_{j = 1}^N p_j \ \binom{n}{k} \ q_j^k (1 - q_j)^{n - k}. $$
In our case \(p_j = 1/N\), so that \(P(|S|=k) = \frac{1}{N} \sum_{j=1}^N P (| S_{q_j} | = k)\). For large enough \(N\) this is
where we computed the integral using the beta function and its expression as products of gamma functions.
Now, given the symmetry wrt. the indices in the sampling procedure, any given set \(S\) of size \(k\) is equally likely to be drawn. So the probability of a set being of size \(k\) must be equally divided by the number of sets of that size, and the weight of a set of size \(k\) is:
PARAMETER | DESCRIPTION |
---|---|
n
|
Size of the index set.
TYPE:
|
subset_len
|
Size of the subset.
TYPE:
|
Returns:
The logarithm of the weight of a subset of size subset_len
.
Source code in src/pydvl/valuation/samplers/owen.py
result_updater
¶
result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]
Returns a callable that updates a valuation result with a value update.
Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.
PARAMETER | DESCRIPTION |
---|---|
result
|
The result to update
TYPE:
|
Returns: A callable object that updates the result with a value update
Source code in src/pydvl/valuation/samplers/base.py
index_iterator
¶
index_iterator(indices: IndexSetT) -> Generator[IndexT | None, None, None]
Iterates over indices with the method specified at construction.