pydvl.valuation.samplers.base
¶
Base classes for samplers and evaluation strategies.
See pydvl.valuation.samplers for details.
ResultUpdater
¶
ResultUpdater(result: ValuationResult)
IndexSampler
¶
IndexSampler(batch_size: int = 1)
Bases: ABC
, Generic[ValueUpdateT]
Samplers are custom iterables over batches of subsets of indices.
Calling from_indices(indexset)
on a sampler returns a generator over batches
of Samples
. A Sample is a tuple of the form
\((i, S)\), where \(i\) is an index of interest, and \(S \subset I \setminus \{i\}\) is a
subset of the complement of \(i\) in \(I\).
Note
Samplers are not iterators themselves, so that each call to
from_indices(data)
e.g. in a new for loop creates a new iterator.
Derived samplers must implement log_weight() and generate(). See the module's documentation for more on these.
Interrupting samplers¶
Calling interrupt() on a sampler will stop the batched generator after the current batch has been yielded.
PARAMETER | DESCRIPTION |
---|---|
batch_size
|
The number of samples to generate per batch. Batches are processed by EvaluationStrategy so that individual valuations in batch are guaranteed to be received in the right sequence.
TYPE:
|
Example
processed by the
[EvaluationStrategy][pydvl.valuation.samplers.base.EvaluationStrategy]
Source code in src/pydvl/valuation/samplers/base.py
skip_indices
property
writable
¶
Indices being skipped in the sampler. The exact behaviour will be sampler-dependent, so that setting this property is disabled by default.
interrupt
¶
__len__
¶
__len__() -> int
Returns the length of the current sample generation in generate_batches.
RAISES | DESCRIPTION |
---|---|
`TypeError`
|
if the sampler is infinite or generate_batches has not been called yet. |
Source code in src/pydvl/valuation/samplers/base.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
sample_limit
abstractmethod
¶
sample_limit(indices: IndexSetT) -> int | None
Number of samples that can be generated from the indices.
PARAMETER | DESCRIPTION |
---|---|
indices
|
The indices used in the sampler.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
int | None
|
The maximum number of samples that will be generated, or |
Source code in src/pydvl/valuation/samplers/base.py
generate
abstractmethod
¶
Generates single samples.
IndexSampler.generate_batches()
will batch these samples according to the
batch size set upon construction.
PARAMETER | DESCRIPTION |
---|---|
indices
|
TYPE:
|
YIELDS | DESCRIPTION |
---|---|
SampleGenerator
|
A tuple (idx, subset) for each sample. |
Source code in src/pydvl/valuation/samplers/base.py
log_weight
abstractmethod
¶
Factor by which to multiply Monte Carlo samples, so that the mean converges to the desired expression.
Log-space computation
Because the weight is a probability that can be arbitrarily small, we compute it in log-space for numerical stability.
By the Law of Large Numbers, the sample mean of \(f(S_j)\) converges to the expectation under the distribution from which \(S_j\) is sampled.
We add the factor \(w(S_j)\) in order to have this expectation coincide with the desired expression, by cancelling out \(\mathbb{P} (S)\).
PARAMETER | DESCRIPTION |
---|---|
n
|
The size of the index set. Note that the actual size of the set being sampled will often be n-1, as one index might be removed from the set. See IndexIteration for more.
TYPE:
|
subset_len
|
The size of the subset being sampled
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
The natural logarithm of the probability of sampling a set of the given
size, when the index set has size |
Source code in src/pydvl/valuation/samplers/base.py
make_strategy
abstractmethod
¶
make_strategy(
utility: UtilityBase,
log_coefficient: Callable[[int, int], float] | None = None,
) -> EvaluationStrategy
Returns the strategy for this sampler.
Source code in src/pydvl/valuation/samplers/base.py
result_updater
¶
result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]
Returns a callable that updates a valuation result with a value update.
Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.
PARAMETER | DESCRIPTION |
---|---|
result
|
The result to update
TYPE:
|
Returns: A callable object that updates the result with a value update
Source code in src/pydvl/valuation/samplers/base.py
LogResultUpdater
¶
LogResultUpdater(result: ValuationResult)
Bases: ResultUpdater[ValueUpdateT]
Updates a valuation result with a value update in log-space.
Source code in src/pydvl/valuation/samplers/base.py
EvaluationStrategy
¶
EvaluationStrategy(
sampler: SamplerT,
utility: UtilityBase,
log_coefficient: Callable[[int, int], float] | None = None,
)
Bases: ABC
, Generic[SamplerT, ValueUpdateT]
An evaluation strategy for samplers.
Implements the processing strategy for batches returned by an IndexSampler.
Different sampling schemes require different strategies for the evaluation of the utilities. For instance permutations generated by PermutationSampler must be evaluated in sequence to save computation, see PermutationEvaluationStrategy.
This class defines the common interface.
Usage pattern in valuation methods
def fit(self, data: Dataset):
self.utility = self.utility.with_dataset(data)
strategy = self.sampler.strategy(self.utility, self.log_coefficient)
delayed_batches = Parallel()(
delayed(strategy.process)(batch=list(batch), is_interrupted=flag)
for batch in self.sampler
)
for batch in delayed_batches:
for evaluation in batch:
self.result.update(evaluation.idx, evaluation.update)
if self.is_done(self.result):
flag.set()
break
PARAMETER | DESCRIPTION |
---|---|
sampler
|
Required to set up some strategies.
TYPE:
|
utility
|
Required to set up some strategies and to process the samples. Since this contains the training data, it is expensive to pickle and send to workers.
TYPE:
|
log_coefficient
|
An additional coefficient to multiply marginals with. This depends on the valuation method, hence the delayed setup. |
Source code in src/pydvl/valuation/samplers/base.py
process
abstractmethod
¶
process(
batch: SampleBatch, is_interrupted: NullaryPredicate
) -> list[ValueUpdateT]
Processes batches of samples using the evaluator, with the strategy required for the sampler.
Warning
This method is intended to be used by the evaluator to process the samples in one batch, which means it might be sent to another process. Be careful with the objects you use here, as they will be pickled and sent over the wire.
PARAMETER | DESCRIPTION |
---|---|
batch
|
A batch of samples to process.
TYPE:
|
is_interrupted
|
A predicate that returns True if the processing should be interrupted.
TYPE:
|
YIELDS | DESCRIPTION |
---|---|
list[ValueUpdateT]
|
Updates to values as tuples (idx, update) |