pydvl.valuation.samplers
¶
Samplers iterate over subsets of indices.
The classes in this module are used to iterate over indices, and subsets of their complement in the whole set, as required for the computation of marginal utilities for semi-values and other marginal-utility based methods.
Subclasses of IndexSampler are iterators over batches of [Samples][pydvl.valuation.samplers.Sample] of the form \((i, S)\), where \(i\) is an index of interest, and \(S \subset I \setminus \{i\}\) is a subset of the complement of \(i\).
The samplers are used by all game-theoretic valuation methods, as well as for LOO and any other marginal-contribution-based method which iterates over subsets of the training data.
Sampler evaluation¶
Because different samplers require different strategies for evaluating the utility of the subsets, the samplers are used in conjunction with an EvaluationStrategy. The basic usage pattern inside a valuation method is the following:
def fit(self, data: Dataset):
self.utility.training_data = data
strategy = self.sampler.strategy(self.utility, self.coefficient)
delayed_batches = Parallel()(
delayed(strategy.process)(batch=list(batch), is_interrupted=flag)
for batch in self.sampler
)
for batch in delayed_batches:
for evaluation in batch:
self.result.update(evaluation.idx, evaluation.update)
if self.is_done(self.result):
flag.set()
break
See more on the EvaluationStrategy class.
Creating custom samplers¶
To create a custom sampler, subclass either PowersetSampler or PermutationSampler, or implement the IndexSampler interface directly.
There are two main methods to implement (and others that can be overridden):
- [generate()][pydvl.valuation.samplers.IndexSampler.generate], which yields samples of the
form \((i, S)\). These will be batched together by
__iter__
. ForPermutationSampler
, the batch size is always the number of indices since permutations must always be processed in full. - weight() to provide a factor by which to multiply Monte Carlo samples in stochastic methods, so that the mean converges to the desired expression.
Additionally, if the sampler requires a dedicated evaluation strategy different from
the marginal evaluations for PowersetSampler
or the successive evaluations for
PermutationSampler
, you need to subclass
EvaluationStrategy and set the
strategy_cls
attribute of the sampler to this class.
Changed in version 0.10.0
All the samplers in this module have been changed to work with the new evaluation strategies.
References¶
-
Mitchell, Rory, Joshua Cooper, Eibe Frank, and Geoffrey Holmes. Sampling Permutations for Shapley Value Estimation. Journal of Machine Learning Research 23, no. 43 (2022): 1–46. ↩
-
Watson, Lauren, Zeno Kujawa, Rayna Andreeva, Hao-Tsung Yang, Tariq Elahi, and Rik Sarkar. Accelerated Shapley Value Approximation for Data Evaluation. arXiv, 9 November 2023. ↩