Skip to content

pydvl.valuation.samplers.msr

MSRSampler

MSRSampler(batch_size: int = 1, seed: Seed | None = None)

Bases: StochasticSamplerMixin, IndexSampler

Sampler for unweighted Maximum Sample Re-use (MSR) valuation.

This is similar to a UniformSampler without an outer index.

PARAMETER DESCRIPTION
batch_size

Number of samples to generate in each batch.

TYPE: int DEFAULT: 1

seed

Seed for the random number generator.

TYPE: Seed | None DEFAULT: None

Source code in src/pydvl/valuation/samplers/msr.py
def __init__(self, batch_size: int = 1, seed: Seed | None = None):
    super().__init__(batch_size=batch_size, seed=seed)

generate_batches

generate_batches(indices: IndexSetT) -> BatchGenerator

Batches the samples and yields them.

Source code in src/pydvl/valuation/samplers/base.py
def generate_batches(self, indices: IndexSetT) -> BatchGenerator:
    """Batches the samples and yields them."""

    # create an empty generator if the indices are empty. `generate_batches` is
    # a generator function because it has a yield statement later in its body.
    # Inside generator function, `return` acts like a `break`, which produces an
    # empty generator function. See: https://stackoverflow.com/a/13243870
    if len(indices) == 0:
        return

    self._interrupted = False
    self._n_samples = 0
    for batch in chunked(self._generate(indices), self.batch_size):
        yield batch
        self._n_samples += len(batch)
        if self._interrupted:
            break

sample_limit

sample_limit(indices: IndexSetT) -> int | None

Number of samples that can be generated from the indices.

Returns None if the number of samples is infinite, which is the case for most stochastic samplers.

Source code in src/pydvl/valuation/samplers/base.py
def sample_limit(self, indices: IndexSetT) -> int | None:
    """Number of samples that can be generated from the indices.

    Returns None if the number of samples is infinite, which is the case for most
    stochastic samplers.
    """
    if len(indices) == 0:
        out = 0
    else:
        out = None
    return out

MSREvaluationStrategy

MSREvaluationStrategy(
    sampler: SamplerT,
    utility: UtilityBase,
    coefficient: Callable[[int, int], float] | None = None,
)

Bases: EvaluationStrategy[SamplerT, MSRValueUpdate]

Evaluation strategy for Maximum Sample Re-use (MSR) valuation.

The MSR evaluation strategy makes one utility evaluation per sample but generates n_indices many updates from it. The updates will be used to update two running means that will later be combined into on final value. We send the ValueUpdate.kind field to ValueUpdateKind.POSITVE or ValueUpdateKind.NEGATIVE to decide which of the two running means is going to be updated.

Source code in src/pydvl/valuation/samplers/base.py
def __init__(
    self,
    sampler: SamplerT,
    utility: UtilityBase,
    coefficient: Callable[[int, int], float] | None = None,
):
    self.utility = utility
    self.n_indices = (
        len(utility.training_data.indices)
        if utility.training_data is not None
        else 0
    )
    self.coefficient: Callable[[int, int], float] = lambda n, k: 1.0

    if sampler is not None:
        if coefficient is not None:

            def coefficient_fun(n: int, subset_len: int) -> float:
                return sampler.weight(n, subset_len) * coefficient(n, subset_len)

            self.coefficient = coefficient_fun
        else:
            self.coefficient = sampler.weight