pydvl.valuation.samplers.classwise
¶
Class-wise sampler for the class-wise Shapley valuation method.
The class-wise Shapley method, introduced by Schoch et al., 20221, uses a so-called set-conditional marginal Shapley value that requires selectively sampling subsets of data points with the same or a different class from that of the data point of interest.
This sampling scheme is divided into an outer and an inner sampler. The outer one is any subclass of PowersetSampler that generates subsets of the complement set of the data point of interest. The inner sampler is any subclass of IndexSampler, typically (and in the paper) a PermutationSampler.
References¶
-
Schoch, Stephanie, Haifeng Xu, and Yangfeng Ji. CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification. In Proc. of the Thirty-Sixth Conference on Neural Information Processing Systems (NeurIPS). New Orleans, Louisiana, USA, 2022. ↩
ClasswiseSampler
¶
ClasswiseSampler(
in_class: IndexSampler,
out_of_class: PowersetSampler,
*,
min_elements_per_label: int = 1,
batch_size: int = 1,
)
Bases: IndexSampler
A sampler that samples elements from a dataset in two steps, based on the labels.
It proceeds by sampling out-of-class indices (training points with a different label to the point of interest), and in-class indices (training points with the same label as the point of interest), in the complement.
Used by the class-wise Shapley valuation method.
PARAMETER | DESCRIPTION |
---|---|
in_class
|
Sampling scheme for elements of a given label.
TYPE:
|
out_of_class
|
Sampling scheme for elements of different labels, i.e., the complement set.
TYPE:
|
min_elements_per_label
|
Minimum number of elements per label to sample from the complement set, i.e., out of class elements.
TYPE:
|
Source code in src/pydvl/valuation/samplers/classwise.py
skip_indices
property
writable
¶
Indices being skipped in the sampler. The exact behaviour will be sampler-dependent, so that setting this property is disabled by default.
__len__
¶
__len__() -> int
Returns the length of the current sample generation in generate_batches.
RAISES | DESCRIPTION |
---|---|
`TypeError`
|
if the sampler is infinite or generate_batches has not been called yet. |
Source code in src/pydvl/valuation/samplers/base.py
generate_batches
¶
Batches the samples and yields them.
Source code in src/pydvl/valuation/samplers/base.py
result_updater
¶
result_updater(result: ValuationResult) -> ResultUpdater[ValueUpdateT]
Returns a callable that updates a valuation result with a value update.
Because we use log-space computation for numerical stability, the default result updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments.
PARAMETER | DESCRIPTION |
---|---|
result
|
The result to update
TYPE:
|
Returns: A callable object that updates the result with a value update
Source code in src/pydvl/valuation/samplers/base.py
interrupt
¶
Interrupts the current sampler as well as the passed in samplers
roundrobin
¶
Take samples from batch generators in order until all of them are exhausted.
This was heavily inspired by the roundrobin recipe in the official Python documentation for the itertools package.
Examples:
>>> from pydvl.valuation.samplers.classwise import roundrobin
>>> list(roundrobin({"A": "123"}, {"B": "456"}))
[("A", "1"), ("B", "4"), ("A", "2"), ("B", "5"), ("A", "3"), ("B", "6")]
PARAMETER | DESCRIPTION |
---|---|
batch_generators
|
dictionary mapping labels to batch generators. |
RETURNS | DESCRIPTION |
---|---|
None
|
Combined generators |
Source code in src/pydvl/valuation/samplers/classwise.py
get_unique_labels
¶
Returns unique labels in a categorical dataset.
PARAMETER | DESCRIPTION |
---|---|
array
|
The input array to find unique labels from. It should be of categorical types such as Object, String, Unicode, Unsigned integer, Signed integer, or Boolean.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
NDArray
|
An array of unique labels. |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If the input array is not of a categorical type. |