pydvl.valuation.utility.knn
¶
This module implements the utility function used in KNN-Shapley, as introduced by Jia et al. (2019)1.
Uses of this utility
Although this class can be used in conjunction with any semi-value method and sampler, when computing Shapley values, it is recommended to use the dedicated valuation class KNNShapleyValuation, because it implements a more efficient algorithm for computing Shapley values which runs in \(O(n \log n)\) time for each test point.
KNN-Shapley
See the documentation for an introduction to the method and our implementation.
The utility implemented by the class KNNClassifierUtility is defined as:
where \(\alpha^{(j)} (S)\) is the intersection of the \(K\)-nearest neighbors of the test point \(x^{\text{test}}_j\) across the whole training set, and the sample \(S\). In particular, \(\alpha^{(j)}_k (S)\) is the index of the training point in \(S\) which is ranked \(k\)-th closest to test point \(x^{\text{test}}_j.\)
References¶
-
Jia, R. et al., 2019. Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms. In: Proceedings of the VLDB Endowment, Vol. 12, No. 11, pp. 1610–1623. ↩
KNNClassifierUtility
¶
KNNClassifierUtility(
model: KNeighborsClassifier,
test_data: Dataset,
*,
catch_errors: bool = False,
show_warnings: bool = False,
cache_backend: CacheBackend | None = None,
cached_func_options: CachedFuncConfig | None = None,
clone_before_fit: bool = True,
)
Bases: ModelUtility[Sample, KNeighborsClassifier]
Utility object for KNN Classifiers.
The utility function is the model's predicted probability for the true class.
Uses of this utility
Although this class can be used in conjunction with any semi-value method and sampler, when computing Shapley values, it is recommended to use the dedicated class KNNShapleyValuation, because it implements a more efficient algorithm for computing Shapley values which runs in O(n log n) time for each test point.
PARAMETER | DESCRIPTION |
---|---|
model
|
A KNN classifier model.
TYPE:
|
test_data
|
The test data to evaluate the model on.
TYPE:
|
catch_errors
|
set to
TYPE:
|
show_warnings
|
Set to
TYPE:
|
cache_backend
|
Optional instance of [CacheBackend][ pydvl.utils.caching.base.CacheBackend] used to wrap the _utility method of the Utility instance. By default, this is set to None and that means that the utility evaluations will not be cached.
TYPE:
|
cached_func_options
|
Optional configuration object for cached utility evaluation.
TYPE:
|
clone_before_fit
|
If
TYPE:
|
Source code in src/pydvl/valuation/utility/knn.py
cache_stats
property
¶
cache_stats: CacheStats | None
Cache statistics are gathered when cache is enabled. See CacheStats for all fields returned.
training_data
property
¶
training_data: Dataset | None
Retrieves the training data used by this utility.
This property is read-only. In order to set it, use with_dataset().
__call__
¶
__call__(sample: SampleT | None) -> float
PARAMETER | DESCRIPTION |
---|---|
sample
|
contains a subset of valid indices for the
TYPE:
|
Source code in src/pydvl/valuation/utility/modelutility.py
_maybe_clone_model
staticmethod
¶
_maybe_clone_model(model: ModelT, do_clone: bool) -> ModelT
Clones the passed model to avoid the possibility of reusing a fitted estimator.
PARAMETER | DESCRIPTION |
---|---|
model
|
Any supervised model. Typical choices can be found on this page
TYPE:
|
do_clone
|
Whether to clone the model or not.
TYPE:
|
Source code in src/pydvl/valuation/utility/modelutility.py
_utility
¶
_utility(sample: SampleT) -> float
PARAMETER | DESCRIPTION |
---|---|
sample
|
contains a subset of valid indices for the
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
0 if no indices are passed, otherwise the KNN utility for the sample. |
Source code in src/pydvl/valuation/utility/knn.py
sample_to_data
¶
sample_to_data(sample: SampleT) -> tuple
Returns the raw data corresponding to a sample.
Subclasses can override this e.g. to do reshaping of tensors. Be careful not to
rely on self.training_data
not changing between calls to this method. For
manipulations to it, use the with_dataset()
method.
PARAMETER | DESCRIPTION |
---|---|
sample
|
contains a subset of valid indices for the
TYPE:
|
Returns: Tuple of the training data and labels corresponding to the sample indices.
Source code in src/pydvl/valuation/utility/modelutility.py
with_dataset
¶
Return the utility, or a copy of it, with the given dataset and the model fitted on it.
PARAMETER | DESCRIPTION |
---|---|
data
|
The dataset to use.
TYPE:
|
copy
|
Whether to copy the utility object or not. Additionally, if
TYPE:
|
Returns: The utility object.