pydvl.valuation.methods.knn_shapley
¶
This module contains Shapley computations for K-Nearest Neighbours classifier, introduced by Jia et al. (2019).1
In particular it provides KNNShapleyValuation to compute exact Shapley values for a KNN classifier in \(O(n \log n)\) time per test point, as opposed to \(O(n^2 \log^2 n)\) if the model were simply fed to a generic ShapleyValuation object.
See the documentation or the paper for details.
Todo
Implement approximate KNN computation for sublinear complexity
References¶
-
Jia, R. et al., 2019. Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms. In: Proceedings of the VLDB Endowment, Vol. 12, No. 11, pp. 1610–1623. ↩
KNNShapleyValuation
¶
KNNShapleyValuation(
model: KNeighborsClassifier,
test_data: Dataset,
progress: bool = True,
clone_before_fit: bool = True,
)
Bases: Valuation
Computes exact Shapley values for a KNN classifier.
This implements the method described in (Jia, R. et al., 2019)1.
PARAMETER | DESCRIPTION |
---|---|
model
|
KNeighborsClassifier model to use for valuation
TYPE:
|
test_data
|
Dataset containing test data to evaluate the model.
TYPE:
|
progress
|
Whether to display a progress bar.
TYPE:
|
clone_before_fit
|
Whether to clone the model before fitting.
TYPE:
|
Source code in src/pydvl/valuation/methods/knn_shapley.py
fit
¶
fit(data: Dataset, continue_from: ValuationResult | None = None) -> Self
Calculate exact shapley values for a KNN model on a dataset.
This fit method bypasses direct evaluations of the utility function and calculates the Shapley values directly.
In contrast to other data valuation models, the runtime increases linearly with the size of the dataset.
Calculating the KNN valuation is a computationally expensive task that
can be parallelized. To do so, call the fit()
method inside a
joblib.parallel_config
context manager as follows:
Source code in src/pydvl/valuation/methods/knn_shapley.py
values
¶
values(sort: bool = False) -> ValuationResult
Returns a copy of the valuation result.
The valuation must have been run with fit()
before calling this method.
PARAMETER | DESCRIPTION |
---|---|
sort
|
Whether to sort the valuation result by value before returning it.
TYPE:
|
Returns: The result of the valuation.