KNN-Shapley¶
Using the generic class
ShapleyValuation together
with a
PermutationSampler
to compute Shapley values for a KNN model has a time complexity of
However, it is possible to exploit the local structure of K-Nearest Neighbours
to drastically reduce the amount of utility evaluations and bring complexity
down to
The exact computation¶
Define the utility by the likelihood of the right label:
where
The utilities are then averaged over all
pyDVL implements the exact method in KNNShapleyValuation. Internally it iterates through each training point with a LOOSampler and computes the recursive formula.2
Computing exact KNN-Shapley values
from joblib import parallel_config
from pydvl.valuation import Dataset, KNNShapleyValuation
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors=5)
train, test = Dataset.from_arrays(...)
valuation = KNNShapleyValuation(model, test, progress=True)
with parallel_config(n_jobs=16):
valuation.fit(train)
Approximate computation¶
By using approximate KNN-based methods, it is possible to shave off a factor
-
This is based on a Hoeffding bound and the time complexity of sorting
points, which is the asymptotic complexity of one utility evaluation of KNN (Jia et al., 2019)3. ↩ -
As already noted, it is possible to use the standard infrastructure instead although this is only useful for testing purposes To this end, we provide an implementation of the formula above in KNNClassifierUtility. ↩
-
Jia, R., Dao, D., Wang, B., Hubis, F.A., Gurel, N.M., Li, B., Zhang, C., Spanos, C., Song, D., 2019. Efficient task-specific data valuation for nearest neighbor algorithms. Proc. VLDB Endow. 12, 1610--1623. https://doi.org/10.14778/3342263.3342637 ↩↩