Skip to content

pydvl.valuation.utility.knn

KNNClassifierUtility

KNNClassifierUtility(
    model: KNeighborsClassifier,
    test_data: Dataset,
    *,
    catch_errors: bool = True,
    show_warnings: bool = False,
    cache_backend: CacheBackend | None = None,
    cached_func_options: CachedFuncConfig | None = None,
    clone_before_fit: bool = True
)

Bases: ModelUtility[Sample, KNeighborsClassifier]

Utility object for KNN Classifiers.

The utility function is the likelihood of the true class given the model's prediction.

This works both as a Utility object for general game theoretic valuation methods and for specialized valuation methods for KNN classifiers.

PARAMETER DESCRIPTION
model

A KNN classifier model.

TYPE: KNeighborsClassifier

test_data

The test data to evaluate the model on.

TYPE: Dataset

catch_errors

set to True to catch the errors when fit() fails. This could happen in several steps of the pipeline, e.g. when too little training data is passed, which happens often during Shapley value calculations. When this happens, the scorer's default value is returned as a score and computation continues.

TYPE: bool DEFAULT: True

show_warnings

Set to False to suppress warnings thrown by fit().

TYPE: bool DEFAULT: False

cache_backend

Optional instance of [CacheBackend][ pydvl.utils.caching.base.CacheBackend] used to wrap the _utility method of the Utility instance. By default, this is set to None and that means that the utility evaluations will not be cached.

TYPE: CacheBackend | None DEFAULT: None

cached_func_options

Optional configuration object for cached utility evaluation.

TYPE: CachedFuncConfig | None DEFAULT: None

clone_before_fit

If True, the model will be cloned before calling fit().

TYPE: bool DEFAULT: True

Source code in src/pydvl/valuation/utility/knn.py
def __init__(
    self,
    model: KNeighborsClassifier,
    test_data: Dataset,
    *,
    catch_errors: bool = True,
    show_warnings: bool = False,
    cache_backend: CacheBackend | None = None,
    cached_func_options: CachedFuncConfig | None = None,
    clone_before_fit: bool = True,
):
    scorer = KNNClassifierScorer(test_data)

    self.test_data = test_data

    super().__init__(
        model=model,
        scorer=scorer,
        catch_errors=catch_errors,
        show_warnings=show_warnings,
        cache_backend=cache_backend,
        cached_func_options=cached_func_options,
        clone_before_fit=clone_before_fit,
    )

cache_stats property

cache_stats: CacheStats | None

Cache statistics are gathered when cache is enabled. See CacheStats for all fields returned.

__call__

__call__(sample: SampleT | None) -> float
PARAMETER DESCRIPTION
sample

contains a subset of valid indices for the x_train attribute of Dataset.

TYPE: SampleT | None

Source code in src/pydvl/valuation/utility/modelutility.py
def __call__(self, sample: SampleT | None) -> float:
    """
    Args:
        sample: contains a subset of valid indices for the
            `x_train` attribute of [Dataset][pydvl.utils.dataset.Dataset].
    """
    if sample is None or len(sample.subset) == 0:
        return self.scorer.default

    return cast(float, self._utility_wrapper(sample))