pydvl.valuation.utility.classwise ¶

This module defines the utility used by class-wise Shapley valuation methods.

See the documentation for more information.

ClasswiseModelUtility ¶

ClasswiseModelUtility(
    model: SupervisedModel,
    scorer: ClasswiseSupervisedScorer,
    *,
    catch_errors: bool = False,
    show_warnings: bool = False,
    cache_backend: CacheBackend | None = None,
    cached_func_options: CachedFuncConfig | None = None,
    clone_before_fit: bool = True,
)

Bases: ModelUtility[ClasswiseSample, SupervisedModel]

ModelUtility class that is specific to class-wise shapley valuation.

It expects a class-wise scorer and a classification task.

PARAMETER	DESCRIPTION
`model`	Any supervised model. Typical choices can be found in the sci-kit learn documentation. TYPE: `SupervisedModel`
`scorer`	A class-wise scoring object. TYPE: `ClasswiseSupervisedScorer`
`catch_errors`	set to `True` to catch the errors when `fit()` fails. This could happen in several steps of the pipeline, e.g. when too little training data is passed, which happens often during Shapley value calculations. When this happens, the scorer's default value is returned as a score and computation continues. TYPE: `bool` DEFAULT: `False`
`show_warnings`	Set to `False` to suppress warnings thrown by `fit()`. TYPE: `bool` DEFAULT: `False`
`cache_backend`	Optional instance of CacheBackend used to wrap the _utility method of the Utility instance. By default, this is set to None and that means that the utility evaluations will not be cached. TYPE: `CacheBackend \| None` DEFAULT: `None`
`cached_func_options`	Optional configuration object for cached utility evaluation. TYPE: `CachedFuncConfig \| None` DEFAULT: `None`
`clone_before_fit`	If `True`, the model will be cloned before calling `fit()`. TYPE: `bool` DEFAULT: `True`

Source code in src/pydvl/valuation/utility/classwise.py

def __init__(
    self,
    model: SupervisedModel,
    scorer: ClasswiseSupervisedScorer,
    *,
    catch_errors: bool = False,
    show_warnings: bool = False,
    cache_backend: CacheBackend | None = None,
    cached_func_options: CachedFuncConfig | None = None,
    clone_before_fit: bool = True,
):
    super().__init__(
        model,
        scorer,
        catch_errors=catch_errors,
        show_warnings=show_warnings,
        cache_backend=cache_backend,
        cached_func_options=cached_func_options,
        clone_before_fit=clone_before_fit,
    )
    if not isinstance(self.scorer, ClasswiseSupervisedScorer):
        raise ValueError("Scorer must be an instance of ClasswiseSupervisedScorer")
    self.scorer: ClasswiseSupervisedScorer

cache_stats `property` ¶

cache_stats: CacheStats | None

Cache statistics are gathered when cache is enabled. See CacheStats for all fields returned.

training_data `property` ¶

training_data: Dataset | None

Retrieves the training data used by this utility.

This property is read-only. In order to set it, use with_dataset().

call ¶

__call__(sample: SampleT | None) -> float

PARAMETER	DESCRIPTION
`sample`	contains a subset of valid indices for the `x_train` attribute of Dataset. TYPE: `SampleT \| None`

Source code in src/pydvl/valuation/utility/modelutility.py

def __call__(self, sample: SampleT | None) -> float:
    """
    Args:
        sample: contains a subset of valid indices for the
            `x_train` attribute of [Dataset][pydvl.utils.dataset.Dataset].
    """
    if sample is None or len(sample.subset) == 0:
        return self.scorer.default

    return cast(float, self._utility_wrapper(sample))

str ¶

__str__()

Returns a string representation of the utility. Subclasses should override this method to provide a more informative string

Source code in src/pydvl/valuation/utility/base.py

def __str__(self):
    """Returns a string representation of the utility.
    Subclasses should override this method to provide a more informative string
    """
    return f"{self.__class__.__name__}"

sample_to_data ¶

sample_to_data(sample: SampleT) -> tuple

Returns the raw data corresponding to a sample.

Subclasses can override this e.g. to do reshaping of tensors. Be careful not to rely on self.training_data not changing between calls to this method. For manipulations to it, use the with_dataset() method.

PARAMETER	DESCRIPTION
`sample`	contains a subset of valid indices for the `x_train` attribute of Dataset. TYPE: `SampleT`

Returns: Tuple of the training data and labels corresponding to the sample indices.

Source code in src/pydvl/valuation/utility/modelutility.py

def sample_to_data(self, sample: SampleT) -> tuple:
    """Returns the raw data corresponding to a sample.

    Subclasses can override this e.g. to do reshaping of tensors. Be careful not to
    rely on `self.training_data` not changing between calls to this method. For
    manipulations to it, use the `with_dataset()` method.

    Args:
        sample: contains a subset of valid indices for the
            `x_train` attribute of [Dataset][pydvl.utils.dataset.Dataset].
    Returns:
        Tuple of the training data and labels corresponding to the sample indices.
    """
    if self.training_data is None:
        raise ValueError("No training data provided")

    x_train, y_train = self.training_data.data(sample.subset)
    return x_train, y_train

with_dataset ¶

with_dataset(data: Dataset, copy: bool = True) -> Self

Returns the utility, or a copy of it, with the given dataset. Args: data: The dataset to use for utility fitting (training data) copy: Whether to copy the utility object or not. Valuation methods should always make copies to avoid unexpected side effects. Returns: The utility object.

Source code in src/pydvl/valuation/utility/base.py

def with_dataset(self, data: Dataset, copy: bool = True) -> Self:
    """Returns the utility, or a copy of it, with the given dataset.
    Args:
        data: The dataset to use for utility fitting (training data)
        copy: Whether to copy the utility object or not. Valuation methods should
            always make copies to avoid unexpected side effects.
    Returns:
        The utility object.
    """
    utility = cp.copy(self) if copy else self
    utility._training_data = data
    return utility

pydvl.valuation.utility.classwise ¶

ClasswiseModelUtility ¶

cache_stats property ¶

training_data property ¶

__call__ ¶

__str__ ¶