Skip to content

pydvl.valuation.methods.classwise_shapley

Class-wise Shapley (Schoch et al., 2022)1 is a semi-value tailored for classification problems.

The core intuition behind the method is that a sample might enhance the overall performance of the model, while being detrimental for the performance when the model is restricted to items of the same class, and vice versa.

Analysis of Class-wise Shapley

For a detailed explanation and analysis of the method, with comparison to other valuation techniques, please refer to the main documentation and to Semmler and de Benito Delgado (2024).2

References


  1. Schoch, Stephanie, Haifeng Xu, and Yangfeng Ji. CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification. In Proc. of the Thirty-Sixth Conference on Neural Information Processing Systems (NeurIPS). New Orleans, Louisiana, USA, 2022. 

  2. Semmler, Markus, and Miguel de Benito Delgado. [Re] Classwise-Shapley Values for Data Valuation. Transactions on Machine Learning Research, July 2024. 

ClasswiseShapleyValuation

ClasswiseShapleyValuation(
    utility: ClasswiseModelUtility,
    sampler: ClasswiseSampler,
    is_done: StoppingCriterion,
    progress: dict[str, Any] | bool = False,
    *,
    normalize_values: bool = True,
)

Bases: Valuation

Class to compute Class-wise Shapley values.

PARAMETER DESCRIPTION
utility

Class-wise utility object with model and class-wise scoring function.

TYPE: ClasswiseModelUtility

sampler

Class-wise sampling scheme to use.

TYPE: ClasswiseSampler

is_done

Stopping criterion to use.

TYPE: StoppingCriterion

progress

Whether to show a progress bar.

TYPE: dict[str, Any] | bool DEFAULT: False

normalize_values

Whether to normalize values after valuation.

TYPE: bool DEFAULT: True

Source code in src/pydvl/valuation/methods/classwise_shapley.py
def __init__(
    self,
    utility: ClasswiseModelUtility,
    sampler: ClasswiseSampler,
    is_done: StoppingCriterion,
    progress: dict[str, Any] | bool = False,
    *,
    normalize_values: bool = True,
):
    super().__init__()
    self.utility = utility
    self.sampler = sampler
    self.labels: NDArray | None = None
    if not isinstance(utility.scorer, ClasswiseSupervisedScorer):
        raise ValueError("scorer must be an instance of ClasswiseSupervisedScorer")
    self.scorer: ClasswiseSupervisedScorer = utility.scorer
    self.is_done = is_done
    self.tqdm_args: dict[str, Any] = {
        "desc": f"{self.__class__.__name__}: {str(is_done)}"
    }
    # HACK: parse additional args for the progress bar if any (we probably want
    #  something better)
    if isinstance(progress, bool):
        self.tqdm_args.update({"disable": not progress})
    else:
        self.tqdm_args.update(progress if isinstance(progress, dict) else {})
    self.normalize_values = normalize_values

result property

The current valuation result (not a copy).

values

values(sort: bool = False) -> ValuationResult

Returns a copy of the valuation result.

The valuation must have been run with fit() before calling this method.

PARAMETER DESCRIPTION
sort

Whether to sort the valuation result by value before returning it.

TYPE: bool DEFAULT: False

Returns: The result of the valuation.

Source code in src/pydvl/valuation/base.py
@deprecated(
    target=None,
    deprecated_in="0.10.0",
    remove_in="0.11.0",
)
def values(self, sort: bool = False) -> ValuationResult:
    """Returns a copy of the valuation result.

    The valuation must have been run with `fit()` before calling this method.

    Args:
        sort: Whether to sort the valuation result by value before returning it.
    Returns:
        The result of the valuation.
    """
    if not self.is_fitted:
        raise NotFittedException(type(self))
    assert self._result is not None

    r = self._result.copy()
    if sort:
        r.sort(inplace=True)
    return r