pydvl.valuation.methods.classwise_shapley ¶

Class-wise Shapley (Schoch et al., 2022)¹ is a semi-value tailored for classification problems.

The core intuition behind the method is that a sample might enhance the overall performance of the model, while being detrimental for the performance when the model is restricted to items of the same class, and vice versa.

Analysis of Class-wise Shapley

For a detailed explanation and analysis of the method, with comparison to other valuation techniques, please refer to the main documentation and to Semmler and de Benito Delgado (2024).²

References¶

Schoch, Stephanie, Haifeng Xu, and Yangfeng Ji. CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification. In Proc. of the Thirty-Sixth Conference on Neural Information Processing Systems (NeurIPS). New Orleans, Louisiana, USA, 2022. ↩
Semmler, Markus, and Miguel de Benito Delgado. [Re] Classwise-Shapley Values for Data Valuation. Transactions on Machine Learning Research, July 2024. ↩

ClasswiseShapleyValuation ¶

ClasswiseShapleyValuation(
    utility: ClasswiseModelUtility,
    sampler: ClasswiseSampler,
    is_done: StoppingCriterion,
    progress: dict[str, Any] | bool = False,
    *,
    normalize_values: bool = True,
)

Bases: Valuation

Class to compute Class-wise Shapley values.

PARAMETER	DESCRIPTION
`utility`	Class-wise utility object with model and class-wise scoring function. TYPE: `ClasswiseModelUtility`
`sampler`	Class-wise sampling scheme to use. TYPE: `ClasswiseSampler`
`is_done`	Stopping criterion to use. TYPE: `StoppingCriterion`
`progress`	Whether to show a progress bar. TYPE: `dict[str, Any] \| bool` DEFAULT: `False`
`normalize_values`	Whether to normalize values after valuation. TYPE: `bool` DEFAULT: `True`

Source code in src/pydvl/valuation/methods/classwise_shapley.py

def __init__(
    self,
    utility: ClasswiseModelUtility,
    sampler: ClasswiseSampler,
    is_done: StoppingCriterion,
    progress: dict[str, Any] | bool = False,
    *,
    normalize_values: bool = True,
):
    super().__init__()
    self.utility = utility
    self.sampler = sampler
    self.labels: NDArray | None = None
    if not isinstance(utility.scorer, ClasswiseSupervisedScorer):
        raise ValueError("scorer must be an instance of ClasswiseSupervisedScorer")
    self.scorer: ClasswiseSupervisedScorer = utility.scorer
    self.is_done = is_done
    self.tqdm_args: dict[str, Any] = {
        "desc": f"{self.__class__.__name__}: {str(is_done)}"
    }
    # HACK: parse additional args for the progress bar if any (we probably want
    #  something better)
    if isinstance(progress, bool):
        self.tqdm_args.update({"disable": not progress})
    else:
        self.tqdm_args.update(progress if isinstance(progress, dict) else {})
    self.normalize_values = normalize_values

result `property` ¶

result: ValuationResult

The current valuation result (not a copy).

values ¶

values(sort: bool = False) -> ValuationResult

Returns a copy of the valuation result.

The valuation must have been run with fit() before calling this method.

PARAMETER	DESCRIPTION
`sort`	Whether to sort the valuation result by value before returning it. TYPE: `bool` DEFAULT: `False`

Returns: The result of the valuation.

Source code in src/pydvl/valuation/base.py

@deprecated(
    target=None,
    deprecated_in="0.10.0",
    remove_in="0.11.0",
)
def values(self, sort: bool = False) -> ValuationResult:
    """Returns a copy of the valuation result.

    The valuation must have been run with `fit()` before calling this method.

    Args:
        sort: Whether to sort the valuation result by value before returning it.
    Returns:
        The result of the valuation.
    """
    if not self.is_fitted:
        raise NotFittedException(type(self))
    assert self._result is not None

    r = self._result.copy()
    if sort:
        r.sort(inplace=True)
    return r