Skip to content

Beta Shapley

In ML applications, where the utility is the performance when trained on a set \(S \subset D\), diminishing returns are often observed when computing the marginal utility of adding a new data point.1

Beta Shapley is a weighting scheme that uses the Beta function to place more weight on subsets deemed to be more informative. The weights are defined as:

\[ w(k) := \frac{B(k+\beta, n-k+1+\alpha)}{B(\alpha, \beta)}, \]

where \(B\) is the Beta function, and \(\alpha\) and \(\beta\) are parameters that control the weighting of the subsets. Setting both to 1 recovers Shapley values, and setting \(\alpha = 1\), and \(\beta = 16\) is reported in (Kwon and Zou, 2022)2 to be a good choice for some applications. Beta Shapley values are available in pyDVL through BetaShapleyValuation:

Beta Shapley values
from joblib import parallel_config
from pydvl.valuation import *

model = ...
train, test = Dataset.from_arrays(...)
scorer = SupervisedScorer(model, test, default=0.0)
utility = ModelUtility(model, scorer)
sampler = PermutationSampler()
stopping = RankCorrelation(rtol=1e-5, burn_in=100) | MaxUpdates(2000)
valuation = BetaShapleyValuation(
    utility, sampler, stopping, alpha=1, beta=16
)
with parallel_config(n_jobs=16):
    valuation.fit(train)

See, however Banzhaf indices, for an alternative choice of weights which is reported to work better in cases of high variance in the utility function.


  1. This observation is made somewhat formal for some model classes in (Watson et al., 2023)3, motivating a complete truncation of the sampling space, see \(\delta\)-Shapley

  2. Kwon, Y., Zou, J., 2022. Beta Shapley: A Unified and [Noise-reduced Data Valuation Framework]{.nocase} for Machine Learning, in: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS) 2022,. Presented at the AISTATS 2022, PMLR, Valencia, Spain. 

  3. Watson, L., Kujawa, Z., Andreeva, R., Yang, H.-T., Elahi, T., Sarkar, R., 2023. Accelerated Shapley Value Approximation for Data Evaluation [WWW Document]. https://doi.org/10.48550/arXiv.2311.05346