pydvl.valuation.methods.msr_banzhaf
¶
This module implements the MSR-Banzhaf valuation method, as described in (Wang et. al.)1.
References¶
-
Wang, J.T. and Jia, R., 2023. Data Banzhaf: A Robust Data Valuation Framework for Machine Learning. In: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, pp. 6388-6421. ↩
MSRBanzhafValuation
¶
MSRBanzhafValuation(
utility: UtilityBase,
sampler: MSRSampler,
is_done: StoppingCriterion,
progress: bool = True,
)
Bases: SemivalueValuation
Class to compute Maximum Sample Re-use (MSR) Banzhaf values.
See Data Valuation for an overview.
The MSR Banzhaf valuation approximates the Banzhaf valuation and is much more efficient than traditional Montecarlo approaches.
PARAMETER | DESCRIPTION |
---|---|
utility |
Utility object with model, data and scoring function.
TYPE:
|
sampler |
Sampling scheme to use. Currently, only one MSRSampler is implemented. In the future, weighted MSRSamplers will be supported.
TYPE:
|
is_done |
Stopping criterion to use.
TYPE:
|
progress |
Whether to show a progress bar.
TYPE:
|
Source code in src/pydvl/valuation/methods/msr_banzhaf.py
values
¶
values(sort: bool = False) -> ValuationResult
Returns a copy of the valuation result.
The valuation must have been run with fit()
before calling this method.
PARAMETER | DESCRIPTION |
---|---|
sort |
Whether to sort the valuation result before returning it.
TYPE:
|
Returns: The result of the valuation.
Source code in src/pydvl/valuation/base.py
fit
¶
fit(data: Dataset) -> Self
Calculate the MSR Banzhaf valuation on a dataset.
This method has to be called before calling values()
.
Calculating the Banzhaf valuation is a computationally expensive task that
can be parallelized. To do so, call the fit()
method inside a
joblib.parallel_config
context manager as follows: