pydvl.valuation.utility.learning
¶
This module implements Data Utility Learning (Wang et al., 2022)1.
Parallel processing not supported
As of 0.9.0, this method does not support parallel processing. DataUtilityLearning would have to collect all utility samples in a single process before fitting the model.
References¶
-
Wang, T., Yang, Y. and Jia, R., 2021. Improving cooperative game theory-based data valuation via data utility learning. arXiv preprint arXiv:2107.06336. ↩
DataUtilityLearning
¶
DataUtilityLearning(
utility: UtilityBase, training_budget: int, model: SupervisedModel
)
Bases: UtilityBase[SampleT]
This object wraps a [Utility][pydvl.valuation.utility.Utility] and delegates
calls to it, up until a given budget (number of iterations). Every tuple
of input and output (a so-called utility sample) is stored. Once the
budget is exhausted, DataUtilityLearning
fits the given model to the
utility samples. Subsequent calls will use the learned model to predict the
utility instead of delegating.
PARAMETER | DESCRIPTION |
---|---|
utility |
The [Utility][pydvl.valuation.utility.Utility] to learn.
TYPE:
|
training_budget |
Number of utility samples to collect before fitting the given model.
TYPE:
|
model |
A supervised regression model
TYPE:
|
Example
>>> from pydvl.valuation.dataset import Dataset
>>> from pydvl.valuation.utility import ModelUtility, DataUtilityLearning
>>> from pydvl.valuation.types import Sample
>>> from sklearn.linear_model import LinearRegression, LogisticRegression
>>> from sklearn.datasets import load_iris
>>>
>>> train, test = Dataset.from_sklearn(load_iris())
>>> u = ModelUtility(LogisticRegression())
>>> u.training_data = train
>>> wrapped_u = DataUtilityLearning(u, 3, LinearRegression())
... # First 3 calls will be computed normally
>>> for i in range(3):
... _ = wrapped_u(Sample(0, np.array([])))
>>> wrapped_u(Sample(0, np.array([1, 2, 3]))) # Subsequent calls will be computed using the fit model for DUL
0.0