Skip to content

pydvl.valuation.utility.learning

This module implements Data Utility Learning (Wang et al., 2022)1.

Parallel processing not supported

As of 0.9.0, this method does not support parallel processing. DataUtilityLearning would have to collect all utility samples in a single process before fitting the model.

References


  1. Wang, T., Yang, Y. and Jia, R., 2021. Improving cooperative game theory-based data valuation via data utility learning. arXiv preprint arXiv:2107.06336. 

DataUtilityLearning

DataUtilityLearning(
    utility: UtilityBase, training_budget: int, model: SupervisedModel
)

Bases: UtilityBase[SampleT]

This object wraps a [Utility][pydvl.valuation.utility.Utility] and delegates calls to it, up until a given budget (number of iterations). Every tuple of input and output (a so-called utility sample) is stored. Once the budget is exhausted, DataUtilityLearning fits the given model to the utility samples. Subsequent calls will use the learned model to predict the utility instead of delegating.

PARAMETER DESCRIPTION
utility

The [Utility][pydvl.valuation.utility.Utility] to learn.

TYPE: UtilityBase

training_budget

Number of utility samples to collect before fitting the given model.

TYPE: int

model

A supervised regression model

TYPE: SupervisedModel

Example
>>> from pydvl.valuation.dataset import Dataset
>>> from pydvl.valuation.utility import ModelUtility, DataUtilityLearning
>>> from pydvl.valuation.types import Sample
>>> from sklearn.linear_model import LinearRegression, LogisticRegression
>>> from sklearn.datasets import load_iris
>>>
>>> train, test = Dataset.from_sklearn(load_iris())
>>> u = ModelUtility(LogisticRegression())
>>> u.training_data = train
>>> wrapped_u = DataUtilityLearning(u, 3, LinearRegression())
... # First 3 calls will be computed normally
>>> for i in range(3):
...     _ = wrapped_u(Sample(0, np.array([])))
>>> wrapped_u(Sample(0, np.array([1, 2, 3]))) # Subsequent calls will be computed using the fit model for DUL
0.0
Source code in src/pydvl/valuation/utility/learning.py
def __init__(
    self, utility: UtilityBase, training_budget: int, model: SupervisedModel
) -> None:
    self.utility = utility
    self.training_budget = training_budget
    self.model = model
    self._current_iteration = 0
    self._is_fitted = False
    self._utility_samples: Dict[Sample, Tuple[NDArray[np.bool_], float]] = {}