pydvl.valuation.utility.modelutility
¶
ModelUtility
¶
ModelUtility(
model: ModelT,
scorer: Scorer,
*,
catch_errors: bool = True,
show_warnings: bool = False,
cache_backend: CacheBackend | None = None,
cached_func_options: CachedFuncConfig | None = None,
clone_before_fit: bool = True,
)
Bases: UtilityBase[SampleT]
, Generic[SampleT, ModelT]
Convenience wrapper with configurable memoization of the utility.
An instance of ModelUtility
holds the tuple of model, and scoring function which
determines the value of data points. This is used for the computation of all
game-theoretic values like Shapley
values and the Least
Core.
ModelUtility
expects the model to fulfill at least the
BaseModel interface, i.e. to have a fit()
method
When calling the utility, the model will be cloned if it is a Scikit-Learn model, otherwise a copy is created using copy.deepcopy
Since evaluating the scoring function requires retraining the model and that can be time-consuming, this class wraps it and caches the results of each execution. Caching is available both locally and across nodes, but must always be enabled for your project first, because most stochastic methods do not benefit much from it. See the documentation and the module documentation.
ATTRIBUTE | DESCRIPTION |
---|---|
model |
The supervised model.
TYPE:
|
scorer |
A scoring function. If None, the
TYPE:
|
PARAMETER | DESCRIPTION |
---|---|
model
|
Any supervised model. Typical choices can be found in the sci-kit learn documentation.
TYPE:
|
scorer
|
A scoring object. If None, the
TYPE:
|
catch_errors
|
set to
TYPE:
|
show_warnings
|
Set to
TYPE:
|
cache_backend
|
Optional instance of CacheBackend used to memoize results to avoid duplicate computation. Note however, that for most stochastic methods, cache hits are rare, making the memory expense of caching not worth it (YMMV).
TYPE:
|
cached_func_options
|
Optional configuration object for cached utility evaluation.
TYPE:
|
clone_before_fit
|
If
TYPE:
|
Example
>>> from pydvl.valuation.utility import ModelUtility, DataUtilityLearning
>>> from pydvl.valuation.dataset import Dataset
>>> from sklearn.linear_model import LinearRegression, LogisticRegression
>>> from sklearn.datasets import load_iris
>>> train, test = Dataset.from_sklearn(load_iris(), random_state=16)
>>> u = ModelUtility(LogisticRegression(random_state=16), Scorer("accuracy"))
>>> u(Sample(subset=dataset.indices))
0.9
With caching enabled:
>>> from pydvl.valuation.utility import ModelUtility, DataUtilityLearning
>>> from pydvl.valuation.dataset import Dataset
>>> from pydvl.utils.caching.memory import InMemoryCacheBackend
>>> from sklearn.linear_model import LinearRegression, LogisticRegression
>>> from sklearn.datasets import load_iris
>>> train, test = Dataset.from_sklearn(load_iris(), random_state=16)
>>> cache_backend = InMemoryCacheBackend()
>>> u = ModelUtility(LogisticRegression(random_state=16), Scorer("accuracy"), cache_backend=cache_backend)
>>> u(Sample(subset=train.indices))
0.9
Source code in src/pydvl/valuation/utility/modelutility.py
training_data
property
¶
training_data: Dataset | None
Retrieves the training data used by this utility.
This property is read-only. In order to set it, use with_dataset().
cache_stats
property
¶
cache_stats: CacheStats | None
Cache statistics are gathered when cache is enabled. See CacheStats for all fields returned.
with_dataset
¶
Returns the utility, or a copy of it, with the given dataset. Args: data: The dataset to use for utility fitting (training data) copy: Whether to copy the utility object or not. Valuation methods should always make copies to avoid unexpected side effects. Returns: The utility object.
Source code in src/pydvl/valuation/utility/base.py
__call__
¶
__call__(sample: SampleT | None) -> float
PARAMETER | DESCRIPTION |
---|---|
sample
|
contains a subset of valid indices for the
TYPE:
|