pydvl.valuation.utility.modelutility
¶
ModelUtility
¶
ModelUtility(
model: ModelT,
scorer: Scorer,
*,
catch_errors: bool = True,
show_warnings: bool = False,
cache_backend: CacheBackend | None = None,
cached_func_options: CachedFuncConfig | None = None,
clone_before_fit: bool = True
)
Bases: UtilityBase[SampleT]
, Generic[SampleT, ModelT]
Convenience wrapper with configurable memoization of the scoring function.
An instance of Utility
holds the triple of model, dataset and scoring
function which determines the value of data points. This is used for the
computation of all game-theoretic values like
[Shapley values][pydvl.valuation.shapley] and [the Least
Core][pydvl.valuation.least_core].
The Utility expects the model to fulfill at least the
BaseModel interface i.e. to have a fit()
method.
When calling the utility, the model will be cloned if it is a Sci-Kit Learn model, otherwise a copy is created using copy.deepcopy
Since evaluating the scoring function requires retraining the model and that can be time-consuming, this class wraps it and caches the results of each execution. Caching is available both locally and across nodes, but must always be enabled for your project first, see the documentation and the module documentation.
ATTRIBUTE | DESCRIPTION |
---|---|
model |
The supervised model.
TYPE:
|
scorer |
A scoring function. If None, the
TYPE:
|
PARAMETER | DESCRIPTION |
---|---|
model |
Any supervised model. Typical choices can be found in the [sci-kit learn documentation][https://scikit-learn.org/stable/supervised_learning.html].
TYPE:
|
scorer |
A scoring object. If None, the
TYPE:
|
catch_errors |
set to
TYPE:
|
show_warnings |
Set to
TYPE:
|
cache_backend |
Optional instance of CacheBackend used to wrap the _utility method of the Utility instance. By default, this is set to None and that means that the utility evaluations will not be cached.
TYPE:
|
cached_func_options |
Optional configuration object for cached utility evaluation.
TYPE:
|
clone_before_fit |
If
TYPE:
|
Example
>>> from pydvl.valuation.utility import ModelUtility, DataUtilityLearning
>>> from pydvl.valuation.dataset import Dataset
>>> from sklearn.linear_model import LinearRegression, LogisticRegression
>>> from sklearn.datasets import load_iris
>>> train, test = Dataset.from_sklearn(load_iris(), random_state=16)
>>> u = ModelUtility(LogisticRegression(random_state=16), Scorer("accuracy"))
>>> u(Sample(subset=dataset.indices))
0.9
With caching enabled:
>>> from pydvl.valuation.utility import ModelUtility, DataUtilityLearning
>>> from pydvl.valuation.dataset import Dataset
>>> from pydvl.utils.caching.memory import InMemoryCacheBackend
>>> from sklearn.linear_model import LinearRegression, LogisticRegression
>>> from sklearn.datasets import load_iris
>>> train, test = Dataset.from_sklearn(load_iris(), random_state=16)
>>> cache_backend = InMemoryCacheBackend()
>>> u = ModelUtility(LogisticRegression(random_state=16), Scorer("accuracy"), cache_backend=cache_backend)
>>> u(Sample(subset=train.indices))
0.9
Source code in src/pydvl/valuation/utility/modelutility.py
cache_stats
property
¶
cache_stats: CacheStats | None
Cache statistics are gathered when cache is enabled. See CacheStats for all fields returned.
__call__
¶
__call__(sample: SampleT | None) -> float
PARAMETER | DESCRIPTION |
---|---|
sample |
contains a subset of valid indices for the
TYPE:
|