pydvl.value.shapley.naive
¶
This module implements exact Shapley values using either the combinatorial or permutation definition.
The exact computation of \(n\) values takes \(\mathcal{O}(2^n)\) evaluations of the utility and is therefore only possible for small datasets. For larger datasets, consider using any of the approximations, such as Monte Carlo, or proxy models like kNN.
See Data valuation for details.
permutation_exact_shapley
¶
permutation_exact_shapley(
u: Utility, *, progress: bool = True
) -> ValuationResult
Computes the exact Shapley value using the formulation with permutations:
See Data valuation for details.
When the length of the training set is > 10 this prints a warning since the computation becomes too expensive. Used mostly for internal testing and simple use cases. Please refer to the Monte Carlo approximations for practical applications.
PARAMETER | DESCRIPTION |
---|---|
u |
Utility object with model, data, and scoring function
TYPE:
|
progress |
Whether to display progress bars for each job.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ValuationResult
|
Object with the data values. |
Source code in src/pydvl/value/shapley/naive.py
combinatorial_exact_shapley
¶
combinatorial_exact_shapley(
u: Utility,
*,
n_jobs: int = 1,
parallel_backend: Optional[ParallelBackend] = None,
config: Optional[ParallelConfig] = None,
progress: bool = False
) -> ValuationResult
Computes the exact Shapley value using the combinatorial definition.
See Data valuation for details.
Note
If the length of the training set is > n_jobs*20 this prints a warning because the computation is very expensive. Used mostly for internal testing and simple use cases. Please refer to the Monte Carlo approximations for practical applications.
PARAMETER | DESCRIPTION |
---|---|
u |
Utility object with model, data, and scoring function
TYPE:
|
n_jobs |
Number of parallel jobs to use
TYPE:
|
parallel_backend |
Parallel backend instance to use
for parallelizing computations. If
TYPE:
|
config |
(DEPRECATED) Object configuring parallel computation, with cluster address, number of cpus, etc.
TYPE:
|
progress |
Whether to display progress bars for each job.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ValuationResult
|
Object with the data values. |
Changed in version 0.9.0
Deprecated config
argument and added a parallel_backend
argument to allow users to pass the Parallel Backend instance
directly.