pydvl.value.least_core
¶
New in version 0.4.0
This package holds all routines for the computation of Least Core data values.
Please refer to Data valuation for an overview.
In addition to the standard interface via compute_least_core_values(), because computing the Least Core values requires the solution of a linear and a quadratic problem after computing all the utility values, there is the possibility of performing each step separately. This is useful when running multiple experiments: use lc_prepare_problem() or mclc_prepare_problem() to prepare a list of problems to solve, then solve them in parallel with lc_solve_problems().
Note that mclc_prepare_problem() is parallelized itself, so preparing the problems should be done in sequence in this case. The solution of the linear systems can then be done in parallel.
montecarlo_least_core
¶
montecarlo_least_core(
u: Utility,
n_iterations: int,
*,
n_jobs: int = 1,
parallel_backend: Optional[ParallelBackend] = None,
config: Optional[ParallelConfig] = None,
non_negative_subsidy: bool = False,
solver_options: Optional[dict] = None,
progress: bool = False,
seed: Optional[Seed] = None,
) -> ValuationResult
Computes approximate Least Core values using a Monte Carlo approach.
Where:
- \(U(2^N)\) is the uniform distribution over the powerset of \(N\).
- \(m\) is the number of subsets that will be sampled and whose utility will be computed and used to compute the data values.
PARAMETER | DESCRIPTION |
---|---|
u
|
Utility object with model, data, and scoring function
TYPE:
|
n_iterations
|
total number of iterations to use
TYPE:
|
n_jobs
|
number of jobs across which to distribute the computation
TYPE:
|
parallel_backend
|
Parallel backend instance to use
for parallelizing computations. If
TYPE:
|
config
|
(DEPRECATED) Object configuring parallel computation, with cluster address, number of cpus, etc.
TYPE:
|
non_negative_subsidy
|
If True, the least core subsidy \(e\) is constrained to be non-negative.
TYPE:
|
solver_options
|
Dictionary of options that will be used to select a solver and to configure it. Refer to cvxpy's documentation for all possible options. |
progress
|
If True, shows a tqdm progress bar
TYPE:
|
seed
|
Either an instance of a numpy random number generator or a seed for it.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ValuationResult
|
Object with the data values and the least core value. |
Changed in version 0.9.0
Deprecated config
argument and added a parallel_backend
argument to allow users to pass the Parallel Backend instance
directly.
Source code in src/pydvl/value/least_core/montecarlo.py
mclc_prepare_problem
¶
mclc_prepare_problem(
u: Utility,
n_iterations: int,
*,
n_jobs: int = 1,
parallel_backend: Optional[ParallelBackend] = None,
config: Optional[ParallelConfig] = None,
progress: bool = False,
seed: Optional[Seed] = None,
) -> LeastCoreProblem
Prepares a linear problem by sampling subsets of the data. Use this to separate the problem preparation from the solving with lc_solve_problem(). Useful for parallel execution of multiple experiments.
See montecarlo_least_core for argument descriptions.
Changed in version 0.9.0
Deprecated config
argument and added a parallel_backend
argument to allow users to pass the Parallel Backend instance
directly.
Source code in src/pydvl/value/least_core/montecarlo.py
exact_least_core
¶
exact_least_core(
u: Utility,
*,
non_negative_subsidy: bool = False,
solver_options: Optional[dict] = None,
progress: bool = True,
) -> ValuationResult
Computes the exact Least Core values.
Note
If the training set contains more than 20 instances a warning is printed because the computation is very expensive. This method is mostly used for internal testing and simple use cases. Please refer to the Monte Carlo method for practical applications.
The least core is the solution to the following Linear Programming problem:
Where \(N = \{1, 2, \dots, n\}\) are the training set's indices.
PARAMETER | DESCRIPTION |
---|---|
u
|
Utility object with model, data, and scoring function
TYPE:
|
non_negative_subsidy
|
If True, the least core subsidy \(e\) is constrained to be non-negative.
TYPE:
|
solver_options
|
Dictionary of options that will be used to select a solver and to configure it. Refer to the cvxpy's documentation for all possible options. |
progress
|
If True, shows a tqdm progress bar
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ValuationResult
|
Object with the data values and the least core value. |
Source code in src/pydvl/value/least_core/naive.py
lc_prepare_problem
¶
Prepares a linear problem with all subsets of the data Use this to separate the problem preparation from the solving with lc_solve_problem(). Useful for parallel execution of multiple experiments.
See exact_least_core() for argument descriptions.
Source code in src/pydvl/value/least_core/naive.py
compute_least_core_values
¶
compute_least_core_values(
u: Utility,
*,
n_jobs: int = 1,
n_iterations: Optional[int] = None,
mode: LeastCoreMode = MonteCarlo,
non_negative_subsidy: bool = False,
solver_options: Optional[dict] = None,
progress: bool = False,
**kwargs,
) -> ValuationResult
Umbrella method to compute Least Core values with any of the available algorithms.
See Data valuation for an overview.
The following algorithms are available. Note that the exact method can only work with very small datasets and is thus intended only for testing.
exact
: uses the complete powerset of the training set for the constraints combinatorial_exact_shapley().montecarlo
: uses the approximate Monte Carlo Least Core algorithm. Implemented in montecarlo_least_core().
PARAMETER | DESCRIPTION |
---|---|
u
|
Utility object with model, data, and scoring function
TYPE:
|
n_jobs
|
Number of jobs to run in parallel. Only used for Monte Carlo Least Core.
TYPE:
|
n_iterations
|
Number of subsets to sample and evaluate the utility on. Only used for Monte Carlo Least Core. |
mode
|
Algorithm to use. See LeastCoreMode for available options.
TYPE:
|
non_negative_subsidy
|
If True, the least core subsidy \(e\) is constrained to be non-negative.
TYPE:
|
solver_options
|
Optional dictionary of options passed to the solvers. |
RETURNS | DESCRIPTION |
---|---|
ValuationResult
|
Object with the computed values. |
New in version 0.5.0