Skip to content

pydvl.value.least_core.naive

exact_least_core

exact_least_core(
    u: Utility,
    *,
    non_negative_subsidy: bool = False,
    solver_options: Optional[dict] = None,
    progress: bool = True
) -> ValuationResult

Computes the exact Least Core values.

Note

If the training set contains more than 20 instances a warning is printed because the computation is very expensive. This method is mostly used for internal testing and simple use cases. Please refer to the Monte Carlo method for practical applications.

The least core is the solution to the following Linear Programming problem:

\[ \begin{array}{lll} \text{minimize} & \displaystyle{e} & \\ \text{subject to} & \displaystyle\sum_{i\in N} x_{i} = v(N) & \\ & \displaystyle\sum_{i\in S} x_{i} + e \geq v(S) &, \forall S \subseteq N \\ \end{array} \]

Where \(N = \{1, 2, \dots, n\}\) are the training set's indices.

PARAMETER DESCRIPTION
u

Utility object with model, data, and scoring function

TYPE: Utility

non_negative_subsidy

If True, the least core subsidy \(e\) is constrained to be non-negative.

TYPE: bool DEFAULT: False

solver_options

Dictionary of options that will be used to select a solver and to configure it. Refer to the cvxpy's documentation for all possible options.

TYPE: Optional[dict] DEFAULT: None

progress

If True, shows a tqdm progress bar

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
ValuationResult

Object with the data values and the least core value.

Source code in src/pydvl/value/least_core/naive.py
def exact_least_core(
    u: Utility,
    *,
    non_negative_subsidy: bool = False,
    solver_options: Optional[dict] = None,
    progress: bool = True,
) -> ValuationResult:
    r"""Computes the exact Least Core values.

    !!! Note
        If the training set contains more than 20 instances a warning is printed
        because the computation is very expensive. This method is mostly used for
        internal testing and simple use cases. Please refer to the
        [Monte Carlo method][pydvl.value.least_core.montecarlo.montecarlo_least_core]
        for practical applications.

    The least core is the solution to the following Linear Programming problem:

    $$
    \begin{array}{lll}
    \text{minimize} & \displaystyle{e} & \\
    \text{subject to} & \displaystyle\sum_{i\in N} x_{i} = v(N) & \\
    & \displaystyle\sum_{i\in S} x_{i} + e \geq v(S) &, \forall S \subseteq N \\
    \end{array}
    $$

    Where $N = \{1, 2, \dots, n\}$ are the training set's indices.

    Args:
        u: Utility object with model, data, and scoring function
        non_negative_subsidy: If True, the least core subsidy $e$ is constrained
            to be non-negative.
        solver_options: Dictionary of options that will be used to select a solver
            and to configure it. Refer to the [cvxpy's
            documentation](https://www.cvxpy.org/tutorial/advanced/index.html#setting-solver-options)
            for all possible options.
        progress: If True, shows a tqdm progress bar

    Returns:
        Object with the data values and the least core value.
    """
    n = len(u.data)
    if n > 20:  # Arbitrary choice, will depend on time required, caching, etc.
        warnings.warn(f"Large dataset! Computation requires 2^{n} calls to model.fit()")

    problem = lc_prepare_problem(u, progress=progress)
    return lc_solve_problem(
        problem=problem,
        u=u,
        algorithm="exact_least_core",
        non_negative_subsidy=non_negative_subsidy,
        solver_options=solver_options,
    )

lc_prepare_problem

lc_prepare_problem(u: Utility, progress: bool = False) -> LeastCoreProblem

Prepares a linear problem with all subsets of the data Use this to separate the problem preparation from the solving with lc_solve_problem(). Useful for parallel execution of multiple experiments.

See exact_least_core() for argument descriptions.

Source code in src/pydvl/value/least_core/naive.py
def lc_prepare_problem(u: Utility, progress: bool = False) -> LeastCoreProblem:
    """Prepares a linear problem with all subsets of the data
    Use this to separate the problem preparation from the solving with
    [lc_solve_problem()][pydvl.value.least_core.common.lc_solve_problem]. Useful for
    parallel execution of multiple experiments.

    See [exact_least_core()][pydvl.value.least_core.naive.exact_least_core] for argument
    descriptions.
    """
    n = len(u.data)

    logger.debug("Building vectors and matrices for linear programming problem")
    powerset_size = 2**n
    A_lb = np.zeros((powerset_size, n))

    logger.debug("Iterating over all subsets")
    utility_values = np.zeros(powerset_size)
    for i, subset in enumerate(  # type: ignore
        tqdm(
            powerset(u.data.indices),
            disable=not progress,
            total=powerset_size - 1,
            position=0,
        )
    ):
        indices: NDArray[np.bool_] = np.zeros(n, dtype=bool)
        indices[list(subset)] = True
        A_lb[i, indices] = 1
        utility_values[i] = u(subset)  # type: ignore

    return LeastCoreProblem(utility_values, A_lb)