Skip to content

pydvl.valuation.methods.least_core

LeastCoreValuation

LeastCoreValuation(
    utility: UtilityBase,
    sampler: PowersetSampler,
    n_samples: int | None = None,
    non_negative_subsidy: bool = False,
    solver_options: dict | None = None,
    progress: bool = True,
)

Bases: Valuation

Umbrella class to calculate least-core values with multiple sampling methods.

See Data valuation for an overview.

Different samplers correspond to different least-core methods from the literature. For those, we provide convenience subclasses of LeastCoreValuation. See

Other samplers allow you to create your own method and might yield computational gains over a standard Monte Carlo method.

PARAMETER DESCRIPTION
utility

Utility object with model, data and scoring function.

TYPE: UtilityBase

sampler

The sampler to use for the valuation.

TYPE: PowersetSampler

n_samples

The number of samples to use for the valuation. If None, it will be set to the sample limit of the chosen sampler (for finite samplers) or 1000 * len(data) (for infinite samplers).

TYPE: int | None DEFAULT: None

non_negative_subsidy

If True, the least core subsidy \(e\) is constrained to be non-negative.

TYPE: bool DEFAULT: False

solver_options

Optional dictionary containing a CVXPY solver and options to configure it. For valid values to the "solver" key see here. For additional options see here.

TYPE: dict | None DEFAULT: None

progress

Whether to show a progress bar during the construction of the least-core problem.

TYPE: bool DEFAULT: True

Source code in src/pydvl/valuation/methods/least_core.py
def __init__(
    self,
    utility: UtilityBase,
    sampler: PowersetSampler,
    n_samples: int | None = None,
    non_negative_subsidy: bool = False,
    solver_options: dict | None = None,
    progress: bool = True,
):
    super().__init__()

    _check_sampler(sampler)
    self._utility = utility
    self._sampler = sampler
    self._non_negative_subsidy = non_negative_subsidy
    self._solver_options = solver_options
    self._n_samples = n_samples
    self._progress = progress

values

values(sort: bool = False) -> ValuationResult

Returns a copy of the valuation result.

The valuation must have been run with fit() before calling this method.

PARAMETER DESCRIPTION
sort

Whether to sort the valuation result before returning it.

TYPE: bool DEFAULT: False

Returns: The result of the valuation.

Source code in src/pydvl/valuation/base.py
def values(self, sort: bool = False) -> ValuationResult:
    """Returns a copy of the valuation result.

    The valuation must have been run with `fit()` before calling this method.

    Args:
        sort: Whether to sort the valuation result before returning it.
    Returns:
        The result of the valuation.
    """
    if not self.is_fitted:
        raise NotFittedException(type(self))
    assert self.result is not None

    from copy import copy

    r = copy(self.result)
    if sort:
        r.sort()
    return r

fit

fit(data: Dataset) -> Valuation

Calculate the least core valuation on a dataset.

This method has to be called before calling values().

Calculating the least core valuation is a computationally expensive task that can be parallelized. To do so, call the fit() method inside a joblib.parallel_config context manager as follows:

from joblib import parallel_config

with parallel_config(n_jobs=4):
    valuation.fit(data)
Source code in src/pydvl/valuation/methods/least_core.py
def fit(self, data: Dataset) -> Valuation:
    """Calculate the least core valuation on a dataset.

    This method has to be called before calling `values()`.

    Calculating the least core valuation is a computationally expensive task that
    can be parallelized. To do so, call the `fit()` method inside a
    `joblib.parallel_config` context manager as follows:

    ```python
    from joblib import parallel_config

    with parallel_config(n_jobs=4):
        valuation.fit(data)
    ```

    """
    self._utility = self._utility.with_dataset(data)
    if self._n_samples is None:
        self._n_samples = _get_default_n_samples(
            sampler=self._sampler, indices=data.indices
        )

    algorithm = str(self._sampler)

    problem = create_least_core_problem(
        u=self._utility,
        sampler=self._sampler,
        n_samples=self._n_samples,
        progress=self._progress,
    )

    solution = lc_solve_problem(
        problem=problem,
        u=self._utility,
        algorithm=algorithm,
        non_negative_subsidy=self._non_negative_subsidy,
        solver_options=self._solver_options,
    )

    self.result = solution
    return self

ExactLeastCoreValuation

ExactLeastCoreValuation(
    utility: UtilityBase,
    non_negative_subsidy: bool = False,
    solver_options: dict | None = None,
    progress: bool = True,
    batch_size: int = 1,
)

Bases: LeastCoreValuation

Class to calculate exact least-core values.

Equivalent to calling LeastCoreValuation with a DeterministicUniformSampler and n_samples=None.

The definition of the exact least-core valuation is:

\[ egin{array}{lll} ext{minimize} & \displaystyle{e} & \ ext{subject to} & \displaystyle\sum_{i\in N} x_{i} = v(N) & \ & \displaystyle\sum_{i\in S} x_{i} + e \geq v(S) &, orall S \subseteq N \ \end{array} \]

Where \(N = \{1, 2, \dots, n\}\) are the training set's indices.

PARAMETER DESCRIPTION
utility

Utility object with model, data and scoring function.

TYPE: UtilityBase

non_negative_subsidy

If True, the least core subsidy \(e\) is constrained to be non-negative.

TYPE: bool DEFAULT: False

solver_options

Optional dictionary containing a CVXPY solver and options to configure it. For valid values to the "solver" key see here. For additional options see here.

TYPE: dict | None DEFAULT: None

progress

Whether to show a progress bar during the construction of the least-core problem.

TYPE: bool DEFAULT: True

Source code in src/pydvl/valuation/methods/least_core.py
def __init__(
    self,
    utility: UtilityBase,
    non_negative_subsidy: bool = False,
    solver_options: dict | None = None,
    progress: bool = True,
    batch_size: int = 1,
):
    super().__init__(
        utility=utility,
        sampler=DeterministicUniformSampler(
            index_iteration=NoIndexIteration, batch_size=batch_size
        ),
        n_samples=None,
        non_negative_subsidy=non_negative_subsidy,
        solver_options=solver_options,
        progress=progress,
    )

fit

fit(data: Dataset) -> Valuation

Calculate the least core valuation on a dataset.

This method has to be called before calling values().

Calculating the least core valuation is a computationally expensive task that can be parallelized. To do so, call the fit() method inside a joblib.parallel_config context manager as follows:

from joblib import parallel_config

with parallel_config(n_jobs=4):
    valuation.fit(data)
Source code in src/pydvl/valuation/methods/least_core.py
def fit(self, data: Dataset) -> Valuation:
    """Calculate the least core valuation on a dataset.

    This method has to be called before calling `values()`.

    Calculating the least core valuation is a computationally expensive task that
    can be parallelized. To do so, call the `fit()` method inside a
    `joblib.parallel_config` context manager as follows:

    ```python
    from joblib import parallel_config

    with parallel_config(n_jobs=4):
        valuation.fit(data)
    ```

    """
    self._utility = self._utility.with_dataset(data)
    if self._n_samples is None:
        self._n_samples = _get_default_n_samples(
            sampler=self._sampler, indices=data.indices
        )

    algorithm = str(self._sampler)

    problem = create_least_core_problem(
        u=self._utility,
        sampler=self._sampler,
        n_samples=self._n_samples,
        progress=self._progress,
    )

    solution = lc_solve_problem(
        problem=problem,
        u=self._utility,
        algorithm=algorithm,
        non_negative_subsidy=self._non_negative_subsidy,
        solver_options=self._solver_options,
    )

    self.result = solution
    return self

values

values(sort: bool = False) -> ValuationResult

Returns a copy of the valuation result.

The valuation must have been run with fit() before calling this method.

PARAMETER DESCRIPTION
sort

Whether to sort the valuation result before returning it.

TYPE: bool DEFAULT: False

Returns: The result of the valuation.

Source code in src/pydvl/valuation/base.py
def values(self, sort: bool = False) -> ValuationResult:
    """Returns a copy of the valuation result.

    The valuation must have been run with `fit()` before calling this method.

    Args:
        sort: Whether to sort the valuation result before returning it.
    Returns:
        The result of the valuation.
    """
    if not self.is_fitted:
        raise NotFittedException(type(self))
    assert self.result is not None

    from copy import copy

    r = copy(self.result)
    if sort:
        r.sort()
    return r

MonteCarloLeastCoreValuation

MonteCarloLeastCoreValuation(
    utility: UtilityBase,
    n_samples: int,
    non_negative_subsidy: bool = False,
    solver_options: dict | None = None,
    progress: bool = True,
    seed: Seed | None = None,
    batch_size: int = 1,
)

Bases: LeastCoreValuation

Class to calculate exact least-core values.

Equivalent to calling LeastCoreValuation with a UniformSampler.

The definition of the Monte Carlo least-core valuation is:

\[ egin{array}{lll} ext{minimize} & \displaystyle{e} & \ ext{subject to} & \displaystyle\sum_{i\in N} x_{i} = v(N) & \ & \displaystyle\sum_{i\in S} x_{i} + e \geq v(S) & , orall S \in \{S_1, S_2, \dots, S_m \overset{\mathrm{iid}}{\sim} U(2^N) \} \end{array} \]

Where:

  • \(U(2^N)\) is the uniform distribution over the powerset of \(N\).
  • \(m\) is the number of subsets that will be sampled and whose utility will be computed and used to compute the data values.
PARAMETER DESCRIPTION
utility

Utility object with model, data and scoring function.

TYPE: UtilityBase

n_samples

The number of samples to use for the valuation. If None, it will be set to 1000 * len(data).

TYPE: int

non_negative_subsidy

If True, the least core subsidy \(e\) is constrained to be non-negative.

TYPE: bool DEFAULT: False

solver_options

Optional dictionary containing a CVXPY solver and options to configure it. For valid values to the "solver" key see here. For additional options see here.

TYPE: dict | None DEFAULT: None

progress

Whether to show a progress bar during the construction of the least-core problem.

TYPE: bool DEFAULT: True

Source code in src/pydvl/valuation/methods/least_core.py
def __init__(
    self,
    utility: UtilityBase,
    n_samples: int,
    non_negative_subsidy: bool = False,
    solver_options: dict | None = None,
    progress: bool = True,
    seed: Seed | None = None,
    batch_size: int = 1,
):
    super().__init__(
        utility=utility,
        sampler=UniformSampler(
            index_iteration=NoIndexIteration, seed=seed, batch_size=batch_size
        ),
        n_samples=n_samples,
        non_negative_subsidy=non_negative_subsidy,
        solver_options=solver_options,
        progress=progress,
    )

fit

fit(data: Dataset) -> Valuation

Calculate the least core valuation on a dataset.

This method has to be called before calling values().

Calculating the least core valuation is a computationally expensive task that can be parallelized. To do so, call the fit() method inside a joblib.parallel_config context manager as follows:

from joblib import parallel_config

with parallel_config(n_jobs=4):
    valuation.fit(data)
Source code in src/pydvl/valuation/methods/least_core.py
def fit(self, data: Dataset) -> Valuation:
    """Calculate the least core valuation on a dataset.

    This method has to be called before calling `values()`.

    Calculating the least core valuation is a computationally expensive task that
    can be parallelized. To do so, call the `fit()` method inside a
    `joblib.parallel_config` context manager as follows:

    ```python
    from joblib import parallel_config

    with parallel_config(n_jobs=4):
        valuation.fit(data)
    ```

    """
    self._utility = self._utility.with_dataset(data)
    if self._n_samples is None:
        self._n_samples = _get_default_n_samples(
            sampler=self._sampler, indices=data.indices
        )

    algorithm = str(self._sampler)

    problem = create_least_core_problem(
        u=self._utility,
        sampler=self._sampler,
        n_samples=self._n_samples,
        progress=self._progress,
    )

    solution = lc_solve_problem(
        problem=problem,
        u=self._utility,
        algorithm=algorithm,
        non_negative_subsidy=self._non_negative_subsidy,
        solver_options=self._solver_options,
    )

    self.result = solution
    return self

values

values(sort: bool = False) -> ValuationResult

Returns a copy of the valuation result.

The valuation must have been run with fit() before calling this method.

PARAMETER DESCRIPTION
sort

Whether to sort the valuation result before returning it.

TYPE: bool DEFAULT: False

Returns: The result of the valuation.

Source code in src/pydvl/valuation/base.py
def values(self, sort: bool = False) -> ValuationResult:
    """Returns a copy of the valuation result.

    The valuation must have been run with `fit()` before calling this method.

    Args:
        sort: Whether to sort the valuation result before returning it.
    Returns:
        The result of the valuation.
    """
    if not self.is_fitted:
        raise NotFittedException(type(self))
    assert self.result is not None

    from copy import copy

    r = copy(self.result)
    if sort:
        r.sort()
    return r

create_least_core_problem

create_least_core_problem(
    u: UtilityBase, sampler: PowersetSampler, n_samples: int, progress: bool
) -> LeastCoreProblem

Create a Least Core problem from a utility and a sampler.

PARAMETER DESCRIPTION
u

Utility object with model, data and scoring function.

TYPE: UtilityBase

sampler

The sampler to use for the valuation.

TYPE: PowersetSampler

n_samples

The maximum number of samples to use for the valuation.

TYPE: int

progress

Whether to show a progress bar during the construction of the least-core problem.

TYPE: bool

RETURNS DESCRIPTION
LeastCoreProblem

The least core problem to solve.

TYPE: LeastCoreProblem

Source code in src/pydvl/valuation/methods/least_core.py
def create_least_core_problem(
    u: UtilityBase, sampler: PowersetSampler, n_samples: int, progress: bool
) -> LeastCoreProblem:
    """Create a Least Core problem from a utility and a sampler.

    Args:
        u: Utility object with model, data and scoring function.
        sampler: The sampler to use for the valuation.
        n_samples: The maximum number of samples to use for the valuation.
        progress: Whether to show a progress bar during the construction of the
            least-core problem.

    Returns:
        LeastCoreProblem: The least core problem to solve.

    """
    utility_values, masks = compute_utility_values_and_sample_masks(
        utility=u, sampler=sampler, n_samples=n_samples, progress=progress
    )

    return LeastCoreProblem(utility_values=utility_values, A_lb=masks.astype(float))