pydvl.valuation.result ¶

This module collects types and methods for the inspection of the results of valuation algorithms.

The most important class is ValuationResult, which provides access to raw values, as well as convenient behaviour as a Sequence with extended indexing and updating abilities, and conversion to pandas DataFrames.

Indexing and slicing¶

Indexing and slicing of results is supported in a natural way and ValuationResult objects are returned. Indexing follows the sorting order. See the class documentation for more on this.

Setting items and slices is also possible with other valuation results. Index and name clashes are detected and raise an exception. Note that any sorted state is potentially lost when setting items or slices.

Addition¶

Results can be added together with the standard + operator. Because values are typically running averages of iterative algorithms, addition behaves like a weighted average of the two results, with the weights being the number of updates in each result: adding two results is the same as generating one result with the mean of the values of the two results as values. The variances are updated accordingly. See ValuationResult for details.

Comparing¶

Results can be compared with the equality operator. The comparison is "semantic" in the sense that it's the valuation for data indices that matters and not the order in which they are in the ValuationResult. Values, variances and counts are compared.

Sorting¶

Results can be sorted (also in-place) by value, variance or number of updates, see sort(). All the properties ValuationResult.values, ValuationResult.variances, ValuationResult.counts, ValuationResult.indices, ValuationResult.stderr, ValuationResult.names are then sorted according to the same order.

Updating¶

Updating results as new values arrive from workers in valuation algorithms can depend on the algorithm used. The most common case is to use the LogResultUpdater class, which uses the log-sum-exp trick to update the values and variances for better numerical stability. This is the default behaviour with the base IndexSampler, but other sampling schemes might require different ones. In particular, MSRResultUpdater must keep track of separate positive and negative updates.

Factories¶

Besides copy(),the most commonly used factory method is ValuationResult.zeros(), which creates a result object with all values, variances and counts set to zero.

ValuationResult.empty() creates an empty result object, which can be used as a starting point for adding results together. Any metadata in empty results is discarded when added to other results.

Finally, ValuationResult.from_random() samples random values uniformly.

LogResultUpdater ¶

LogResultUpdater(result: ValuationResult)

Bases: ResultUpdater[ValueUpdateT]

An object to update valuation results in log-space.

This updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments. It also uses the log-sum-exp trick for numerical stability.

Source code in src/pydvl/valuation/result.py

def __init__(self, result: ValuationResult):
    super().__init__(result)
    self._log_sum_positive = np.full_like(result.values, -np.inf)

    pos = result.values > 0
    self._log_sum_positive[pos] = np.log(result.values[pos] * result.counts[pos])
    self._log_sum_negative = np.full_like(result.values, -np.inf)

    neg = result.values < 0
    self._log_sum_negative[neg] = np.log(-result.values[neg] * result.counts[neg])
    self._log_sum2 = np.full_like(result.values, -np.inf)

    nz = result.values != 0
    x2 = (
        result.variances[nz] * np.maximum(1, result.counts[nz] - 1) ** 2
        + result.values[nz] ** 2 * result.counts[nz]
    )
    self._log_sum2[nz] = np.log(x2)

ResultUpdater ¶

ResultUpdater(result: ValuationResult)

Bases: ABC, Generic[ValueUpdateT]

Base class for result updaters.

A result updater is a strategy to update a valuation result with a value update. It is used by the valuation methods to process the ValueUpdates emitted by the EvaluationStrategy corresponding to the sampler.

Source code in src/pydvl/valuation/result.py

def __init__(self, result: ValuationResult):
    self.result = result
    self.n_updates = 0

ValuationResult ¶

ValuationResult(
    *,
    values: Sequence[float64] | NDArray[float64] | Array,
    variances: Sequence[float64] | NDArray[float64] | Array | None = None,
    counts: Sequence[int_] | NDArray[int_] | Array | None = None,
    indices: Sequence[IndexT] | NDArray[IndexT] | Array | None = None,
    data_names: Sequence[NameT] | NDArray[NameT] | Array | None = None,
    algorithm: str = "",
    status: Status = Pending,
    sort: bool | None = None,
    **extra_values: Any,
)

Bases: Sequence, Iterable[ValueItem]

Objects of this class hold the results of valuation algorithms.

These include indices in the original Dataset, any data names (e.g. group names in GroupedDataset), the values themselves, and variance of the computation in the case of Monte Carlo methods. ValuationResults can be iterated over like any Sequence: iter(valuation_result) returns a generator of ValueItem in the order in which the object is sorted.

Indexing¶

Indexing is sort-based, when accessing any of the attributes values, variances, counts and indices, as well as when iterating over the object, or using the item access operator, both getter and setter. The "position" is either the original sequence in which the data was passed to the constructor, or the sequence in which the object has been sorted, see below. One can retrieve the sorted position for a given data index using the method positions().

Some methods use data indices instead. This is the case for get().

Sorting¶

Results can be sorted (also in-place) with sort(), or alternatively using python's standard sorted() and reversed() Note that sorting values affects how iterators and the object itself as Sequence behave: values[0] returns a ValueItem with the highest or lowest ranking point if this object is sorted by descending or ascending value, respectively. If unsorted, values[0] returns the ValueItem at position 0, which has data index indices[0] in the Dataset.

The same applies to direct indexing of the ValuationResult: the index is positional, according to the sorting. It does not refer to the "data index". To sort according to data index, use sort() with key="index".

In order to access ValueItem objects by their data index, use get(), or use positions() to convert data indices to positions.

Converting back and forth from data indices and positions

data_indices = result.indices[result.positions(data_indices)] is a noop.

Operating on results¶

Results can be added to each other with the + operator. Means and variances are correctly updated accordingly using the Welford algorithm.

Empty objects behave in a special way, see empty().

PARAMETER	DESCRIPTION
`values`	An array of values. If omitted, defaults to an empty array or to an array of zeros if `indices` are given. TYPE: `Sequence[float64] \| NDArray[float64] \| Array`
`indices`	An optional array of indices in the original dataset. If omitted, defaults to `np.arange(len(values))`. Warning: It is common to pass the indices of a Dataset here. Attention must be paid in a parallel context to copy them to the local process. Just do `indices=np.copy(data.indices)`. TYPE: `Sequence[IndexT] \| NDArray[IndexT] \| Array \| None` DEFAULT: `None`
`variances`	An optional array of variances of the marginals from which the values are computed. TYPE: `Sequence[float64] \| NDArray[float64] \| Array \| None` DEFAULT: `None`
`counts`	An optional array with the number of updates for each value. Defaults to an array of ones. TYPE: `Sequence[int_] \| NDArray[int_] \| Array \| None` DEFAULT: `None`
`data_names`	Names for the data points. Defaults to index numbers if not set. TYPE: `Sequence[NameT] \| NDArray[NameT] \| Array \| None` DEFAULT: `None`
`algorithm`	The method used. TYPE: `str` DEFAULT: `''`
`status`	The end status of the algorithm. TYPE: `Status` DEFAULT: `Pending`
`sort`	Whether to sort the indices. Defaults to `None` for no sorting. Set to `True` for ascending order by value, `False` for descending. See above how sorting affects usage as an iterable or sequence. TYPE: `bool \| None` DEFAULT: `None`
`extra_values`	Additional values that can be passed as keyword arguments. This can contain, for example, the least core value. TYPE: `Any` DEFAULT: `{}`

RAISES	DESCRIPTION
`ValueError`	If input arrays have mismatching lengths.

Tensor Support

PyTorch tensors are not accepted as inputs to ValuationResult. You must explicitly convert tensor inputs to numpy arrays using .cpu().numpy(). ValuationResult requires numpy arrays internally for efficiency and interoperability with other tools.

Changed in 0.10.0

Changed the behaviour of sorting, slicing, and indexing.

Source code in src/pydvl/valuation/result.py

def __init__(
    self,
    *,
    values: Sequence[np.float64] | NDArray[np.float64] | Array,
    variances: Sequence[np.float64] | NDArray[np.float64] | Array | None = None,
    counts: Sequence[np.int_] | NDArray[np.int_] | Array | None = None,
    indices: Sequence[IndexT] | NDArray[IndexT] | Array | None = None,
    data_names: Sequence[NameT] | NDArray[NameT] | Array | None = None,
    algorithm: str = "",
    status: Status = Status.Pending,
    sort: bool | None = None,
    **extra_values: Any,
):
    if (
        is_tensor(values)
        or (variances is not None and is_tensor(variances))
        or (counts is not None and is_tensor(counts))
        or (indices is not None and is_tensor(indices))
        or (data_names is not None and is_tensor(data_names))
    ):
        raise TypeError(
            "ValuationResult requires numpy arrays. "
            "Please convert tensor to numpy using tensor.cpu().numpy() explicitly."
        )

    # Convert non-tensor sequences to numpy arrays
    values = to_numpy(values)
    variances = to_numpy(variances) if variances is not None else None
    counts = to_numpy(counts) if counts is not None else None
    indices = to_numpy(indices) if indices is not None else None
    data_names = to_numpy(data_names) if data_names is not None else None

    if variances is not None and len(variances) != len(values):
        raise ValueError(
            f"Lengths of values ({len(values)}) "
            f"and variances ({len(variances)}) do not match"
        )
    if data_names is not None and len(data_names) != len(values):
        raise ValueError(
            f"Lengths of values ({len(values)}) "
            f"and data_names ({len(data_names)}) do not match"
        )
    if indices is not None and len(indices) != len(values):
        raise ValueError(
            f"Lengths of values ({len(values)}) "
            f"and indices ({len(indices)}) do not match"
        )

    self._algorithm = algorithm
    self._status = Status(status)  # Just in case we are given a string
    self._values = np.asarray(values, dtype=np.float64)
    self._variances = (
        np.zeros_like(values) if variances is None else np.asarray(variances)
    )
    self._counts = (
        np.ones_like(values, dtype=int) if counts is None else np.asarray(counts)
    )
    self._sort_order = None
    self._extra_values = extra_values or {}

    # Internal indices -> data indices
    self._data_indices = self._create_indices_array(indices, len(self._values))
    self._names = self._create_names_array(data_names, self._data_indices)

    # Data indices -> Internal indices
    self._indices = {idx: pos for pos, idx in enumerate(self._data_indices)}

    # Sorted indices ("positions") -> Internal indices
    self._positions_to_indices = np.arange(len(self._values), dtype=np.int_)

    # Internal indices -> Sorted indices ("positions")
    self._indices_to_positions = np.arange(len(self._values), dtype=np.int_)

    if sort is not None:
        self.sort(reverse=not sort, inplace=True)

counts `property` ¶

counts: NDArray[int_]

The raw counts, possibly sorted.

indices `property` ¶

indices: NDArray[IndexT]

The data indices, possibly sorted.

If the object is unsorted, then these are the same as declared at construction. If no data indices were manually assigned, then they are just consecutive integers starting from zero.

names `property` ¶

names: NDArray[NameT]

The names for the values, possibly sorted. If the object is unsorted, then these are the same as declared at construction or np.arange(len(values)) if none were passed.

stderr `property` ¶

stderr: NDArray[float64]

Standard errors of the value estimates, possibly sorted.

values `property` ¶

values: NDArray[float64]

The values, possibly sorted.

variances `property` ¶

variances: NDArray[float64]

Variances of the marginals from which values were computed, possibly sorted.

Note that this is not the variance of the value estimate, but the sample variance of the marginals used to compute it.

add ¶

__add__(other: ValuationResult) -> ValuationResult

Adds two ValuationResults.

The values must have been computed with the same algorithm. An exception to this is if one argument has empty or all-zero values, in which case the other argument is returned.

Danger

Abusing this will introduce numerical errors.

Means and standard errors are correctly handled. Statuses are added with bit-wise &, see Status. data_names are taken from the left summand, or if unavailable from the right one. The algorithm string is carried over if both terms have the same one or concatenated.

It is possible to add ValuationResults of different lengths, and with different or overlapping indices. The result will have the union of indices, and the values.

Warning

FIXME: Arbitrary extra_values aren't handled.

Source code in src/pydvl/valuation/result.py

def __add__(self, other: ValuationResult) -> ValuationResult:
    """Adds two ValuationResults.

    The values must have been computed with the same algorithm. An exception to this
    is if one argument has empty or all-zero values, in which case the other
    argument is returned.

    !!! danger
        Abusing this will introduce numerical errors.

    Means and standard errors are correctly handled. Statuses are added with
    bit-wise `&`, see [Status][pydvl.valuation.result.Status].
    `data_names` are taken from the left summand, or if unavailable from
    the right one. The `algorithm` string is carried over if both terms
    have the same one or concatenated.

    It is possible to add ValuationResults of different lengths, and with
    different or overlapping indices. The result will have the union of
    indices, and the values.

    !!! Warning
        FIXME: Arbitrary `extra_values` aren't handled.

    """
    self._check_compatible(other)

    if len(self.values) == 0 or (all(self.values == 0.0) and all(self.counts == 0)):
        return other
    if len(other.values) == 0 or (
        all(other.values == 0.0) and all(other.counts == 0)
    ):
        return self

    indices = np.union1d(self._data_indices, other._data_indices).astype(
        self._data_indices.dtype
    )
    this_pos = np.searchsorted(indices, self._data_indices)
    other_pos = np.searchsorted(indices, other._data_indices)

    n: NDArray[np.int_] = np.zeros_like(indices, dtype=int)
    m: NDArray[np.int_] = np.zeros_like(indices, dtype=int)
    xn: NDArray[np.int_] = np.zeros_like(indices, dtype=float)
    xm: NDArray[np.int_] = np.zeros_like(indices, dtype=float)
    vn: NDArray[np.int_] = np.zeros_like(indices, dtype=float)
    vm: NDArray[np.int_] = np.zeros_like(indices, dtype=float)

    n[this_pos] = self._counts
    xn[this_pos] = self._values
    vn[this_pos] = self._variances
    m[other_pos] = other._counts
    xm[other_pos] = other._values
    vm[other_pos] = other._variances

    # np.maximum(1, n + m) covers case n = m = 0.
    n_m_sum = np.maximum(1, n + m)

    # Sample mean of n+m samples from two means of n and m samples
    xnm = (n * xn + m * xm) / n_m_sum

    # Sample variance of n+m samples from two sample variances of n and m samples
    vnm = (n * (vn + xn**2) + m * (vm + xm**2)) / n_m_sum - xnm**2

    if np.any(vnm < 0):
        if np.any(vnm < -1e-6):
            logger.warning(
                "Numerical error in variance computation. "
                f"Negative sample variances clipped to 0 in {vnm}"
            )
        vnm[np.where(vnm < 0)] = 0

    # Merging of names:
    # If an index has the same name in both results, it must be the same.
    # If an index has a name in one result but not the other, the name is
    # taken from the result with the name.
    if self._names.dtype != other._names.dtype:
        if np.can_cast(other._names.dtype, self._names.dtype, casting="safe"):
            logger.warning(
                f"Casting ValuationResult.names from {other._names.dtype} to "
                f"{self._names.dtype}"
            )
            other._names = other._names.astype(self._names.dtype)
        else:
            raise TypeError(
                f"Cannot cast ValuationResult.names from "
                f"{other._names.dtype} to {self._names.dtype}"
            )

    both_pos = np.intersect1d(this_pos, other_pos)

    if len(both_pos) > 0:
        this_names: NDArray = np.empty_like(indices, dtype=np.str_)
        other_names: NDArray = np.empty_like(indices, dtype=np.str_)
        this_names[this_pos] = self._names
        other_names[other_pos] = other._names

        this_shared_names = np.take(this_names, both_pos)
        other_shared_names = np.take(other_names, both_pos)

        if np.any(this_shared_names != other_shared_names):
            raise ValueError("Mismatching names in ValuationResults")

    names = np.empty_like(indices, dtype=self._names.dtype)
    names[this_pos] = self._names
    names[other_pos] = other._names

    return ValuationResult(
        algorithm=self.algorithm or other.algorithm or "",
        status=self.status & other.status,
        indices=indices,
        values=xnm,
        variances=vnm,
        counts=n + m,
        data_names=names,
        # FIXME: What to do with extra_values? This is not commutative:
        # extra_values=self._extra_values.update(other._extra_values),
    )

getattr ¶

__getattr__(attr: str) -> Any

Allows access to extra values as if they were properties of the instance.

Source code in src/pydvl/valuation/result.py

def __getattr__(self, attr: str) -> Any:
    """Allows access to extra values as if they were properties of the instance."""
    # This is here to avoid a RecursionError when copying or pickling the object
    if attr == "_extra_values":
        if "_extra_values" in self.__dict__:
            return self.__dict__["_extra_values"]
        return {}  # Return empty dict as fallback to prevent pickle from failing
    try:
        return self._extra_values[attr]
    except KeyError as e:
        raise AttributeError(
            f"{self.__class__.__name__} object has no attribute {attr}"
        ) from e

getitem ¶

__getitem__(key: int) -> ValuationResult

__getitem__(key: slice) -> ValuationResult

__getitem__(key: Iterable[int]) -> ValuationResult

__getitem__(key: Union[slice, Iterable[int], int]) -> ValuationResult

Get a ValuationResult for the given key.

The key can be an integer, a slice, or an iterable of integers. The returned object is a new ValuationResult with all metadata copied, except for the sorting order. If the key is a slice or sequence, the returned object will contain the items in the order specified by the sequence.

RETURNS	DESCRIPTION
`ValuationResult`	A new object containing only the selected items.

Source code in src/pydvl/valuation/result.py

def __getitem__(self, key: Union[slice, Iterable[int], int]) -> ValuationResult:
    """Get a ValuationResult for the given key.

    The key can be an integer, a slice, or an iterable of integers.
    The returned object is a new `ValuationResult` with all metadata copied, except
    for the sorting order. If the key is a slice or sequence, the returned object
    will contain the items **in the order specified by the sequence**.

    Returns:
        A new object containing only the selected items.
    """

    positions = self._key_to_positions(key)

    # Convert positions to original indices in the sort order
    sort_indices = self._positions_to_indices[positions]

    return ValuationResult(
        values=self._values[sort_indices].copy(),
        variances=self._variances[sort_indices].copy(),
        counts=self._counts[sort_indices].copy(),
        indices=self._data_indices[sort_indices].copy(),
        data_names=self._names[sort_indices].copy(),
        algorithm=self._algorithm,
        status=self._status,
        # sort=self._sort_order,  # makes no sense
        **self._extra_values,
    )

iter ¶

__iter__() -> Iterator[ValueItem]

Iterate over the results returning ValueItem objects. To sort in place before iteration, use sort().

Source code in src/pydvl/valuation/result.py

def __iter__(self) -> Iterator[ValueItem]:
    """Iterate over the results returning
    [ValueItem][pydvl.valuation.result.ValueItem] objects. To sort in
    place before iteration, use
    [sort()][pydvl.valuation.result.ValuationResult.sort].
    """
    for pos in self._positions_to_indices:
        yield ValueItem(
            self._data_indices[pos],
            self._names[pos],
            self._values[pos],
            self._variances[pos],
            self._counts[pos],
        )

setitem ¶

__setitem__(key: int, value: ValuationResult) -> None

__setitem__(key: slice, value: ValuationResult) -> None

__setitem__(key: Iterable[int], value: ValuationResult) -> None

__setitem__(
    key: Union[slice, Iterable[int], int], value: ValuationResult
) -> None

Set items in the ValuationResult using another ValuationResult.

This method provides a symmetrical counterpart to __getitem__, both operating on ValuationResult objects.

The key can be an integer, a slice, or an iterable of integers. The value must be a ValuationResult with length matching the number of positions specified by key.

PARAMETER	DESCRIPTION
`key`	Position(s) to set TYPE: `Union[slice, Iterable[int], int]`
`value`	A ValuationResult to set at the specified position(s) TYPE: `ValuationResult`

RAISES	DESCRIPTION
`TypeError`	If value is not a ValuationResult
`ValueError`	If value's length doesn't match the number of positions specified by the key

Source code in src/pydvl/valuation/result.py

def __setitem__(
    self, key: Union[slice, Iterable[int], int], value: ValuationResult
) -> None:
    """Set items in the `ValuationResult` using another `ValuationResult`.

    This method provides a symmetrical counterpart to `__getitem__`, both
    operating on `ValuationResult` objects.

    The key can be an integer, a slice, or an iterable of integers.
    The value must be a `ValuationResult` with length matching the number of
    positions specified by key.

    Args:
        key: Position(s) to set
        value: A ValuationResult to set at the specified position(s)

    Raises:
        TypeError: If value is not a ValuationResult
        ValueError: If value's length doesn't match the number of positions
            specified by the key
    """
    if not isinstance(value, ValuationResult):
        raise TypeError(
            f"Value must be a ValuationResult, got {type(value)}. "
            f"To set individual ValueItems, use the set() method "
            f"instead."
        )

    positions = self._key_to_positions(key)

    if len(value) != len(positions):
        raise ValueError(
            f"Cannot set {len(positions)} positions with a ValuationResult of "
            f"length {len(value)}"
        )

    # Convert sorted positions (user-facing) to original indices in the sort order
    destination = self._positions_to_indices[positions]
    # For the source, use the first sorted n items
    source = list(range(len(positions)))

    # Check that the operation won't result in duplicate indices or names
    new_indices = self._data_indices.copy()
    new_indices[destination] = value.indices[source]
    new_names = self._names.copy()
    new_names[destination] = value.names[source]

    if len(np.unique(new_indices)) != len(new_indices):
        raise ValueError("Operation would result in duplicate indices")
    if len(np.unique(new_names)) != len(new_names):
        raise ValueError("Operation would result in duplicate names")

    # Update data index -> internal index mapping
    for data_idx in self._data_indices[destination]:
        del self._indices[data_idx]
    for data_idx, dest in zip(value.indices[source], destination):
        self._indices[data_idx] = dest

    self._data_indices[destination] = value.indices[source]
    self._names[destination] = value.names[source]
    self._values[destination] = value.values[source]
    self._variances[destination] = value.variances[source]
    self._counts[destination] = value.counts[source]

copy ¶

copy() -> ValuationResult

Returns a copy of the object.

Source code in src/pydvl/valuation/result.py

def copy(self) -> ValuationResult:
    """Returns a copy of the object."""
    return ValuationResult(
        values=self._values.copy(),
        variances=self._variances.copy(),
        counts=self._counts.copy(),
        indices=self._data_indices.copy(),
        data_names=self._names.copy(),
        algorithm=self._algorithm,
        status=self._status,
        sort=self._sort_order,
        **self._extra_values,
    )

empty `classmethod` ¶

empty(algorithm: str = '', **kwargs: dict[str, Any]) -> ValuationResult

Creates an empty ValuationResult object.

Empty results are characterised by having an empty array of values.

Tip

When a result is added to an empty one, the empty one is entirely discarded.

PARAMETER	DESCRIPTION
`algorithm`	Name of the algorithm used to compute the values TYPE: `str` DEFAULT: `''`
`kwargs`	Additional options to pass to the constructor of ValuationResult. Use to override status, extra_values, etc. TYPE: `dict[str, Any]` DEFAULT: `{}`

RETURNS	DESCRIPTION
`ValuationResult`	Object with the results.

RAISES	DESCRIPTION
`TypeError`	If any input contains tensor values (see class docstring).

Source code in src/pydvl/valuation/result.py

@classmethod
def empty(cls, algorithm: str = "", **kwargs: dict[str, Any]) -> ValuationResult:
    """Creates an empty [ValuationResult][pydvl.valuation.result.ValuationResult]
    object.

    Empty results are characterised by having an empty array of values.

    !!! tip
        When a result is added to an empty one, the empty one is entirely discarded.

    Args:
        algorithm: Name of the algorithm used to compute the values
        kwargs: Additional options to pass to the constructor of
            [ValuationResult][pydvl.valuation.result.ValuationResult]. Use to
            override status, extra_values, etc.

    Returns:
        Object with the results.

    Raises:
        TypeError: If any input contains tensor values (see class docstring).
    """
    options: dict[str, Any] = dict(
        algorithm=algorithm, status=Status.Pending, values=np.array([])
    )
    return cls(**(options | kwargs))

from_random `classmethod` ¶

from_random(
    size: int, total: float | None = None, seed: Seed | None = None, **kwargs
) -> ValuationResult

Creates a ValuationResult object and fills it with an array of random values from a uniform distribution in [-1,1]. The values can be made to sum up to a given total number (doing so will change their range).

PARAMETER	DESCRIPTION
`size`	Number of values to generate TYPE: `int`
`total`	If set, the values are normalized to sum to this number ("efficiency" property of Shapley values). TYPE: `float \| None` DEFAULT: `None`
`seed`	Random seed to use TYPE: `Seed \| None` DEFAULT: `None`
`kwargs`	Additional options to pass to the constructor of ValuationResult. Use to override status, names, etc. DEFAULT: `{}`

RETURNS	DESCRIPTION
`ValuationResult`	A valuation result with its status set to
`ValuationResult`	Status.Converged by default.

RAISES	DESCRIPTION
`ValueError`	If `size` is less than 1.
`TypeError`	If any input contains tensor values (see class docstring).

Changed in version 0.6.0

Added parameter total. Check for zero size

Source code in src/pydvl/valuation/result.py

@classmethod
def from_random(
    cls,
    size: int,
    total: float | None = None,
    seed: Seed | None = None,
    **kwargs,
) -> ValuationResult:
    """Creates a [ValuationResult][pydvl.valuation.result.ValuationResult] object
    and fills it with an array of random values from a uniform distribution in
    [-1,1]. The values can be made to sum up to a given total number (doing so will
    change their range).

    Args:
        size: Number of values to generate
        total: If set, the values are normalized to sum to this number
            ("efficiency" property of Shapley values).
        seed: Random seed to use
        kwargs: Additional options to pass to the constructor of
            [ValuationResult][pydvl.valuation.result.ValuationResult]. Use to
            override status, names, etc.

    Returns:
        A valuation result with its status set to
        [Status.Converged][pydvl.utils.status.Status] by default.

    Raises:
         ValueError: If `size` is less than 1.
         TypeError: If any input contains tensor values (see class docstring).

    !!! tip "Changed in version 0.6.0"
        Added parameter `total`. Check for zero size
    """
    if size < 1:
        raise ValueError("Size must be a positive integer")

    rng = np.random.default_rng(seed)
    values = rng.uniform(low=-1, high=1, size=size)
    if total is not None:
        values *= total / np.sum(values)

    options = dict(values=values, status=Status.Converged, algorithm="random")
    options.update(kwargs)
    return cls(**options)  # type: ignore

get ¶

get(data_idx: IndexT) -> ValueItem

Retrieves a ValueItem object by data index, as opposed to sort index, like the indexing operator.

PARAMETER	DESCRIPTION
`data_idx`	Data index of the value to retrieve. TYPE: `IndexT`

RAISES	DESCRIPTION
`IndexError`	If the index is not found.

Source code in src/pydvl/valuation/result.py

def get(self, data_idx: IndexT) -> ValueItem:
    """Retrieves a [ValueItem][pydvl.valuation.result.ValueItem] object by data
    index, as opposed to sort index, like the indexing operator.

    Args:
        data_idx: Data index of the value to retrieve.

    Raises:
         IndexError: If the index is not found.
    """
    try:
        pos = self._indices[data_idx]
    except KeyError:
        raise IndexError(f"Index {data_idx} not found in ValuationResult")

    return ValueItem(
        data_idx,
        self._names[pos],
        self._values[pos],
        self._variances[pos],
        self._counts[pos],
    )

positions ¶

positions(data_indices: IndexSetT | list[IndexT]) -> IndexSetT

Return the location (indices) within the ValuationResult for the given data indices.

Sorting is taken into account. This operation is the inverse of indexing the indices property:

np.all(v.indices[v.positions(data_indices)] == data_indices) == True

Source code in src/pydvl/valuation/result.py

def positions(self, data_indices: IndexSetT | list[IndexT]) -> IndexSetT:
    """Return the location (indices) within the `ValuationResult` for the given
    data indices.

    Sorting is taken into account. This operation is the inverse of indexing the
    [indices][pydvl.valuation.result.ValuationResult.indices] property:

        np.all(v.indices[v.positions(data_indices)] == data_indices) == True
    """
    indices = [self._indices[idx] for idx in data_indices]
    return self._indices_to_positions[indices]

scale ¶

scale(factor: float, data_indices: NDArray[IndexT] | None = None)

Scales the values and variances of the result by a coefficient.

PARAMETER	DESCRIPTION
`factor`	Factor to scale by. TYPE: `float`
`data_indices`	Data indices to scale. If `None`, all values are scaled. TYPE: `NDArray[IndexT] \| None` DEFAULT: `None`

Source code in src/pydvl/valuation/result.py

def scale(self, factor: float, data_indices: NDArray[IndexT] | None = None):
    """
    Scales the values and variances of the result by a coefficient.

    Args:
        factor: Factor to scale by.
        data_indices: Data indices to scale. If `None`, all values are scaled.
    """
    if data_indices is None:
        positions = None
    else:
        positions = [self._indices[idx] for idx in data_indices]
    self._values[positions] *= factor
    self._variances[positions] *= factor**2

set ¶

set(data_idx: IndexT, value: ValueItem) -> Self

Set a ValueItem in the result by its data index.

This is the complement to the [get()][ pydvl.valuation.result.ValuationResult.get] method and allows setting individual ValueItems directly by their data index rather than (sort-) position.

PARAMETER	DESCRIPTION
`data_idx`	Data index of the value to set TYPE: `IndexT`
`value`	The data to set TYPE: `ValueItem`

RETURNS	DESCRIPTION
`Self`	A reference to self for method chaining

RAISES	DESCRIPTION
`IndexError`	If the index is not found
`ValueError`	If the `ValueItem`'s idx doesn't match `data_idx`

Source code in src/pydvl/valuation/result.py

def set(self, data_idx: IndexT, value: ValueItem) -> Self:
    """Set a [ValueItem][pydvl.valuation.result.ValueItem] in the result by its data
    index.

    This is the complement to the [get()][
    pydvl.valuation.result.ValuationResult.get]
    method and allows setting individual `ValueItems` directly by their data index
    rather than (sort-) position.

    Args:
        data_idx: Data index of the value to set
        value: The data to set

    Returns:
        A reference to self for method chaining

    Raises:
        IndexError: If the index is not found
        ValueError: If the `ValueItem`'s idx doesn't match `data_idx`
    """
    if value.idx != data_idx:
        raise ValueError(
            f"ValueItem's idx ({value.idx}) doesn't match the provided "
            f"data_idx ({data_idx})"
        )

    try:
        pos = self._indices[data_idx]
    except KeyError:
        raise IndexError(f"Index {data_idx} not found in ValuationResult")

    self._data_indices[pos] = value.idx
    self._names[pos] = value.name
    self._values[pos] = value.value
    self._variances[pos] = value.variance
    self._counts[pos] = value.count

    return self

sort ¶

sort(
    reverse: bool = False,
    key: Literal["value", "variance", "index", "name"] = "value",
    inplace: bool = False,
) -> ValuationResult

Sorts the indices in ascending order by key.

Once sorted, iteration over the results, and indexing of all the properties ValuationResult.values, ValuationResult.variances, ValuationResult.stderr, ValuationResult.counts, ValuationResult.indices and ValuationResult.names will follow the same order.

PARAMETER	DESCRIPTION
`reverse`	Whether to sort in descending order. TYPE: `bool` DEFAULT: `False`
`key`	The key to sort by. Defaults to ValueItem.value. TYPE: `Literal['value', 'variance', 'index', 'name']` DEFAULT: `'value'`
`inplace`	Whether to sort the object in place or return a new object. TYPE: `bool` DEFAULT: `False`

Returns: A new object with the sorted values, or the same object, sorted, if inplace is True.

Source code in src/pydvl/valuation/result.py

def sort(
    self,
    reverse: bool = False,  # Need a "Comparable" type here
    key: Literal["value", "variance", "index", "name"] = "value",
    inplace: bool = False,
) -> ValuationResult:
    """Sorts the indices in ascending order by `key`.

    Once sorted, iteration over the results, and indexing of all the
    properties
    [ValuationResult.values][pydvl.valuation.result.ValuationResult.values],
    [ValuationResult.variances][pydvl.valuation.result.ValuationResult.variances],
    [ValuationResult.stderr][pydvl.valuation.result.ValuationResult.stderr],
    [ValuationResult.counts][pydvl.valuation.result.ValuationResult.counts],
    [ValuationResult.indices][pydvl.valuation.result.ValuationResult.indices]
    and [ValuationResult.names][pydvl.valuation.result.ValuationResult.names]
    will follow the same order.

    Args:
        reverse: Whether to sort in descending order.
        key: The key to sort by. Defaults to
            [ValueItem.value][pydvl.valuation.result.ValueItem].
        inplace: Whether to sort the object in place or return a new object.
    Returns:
        A new object with the sorted values, or the same object, sorted, if
            `inplace` is `True`.
    """
    keymap = {
        "index": "_data_indices",
        "value": "_values",
        "variance": "_variances",
        "name": "_names",
    }

    obj = self if inplace else self.copy()

    obj._positions_to_indices = np.argsort(getattr(self, keymap[key])).astype(int)
    if reverse:
        obj._positions_to_indices = obj._positions_to_indices[::-1]
    obj._sort_order = not reverse
    obj._indices_to_positions = np.argsort(obj._positions_to_indices).astype(int)
    return obj

to_dataframe ¶

to_dataframe(column: str | None = None, use_names: bool = False) -> DataFrame

Returns values as a dataframe.

PARAMETER	DESCRIPTION
`column`	Name for the column holding the data value. Defaults to the name of the algorithm used. TYPE: `str \| None` DEFAULT: `None`
`use_names`	Whether to use data names instead of indices for the DataFrame's index. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`DataFrame`	A dataframe with three columns: `name`, `name_variances` and `name_counts`, where `name` is the value of argument `column`.

Source code in src/pydvl/valuation/result.py

def to_dataframe(
    self, column: str | None = None, use_names: bool = False
) -> pd.DataFrame:
    """Returns values as a dataframe.

    Args:
        column: Name for the column holding the data value. Defaults to
            the name of the algorithm used.
        use_names: Whether to use data names instead of indices for the
            DataFrame's index.

    Returns:
        A dataframe with three columns: `name`, `name_variances` and
            `name_counts`, where `name` is the value of argument `column`.
    """
    column = column or self._algorithm
    df = pd.DataFrame(
        self._values[self._positions_to_indices],
        index=(
            self._names[self._positions_to_indices]
            if use_names
            else self._data_indices[self._positions_to_indices]
        ),
        columns=[column],
    )
    df[column + "_variances"] = self.variances[self._positions_to_indices]
    df[column + "_counts"] = self.counts[self._positions_to_indices]
    return df

zeros `classmethod` ¶

zeros(
    algorithm: str = "",
    indices: IndexSetT | None = None,
    data_names: Sequence[NameT] | NDArray[NameT] | None = None,
    size: int = 0,
    **kwargs: dict[str, Any],
) -> ValuationResult

Creates a ValuationResult filled with zeros.

Info

When a result is added to a zeroed one, the zeroed one is entirely discarded.

PARAMETER	DESCRIPTION
`algorithm`	Name of the algorithm used to compute the values TYPE: `str` DEFAULT: `''`
`indices`	Data indices to use. A copy will be made. If not given, the indices will be set to the range `[0, size)`. TYPE: `IndexSetT \| None` DEFAULT: `None`
`data_names`	Data names to use. A copy will be made. If not given, the names will be set to the string representation of the indices. TYPE: `Sequence[NameT] \| NDArray[NameT] \| None` DEFAULT: `None`
`size`	Number of data points whose values are computed. If not given, the length of `indices` will be used. TYPE: `int` DEFAULT: `0`
`kwargs`	Additional options to pass to the constructor of ValuationResult. Use to override status, extra_values, etc. TYPE: `dict[str, Any]` DEFAULT: `{}`

RETURNS	DESCRIPTION
`ValuationResult`	Object with the results.

RAISES	DESCRIPTION
`TypeError`	If any input contains tensor values (see class docstring).

Source code in src/pydvl/valuation/result.py

@classmethod
def zeros(
    cls,
    algorithm: str = "",
    indices: IndexSetT | None = None,
    data_names: Sequence[NameT] | NDArray[NameT] | None = None,
    size: int = 0,
    **kwargs: dict[str, Any],
) -> ValuationResult:
    """Creates a [ValuationResult][pydvl.valuation.result.ValuationResult] filled
    with zeros.

    !!! info
        When a result is added to a zeroed one, the zeroed one is entirely
        discarded.

    Args:
        algorithm: Name of the algorithm used to compute the values
        indices: Data indices to use. A copy will be made. If not given,
            the indices will be set to the range `[0, size)`.
        data_names: Data names to use. A copy will be made. If not given,
            the names will be set to the string representation of the indices.
        size: Number of data points whose values are computed. If
            not given, the length of `indices` will be used.
        kwargs: Additional options to pass to the constructor of
            [ValuationResult][pydvl.valuation.result.ValuationResult]. Use to
            override status, extra_values, etc.

    Returns:
        Object with the results.

    Raises:
        TypeError: If any input contains tensor values (see class docstring).
    """
    from pydvl.utils.array import is_tensor

    if (indices is not None and is_tensor(indices)) or (
        data_names is not None and is_tensor(data_names)
    ):
        raise TypeError(
            "ValuationResult requires numpy arrays. "
            "Please convert tensor to numpy using tensor.cpu().numpy() explicitly."
        )

    indices = cls._create_indices_array(indices, size)
    data_names = cls._create_names_array(data_names, indices)

    options: dict[str, Any] = dict(
        algorithm=algorithm,
        status=Status.Pending,
        indices=indices,
        data_names=data_names,
        values=np.zeros(len(indices)),
        variances=np.zeros(len(indices)),
        counts=np.zeros(len(indices), dtype=np.int_),
    )
    return cls(**(options | kwargs))

ValueItem `dataclass` ¶

ValueItem(
    idx: IndexT,
    name: NameT,
    value: float,
    variance: float | None,
    count: int | None,
)

The result of a value computation for one datum.

ValueItems can be compared with the usual operators, forming a total order. Comparisons take only the idx, name and value into account.

Todo

Maybe have a mode of comparison taking the variance into account.

ATTRIBUTE	DESCRIPTION
`idx`	Index of the sample with this value in the original Dataset TYPE: `IndexT`
`name`	Name of the sample if it was provided. Otherwise, `str(idx)` TYPE: `NameT`
`value`	The value TYPE: `float`
`variance`	Variance of the marginals from which the value was computed. TYPE: `float \| None`
`count`	Number of updates for this value TYPE: `int \| None`

load_result ¶

load_result(
    file: str | PathLike | IOBase, ignore_missing: bool = False
) -> ValuationResult | None

Load a valuation result from a file or file-like object.

The file or stream must be in the format used by save(). If the file does not exist, the method does nothing.

PARAMETER	DESCRIPTION
`file`	The name or path of the file to load, or a file-like object. TYPE: `str \| PathLike \| IOBase`
`ignore_missing`	If `True`, do not raise an error if the file does not exist. TYPE: `bool` DEFAULT: `False`

Raises: FileNotFoundError: If the file does not exist and ignore_exists is False.

Source code in src/pydvl/valuation/result.py

def load_result(
    file: str | os.PathLike | io.IOBase, ignore_missing: bool = False
) -> ValuationResult | None:
    """Load a valuation result from a file or file-like object.

    The file or stream must be in the format used by `save()`. If the file does not
    exist, the method does nothing.

    Args:
        file: The name or path of the file to load, or a file-like object.
        ignore_missing: If `True`, do not raise an error if the file does not exist.
    Raises:
        FileNotFoundError: If the file does not exist and `ignore_exists` is `False`.
    """

    try:
        if isinstance(file, (os.PathLike, str)):
            with open(file, "rb") as f:
                result = load(f)
        else:
            result = load(file)
        if not isinstance(result, ValuationResult):
            raise ValueError(
                f"Loaded object is not a ValuationResult but {type(result)}"
            )
    except FileNotFoundError as e:
        msg = f"File '{file}' not found. Cannot load valuation result."
        if ignore_missing:
            logger.debug(msg + " Ignoring.")
            return None
        raise FileNotFoundError(msg) from e
    except (ValueError, TypeError, AttributeError, ModuleNotFoundError) as e:
        raise ValueError(f"Failed to load valid ValuationResult: {e}") from e

    return result

save_result ¶

save_result(result: ValuationResult, file: str | PathLike | IOBase)

Save the valuation result to a file or file-like object.

The file or stream must be in the format used by load(). If the file already exists, it will be overwritten.

PARAMETER	DESCRIPTION
`result`	The valuation result to save. TYPE: `ValuationResult`
`file`	The name or path of the file to save to, or a file-like object. TYPE: `str \| PathLike \| IOBase`

Source code in src/pydvl/valuation/result.py

def save_result(result: ValuationResult, file: str | os.PathLike | io.IOBase):
    """Save the valuation result to a file or file-like object.

    The file or stream must be in the format used by `load()`. If the file already
    exists, it will be overwritten.

    Args:
        result: The valuation result to save.
        file: The name or path of the file to save to, or a file-like object.
    """
    if isinstance(file, Path):
        os.makedirs(file.parent, exist_ok=True)
    if isinstance(file, (os.PathLike, str)):
        with open(file, "wb") as f:
            dump(result, f, protocol=5)
    else:
        dump(result, file, protocol=5)

pydvl.valuation.result ¶

Indexing and slicing¶

Addition¶

Comparing¶

Sorting¶

Updating¶

Factories¶

LogResultUpdater ¶

ResultUpdater ¶

ValuationResult ¶

Indexing¶

Sorting¶

Operating on results¶

counts property ¶

indices property ¶

names property ¶

stderr property ¶

values property ¶

variances property ¶

__add__ ¶

__getattr__ ¶

__getitem__ ¶

__iter__ ¶

__setitem__ ¶

copy ¶

empty classmethod ¶

from_random classmethod ¶

get ¶

positions ¶

scale ¶

set ¶

sort ¶

to_dataframe ¶

zeros classmethod ¶

ValueItem dataclass ¶

load_result ¶

save_result ¶

counts `property` ¶

indices `property` ¶

names `property` ¶

stderr `property` ¶

values `property` ¶

variances `property` ¶

add ¶

getattr ¶

getitem ¶

iter ¶

setitem ¶

empty `classmethod` ¶

from_random `classmethod` ¶

zeros `classmethod` ¶

ValueItem `dataclass` ¶