pydvl.valuation.result
¶
This module collects types and methods for the inspection of the results of valuation algorithms.
The most important class is ValuationResult,
which provides access to raw values, as well as convenient behaviour as a Sequence
with extended indexing and updating abilities, and conversion to pandas
DataFrames.
Indexing and slicing¶
Indexing and slicing of results is supported in a natural way and ValuationResult objects are returned. Indexing follows the sorting order. See the class documentation for more on this.
Setting items and slices is also possible with other valuation results. Index and name clashes are detected and raise an exception. Note that any sorted state is potentially lost when setting items or slices.
Addition¶
Results can be added together with the standard +
operator. Because values
are typically running averages of iterative algorithms, addition behaves like a
weighted average of the two results, with the weights being the number of
updates in each result: adding two results is the same as generating one result
with the mean of the values of the two results as values. The variances are
updated accordingly. See ValuationResult for
details.
Comparing¶
Results can be compared with the equality operator. The comparison is "semantic" in the
sense that it's the valuation for data indices that matters and not the order in which
they are in the ValuationResult
. Values, variances and counts are compared.
Sorting¶
Results can be sorted (also in-place) by value, variance or number of updates, see sort(). All the properties ValuationResult.values, ValuationResult.variances, ValuationResult.counts, ValuationResult.indices, ValuationResult.stderr, ValuationResult.names are then sorted according to the same order.
Updating¶
Updating results as new values arrive from workers in valuation algorithms can depend on the algorithm used. The most common case is to use the LogResultUpdater class, which uses the log-sum-exp trick to update the values and variances for better numerical stability. This is the default behaviour with the base IndexSampler, but other sampling schemes might require different ones. In particular, MSRResultUpdater must keep track of separate positive and negative updates.
Factories¶
Besides copy(),the most commonly used factory method is ValuationResult.zeros(), which creates a result object with all values, variances and counts set to zero.
ValuationResult.empty() creates an empty result object, which can be used as a starting point for adding results together. Any metadata in empty results is discarded when added to other results.
Finally, ValuationResult.from_random() samples random values uniformly.
LogResultUpdater
¶
LogResultUpdater(result: ValuationResult)
Bases: ResultUpdater[ValueUpdateT]
An object to update valuation results in log-space.
This updater keeps track of several quantities required to maintain accurate running 1st and 2nd moments. It also uses the log-sum-exp trick for numerical stability.
Source code in src/pydvl/valuation/result.py
ResultUpdater
¶
ResultUpdater(result: ValuationResult)
Bases: ABC
, Generic[ValueUpdateT]
Base class for result updaters.
A result updater is a strategy to update a valuation result with a value update. It is used by the valuation methods to process the ValueUpdates emitted by the EvaluationStrategy corresponding to the sampler.
Source code in src/pydvl/valuation/result.py
ValuationResult
¶
ValuationResult(
*,
values: Sequence[float64] | NDArray[float64],
variances: Sequence[float64] | NDArray[float64] | None = None,
counts: Sequence[int_] | NDArray[int_] | None = None,
indices: Sequence[IndexT] | NDArray[IndexT] | None = None,
data_names: Sequence[NameT] | NDArray[NameT] | None = None,
algorithm: str = "",
status: Status = Pending,
sort: bool | None = None,
**extra_values: Any,
)
Bases: Sequence
, Iterable[ValueItem]
Objects of this class hold the results of valuation algorithms.
These include indices in the original Dataset,
any data names (e.g. group names in GroupedDataset),
the values themselves, and variance of the computation in the case of Monte
Carlo methods. ValuationResults
can be iterated over like any Sequence
:
iter(valuation_result)
returns a generator of
ValueItem in the order in which the object
is sorted.
Indexing¶
Indexing is sort-based, when accessing any of the attributes values, variances, counts and indices, as well as when iterating over the object, or using the item access operator, both getter and setter. The "position" is either the original sequence in which the data was passed to the constructor, or the sequence in which the object has been sorted, see below. One can retrieve the sorted position for a given data index using the method positions().
Some methods use data indices instead. This is the case for get().
Sorting¶
Results can be sorted (also in-place) with
sort(), or alternatively using
python's standard sorted()
and reversed()
Note that sorting values affects how
iterators and the object itself as Sequence
behave: values[0]
returns a
ValueItem with the highest or lowest ranking
point if this object is sorted by descending or ascending value, respectively.the methods If
unsorted, values[0]
returns the ValueItem
at position 0, which has data index
indices[0]
in the Dataset.
The same applies to direct indexing of the ValuationResult
: the index
is positional, according to the sorting. It does not refer to the "data
index". To sort according to data index, use
sort() with key="index"
.
In order to access ValueItem objects by their data index, use get(), or use positions() to convert data indices to positions.
Converting back and forth from data indices and positions
data_indices = result.indices[result.positions(data_indices)]
is a noop.
Operating on results¶
Results can be added to each other with the +
operator. Means and variances
are correctly updated accordingly using the Welford algorithm.
Empty objects behave in a special way, see empty().
PARAMETER | DESCRIPTION |
---|---|
values
|
An array of values. If omitted, defaults to an empty array
or to an array of zeros if |
indices
|
An optional array of indices in the original dataset. If
omitted, defaults to
TYPE:
|
variances
|
An optional array of variances of the marginals from which the values are computed.
TYPE:
|
counts
|
An optional array with the number of updates for each value. Defaults to an array of ones. |
data_names
|
Names for the data points. Defaults to index numbers if not set. |
algorithm
|
The method used.
TYPE:
|
status
|
The end status of the algorithm.
TYPE:
|
sort
|
Whether to sort the indices. Defaults to
TYPE:
|
extra_values
|
Additional values that can be passed as keyword arguments. This can contain, for example, the least core value.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
If input arrays have mismatching lengths. |
Changed in 0.10.0
Changed the behaviour of sorting, slicing, and indexing.
Source code in src/pydvl/valuation/result.py
indices
property
¶
indices: NDArray[IndexT]
The indices for the values, possibly sorted.
If the object is unsorted, then these are the same as declared at
construction or np.arange(len(values))
if none were passed.
names
property
¶
names: NDArray[NameT]
The names for the values, possibly sorted.
If the object is unsorted, then these are the same as declared at
construction or np.arange(len(values))
if none were passed.
variances
property
¶
Variances of the marginals from which values were computed, possibly sorted.
Note that this is not the variance of the value estimate, but the sample variance of the marginals used to compute it.
__add__
¶
__add__(other: ValuationResult) -> ValuationResult
Adds two ValuationResults.
The values must have been computed with the same algorithm. An exception to this is if one argument has empty or all-zero values, in which case the other argument is returned.
Danger
Abusing this will introduce numerical errors.
Means and standard errors are correctly handled. Statuses are added with
bit-wise &
, see Status.
data_names
are taken from the left summand, or if unavailable from
the right one. The algorithm
string is carried over if both terms
have the same one or concatenated.
It is possible to add ValuationResults of different lengths, and with different or overlapping indices. The result will have the union of indices, and the values.
Warning
FIXME: Arbitrary extra_values
aren't handled.
Source code in src/pydvl/valuation/result.py
744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 |
|
__getattr__
¶
Allows access to extra values as if they were properties of the instance.
Source code in src/pydvl/valuation/result.py
__getitem__
¶
__getitem__(key: int) -> ValuationResult
__getitem__(key: slice) -> ValuationResult
__getitem__(key: Iterable[int]) -> ValuationResult
Get a ValuationResult for the given key.
The key can be an integer, a slice, or an iterable of integers.
The returned object is a new ValuationResult
with all metadata copied, except
for the sorting order. If the key is a slice or sequence, the returned object
will contain the items in the order specified by the sequence.
RETURNS | DESCRIPTION |
---|---|
ValuationResult
|
A new object containing only the selected items. |
Source code in src/pydvl/valuation/result.py
__iter__
¶
Iterate over the results returning ValueItem objects. To sort in place before iteration, use sort().
Source code in src/pydvl/valuation/result.py
__setitem__
¶
__setitem__(key: int, value: ValuationResult) -> None
__setitem__(key: slice, value: ValuationResult) -> None
__setitem__(key: Iterable[int], value: ValuationResult) -> None
Set items in the ValuationResult
using another ValuationResult
.
This method provides a symmetrical counterpart to __getitem__
, both
operating on ValuationResult
objects.
The key can be an integer, a slice, or an iterable of integers.
The value must be a ValuationResult
with length matching the number of
positions specified by key.
PARAMETER | DESCRIPTION |
---|---|
key
|
Position(s) to set |
value
|
A ValuationResult to set at the specified position(s)
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If value is not a ValuationResult |
ValueError
|
If value's length doesn't match the number of positions specified by the key |
Source code in src/pydvl/valuation/result.py
copy
¶
copy() -> ValuationResult
Returns a copy of the object.
Source code in src/pydvl/valuation/result.py
empty
classmethod
¶
empty(algorithm: str = '', **kwargs: dict[str, Any]) -> ValuationResult
Creates an empty ValuationResult object.
Empty results are characterised by having an empty array of values.
Tip
When a result is added to an empty one, the empty one is entirely discarded.
PARAMETER | DESCRIPTION |
---|---|
algorithm
|
Name of the algorithm used to compute the values
TYPE:
|
kwargs
|
Additional options to pass to the constructor of ValuationResult. Use to override status, extra_values, etc. |
Returns: Object with the results.
Source code in src/pydvl/valuation/result.py
from_random
classmethod
¶
from_random(
size: int, total: float | None = None, seed: Seed | None = None, **kwargs
) -> ValuationResult
Creates a ValuationResult object and fills it with an array of random values from a uniform distribution in [-1,1]. The values can be made to sum up to a given total number (doing so will change their range).
PARAMETER | DESCRIPTION |
---|---|
size
|
Number of values to generate
TYPE:
|
total
|
If set, the values are normalized to sum to this number ("efficiency" property of Shapley values).
TYPE:
|
seed
|
Random seed to use
TYPE:
|
kwargs
|
Additional options to pass to the constructor of ValuationResult. Use to override status, names, etc.
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
ValuationResult
|
A valuation result with its status set to |
ValuationResult
|
Status.Converged by default. |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If |
Changed in version 0.6.0
Added parameter total
. Check for zero size
Source code in src/pydvl/valuation/result.py
get
¶
get(data_idx: IndexT) -> ValueItem
Retrieves a ValueItem object by data index, as opposed to sort index, like the indexing operator.
PARAMETER | DESCRIPTION |
---|---|
data_idx
|
Data index of the value to retrieve.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
IndexError
|
If the index is not found. |
Source code in src/pydvl/valuation/result.py
positions
¶
positions(data_indices: IndexSetT | list[IndexT]) -> IndexSetT
Return the location (indices) within the ValuationResult
for the given
data indices.
Sorting is taken into account. This operation is the inverse of indexing the indices property:
Source code in src/pydvl/valuation/result.py
scale
¶
Scales the values and variances of the result by a coefficient.
PARAMETER | DESCRIPTION |
---|---|
factor
|
Factor to scale by.
TYPE:
|
data_indices
|
Data indices to scale. If
TYPE:
|
Source code in src/pydvl/valuation/result.py
set
¶
set(data_idx: IndexT, value: ValueItem) -> Self
Set a ValueItem in the result by its data index.
This is the complement to the get()
method and allows setting individual ValueItems
directly by their data index
rather than (sort-) position.
PARAMETER | DESCRIPTION |
---|---|
data_idx
|
Data index of the value to set
TYPE:
|
value
|
The data to set
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
A reference to self for method chaining |
RAISES | DESCRIPTION |
---|---|
IndexError
|
If the index is not found |
ValueError
|
If the |
Source code in src/pydvl/valuation/result.py
sort
¶
sort(
reverse: bool = False,
key: Literal["value", "variance", "index", "name"] = "value",
inplace: bool = False,
) -> ValuationResult
Sorts the indices in ascending order by key
.
Once sorted, iteration over the results, and indexing of all the properties ValuationResult.values, ValuationResult.variances, ValuationResult.stderr, ValuationResult.counts, ValuationResult.indices and ValuationResult.names will follow the same order.
PARAMETER | DESCRIPTION |
---|---|
reverse
|
Whether to sort in descending order.
TYPE:
|
key
|
The key to sort by. Defaults to ValueItem.value.
TYPE:
|
inplace
|
Whether to sort the object in place or return a new object.
TYPE:
|
Returns:
A new object with the sorted values, or the same object, sorted, if
inplace
is True
.
Source code in src/pydvl/valuation/result.py
to_dataframe
¶
Returns values as a dataframe.
PARAMETER | DESCRIPTION |
---|---|
column
|
Name for the column holding the data value. Defaults to the name of the algorithm used.
TYPE:
|
use_names
|
Whether to use data names instead of indices for the DataFrame's index.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
A dataframe with three columns: |
Source code in src/pydvl/valuation/result.py
zeros
classmethod
¶
zeros(
algorithm: str = "",
indices: IndexSetT | None = None,
data_names: Sequence[NameT] | NDArray[NameT] | None = None,
size: int = 0,
**kwargs: dict[str, Any],
) -> ValuationResult
Creates a ValuationResult filled with zeros.
Info
When a result is added to a zeroed one, the zeroed one is entirely discarded.
PARAMETER | DESCRIPTION |
---|---|
algorithm
|
Name of the algorithm used to compute the values
TYPE:
|
indices
|
Data indices to use. A copy will be made. If not given,
the indices will be set to the range
TYPE:
|
data_names
|
Data names to use. A copy will be made. If not given, the names will be set to the string representation of the indices. |
size
|
Number of data points whose values are computed. If
not given, the length of
TYPE:
|
kwargs
|
Additional options to pass to the constructor of ValuationResult. Use to override status, extra_values, etc. |
Returns: Object with the results.
Source code in src/pydvl/valuation/result.py
ValueItem
dataclass
¶
The result of a value computation for one datum.
ValueItems
can be compared with the usual operators, forming a total
order. Comparisons take only the idx
, name
and value
into account.
Todo
Maybe have a mode of comparison taking the variance
into account.
ATTRIBUTE | DESCRIPTION |
---|---|
idx |
Index of the sample with this value in the original Dataset
TYPE:
|
name |
Name of the sample if it was provided. Otherwise,
TYPE:
|
value |
The value
TYPE:
|
variance |
Variance of the marginals from which the value was computed.
TYPE:
|
count |
Number of updates for this value
TYPE:
|
load_result
¶
load_result(
file: str | PathLike | IOBase, ignore_missing: bool = False
) -> ValuationResult | None
Load a valuation result from a file or file-like object.
The file or stream must be in the format used by save()
. If the file does not
exist, the method does nothing.
PARAMETER | DESCRIPTION |
---|---|
file
|
The name or path of the file to load, or a file-like object. |
ignore_missing
|
If
TYPE:
|
Raises:
FileNotFoundError: If the file does not exist and ignore_exists
is False
.
Source code in src/pydvl/valuation/result.py
save_result
¶
save_result(result: ValuationResult, file: str | PathLike | IOBase)
Save the valuation result to a file or file-like object.
The file or stream must be in the format used by load()
. If the file already
exists, it will be overwritten.
PARAMETER | DESCRIPTION |
---|---|
result
|
The valuation result to save.
TYPE:
|
file
|
The name or path of the file to save to, or a file-like object. |