pydvl.value.result
¶
This module collects types and methods for the inspection of the results of valuation algorithms.
The most important class is ValuationResult, which provides access
to raw values, as well as convenient behaviour as a Sequence
with extended
indexing and updating abilities, and conversion to pandas DataFrames.
Operating on results¶
Results can be added together with the standard +
operator. Because values
are typically running averages of iterative algorithms, addition behaves like a
weighted average of the two results, with the weights being the number of
updates in each result: adding two results is the same as generating one result
with the mean of the values of the two results as values. The variances are
updated accordingly. See ValuationResult for details.
Results can also be sorted by value, variance or number of updates, see sort(). The arrays of ValuationResult.values, ValuationResult.variances, ValuationResult.counts, ValuationResult.indices, ValuationResult.names are sorted in the same way.
Indexing and slicing of results is supported and ValueItem objects are returned. These objects can be compared with the usual operators, which take only the ValueItem.value into account.
Creating result objects¶
The most commonly used factory method is ValuationResult.zeros(), which creates a result object with all values, variances and counts set to zero. ValuationResult.empty() creates an empty result object, which can be used as a starting point for adding results together. Empty results are discarded when added to other results. Finally, ValuationResult.from_random() samples random values uniformly.
ValueItem
dataclass
¶
ValueItem(
index: IndexT,
name: NameT,
value: float,
variance: Optional[float],
count: Optional[int],
)
Bases: Generic[IndexT, NameT]
The result of a value computation for one datum.
ValueItems
can be compared with the usual operators, forming a total
order. Comparisons take only the value
into account.
Todo
Maybe have a mode of comparing similar to np.isclose
, or taking the
variance
into account.
ATTRIBUTE | DESCRIPTION |
---|---|
index |
Index of the sample with this value in the original Dataset
TYPE:
|
name |
Name of the sample if it was provided. Otherwise,
TYPE:
|
value |
The value
TYPE:
|
variance |
Variance of the value if it was computed with an approximate method |
count |
Number of updates for this value |
ValuationResult
¶
ValuationResult(
*,
values: NDArray[float_],
variances: Optional[NDArray[float_]] = None,
counts: Optional[NDArray[int_]] = None,
indices: Optional[NDArray[IndexT]] = None,
data_names: Optional[Sequence[NameT] | NDArray[NameT]] = None,
algorithm: str = "",
status: Status = Status.Pending,
sort: bool = False,
**extra_values
)
Bases: Sequence
, Iterable[ValueItem[IndexT, NameT]]
, Generic[IndexT, NameT]
Objects of this class hold the results of valuation algorithms.
These include indices in the original Dataset,
any data names (e.g. group names in GroupedDataset),
the values themselves, and variance of the computation in the case of Monte
Carlo methods. ValuationResults
can be iterated over like any Sequence
:
iter(valuation_result)
returns a generator of
ValueItem in the order in which the object
is sorted.
Indexing¶
Indexing can be position-based, when accessing any of the attributes values, variances, counts and indices, as well as when iterating over the object, or using the item access operator, both getter and setter. The "position" is either the original sequence in which the data was passed to the constructor, or the sequence in which the object is sorted, see below.
Alternatively, indexing can be data-based, i.e. using the indices in the original dataset. This is the case for the methods get() and update().
Sorting¶
Results can be sorted in-place with sort(), or alternatively using
python's standard sorted()
and reversed()
Note that sorting values
affects how iterators and the object itself as Sequence
behave:
values[0]
returns a ValueItem with the highest or lowest
ranking point if this object is sorted by descending or ascending value,
respectively. If unsorted, values[0]
returns the ValueItem
at
position 0, which has data index indices[0]
in the
Dataset.
The same applies to direct indexing of the ValuationResult
: the index
is positional, according to the sorting. It does not refer to the "data
index". To sort according to data index, use sort() with
key="index"
.
In order to access ValueItem objects by their data index, use get().
Operating on results¶
Results can be added to each other with the +
operator. Means and
variances are correctly updated, using the counts
attribute.
Results can also be updated with new values using update(). Means and variances are updated accordingly using the Welford algorithm.
Empty objects behave in a special way, see empty().
PARAMETER | DESCRIPTION |
---|---|
values |
An array of values. If omitted, defaults to an empty array
or to an array of zeros if |
indices |
An optional array of indices in the original dataset. If
omitted, defaults to |
variances |
An optional array of variances in the computation of each value. |
counts |
An optional array with the number of updates for each value. Defaults to an array of ones. |
data_names |
Names for the data points. Defaults to index numbers if not set.
TYPE:
|
algorithm |
The method used.
TYPE:
|
status |
The end status of the algorithm.
TYPE:
|
sort |
Whether to sort the indices by ascending value. See above how this affects usage as an iterable or sequence.
TYPE:
|
extra_values |
Additional values that can be passed as keyword arguments. This can contain, for example, the least core value.
DEFAULT:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
If input arrays have mismatching lengths. |
Source code in src/pydvl/value/result.py
indices
property
¶
indices: NDArray[IndexT]
The indices for the values, possibly sorted.
If the object is unsorted, then these are the same as declared at
construction or np.arange(len(values))
if none were passed.
names
property
¶
names: NDArray[NameT]
The names for the values, possibly sorted.
If the object is unsorted, then these are the same as declared at
construction or np.arange(len(values))
if none were passed.
sort
¶
sort(
reverse: bool = False,
key: Literal["value", "variance", "index", "name"] = "value",
) -> None
Sorts the indices in place by key
.
Once sorted, iteration over the results, and indexing of all the properties ValuationResult.values, ValuationResult.variances, ValuationResult.counts, ValuationResult.indices and ValuationResult.names will follow the same order.
PARAMETER | DESCRIPTION |
---|---|
reverse |
Whether to sort in descending order by value.
TYPE:
|
key |
The key to sort by. Defaults to ValueItem.value.
TYPE:
|
Source code in src/pydvl/value/result.py
__getattr__
¶
Allows access to extra values as if they were properties of the instance.
Source code in src/pydvl/value/result.py
__iter__
¶
Iterate over the results returning ValueItem objects. To sort in place before iteration, use sort().
Source code in src/pydvl/value/result.py
__add__
¶
__add__(
other: ValuationResult[IndexT, NameT]
) -> ValuationResult[IndexT, NameT]
Adds two ValuationResults.
The values must have been computed with the same algorithm. An exception to this is if one argument has empty values, in which case the other argument is returned.
Warning
Abusing this will introduce numerical errors.
Means and standard errors are correctly handled. Statuses are added with
bit-wise &
, see Status.
data_names
are taken from the left summand, or if unavailable from
the right one. The algorithm
string is carried over if both terms
have the same one or concatenated.
It is possible to add ValuationResults of different lengths, and with different or overlapping indices. The result will have the union of indices, and the values.
Warning
FIXME: Arbitrary extra_values
aren't handled.
Source code in src/pydvl/value/result.py
485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 |
|
update
¶
update(idx: int, new_value: float) -> ValuationResult[IndexT, NameT]
Updates the result in place with a new value, using running mean and variance.
PARAMETER | DESCRIPTION |
---|---|
idx |
Data index of the value to update.
TYPE:
|
new_value |
New value to add to the result.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ValuationResult[IndexT, NameT]
|
A reference to the same, modified result. |
RAISES | DESCRIPTION |
---|---|
IndexError
|
If the index is not found. |
Source code in src/pydvl/value/result.py
scale
¶
Scales the values and variances of the result by a coefficient.
PARAMETER | DESCRIPTION |
---|---|
factor |
Factor to scale by.
TYPE:
|
indices |
Indices to scale. If None, all values are scaled. |
Source code in src/pydvl/value/result.py
get
¶
Retrieves a ValueItem by data index, as opposed to sort index, like the indexing operator.
RAISES | DESCRIPTION |
---|---|
IndexError
|
If the index is not found. |
Source code in src/pydvl/value/result.py
to_dataframe
¶
Returns values as a dataframe.
PARAMETER | DESCRIPTION |
---|---|
column |
Name for the column holding the data value. Defaults to the name of the algorithm used. |
use_names |
Whether to use data names instead of indices for the DataFrame's index.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
A dataframe with two columns, one for the values, with name
given as explained in |
Source code in src/pydvl/value/result.py
from_random
classmethod
¶
from_random(
size: int,
total: Optional[float] = None,
seed: Optional[Seed] = None,
**kwargs
) -> "ValuationResult"
Creates a ValuationResult object and fills it with an array of random values from a uniform distribution in [-1,1]. The values can be made to sum up to a given total number (doing so will change their range).
PARAMETER | DESCRIPTION |
---|---|
size |
Number of values to generate
TYPE:
|
total |
If set, the values are normalized to sum to this number ("efficiency" property of Shapley values). |
kwargs |
Additional options to pass to the constructor of ValuationResult. Use to override status, names, etc.
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
'ValuationResult'
|
A valuation result with its status set to |
'ValuationResult'
|
Status.Converged by default. |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If |
Changed in version 0.6.0
Added parameter total
. Check for zero size
Source code in src/pydvl/value/result.py
empty
classmethod
¶
empty(
algorithm: str = "",
indices: Optional[Sequence[IndexT] | NDArray[IndexT]] = None,
data_names: Optional[Sequence[NameT] | NDArray[NameT]] = None,
n_samples: int = 0,
) -> ValuationResult
Creates an empty ValuationResult object.
Empty results are characterised by having an empty array of values. When another result is added to an empty one, the empty one is discarded.
PARAMETER | DESCRIPTION |
---|---|
algorithm |
Name of the algorithm used to compute the values
TYPE:
|
indices |
Optional sequence or array of indices.
TYPE:
|
data_names |
Optional sequences or array of names for the data points. Defaults to index numbers if not set.
TYPE:
|
n_samples |
Number of valuation result entries.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ValuationResult
|
Object with the results. |
Source code in src/pydvl/value/result.py
zeros
classmethod
¶
zeros(
algorithm: str = "",
indices: Optional[Sequence[IndexT] | NDArray[IndexT]] = None,
data_names: Optional[Sequence[NameT] | NDArray[NameT]] = None,
n_samples: int = 0,
) -> ValuationResult
Creates an empty ValuationResult object.
Empty results are characterised by having an empty array of values. When another result is added to an empty one, the empty one is ignored.
PARAMETER | DESCRIPTION |
---|---|
algorithm |
Name of the algorithm used to compute the values
TYPE:
|
indices |
Data indices to use. A copy will be made. If not given,
the indices will be set to the range
TYPE:
|
data_names |
Data names to use. A copy will be made. If not given, the names will be set to the string representation of the indices.
TYPE:
|
n_samples |
Number of data points whose values are computed. If
not given, the length of
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ValuationResult
|
Object with the results. |