pydvl.utils.numeric
¶
This module contains routines for numerical computations used across the library.
powerset
¶
powerset(s: NDArray[T]) -> Iterator[Collection[T]]
Returns an iterator for the power set of the argument.
Subsets are generated in sequence by growing size. See random_powerset() for random sampling.
Example
PARAMETER | DESCRIPTION |
---|---|
s |
The set to use
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Iterator[Collection[T]]
|
An iterator over all subsets of the set of indices |
Source code in src/pydvl/utils/numeric.py
num_samples_permutation_hoeffding
¶
Lower bound on the number of samples required for MonteCarlo Shapley to obtain an (ε,δ)-approximation.
That is: with probability 1-δ, the estimated value for one data point will be ε-close to the true quantity, if at least this many permutations are sampled.
PARAMETER | DESCRIPTION |
---|---|
eps |
ε > 0
TYPE:
|
delta |
0 < δ <= 1
TYPE:
|
u_range |
Range of the Utility function
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
int
|
Number of permutations required to guarantee ε-correct Shapley values with probability 1-δ |
Source code in src/pydvl/utils/numeric.py
random_subset
¶
Returns one subset at random from s
.
PARAMETER | DESCRIPTION |
---|---|
s |
set to sample from
TYPE:
|
q |
Sampling probability for elements. The default 0.5 yields a uniform distribution over the power set of s.
TYPE:
|
seed |
Either an instance of a numpy random number generator or a seed for it.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
NDArray[T]
|
The subset |
Source code in src/pydvl/utils/numeric.py
random_powerset
¶
random_powerset(
s: NDArray[T],
n_samples: Optional[int] = None,
q: float = 0.5,
seed: Optional[Seed] = None,
) -> Generator[NDArray[T], None, None]
Samples subsets from the power set of the argument, without pre-generating all subsets and in no order.
See powerset if you wish to deterministically generate all subsets.
To generate subsets, len(s)
Bernoulli draws with probability q
are
drawn. The default value of q = 0.5
provides a uniform distribution over
the power set of s
. Other choices can be used e.g. to implement
owen_sampling_shapley.
PARAMETER | DESCRIPTION |
---|---|
s |
set to sample from
TYPE:
|
n_samples |
if set, stop the generator after this many steps.
Defaults to |
q |
Sampling probability for elements. The default 0.5 yields a uniform distribution over the power set of s.
TYPE:
|
seed |
Either an instance of a numpy random number generator or a seed for it.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Generator[NDArray[T], None, None]
|
Samples from the power set of |
RAISES | DESCRIPTION |
---|---|
ValueError
|
if the element sampling probability is not in [0,1] |
Source code in src/pydvl/utils/numeric.py
random_powerset_label_min
¶
random_powerset_label_min(
s: NDArray[T],
labels: NDArray[int_],
min_elements_per_label: int = 1,
seed: Optional[Seed] = None,
) -> Generator[NDArray[T], None, None]
Draws random subsets from s
, while ensuring that at least
min_elements_per_label
elements per label are included in the draw. It can be used
for classification problems to ensure that a set contains information for all labels
(or not if min_elements_per_label=0
).
PARAMETER | DESCRIPTION |
---|---|
s |
Set to sample from
TYPE:
|
labels |
Labels for the samples |
min_elements_per_label |
Minimum number of elements for each label.
TYPE:
|
seed |
Either an instance of a numpy random number generator or a seed for it.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Generator[NDArray[T], None, None]
|
Generated draw from the powerset of s with |
Generator[NDArray[T], None, None]
|
label. |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If |
Source code in src/pydvl/utils/numeric.py
random_subset_of_size
¶
Samples a random subset of given size uniformly from the powerset
of s
.
PARAMETER | DESCRIPTION |
---|---|
s |
Set to sample from
TYPE:
|
size |
Size of the subset to generate
TYPE:
|
seed |
Either an instance of a numpy random number generator or a seed for it.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
NDArray[T]
|
The subset |
Raises ValueError: If size > len(s)
Source code in src/pydvl/utils/numeric.py
random_matrix_with_condition_number
¶
random_matrix_with_condition_number(
n: int, condition_number: float, seed: Optional[Seed] = None
) -> NDArray
Constructs a square matrix with a given condition number.
Taken from: https://gist.github.com/bstellato/23322fe5d87bb71da922fbc41d658079#file-random_mat_condition_number-py
Also see: https://math.stackexchange.com/questions/1351616/condition-number-of-ata.
PARAMETER | DESCRIPTION |
---|---|
n |
size of the matrix
TYPE:
|
condition_number |
duh
TYPE:
|
seed |
Either an instance of a numpy random number generator or a seed for it.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
NDArray
|
An (n,n) matrix with the requested condition number. |
Source code in src/pydvl/utils/numeric.py
running_moments
¶
running_moments(
previous_avg: float | NDArray[float64],
previous_variance: float | NDArray[float64],
count: int,
new_value: float | NDArray[float64],
) -> Tuple[float | NDArray[float64], float | NDArray[float64]]
Uses Welford's algorithm to calculate the running average and variance of a set of numbers.
See Welford's algorithm in wikipedia
Warning
This is not really using Welford's correction for numerical stability for the variance. (FIXME)
Todo
This could be generalised to arbitrary moments. See this paper
PARAMETER | DESCRIPTION |
---|---|
previous_avg |
average value at previous step |
previous_variance |
variance at previous step |
count |
number of points seen so far
TYPE:
|
new_value |
new value in the series of numbers |
RETURNS | DESCRIPTION |
---|---|
Tuple[float | NDArray[float64], float | NDArray[float64]]
|
new_average, new_variance, calculated with the new count |
Source code in src/pydvl/utils/numeric.py
top_k_value_accuracy
¶
Computes the top-k accuracy for the estimated values by comparing indices of the highest k values.
PARAMETER | DESCRIPTION |
---|---|
y_true |
Exact/true value |
y_pred |
Predicted/estimated value |
k |
Number of the highest values taken into account
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
Accuracy |