pydvl.valuation.types
¶
This module contains different types used by pydvl.valuation
If you are interested in extending valuation methods, you might need to subclass ValueUpdate, Sample or ClasswiseSample. These are the data types used for communication between the samplers on the main process and the workers.
BaggingModel
¶
Bases: Protocol[ArrayT, ArrayRetT]
Any model with the attributes n_estimators
and max_samples
is considered a
bagging model.
After fitting, the model must have the estimators_
attribute.
If it defines estimators_samples_
, it will be used by DataOOBValuation
fit
¶
Fit the model to the data
PARAMETER | DESCRIPTION |
---|---|
x
|
Independent variables
TYPE:
|
y
|
Dependent variable
TYPE:
|
predict
¶
Compute predictions for the input
PARAMETER | DESCRIPTION |
---|---|
x
|
Independent variables for which to compute predictions
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ArrayRetT
|
Predictions for the input |
BaseModel
¶
ClasswiseSample
dataclass
¶
ClasswiseSample(
idx: IndexT | None,
subset: NDArray[int_],
label: int,
ooc_subset: NDArray[int_],
)
Bases: Sample
Sample class for classwise shapley valuation
ooc_subset
instance-attribute
¶
Indices of out-of-class elements, i.e., those with a label different from this sample's label
__post_init__
¶
Ensure that the subset and ooc_subset are numpy arrays of integers.
Source code in src/pydvl/valuation/types.py
with_idx
¶
Return a copy of sample with idx changed.
Returns the original sample if idx is the same.
PARAMETER | DESCRIPTION |
---|---|
idx
|
New value for idx.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Sample
|
A copy of the sample with idx changed.
TYPE:
|
Source code in src/pydvl/valuation/types.py
with_idx_in_subset
¶
Return a copy of sample with idx added to the subset.
Returns the original sample if idx was already part of the subset.
RETURNS | DESCRIPTION |
---|---|
Sample
|
A copy of the sample with idx added to the subset.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
If idx is None. |
Source code in src/pydvl/valuation/types.py
Sample
dataclass
¶
__hash__
¶
This type must be hashable for the utility caching to work. We use hashlib.sha256 which is about 4-5x faster than hash(), and returns the same value in all processes, as opposed to hash() which is salted in each process
Source code in src/pydvl/valuation/types.py
__post_init__
¶
Ensure that the subset is a numpy array of integers.
Source code in src/pydvl/valuation/types.py
with_idx
¶
Return a copy of sample with idx changed.
Returns the original sample if idx is the same.
PARAMETER | DESCRIPTION |
---|---|
idx
|
New value for idx.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Sample
|
A copy of the sample with idx changed.
TYPE:
|
Source code in src/pydvl/valuation/types.py
with_idx_in_subset
¶
Return a copy of sample with idx added to the subset.
Returns the original sample if idx was already part of the subset.
RETURNS | DESCRIPTION |
---|---|
Sample
|
A copy of the sample with idx added to the subset.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
If idx is None. |
Source code in src/pydvl/valuation/types.py
SemivalueCoefficient
¶
Bases: Protocol
__call__
¶
A semi-value coefficient is a function of the number of elements in the set, and the size of the subset for which the coefficient is being computed. Because both coefficients and sampler weights can be very large or very small, we perform all computations in log-space to avoid numerical issues.
PARAMETER | DESCRIPTION |
---|---|
n
|
Total number of elements in the set.
TYPE:
|
k
|
Size of the subset for which the coefficient is being computed
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
The natural logarithm of the semi-value coefficient. |
Source code in src/pydvl/valuation/types.py
SkorchSupervisedModel
¶
Bases: Protocol[ArrayT]
This is the standard sklearn Protocol with the methods fit()
, predict()
and score()
, but accepting Tensors and with any additional info required.
It is compatible with skorch.net.NeuralNet.
fit
¶
fit(x: ArrayT, y: Tensor)
Fit the model to the data
PARAMETER | DESCRIPTION |
---|---|
x
|
Independent variables
TYPE:
|
y
|
Dependent variable
TYPE:
|
predict
¶
predict(x: ArrayT) -> NDArray
Compute predictions for the input
PARAMETER | DESCRIPTION |
---|---|
x
|
Independent variables for which to compute predictions
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
NDArray
|
Predictions for the input |
SupervisedModel
¶
Bases: Protocol[ArrayT, ArrayRetT]
This is the standard sklearn Protocol with the methods fit()
, predict()
and
score()
.
fit
¶
Fit the model to the data
PARAMETER | DESCRIPTION |
---|---|
x
|
Independent variables
TYPE:
|
y
|
Dependent variable
TYPE:
|
predict
¶
Compute predictions for the input
PARAMETER | DESCRIPTION |
---|---|
x
|
Independent variables for which to compute predictions
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ArrayRetT
|
Predictions for the input |
ValueUpdate
dataclass
¶
ValueUpdates are emitted by evaluation strategies.
Typically, a value update is the product of a marginal utility, the sampler weight and the valuation's coefficient. Instead of multiplying weights, coefficients and utilities directly, the strategy works in log-space for numerical stability using the samplers' log-weights and the valuation methods' log-coefficients.
The updates from all workers are converted back to linear space by LogResultUpdater.
ATTRIBUTE | DESCRIPTION |
---|---|
idx |
Index of the sample the update corresponds to.
TYPE:
|
log_update |
Logarithm of the absolute value of the update.
TYPE:
|
sign |
Sign of the update.
TYPE:
|
Source code in src/pydvl/valuation/types.py
validate_number
¶
validate_number(
name: str,
value: Any,
dtype: Type[T],
lower: T | None = None,
upper: T | None = None,
) -> T
Ensure that the value is of the given type and within the given bounds.
For int and float types, this function is lenient with numpy numeric types and will convert them to the appropriate Python type as long as no precision is lost.
PARAMETER | DESCRIPTION |
---|---|
name
|
The name of the variable to validate.
TYPE:
|
value
|
The value to validate.
TYPE:
|
dtype
|
The type to convert the value to.
TYPE:
|
lower
|
The lower bound for the value (inclusive).
TYPE:
|
upper
|
The upper bound for the value (inclusive).
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If the value is not of the given type. |
ValueError
|
If the value is not within the given bounds, if there is precision
loss, e.g. when forcing a float to an int, or if |