Skip to content

pydvl.valuation.types

This module contains different types used by pydvl.valuation

If you are interested in extending valuation methods, you might need to subclass ValueUpdate, Sample or ClasswiseSample. These are the data types used for communication between the samplers on the main process and the workers.

BaggingModel

Bases: Protocol[ArrayT, ArrayRetT]

Any model with the attributes n_estimators and max_samples is considered a bagging model. After fitting, the model must have the estimators_ attribute. If it defines estimators_samples_, it will be used by DataOOBValuation

fit

fit(x: ArrayT, y: ArrayT | None)

Fit the model to the data

PARAMETER DESCRIPTION
x

Independent variables

TYPE: ArrayT

y

Dependent variable

TYPE: ArrayT | None

Source code in src/pydvl/valuation/types.py
def fit(self, x: ArrayT, y: ArrayT | None):
    """Fit the model to the data

    Args:
        x: Independent variables
        y: Dependent variable
    """
    pass

predict

predict(x: ArrayT) -> ArrayRetT

Compute predictions for the input

PARAMETER DESCRIPTION
x

Independent variables for which to compute predictions

TYPE: ArrayT

RETURNS DESCRIPTION
ArrayRetT

Predictions for the input

Source code in src/pydvl/valuation/types.py
def predict(self, x: ArrayT) -> ArrayRetT:
    """Compute predictions for the input

    Args:
        x: Independent variables for which to compute predictions

    Returns:
        Predictions for the input
    """
    pass

BaseModel

Bases: Protocol[ArrayT]

This is the minimal model protocol with the method fit()

fit

fit(x: ArrayT, y: ArrayT | None)

Fit the model to the data

PARAMETER DESCRIPTION
x

Independent variables

TYPE: ArrayT

y

Dependent variable

TYPE: ArrayT | None

Source code in src/pydvl/valuation/types.py
def fit(self, x: ArrayT, y: ArrayT | None):
    """Fit the model to the data

    Args:
        x: Independent variables
        y: Dependent variable
    """
    pass

ClasswiseSample dataclass

ClasswiseSample(
    idx: IndexT | None,
    subset: NDArray[int_],
    label: int,
    ooc_subset: NDArray[int_],
)

Bases: Sample

Sample class for classwise shapley valuation

idx instance-attribute

idx: IndexT | None

Index of current sample

label instance-attribute

label: int

Label of the current sample

ooc_subset instance-attribute

ooc_subset: NDArray[int_]

Indices of out-of-class elements, i.e., those with a label different from this sample's label

subset instance-attribute

subset: NDArray[int_]

Indices of current sample

__post_init__

__post_init__()

Ensure that the subset and ooc_subset are numpy arrays of integers.

Source code in src/pydvl/valuation/types.py
def __post_init__(self):
    """Ensure that the subset and ooc_subset are numpy arrays of integers."""
    super().__post_init__()
    try:
        self.__dict__["ooc_subset"] = to_numpy(self.ooc_subset)
    except Exception:
        raise TypeError(
            f"ooc_subset must be a numpy array, got {type(self.ooc_subset).__name__}"
        )
    if self.ooc_subset.size == 0:
        self.__dict__["ooc_subset"] = self.ooc_subset.astype(int)
    if not np.issubdtype(self.ooc_subset.dtype, np.integer):
        raise TypeError(
            f"ooc_subset must be a numpy array of integers, got {self.ooc_subset.dtype}"
        )

with_idx

with_idx(idx: IndexT) -> Self

Return a copy of sample with idx changed.

Returns the original sample if idx is the same.

PARAMETER DESCRIPTION
idx

New value for idx.

TYPE: IndexT

RETURNS DESCRIPTION
Sample

A copy of the sample with idx changed.

TYPE: Self

Source code in src/pydvl/valuation/types.py
def with_idx(self, idx: IndexT) -> Self:
    """Return a copy of sample with idx changed.

    Returns the original sample if idx is the same.

    Args:
        idx: New value for idx.

    Returns:
        Sample: A copy of the sample with idx changed.
    """
    if self.idx == idx:
        return self

    return replace(self, idx=idx)

with_idx_in_subset

with_idx_in_subset() -> Self

Return a copy of sample with idx added to the subset.

Returns the original sample if idx was already part of the subset.

RETURNS DESCRIPTION
Sample

A copy of the sample with idx added to the subset.

TYPE: Self

RAISES DESCRIPTION
ValueError

If idx is None.

Source code in src/pydvl/valuation/types.py
def with_idx_in_subset(self) -> Self:
    """Return a copy of sample with idx added to the subset.

    Returns the original sample if idx was already part of the subset.

    Returns:
        Sample: A copy of the sample with idx added to the subset.

    Raises:
        ValueError: If idx is None.
    """
    if self.idx in self.subset:
        return self

    if self.idx is None:
        raise ValueError("Cannot add idx to subset if idx is None.")

    new_subset = array_concatenate([self.subset, np.array([self.idx])])
    return replace(self, subset=new_subset)

with_subset

with_subset(subset: Array[IndexT]) -> Self

Return a copy of sample with the subset changed.

PARAMETER DESCRIPTION
subset

New value for subset.

TYPE: Array[IndexT]

RETURNS DESCRIPTION
Self

A copy of the sample with subset changed.

Source code in src/pydvl/valuation/types.py
def with_subset(self, subset: Array[IndexT]) -> Self:
    """Return a copy of sample with the subset changed.

    Args:
        subset: New value for subset.

    Returns:
        A copy of the sample with subset changed.
    """
    return replace(self, subset=to_numpy(subset))

Sample dataclass

Sample(idx: IndexT | None, subset: NDArray[int_])

idx instance-attribute

idx: IndexT | None

Index of current sample

subset instance-attribute

subset: NDArray[int_]

Indices of current sample

__hash__

__hash__()

This type must be hashable for the utility caching to work. We use hashlib.sha256 which is about 4-5x faster than hash(), and returns the same value in all processes, as opposed to hash() which is salted in each process

Source code in src/pydvl/valuation/types.py
def __hash__(self):
    """This type must be hashable for the utility caching to work.
    We use hashlib.sha256 which is about 4-5x faster than hash(), and returns the
    same value in all processes, as opposed to hash() which is salted in each
    process
    """
    sha256_hash = hashlib.sha256(self.subset.tobytes()).hexdigest()
    return int(sha256_hash, base=16)

__post_init__

__post_init__()

Ensure that the subset is a numpy array of integers.

Source code in src/pydvl/valuation/types.py
def __post_init__(self):
    """Ensure that the subset is a numpy array of integers."""
    try:
        self.__dict__["subset"] = to_numpy(self.subset)
    except Exception:
        raise TypeError(
            f"subset must be a numpy array, got {type(self.subset).__name__}"
        )
    if self.subset.size == 0:
        self.__dict__["subset"] = self.subset.astype(int)
    if not np.issubdtype(self.subset.dtype, np.integer):
        raise TypeError(
            f"subset must be a numpy array of integers, got {self.subset.dtype}"
        )

with_idx

with_idx(idx: IndexT) -> Self

Return a copy of sample with idx changed.

Returns the original sample if idx is the same.

PARAMETER DESCRIPTION
idx

New value for idx.

TYPE: IndexT

RETURNS DESCRIPTION
Sample

A copy of the sample with idx changed.

TYPE: Self

Source code in src/pydvl/valuation/types.py
def with_idx(self, idx: IndexT) -> Self:
    """Return a copy of sample with idx changed.

    Returns the original sample if idx is the same.

    Args:
        idx: New value for idx.

    Returns:
        Sample: A copy of the sample with idx changed.
    """
    if self.idx == idx:
        return self

    return replace(self, idx=idx)

with_idx_in_subset

with_idx_in_subset() -> Self

Return a copy of sample with idx added to the subset.

Returns the original sample if idx was already part of the subset.

RETURNS DESCRIPTION
Sample

A copy of the sample with idx added to the subset.

TYPE: Self

RAISES DESCRIPTION
ValueError

If idx is None.

Source code in src/pydvl/valuation/types.py
def with_idx_in_subset(self) -> Self:
    """Return a copy of sample with idx added to the subset.

    Returns the original sample if idx was already part of the subset.

    Returns:
        Sample: A copy of the sample with idx added to the subset.

    Raises:
        ValueError: If idx is None.
    """
    if self.idx in self.subset:
        return self

    if self.idx is None:
        raise ValueError("Cannot add idx to subset if idx is None.")

    new_subset = array_concatenate([self.subset, np.array([self.idx])])
    return replace(self, subset=new_subset)

with_subset

with_subset(subset: Array[IndexT]) -> Self

Return a copy of sample with the subset changed.

PARAMETER DESCRIPTION
subset

New value for subset.

TYPE: Array[IndexT]

RETURNS DESCRIPTION
Self

A copy of the sample with subset changed.

Source code in src/pydvl/valuation/types.py
def with_subset(self, subset: Array[IndexT]) -> Self:
    """Return a copy of sample with the subset changed.

    Args:
        subset: New value for subset.

    Returns:
        A copy of the sample with subset changed.
    """
    return replace(self, subset=to_numpy(subset))

SemivalueCoefficient

Bases: Protocol

__call__

__call__(n: int, k: int) -> float

A semi-value coefficient is a function of the number of elements in the set, and the size of the subset for which the coefficient is being computed. Because both coefficients and sampler weights can be very large or very small, we perform all computations in log-space to avoid numerical issues.

PARAMETER DESCRIPTION
n

Total number of elements in the set.

TYPE: int

k

Size of the subset for which the coefficient is being computed

TYPE: int

RETURNS DESCRIPTION
float

The natural logarithm of the semi-value coefficient.

Source code in src/pydvl/valuation/types.py
def __call__(self, n: int, k: int) -> float:
    """A semi-value coefficient is a function of the number of elements in the set,
    and the size of the subset for which the coefficient is being computed.
    Because both coefficients and sampler weights can be very large or very small,
    we perform all computations in log-space to avoid numerical issues.

    Args:
        n: Total number of elements in the set.
        k: Size of the subset for which the coefficient is being computed

    Returns:
        The natural logarithm of the semi-value coefficient.
    """
    ...

SkorchSupervisedModel

Bases: Protocol[ArrayT]

This is the standard sklearn Protocol with the methods fit(), predict() and score(), but accepting Tensors and with any additional info required. It is compatible with skorch.net.NeuralNet.

fit

fit(x: ArrayT, y: Tensor)

Fit the model to the data

PARAMETER DESCRIPTION
x

Independent variables

TYPE: ArrayT

y

Dependent variable

TYPE: Tensor

Source code in src/pydvl/valuation/types.py
def fit(self, x: ArrayT, y: Tensor):
    """Fit the model to the data

    Args:
        x: Independent variables
        y: Dependent variable
    """
    ...

predict

predict(x: ArrayT) -> NDArray

Compute predictions for the input

PARAMETER DESCRIPTION
x

Independent variables for which to compute predictions

TYPE: ArrayT

RETURNS DESCRIPTION
NDArray

Predictions for the input

Source code in src/pydvl/valuation/types.py
def predict(self, x: ArrayT) -> NDArray:
    """Compute predictions for the input

    Args:
        x: Independent variables for which to compute predictions

    Returns:
        Predictions for the input
    """
    ...

score

score(x: ArrayT, y: NDArray) -> float

Compute the score of the model given test data

PARAMETER DESCRIPTION
x

Independent variables

TYPE: ArrayT

y

Dependent variable

TYPE: NDArray

RETURNS DESCRIPTION
float

The score of the model on (x, y)

Source code in src/pydvl/valuation/types.py
def score(self, x: ArrayT, y: NDArray) -> float:
    """Compute the score of the model given test data

    Args:
        x: Independent variables
        y: Dependent variable

    Returns:
        The score of the model on `(x, y)`
    """
    ...

SupervisedModel

Bases: Protocol[ArrayT, ArrayRetT]

This is the standard sklearn Protocol with the methods fit(), predict() and score().

fit

fit(x: ArrayT, y: ArrayT)

Fit the model to the data

PARAMETER DESCRIPTION
x

Independent variables

TYPE: ArrayT

y

Dependent variable

TYPE: ArrayT

Source code in src/pydvl/valuation/types.py
def fit(self, x: ArrayT, y: ArrayT):
    """Fit the model to the data

    Args:
        x: Independent variables
        y: Dependent variable
    """
    pass

predict

predict(x: ArrayT) -> ArrayRetT

Compute predictions for the input

PARAMETER DESCRIPTION
x

Independent variables for which to compute predictions

TYPE: ArrayT

RETURNS DESCRIPTION
ArrayRetT

Predictions for the input

Source code in src/pydvl/valuation/types.py
def predict(self, x: ArrayT) -> ArrayRetT:
    """Compute predictions for the input

    Args:
        x: Independent variables for which to compute predictions

    Returns:
        Predictions for the input
    """
    pass

score

score(x: ArrayT, y: ArrayT) -> float

Compute the score of the model given test data

PARAMETER DESCRIPTION
x

Independent variables

TYPE: ArrayT

y

Dependent variable

TYPE: ArrayT

RETURNS DESCRIPTION
float

The score of the model on (x, y)

Source code in src/pydvl/valuation/types.py
def score(self, x: ArrayT, y: ArrayT) -> float:
    """Compute the score of the model given test data

    Args:
        x: Independent variables
        y: Dependent variable

    Returns:
        The score of the model on `(x, y)`
    """
    pass

ValueUpdate dataclass

ValueUpdate(idx: IndexT | None, log_update: float, sign: int)

ValueUpdates are emitted by evaluation strategies.

Typically, a value update is the product of a marginal utility, the sampler weight and the valuation's coefficient. Instead of multiplying weights, coefficients and utilities directly, the strategy works in log-space for numerical stability using the samplers' log-weights and the valuation methods' log-coefficients.

The updates from all workers are converted back to linear space by LogResultUpdater.

ATTRIBUTE DESCRIPTION
idx

Index of the sample the update corresponds to.

TYPE: IndexT | None

log_update

Logarithm of the absolute value of the update.

TYPE: float

sign

Sign of the update.

TYPE: int

Source code in src/pydvl/valuation/types.py
def __init__(self, idx: IndexT | None, log_update: float, sign: int):
    object.__setattr__(self, "idx", idx)
    object.__setattr__(self, "log_update", log_update)
    object.__setattr__(self, "sign", sign)

validate_number

validate_number(
    name: str,
    value: Any,
    dtype: Type[T],
    lower: T | None = None,
    upper: T | None = None,
) -> T

Ensure that the value is of the given type and within the given bounds.

For int and float types, this function is lenient with numpy numeric types and will convert them to the appropriate Python type as long as no precision is lost.

PARAMETER DESCRIPTION
name

The name of the variable to validate.

TYPE: str

value

The value to validate.

TYPE: Any

dtype

The type to convert the value to.

TYPE: Type[T]

lower

The lower bound for the value (inclusive).

TYPE: T | None DEFAULT: None

upper

The upper bound for the value (inclusive).

TYPE: T | None DEFAULT: None

RAISES DESCRIPTION
TypeError

If the value is not of the given type.

ValueError

If the value is not within the given bounds, if there is precision loss, e.g. when forcing a float to an int, or if dtype is not a valid scalar type.

Source code in src/pydvl/valuation/types.py
def validate_number(
    name: str,
    value: Any,
    dtype: Type[T],
    lower: T | None = None,
    upper: T | None = None,
) -> T:
    """Ensure that the value is of the given type and within the given bounds.

    For int and float types, this function is lenient with numpy numeric types and
    will convert them to the appropriate Python type as long as no precision is lost.

    Args:
        name: The name of the variable to validate.
        value: The value to validate.
        dtype: The type to convert the value to.
        lower: The lower bound for the value (inclusive).
        upper: The upper bound for the value (inclusive).

    Raises:
        TypeError: If the value is not of the given type.
        ValueError: If the value is not within the given bounds, if there is precision
            loss, e.g. when forcing a float to an int, or if `dtype` is not a valid
            scalar type.
    """
    if not isinstance(value, (int, float, np.number)):
        raise TypeError(f"'{name}' is not a number, it is {type(value).__name__}")
    if not issubclass(dtype, (np.number, int, float)):
        raise ValueError(f"type '{dtype}' is not a valid scalar type")

    converted = dtype(value)
    if not np.isnan(converted) and not np.isclose(converted, value, rtol=0, atol=0):
        raise ValueError(
            f"'{name}' cannot be converted to {dtype.__name__} without precision loss"
        )
    value = cast(T, converted)

    if lower is not None and value < lower:  # type: ignore
        raise ValueError(f"'{name}' is {value}, but it should be >= {lower}")
    if upper is not None and value > upper:  # type: ignore
        raise ValueError(f"'{name}' is {value}, but it should be <= {upper}")
    return value