pydvl.valuation.types ¶

This module contains different types used by pydvl.valuation

If you are interested in extending valuation methods, you might need to subclass ValueUpdate, Sample or ClasswiseSample. These are the data types used for communication between the samplers on the main process and the workers.

BaggingModel ¶

Bases: Protocol[ArrayT, ArrayRetT]

Any model with the attributes n_estimators and max_samples is considered a bagging model. After fitting, the model must have the estimators_ attribute. If it defines estimators_samples_, it will be used by DataOOBValuation

fit ¶

fit(x: ArrayT, y: ArrayT | None)

Fit the model to the data

PARAMETER	DESCRIPTION
`x`	Independent variables TYPE: `ArrayT`
`y`	Dependent variable TYPE: `ArrayT \| None`

Source code in src/pydvl/valuation/types.py

def fit(self, x: ArrayT, y: ArrayT | None):
    """Fit the model to the data

    Args:
        x: Independent variables
        y: Dependent variable
    """
    pass

predict ¶

predict(x: ArrayT) -> ArrayRetT

Compute predictions for the input

PARAMETER	DESCRIPTION
`x`	Independent variables for which to compute predictions TYPE: `ArrayT`

RETURNS	DESCRIPTION
`ArrayRetT`	Predictions for the input

Source code in src/pydvl/valuation/types.py

def predict(self, x: ArrayT) -> ArrayRetT:
    """Compute predictions for the input

    Args:
        x: Independent variables for which to compute predictions

    Returns:
        Predictions for the input
    """
    pass

BaseModel ¶

Bases: Protocol[ArrayT]

This is the minimal model protocol with the method fit()

fit ¶

fit(x: ArrayT, y: ArrayT | None)

Fit the model to the data

PARAMETER	DESCRIPTION
`x`	Independent variables TYPE: `ArrayT`
`y`	Dependent variable TYPE: `ArrayT \| None`

Source code in src/pydvl/valuation/types.py

def fit(self, x: ArrayT, y: ArrayT | None):
    """Fit the model to the data

    Args:
        x: Independent variables
        y: Dependent variable
    """
    pass

ClasswiseSample `dataclass` ¶

ClasswiseSample(
    idx: IndexT | None,
    subset: NDArray[int_],
    label: int,
    ooc_subset: NDArray[int_],
)

Bases: Sample

Sample class for classwise shapley valuation

idx `instance-attribute` ¶

idx: IndexT | None

Index of current sample

label `instance-attribute` ¶

label: int

Label of the current sample

ooc_subset `instance-attribute` ¶

ooc_subset: NDArray[int_]

Indices of out-of-class elements, i.e., those with a label different from this sample's label

subset `instance-attribute` ¶

subset: NDArray[int_]

Indices of current sample

__post_init__ ¶

__post_init__()

Ensure that the subset and ooc_subset are numpy arrays of integers.

Source code in src/pydvl/valuation/types.py

def __post_init__(self):
    """Ensure that the subset and ooc_subset are numpy arrays of integers."""
    super().__post_init__()
    try:
        self.__dict__["ooc_subset"] = to_numpy(self.ooc_subset)
    except Exception:
        raise TypeError(
            f"ooc_subset must be a numpy array, got {type(self.ooc_subset).__name__}"
        )
    if self.ooc_subset.size == 0:
        self.__dict__["ooc_subset"] = self.ooc_subset.astype(int)
    if not np.issubdtype(self.ooc_subset.dtype, np.integer):
        raise TypeError(
            f"ooc_subset must be a numpy array of integers, got {self.ooc_subset.dtype}"
        )

with_idx ¶

with_idx(idx: IndexT) -> Self

Return a copy of sample with idx changed.

Returns the original sample if idx is the same.

PARAMETER	DESCRIPTION
`idx`	New value for idx. TYPE: `IndexT`

RETURNS	DESCRIPTION
`Sample`	A copy of the sample with idx changed. TYPE: `Self`

Source code in src/pydvl/valuation/types.py

def with_idx(self, idx: IndexT) -> Self:
    """Return a copy of sample with idx changed.

    Returns the original sample if idx is the same.

    Args:
        idx: New value for idx.

    Returns:
        Sample: A copy of the sample with idx changed.
    """
    if self.idx == idx:
        return self

    return replace(self, idx=idx)

with_idx_in_subset ¶

with_idx_in_subset() -> Self

Return a copy of sample with idx added to the subset.

Returns the original sample if idx was already part of the subset.

RETURNS	DESCRIPTION
`Sample`	A copy of the sample with idx added to the subset. TYPE: `Self`

RAISES	DESCRIPTION
`ValueError`	If idx is None.

Source code in src/pydvl/valuation/types.py

def with_idx_in_subset(self) -> Self:
    """Return a copy of sample with idx added to the subset.

    Returns the original sample if idx was already part of the subset.

    Returns:
        Sample: A copy of the sample with idx added to the subset.

    Raises:
        ValueError: If idx is None.
    """
    if self.idx in self.subset:
        return self

    if self.idx is None:
        raise ValueError("Cannot add idx to subset if idx is None.")

    new_subset = array_concatenate([self.subset, np.array([self.idx])])
    return replace(self, subset=new_subset)

with_subset ¶

with_subset(subset: Array[IndexT]) -> Self

Return a copy of sample with the subset changed.

PARAMETER	DESCRIPTION
`subset`	New value for subset. TYPE: `Array[IndexT]`

RETURNS	DESCRIPTION
`Self`	A copy of the sample with subset changed.

Source code in src/pydvl/valuation/types.py

def with_subset(self, subset: Array[IndexT]) -> Self:
    """Return a copy of sample with the subset changed.

    Args:
        subset: New value for subset.

    Returns:
        A copy of the sample with subset changed.
    """
    return replace(self, subset=to_numpy(subset))

Sample `dataclass` ¶

Sample(idx: IndexT | None, subset: NDArray[int_])

idx `instance-attribute` ¶

idx: IndexT | None

Index of current sample

subset `instance-attribute` ¶

subset: NDArray[int_]

Indices of current sample

hash ¶

__hash__()

This type must be hashable for the utility caching to work. We use hashlib.sha256 which is about 4-5x faster than hash(), and returns the same value in all processes, as opposed to hash() which is salted in each process

Source code in src/pydvl/valuation/types.py

def __hash__(self):
    """This type must be hashable for the utility caching to work.
    We use hashlib.sha256 which is about 4-5x faster than hash(), and returns the
    same value in all processes, as opposed to hash() which is salted in each
    process
    """
    sha256_hash = hashlib.sha256(self.subset.tobytes()).hexdigest()
    return int(sha256_hash, base=16)

__post_init__ ¶

__post_init__()

Ensure that the subset is a numpy array of integers.

Source code in src/pydvl/valuation/types.py

def __post_init__(self):
    """Ensure that the subset is a numpy array of integers."""
    try:
        self.__dict__["subset"] = to_numpy(self.subset)
    except Exception:
        raise TypeError(
            f"subset must be a numpy array, got {type(self.subset).__name__}"
        )
    if self.subset.size == 0:
        self.__dict__["subset"] = self.subset.astype(int)
    if not np.issubdtype(self.subset.dtype, np.integer):
        raise TypeError(
            f"subset must be a numpy array of integers, got {self.subset.dtype}"
        )

with_idx ¶

with_idx(idx: IndexT) -> Self

Return a copy of sample with idx changed.

Returns the original sample if idx is the same.

PARAMETER	DESCRIPTION
`idx`	New value for idx. TYPE: `IndexT`

RETURNS	DESCRIPTION
`Sample`	A copy of the sample with idx changed. TYPE: `Self`

Source code in src/pydvl/valuation/types.py

def with_idx(self, idx: IndexT) -> Self:
    """Return a copy of sample with idx changed.

    Returns the original sample if idx is the same.

    Args:
        idx: New value for idx.

    Returns:
        Sample: A copy of the sample with idx changed.
    """
    if self.idx == idx:
        return self

    return replace(self, idx=idx)

with_idx_in_subset ¶

with_idx_in_subset() -> Self

Return a copy of sample with idx added to the subset.

Returns the original sample if idx was already part of the subset.

RETURNS	DESCRIPTION
`Sample`	A copy of the sample with idx added to the subset. TYPE: `Self`

RAISES	DESCRIPTION
`ValueError`	If idx is None.

Source code in src/pydvl/valuation/types.py

def with_idx_in_subset(self) -> Self:
    """Return a copy of sample with idx added to the subset.

    Returns the original sample if idx was already part of the subset.

    Returns:
        Sample: A copy of the sample with idx added to the subset.

    Raises:
        ValueError: If idx is None.
    """
    if self.idx in self.subset:
        return self

    if self.idx is None:
        raise ValueError("Cannot add idx to subset if idx is None.")

    new_subset = array_concatenate([self.subset, np.array([self.idx])])
    return replace(self, subset=new_subset)

with_subset ¶

with_subset(subset: Array[IndexT]) -> Self

Return a copy of sample with the subset changed.

PARAMETER	DESCRIPTION
`subset`	New value for subset. TYPE: `Array[IndexT]`

RETURNS	DESCRIPTION
`Self`	A copy of the sample with subset changed.

Source code in src/pydvl/valuation/types.py

def with_subset(self, subset: Array[IndexT]) -> Self:
    """Return a copy of sample with the subset changed.

    Args:
        subset: New value for subset.

    Returns:
        A copy of the sample with subset changed.
    """
    return replace(self, subset=to_numpy(subset))

SemivalueCoefficient ¶

Bases: Protocol

call ¶

__call__(n: int, k: int) -> float

A semi-value coefficient is a function of the number of elements in the set, and the size of the subset for which the coefficient is being computed. Because both coefficients and sampler weights can be very large or very small, we perform all computations in log-space to avoid numerical issues.

PARAMETER	DESCRIPTION
`n`	Total number of elements in the set. TYPE: `int`
`k`	Size of the subset for which the coefficient is being computed TYPE: `int`

RETURNS	DESCRIPTION
`float`	The natural logarithm of the semi-value coefficient.

Source code in src/pydvl/valuation/types.py

def __call__(self, n: int, k: int) -> float:
    """A semi-value coefficient is a function of the number of elements in the set,
    and the size of the subset for which the coefficient is being computed.
    Because both coefficients and sampler weights can be very large or very small,
    we perform all computations in log-space to avoid numerical issues.

    Args:
        n: Total number of elements in the set.
        k: Size of the subset for which the coefficient is being computed

    Returns:
        The natural logarithm of the semi-value coefficient.
    """
    ...

SkorchSupervisedModel ¶

Bases: Protocol[ArrayT]

This is the standard sklearn Protocol with the methods fit(), predict() and score(), but accepting Tensors and with any additional info required. It is compatible with skorch.net.NeuralNet.

fit ¶

fit(x: ArrayT, y: Tensor)

Fit the model to the data

PARAMETER	DESCRIPTION
`x`	Independent variables TYPE: `ArrayT`
`y`	Dependent variable TYPE: `Tensor`

Source code in src/pydvl/valuation/types.py

def fit(self, x: ArrayT, y: Tensor):
    """Fit the model to the data

    Args:
        x: Independent variables
        y: Dependent variable
    """
    ...

predict ¶

predict(x: ArrayT) -> NDArray

Compute predictions for the input

PARAMETER	DESCRIPTION
`x`	Independent variables for which to compute predictions TYPE: `ArrayT`

RETURNS	DESCRIPTION
`NDArray`	Predictions for the input

Source code in src/pydvl/valuation/types.py

def predict(self, x: ArrayT) -> NDArray:
    """Compute predictions for the input

    Args:
        x: Independent variables for which to compute predictions

    Returns:
        Predictions for the input
    """
    ...

score ¶

score(x: ArrayT, y: NDArray) -> float

Compute the score of the model given test data

PARAMETER	DESCRIPTION
`x`	Independent variables TYPE: `ArrayT`
`y`	Dependent variable TYPE: `NDArray`

RETURNS	DESCRIPTION
`float`	The score of the model on `(x, y)`

Source code in src/pydvl/valuation/types.py

def score(self, x: ArrayT, y: NDArray) -> float:
    """Compute the score of the model given test data

    Args:
        x: Independent variables
        y: Dependent variable

    Returns:
        The score of the model on `(x, y)`
    """
    ...

SupervisedModel ¶

Bases: Protocol[ArrayT, ArrayRetT]

This is the standard sklearn Protocol with the methods fit(), predict() and score().

fit ¶

fit(x: ArrayT, y: ArrayT)

Fit the model to the data

PARAMETER	DESCRIPTION
`x`	Independent variables TYPE: `ArrayT`
`y`	Dependent variable TYPE: `ArrayT`

Source code in src/pydvl/valuation/types.py

def fit(self, x: ArrayT, y: ArrayT):
    """Fit the model to the data

    Args:
        x: Independent variables
        y: Dependent variable
    """
    pass

predict ¶

predict(x: ArrayT) -> ArrayRetT

Compute predictions for the input

PARAMETER	DESCRIPTION
`x`	Independent variables for which to compute predictions TYPE: `ArrayT`

RETURNS	DESCRIPTION
`ArrayRetT`	Predictions for the input

Source code in src/pydvl/valuation/types.py

def predict(self, x: ArrayT) -> ArrayRetT:
    """Compute predictions for the input

    Args:
        x: Independent variables for which to compute predictions

    Returns:
        Predictions for the input
    """
    pass

score ¶

score(x: ArrayT, y: ArrayT) -> float

Compute the score of the model given test data

PARAMETER	DESCRIPTION
`x`	Independent variables TYPE: `ArrayT`
`y`	Dependent variable TYPE: `ArrayT`

RETURNS	DESCRIPTION
`float`	The score of the model on `(x, y)`

Source code in src/pydvl/valuation/types.py

def score(self, x: ArrayT, y: ArrayT) -> float:
    """Compute the score of the model given test data

    Args:
        x: Independent variables
        y: Dependent variable

    Returns:
        The score of the model on `(x, y)`
    """
    pass

ValueUpdate `dataclass` ¶

ValueUpdate(idx: IndexT | None, log_update: float, sign: int)

ValueUpdates are emitted by evaluation strategies.

Typically, a value update is the product of a marginal utility, the sampler weight and the valuation's coefficient. Instead of multiplying weights, coefficients and utilities directly, the strategy works in log-space for numerical stability using the samplers' log-weights and the valuation methods' log-coefficients.

The updates from all workers are converted back to linear space by LogResultUpdater.

ATTRIBUTE	DESCRIPTION
`idx`	Index of the sample the update corresponds to. TYPE: `IndexT \| None`
`log_update`	Logarithm of the absolute value of the update. TYPE: `float`
`sign`	Sign of the update. TYPE: `int`

Source code in src/pydvl/valuation/types.py

def __init__(self, idx: IndexT | None, log_update: float, sign: int):
    object.__setattr__(self, "idx", idx)
    object.__setattr__(self, "log_update", log_update)
    object.__setattr__(self, "sign", sign)

validate_number ¶

validate_number(
    name: str,
    value: Any,
    dtype: Type[T],
    lower: T | None = None,
    upper: T | None = None,
) -> T

Ensure that the value is of the given type and within the given bounds.

For int and float types, this function is lenient with numpy numeric types and will convert them to the appropriate Python type as long as no precision is lost.

PARAMETER	DESCRIPTION
`name`	The name of the variable to validate. TYPE: `str`
`value`	The value to validate. TYPE: `Any`
`dtype`	The type to convert the value to. TYPE: `Type[T]`
`lower`	The lower bound for the value (inclusive). TYPE: `T \| None` DEFAULT: `None`
`upper`	The upper bound for the value (inclusive). TYPE: `T \| None` DEFAULT: `None`

RAISES	DESCRIPTION
`TypeError`	If the value is not of the given type.
`ValueError`	If the value is not within the given bounds, if there is precision loss, e.g. when forcing a float to an int, or if `dtype` is not a valid scalar type.

Source code in src/pydvl/valuation/types.py

def validate_number(
    name: str,
    value: Any,
    dtype: Type[T],
    lower: T | None = None,
    upper: T | None = None,
) -> T:
    """Ensure that the value is of the given type and within the given bounds.

    For int and float types, this function is lenient with numpy numeric types and
    will convert them to the appropriate Python type as long as no precision is lost.

    Args:
        name: The name of the variable to validate.
        value: The value to validate.
        dtype: The type to convert the value to.
        lower: The lower bound for the value (inclusive).
        upper: The upper bound for the value (inclusive).

    Raises:
        TypeError: If the value is not of the given type.
        ValueError: If the value is not within the given bounds, if there is precision
            loss, e.g. when forcing a float to an int, or if `dtype` is not a valid
            scalar type.
    """
    if not isinstance(value, (int, float, np.number)):
        raise TypeError(f"'{name}' is not a number, it is {type(value).__name__}")
    if not issubclass(dtype, (np.number, int, float)):
        raise ValueError(f"type '{dtype}' is not a valid scalar type")

    converted = dtype(value)
    if not np.isnan(converted) and not np.isclose(converted, value, rtol=0, atol=0):
        raise ValueError(
            f"'{name}' cannot be converted to {dtype.__name__} without precision loss"
        )
    value = cast(T, converted)

    if lower is not None and value < lower:  # type: ignore
        raise ValueError(f"'{name}' is {value}, but it should be >= {lower}")
    if upper is not None and value > upper:  # type: ignore
        raise ValueError(f"'{name}' is {value}, but it should be <= {upper}")
    return value

pydvl.valuation.types ¶

BaggingModel ¶

fit ¶

predict ¶

BaseModel ¶

fit ¶

ClasswiseSample dataclass ¶

idx instance-attribute ¶

label instance-attribute ¶

ooc_subset instance-attribute ¶

subset instance-attribute ¶

__post_init__ ¶

with_idx ¶

with_idx_in_subset ¶

with_subset ¶

Sample dataclass ¶

idx instance-attribute ¶

subset instance-attribute ¶

__hash__ ¶

__post_init__ ¶

with_idx ¶

with_idx_in_subset ¶

with_subset ¶

SemivalueCoefficient ¶

__call__ ¶

SkorchSupervisedModel ¶

fit ¶

predict ¶

score ¶

SupervisedModel ¶

fit ¶

predict ¶

score ¶

ValueUpdate dataclass ¶

validate_number ¶

ClasswiseSample `dataclass` ¶

idx `instance-attribute` ¶

label `instance-attribute` ¶

ooc_subset `instance-attribute` ¶

subset `instance-attribute` ¶

Sample `dataclass` ¶

idx `instance-attribute` ¶

subset `instance-attribute` ¶

hash ¶

call ¶

ValueUpdate `dataclass` ¶