Skip to content

pydvl.valuation.types

This module contains different types used by pydvl.valuation

If you are interested in extending valuation methods, you might need to subclass ValueUpdate, Sample or ClasswiseSample. These are the data types used for communication between the samplers on the main process and the workers.

ClasswiseSample dataclass

ClasswiseSample(
    idx: IndexT | None,
    subset: NDArray[IndexT],
    label: int,
    ooc_subset: NDArray[IndexT],
)

Bases: Sample

Sample class for classwise shapley valuation

idx instance-attribute

idx: IndexT | None

Index of current sample

label instance-attribute

label: int

Label of the current sample

ooc_subset instance-attribute

ooc_subset: NDArray[IndexT]

Indices of out-of-class elements, i.e., those with a label different from this sample's label

subset instance-attribute

subset: NDArray[IndexT]

Indices of current sample

with_idx

with_idx(idx: IndexT) -> Self

Return a copy of sample with idx changed.

Returns the original sample if idx is the same.

PARAMETER DESCRIPTION
idx

New value for idx.

TYPE: IndexT

RETURNS DESCRIPTION
Sample

A copy of the sample with idx changed.

TYPE: Self

Source code in src/pydvl/valuation/types.py
def with_idx(self, idx: IndexT) -> Self:
    """Return a copy of sample with idx changed.

    Returns the original sample if idx is the same.

    Args:
        idx: New value for idx.

    Returns:
        Sample: A copy of the sample with idx changed.
    """
    if self.idx == idx:
        return self

    return replace(self, idx=idx)

with_idx_in_subset

with_idx_in_subset() -> Self

Return a copy of sample with idx added to the subset.

Returns the original sample if idx was already part of the subset.

RETURNS DESCRIPTION
Sample

A copy of the sample with idx added to the subset.

TYPE: Self

RAISES DESCRIPTION
ValueError

If idx is None.

Source code in src/pydvl/valuation/types.py
def with_idx_in_subset(self) -> Self:
    """Return a copy of sample with idx added to the subset.

    Returns the original sample if idx was already part of the subset.

    Returns:
        Sample: A copy of the sample with idx added to the subset.

    Raises:
        ValueError: If idx is None.
    """
    if self.idx in self.subset:
        return self

    if self.idx is None:
        raise ValueError("Cannot add idx to subset if idx is None.")

    new_subset = np.append(self.subset, self.idx)
    return replace(self, subset=new_subset)

with_subset

with_subset(subset: NDArray[IndexT]) -> Self

Return a copy of sample with subset changed.

Returns the original sample if subset is the same.

PARAMETER DESCRIPTION
subset

New value for subset.

TYPE: NDArray[IndexT]

RETURNS DESCRIPTION
Sample

A copy of the sample with subset changed.

TYPE: Self

Source code in src/pydvl/valuation/types.py
def with_subset(self, subset: NDArray[IndexT]) -> Self:
    """Return a copy of sample with subset changed.

    Returns the original sample if subset is the same.

    Args:
        subset: New value for subset.

    Returns:
        Sample: A copy of the sample with subset changed.
    """
    if np.array_equal(self.subset, subset):
        return self

    return replace(self, subset=subset)

Sample dataclass

Sample(idx: IndexT | None, subset: NDArray[IndexT])

idx instance-attribute

idx: IndexT | None

Index of current sample

subset instance-attribute

subset: NDArray[IndexT]

Indices of current sample

__hash__

__hash__()

This type must be hashable for the utility caching to work. We use hashlib.sha256 which is about 4-5x faster than hash(), and returns the same value in all processes, as opposed to hash() which is salted in each process

Source code in src/pydvl/valuation/types.py
def __hash__(self):
    """This type must be hashable for the utility caching to work.
    We use hashlib.sha256 which is about 4-5x faster than hash(), and returns the
    same value in all processes, as opposed to hash() which is salted in each
    process
    """
    sha256_hash = hashlib.sha256(self.subset.tobytes()).hexdigest()
    return int(sha256_hash, base=16)

with_idx

with_idx(idx: IndexT) -> Self

Return a copy of sample with idx changed.

Returns the original sample if idx is the same.

PARAMETER DESCRIPTION
idx

New value for idx.

TYPE: IndexT

RETURNS DESCRIPTION
Sample

A copy of the sample with idx changed.

TYPE: Self

Source code in src/pydvl/valuation/types.py
def with_idx(self, idx: IndexT) -> Self:
    """Return a copy of sample with idx changed.

    Returns the original sample if idx is the same.

    Args:
        idx: New value for idx.

    Returns:
        Sample: A copy of the sample with idx changed.
    """
    if self.idx == idx:
        return self

    return replace(self, idx=idx)

with_idx_in_subset

with_idx_in_subset() -> Self

Return a copy of sample with idx added to the subset.

Returns the original sample if idx was already part of the subset.

RETURNS DESCRIPTION
Sample

A copy of the sample with idx added to the subset.

TYPE: Self

RAISES DESCRIPTION
ValueError

If idx is None.

Source code in src/pydvl/valuation/types.py
def with_idx_in_subset(self) -> Self:
    """Return a copy of sample with idx added to the subset.

    Returns the original sample if idx was already part of the subset.

    Returns:
        Sample: A copy of the sample with idx added to the subset.

    Raises:
        ValueError: If idx is None.
    """
    if self.idx in self.subset:
        return self

    if self.idx is None:
        raise ValueError("Cannot add idx to subset if idx is None.")

    new_subset = np.append(self.subset, self.idx)
    return replace(self, subset=new_subset)

with_subset

with_subset(subset: NDArray[IndexT]) -> Self

Return a copy of sample with subset changed.

Returns the original sample if subset is the same.

PARAMETER DESCRIPTION
subset

New value for subset.

TYPE: NDArray[IndexT]

RETURNS DESCRIPTION
Sample

A copy of the sample with subset changed.

TYPE: Self

Source code in src/pydvl/valuation/types.py
def with_subset(self, subset: NDArray[IndexT]) -> Self:
    """Return a copy of sample with subset changed.

    Returns the original sample if subset is the same.

    Args:
        subset: New value for subset.

    Returns:
        Sample: A copy of the sample with subset changed.
    """
    if np.array_equal(self.subset, subset):
        return self

    return replace(self, subset=subset)

SemivalueCoefficient

Bases: Protocol

__call__

__call__(n: int, k: int) -> float

A semi-value coefficient is a function of the number of elements in the set, and the size of the subset for which the coefficient is being computed. Because both coefficients and sampler weights can be very large or very small, we perform all computations in log-space to avoid numerical issues.

PARAMETER DESCRIPTION
n

Total number of elements in the set.

TYPE: int

k

Size of the subset for which the coefficient is being computed

TYPE: int

RETURNS DESCRIPTION
float

The natural logarithm of the semi-value coefficient.

Source code in src/pydvl/valuation/types.py
def __call__(self, n: int, k: int) -> float:
    """A semi-value coefficient is a function of the number of elements in the set,
    and the size of the subset for which the coefficient is being computed.
    Because both coefficients and sampler weights can be very large or very small,
    we perform all computations in log-space to avoid numerical issues.

    Args:
        n: Total number of elements in the set.
        k: Size of the subset for which the coefficient is being computed

    Returns:
        The natural logarithm of the semi-value coefficient.
    """
    ...

ValueUpdate dataclass

ValueUpdate(idx: IndexT | None, log_update: float, sign: int)

ValueUpdates are emitted by evaluation strategies.

Typically, a value update is the product of a marginal utility, the sampler weight and the valuation's coefficient. Instead of multiplying weights, coefficients and utilities directly, the strategy works in log-space for numerical stability using the samplers' log-weights and the valuation methods' log-coefficients.

The updates from all workers are converted back to linear space by LogResultUpdater.

ATTRIBUTE DESCRIPTION
idx

Index of the sample the update corresponds to.

TYPE: IndexT | None

log_update

Logarithm of the absolute value of the update.

TYPE: float

sign

Sign of the update.

TYPE: int

Source code in src/pydvl/valuation/types.py
def __init__(self, idx: IndexT | None, log_update: float, sign: int):
    object.__setattr__(self, "idx", idx)
    object.__setattr__(self, "log_update", log_update)
    object.__setattr__(self, "sign", sign)