pydvl.influence.influence_calculator ¶

This module provides functionality for calculating influences for large amount of data. The computation is based on a chunk computation model in the form of an instance of InfluenceFunctionModel, which is mapped over collection of chunks.

DaskInfluenceCalculator ¶

DaskInfluenceCalculator(
    influence_function_model: InfluenceFunctionModel,
    converter: NumpyConverter,
    client: Union[Client, Type[DisableClientSingleThreadCheck]],
)

This class is designed to compute influences over dask.array.Array collections, leveraging the capabilities of Dask for distributed computing and parallel processing. It requires an influence computation model of type InfluenceFunctionModel, which defines how influences are computed on a chunk of data. Essentially, this class functions by mapping the influence function model across the various chunks of a dask.array.Array collection.

PARAMETER	DESCRIPTION
`influence_function_model`	instance of type InfluenceFunctionModel, that specifies the computation logic for influence on data chunks. It's a pivotal part of the calculator, determining how influence is computed and applied across the data array. TYPE: `InfluenceFunctionModel`
`converter`	A utility for converting numpy arrays to TensorType objects, facilitating the interaction between numpy arrays and the influence function model. TYPE: `NumpyConverter`
`client`	This parameter accepts either of two types: A distributed Client object The special type DisableClientSingleThreadCheck, which serves as a flag to bypass certain checks. During initialization, the system verifies if all workers are operating in single-threaded mode when the provided influence_function_model is designated as not thread-safe (indicated by the `is_thread_safe` property). If this condition is not met, the initialization will raise a specific error, signaling a potential thread-safety conflict. To intentionally skip this safety check (e.g., for debugging purposes using the single machine synchronous scheduler), you can supply the DisableClientSingleThreadCheck type. TYPE: `Union[Client, Type[DisableClientSingleThreadCheck]]`

Warning

Make sure to set threads_per_worker=1, when using the distributed scheduler for computing, if your implementation of InfluenceFunctionModel is not thread-safe.

client = Client(threads_per_worker=1)

For details on dask schedulers see the official documentation.

Example

import torch
from torch.utils.data import Dataset, DataLoader
from pydvl.influence import DaskInfluenceCalculator
from pydvl.influence.torch import CgInfluence
from pydvl.influence.torch.util import (
    torch_dataset_to_dask_array,
    TorchNumpyConverter,
)
from distributed import Client

# Possible some out of memory large Dataset
train_data_set: Dataset = LargeDataSet(...)
test_data_set: Dataset = LargeDataSet(...)

train_dataloader = DataLoader(train_data_set)
infl_model = CgInfluence(model, loss, hessian_regularization=0.01)
infl_model = if_model.fit(train_dataloader)

# wrap your input data into dask arrays
chunk_size = 10
da_x, da_y = torch_dataset_to_dask_array(train_data_set, chunk_size=chunk_size)
da_x_test, da_y_test = torch_dataset_to_dask_array(test_data_set,
                                                   chunk_size=chunk_size)

# use only one thread for scheduling, due to non-thread safety of some torch
# operations
client = Client(n_workers=4, threads_per_worker=1)

infl_calc = DaskInfluenceCalculator(infl_model,
                                    TorchNumpyConverter(device=torch.device("cpu")),
                                    client)
da_influences = infl_calc.influences(da_x_test, da_y_test, da_x, da_y)
# da_influences is a dask.array.Array

# trigger computation and write chunks to disk in parallel
da_influences.to_zarr("path/or/url")

Source code in src/pydvl/influence/influence_calculator.py

def __init__(
    self,
    influence_function_model: InfluenceFunctionModel,
    converter: NumpyConverter,
    client: Union[Client, Type[DisableClientSingleThreadCheck]],
):
    self._n_parameters = influence_function_model.n_parameters
    self.influence_function_model = influence_function_model
    self.numpy_converter = converter

    if isinstance(client, type(DisableClientSingleThreadCheck)):
        logger.warning(DisableClientSingleThreadCheck.warning_msg())
        self.influence_function_model = delayed(influence_function_model)
    elif isinstance(client, Client):
        self._validate_client(client, influence_function_model)
        self.influence_function_model = client.scatter(
            influence_function_model, broadcast=True
        )
    else:
        raise ValueError(
            "The 'client' parameter "
            "must either be a distributed.Client object or the"
            "type 'DisableClientSingleThreadCheck'."
        )

n_parameters `property` ¶

n_parameters

Number of trainable parameters of the underlying model used in the batch computation

influence_factors ¶

influence_factors(x: Array, y: Array) -> Array

Computes the expression

\[ H^{-1}\nabla_{\theta} \ell(y, f_{\theta}(x)) \]

where the gradients are computed for the chunks of \((x, y)\).

PARAMETER	DESCRIPTION
`x`	model input to use in the gradient computations TYPE: `Array`
`y`	label tensor to compute gradients TYPE: `Array`

RETURNS	DESCRIPTION
`Array`	dask.array.Array representing the element-wise inverse Hessian matrix vector products for the provided batch.

Source code in src/pydvl/influence/influence_calculator.py

def influence_factors(self, x: da.Array, y: da.Array) -> da.Array:
    r"""
    Computes the expression

    \[ H^{-1}\nabla_{\theta} \ell(y, f_{\theta}(x)) \]

    where the gradients are computed for the chunks of $(x, y)$.

    Args:
        x: model input to use in the gradient computations
        y: label tensor to compute gradients

    Returns:
        [dask.array.Array][dask.array.Array] representing the element-wise inverse
            Hessian matrix vector products for the provided batch.

    """

    self._validate_aligned_chunking(x, y)
    self._validate_dimensions_not_chunked(x)
    self._validate_dimensions_not_chunked(y)

    def func(x_numpy: NDArray, y_numpy: NDArray, model: InfluenceFunctionModel):
        factors = model.influence_factors(
            self.numpy_converter.from_numpy(x_numpy),
            self.numpy_converter.from_numpy(y_numpy),
        )
        return self.numpy_converter.to_numpy(factors)

    chunks = []
    for x_chunk, y_chunk, chunk_size in zip(
        x.to_delayed(), y.to_delayed(), x.chunks[0]
    ):
        chunk_shape = (chunk_size, self.n_parameters)
        chunk_array = da.from_delayed(
            delayed(func)(
                x_chunk.squeeze()[()],
                y_chunk.squeeze()[()],
                self.influence_function_model,
            ),
            dtype=x.dtype,
            shape=chunk_shape,
        )
        chunks.append(chunk_array)

    return da.concatenate(chunks)

influences ¶

influences(
    x_test: Array,
    y_test: Array,
    x: Optional[Array] = None,
    y: Optional[Array] = None,
    mode: InfluenceMode = Up,
) -> Array

Compute approximation of

\[ \langle H^{-1}\nabla_{\theta} \ell(y_{\text{test}}, f_{\theta}(x_{\text{test}})), \nabla_{\theta} \ell(y, f_{\theta}(x)) \rangle \]

for the case of up-weighting influence, resp.

\[ \langle H^{-1}\nabla_{\theta} \ell(y_{\text{test}}, f_{\theta}(x_{\text{test}})), \nabla_{x} \nabla_{\theta} \ell(y, f_{\theta}(x)) \rangle \]

for the perturbation type influence case. The computation is done block-wise for the chunks of the provided dask arrays.

PARAMETER	DESCRIPTION
`x_test`	model input to use in the gradient computations of \(H^{-1}\nabla_{\theta} \ell(y_{\text{test}}, f_{\theta}(x_{\text{test}}))\) TYPE: `Array`
`y_test`	label tensor to compute gradients TYPE: `Array`
`x`	optional model input to use in the gradient computations \(\nabla_{\theta}\ell(y, f_{\theta}(x))\), resp. \(\nabla_{x}\nabla_{\theta}\ell(y, f_{\theta}(x))\), if None, use \(x=x_{\text{test}}\) TYPE: `Optional[Array]` DEFAULT: `None`
`y`	optional label tensor to compute gradients TYPE: `Optional[Array]` DEFAULT: `None`
`mode`	enum value of InfluenceMode TYPE: `InfluenceMode` DEFAULT: `Up`

RETURNS	DESCRIPTION
`Array`	dask.array.Array representing the element-wise scalar products for the provided batch.

Source code in src/pydvl/influence/influence_calculator.py

def influences(
    self,
    x_test: da.Array,
    y_test: da.Array,
    x: Optional[da.Array] = None,
    y: Optional[da.Array] = None,
    mode: InfluenceMode = InfluenceMode.Up,
) -> da.Array:
    r"""
    Compute approximation of

    \[ \langle H^{-1}\nabla_{\theta} \ell(y_{\text{test}},
    f_{\theta}(x_{\text{test}})), \nabla_{\theta} \ell(y, f_{\theta}(x)) \rangle \]

    for the case of up-weighting influence, resp.

    \[ \langle H^{-1}\nabla_{\theta} \ell(y_{\text{test}},
    f_{\theta}(x_{\text{test}})),
    \nabla_{x} \nabla_{\theta} \ell(y, f_{\theta}(x)) \rangle \]

    for the perturbation type influence case. The computation is done block-wise
    for the chunks of the provided dask arrays.

    Args:
        x_test: model input to use in the gradient computations of
            $H^{-1}\nabla_{\theta} \ell(y_{\text{test}},
                f_{\theta}(x_{\text{test}}))$
        y_test: label tensor to compute gradients
        x: optional model input to use in the gradient computations
            $\nabla_{\theta}\ell(y, f_{\theta}(x))$,
            resp. $\nabla_{x}\nabla_{\theta}\ell(y, f_{\theta}(x))$,
            if None, use $x=x_{\text{test}}$
        y: optional label tensor to compute gradients
        mode: enum value of [InfluenceMode]
            [pydvl.influence.base_influence_function_model.InfluenceMode]

    Returns:
        [dask.array.Array][dask.array.Array] representing the element-wise scalar
            products for the provided batch.

    """

    self._validate_aligned_chunking(x_test, y_test)
    self._validate_dimensions_not_chunked(x_test)
    self._validate_dimensions_not_chunked(y_test)

    if (x is None) != (y is None):
        if x is None:
            raise ValueError(
                "Providing labels y without providing model input x "
                "is not supported"
            )
        if y is None:
            raise ValueError(
                "Providing model input x without labels y is not supported"
            )
    elif x is not None:
        self._validate_aligned_chunking(x, y)
        self._validate_dimensions_not_chunked(x)
        self._validate_dimensions_not_chunked(y)
    else:
        x, y = x_test, y_test
    assert x is not None and y is not None  # For the type checker's benefit

    def func(
        x_test_numpy: NDArray,
        y_test_numpy: NDArray,
        x_numpy: NDArray,
        y_numpy: NDArray,
        model: InfluenceFunctionModel,
    ):
        values = model.influences(
            self.numpy_converter.from_numpy(x_test_numpy),
            self.numpy_converter.from_numpy(y_test_numpy),
            self.numpy_converter.from_numpy(x_numpy),
            self.numpy_converter.from_numpy(y_numpy),
            mode,
        )
        return self.numpy_converter.to_numpy(values)

    un_chunked_x_shapes = [s[0] for s in x_test.chunks[1:]]
    x_test_chunk_sizes = x_test.chunks[0]
    x_chunk_sizes = x.chunks[0]
    blocks = []
    block_shape: Tuple[int, ...]

    for x_test_chunk, y_test_chunk, test_chunk_size in zip(
        x_test.to_delayed(), y_test.to_delayed(), x_test_chunk_sizes
    ):
        row = []
        for x_chunk, y_chunk, chunk_size in zip(
            x.to_delayed(),
            y.to_delayed(),
            x_chunk_sizes,  # type:ignore
        ):
            if mode == InfluenceMode.Up:
                block_shape = (test_chunk_size, chunk_size)
            elif mode == InfluenceMode.Perturbation:
                block_shape = (test_chunk_size, chunk_size, *un_chunked_x_shapes)
            else:
                raise UnsupportedInfluenceModeException(mode)

            block_array = da.from_delayed(
                delayed(func)(
                    x_test_chunk.squeeze()[()],
                    y_test_chunk.squeeze()[()],
                    x_chunk.squeeze()[()],
                    y_chunk.squeeze()[()],
                    self.influence_function_model,
                ),
                shape=block_shape,
                dtype=x_test.dtype,
            )

            if mode == InfluenceMode.Perturbation:
                n_dims = block_array.ndim
                new_order = tuple(range(2, n_dims)) + (0, 1)
                block_array = block_array.transpose(new_order)

            row.append(block_array)
        blocks.append(row)

    values_array = da.block(blocks)

    if mode == InfluenceMode.Perturbation:
        n_dims = values_array.ndim
        new_order = (n_dims - 2, n_dims - 1) + tuple(range(n_dims - 2))
        values_array = values_array.transpose(new_order)

    return values_array

influences_from_factors ¶

influences_from_factors(
    z_test_factors: Array, x: Array, y: Array, mode: InfluenceMode = Up
) -> Array

Computation of

\[ \langle z_{\text{test_factors}}, \nabla_{\theta} \ell(y, f_{\theta}(x)) \rangle \]

for the case of up-weighting influence, resp.

\[ \langle z_{\text{test_factors}}, \nabla_{x} \nabla_{\theta} \ell(y, f_{\theta}(x)) \rangle \]

for the perturbation type influence case. The gradient is meant to be per sample of the batch \((x, y)\).

PARAMETER	DESCRIPTION
`z_test_factors`	pre-computed array, approximating \(H^{-1}\nabla_{\theta} \ell(y_{\text{test}}, f_{\theta}(x_{\text{test}}))\) TYPE: `Array`
`x`	optional model input to use in the gradient computations \(\nabla_{\theta}\ell(y, f_{\theta}(x))\), resp. \(\nabla_{x}\nabla_{\theta}\ell(y, f_{\theta}(x))\), if None, use \(x=x_{\text{test}}\) TYPE: `Array`
`y`	optional label tensor to compute gradients TYPE: `Array`
`mode`	enum value of InfluenceMode TYPE: `InfluenceMode` DEFAULT: `Up`

RETURNS	DESCRIPTION
`Array`	dask.array.Array representing the element-wise scalar product of the provided batch

Source code in src/pydvl/influence/influence_calculator.py

def influences_from_factors(
    self,
    z_test_factors: da.Array,
    x: da.Array,
    y: da.Array,
    mode: InfluenceMode = InfluenceMode.Up,
) -> da.Array:
    r"""
    Computation of

    \[ \langle z_{\text{test_factors}},
        \nabla_{\theta} \ell(y, f_{\theta}(x)) \rangle \]

    for the case of up-weighting influence, resp.

    \[ \langle z_{\text{test_factors}},
        \nabla_{x} \nabla_{\theta} \ell(y, f_{\theta}(x)) \rangle \]

    for the perturbation type influence case. The gradient is meant
    to be per sample of the batch $(x, y)$.

    Args:
        z_test_factors: pre-computed array, approximating
            $H^{-1}\nabla_{\theta} \ell(y_{\text{test}},
                f_{\theta}(x_{\text{test}}))$
        x: optional model input to use in the gradient computations
            $\nabla_{\theta}\ell(y, f_{\theta}(x))$,
            resp. $\nabla_{x}\nabla_{\theta}\ell(y, f_{\theta}(x))$,
            if None, use $x=x_{\text{test}}$
        y: optional label tensor to compute gradients
        mode: enum value of [InfluenceMode]
            [pydvl.influence.base_influence_function_model.InfluenceMode]

    Returns:
      [dask.array.Array][dask.array.Array] representing the element-wise scalar
        product of the provided batch

    """
    self._validate_aligned_chunking(x, y)
    self._validate_dimensions_not_chunked(x)
    self._validate_dimensions_not_chunked(y)
    self._validate_dimensions_not_chunked(z_test_factors)

    def func(
        z_test_numpy: NDArray,
        x_numpy: NDArray,
        y_numpy: NDArray,
        model: InfluenceFunctionModel,
    ):
        ups = model.influences_from_factors(
            self.numpy_converter.from_numpy(z_test_numpy),
            self.numpy_converter.from_numpy(x_numpy),
            self.numpy_converter.from_numpy(y_numpy),
            mode=mode,
        )
        return self.numpy_converter.to_numpy(ups)

    un_chunked_x_shape = [s[0] for s in x.chunks[1:]]
    x_chunk_sizes = x.chunks[0]
    z_test_chunk_sizes = z_test_factors.chunks[0]
    blocks = []
    block_shape: Tuple[int, ...]

    for z_test_chunk, z_test_chunk_size in zip(
        z_test_factors.to_delayed(), z_test_chunk_sizes
    ):
        row = []
        for x_chunk, y_chunk, chunk_size in zip(
            x.to_delayed(), y.to_delayed(), x_chunk_sizes
        ):
            if mode == InfluenceMode.Perturbation:
                block_shape = (z_test_chunk_size, chunk_size, *un_chunked_x_shape)
            elif mode == InfluenceMode.Up:
                block_shape = (z_test_chunk_size, chunk_size)
            else:
                raise UnsupportedInfluenceModeException(mode)

            block_array = da.from_delayed(
                delayed(func)(
                    z_test_chunk.squeeze()[()],
                    x_chunk.squeeze()[()],
                    y_chunk.squeeze()[()],
                    self.influence_function_model,
                ),
                shape=block_shape,
                dtype=z_test_factors.dtype,
            )

            if mode == InfluenceMode.Perturbation:
                n_dims = block_array.ndim
                new_order = tuple(range(2, n_dims)) + (0, 1)
                block_array = block_array.transpose(*new_order)

            row.append(block_array)
        blocks.append(row)

    values_array = da.block(blocks)

    if mode == InfluenceMode.Perturbation:
        n_dims = values_array.ndim
        new_order = (n_dims - 2, n_dims - 1) + tuple(range(n_dims - 2))
        values_array = values_array.transpose(*new_order)

    return values_array

DisableClientSingleThreadCheck ¶

This type can be provided to the initialization of a DaskInfluenceCalculator instead of a distributed client object. It is useful in those scenarios, where the user want to disable the checking for thread-safety in the initialization phase, e.g. when using the single machine synchronous scheduler for debugging purposes.

Example

from pydvl.influence import DisableClientThreadingCheck

da_calc = DaskInfluenceCalculator(if_model,
                                  TorchNumpyConverter(),
                                  DisableClientThreadingCheck)
da_influences = da_calc.influences(da_x_test, da_y_test, da_x, da_y)
da_influences.compute(scheduler='synchronous')

SequentialInfluenceCalculator ¶

SequentialInfluenceCalculator(influence_function_model: InfluenceFunctionModel)

This class serves as a simple wrapper for processing batches of data in a sequential manner. It is particularly useful in scenarios where parallel or distributed processing is not required or not feasible. The core functionality of this class is to apply a specified influence computation model, of type InfluenceFunctionModel, to batches of data one at a time.

PARAMETER	DESCRIPTION
`influence_function_model`	An instance of type [InfluenceFunctionModel] [pydvl.influence.base_influence_function_model.InfluenceFunctionModel], that specifies the computation logic for influence on data chunks. TYPE: `InfluenceFunctionModel`

Example

from pydvl.influence import SequentialInfluenceCalculator
from pydvl.influence.torch.util import (
NestedTorchCatAggregator,
TorchNumpyConverter,
)
from pydvl.influence.torch import CgInfluence

batch_size = 10
train_dataloader = DataLoader(..., batch_size=batch_size)
test_dataloader = DataLoader(..., batch_size=batch_size)

infl_model = CgInfluence(model, loss, hessian_regularization=0.01)
infl_model = infl_model.fit(train_dataloader)

infl_calc = SequentialInfluenceCalculator(if_model)

# this does not trigger the computation
lazy_influences = infl_calc.influences(test_dataloader, train_dataloader)

# trigger computation and pull the result into main memory, result is the full
# tensor for all combinations of the two loaders
influences = lazy_influences.compute(aggregator=NestedTorchCatAggregator())
# or
# trigger computation and write results chunk-wise to disk using zarr in a
# sequential manner
lazy_influences.to_zarr("local_path/or/url", TorchNumpyConverter())

Source code in src/pydvl/influence/influence_calculator.py

def __init__(
    self,
    influence_function_model: InfluenceFunctionModel,
):
    self.influence_function_model = influence_function_model

influence_factors ¶

influence_factors(
    data_iterable: Iterable[Tuple[TensorType, TensorType]],
) -> LazyChunkSequence

Compute the expression

\[ H^{-1}\nabla_{\theta} \ell(y, f_{\theta}(x)) \]

where the gradient are computed for the chunks \((x, y)\) of the data_iterable in a sequential manner.

PARAMETER	DESCRIPTION
`data_iterable`	An iterable that returns tuples of tensors. Each tuple consists of a pair of tensors (x, y), representing input data and corresponding targets. TYPE: `Iterable[Tuple[TensorType, TensorType]]`

RETURNS	DESCRIPTION
`LazyChunkSequence`	A lazy data structure representing the chunks of the resulting tensor

Source code in src/pydvl/influence/influence_calculator.py

def influence_factors(
    self,
    data_iterable: Iterable[Tuple[TensorType, TensorType]],
) -> LazyChunkSequence:
    r"""
    Compute the expression

    \[ H^{-1}\nabla_{\theta} \ell(y, f_{\theta}(x)) \]

    where the gradient are computed for the chunks $(x, y)$ of the data_iterable in
    a sequential manner.

    Args:
        data_iterable: An iterable that returns tuples of tensors.
            Each tuple consists of a pair of tensors (x, y), representing input data
            and corresponding targets.

    Returns:
        A lazy data structure representing the chunks of the resulting tensor
    """
    try:
        len_iterable = len(cast(Sized, data_iterable))
    except Exception as e:
        logger.debug(f"Failed to retrieve len of data iterable: {e}")
        len_iterable = None

    tensors_gen_factory = partial(self._influence_factors_gen, data_iterable)
    return LazyChunkSequence(tensors_gen_factory, len_generator=len_iterable)

influences ¶

influences(
    test_data_iterable: Iterable[Tuple[TensorType, TensorType]],
    train_data_iterable: Iterable[Tuple[TensorType, TensorType]],
    mode: InfluenceMode = Up,
) -> NestedLazyChunkSequence

Compute approximation of

\[ \langle H^{-1}\nabla_{\theta} \ell(y_{\text{test}}, f_{\theta}(x_{\text{test}})), \nabla_{\theta} \ell(y, f_{\theta}(x)) \rangle \]

for the case of up-weighting influence, resp.

\[ \langle H^{-1}\nabla_{\theta} \ell(y_{\text{test}}, f_{\theta}(x_{\text{test}})), \nabla_{x} \nabla_{\theta} \ell(y, f_{\theta}(x)) \rangle \]

for the perturbation type influence case. The computation is done block-wise for the chunks of the provided data iterables and aggregated into a single tensor in memory.

PARAMETER	DESCRIPTION
`test_data_iterable`	An iterable that returns tuples of tensors. Each tuple consists of a pair of tensors (x, y), representing input data and corresponding targets. TYPE: `Iterable[Tuple[TensorType, TensorType]]`
`train_data_iterable`	An iterable that returns tuples of tensors. Each tuple consists of a pair of tensors (x, y), representing input data and corresponding targets. TYPE: `Iterable[Tuple[TensorType, TensorType]]`
`mode`	enum value of InfluenceMode TYPE: `InfluenceMode` DEFAULT: `Up`

RETURNS	DESCRIPTION
`NestedLazyChunkSequence`	A lazy data structure representing the chunks of the resulting tensor

Source code in src/pydvl/influence/influence_calculator.py

def influences(
    self,
    test_data_iterable: Iterable[Tuple[TensorType, TensorType]],
    train_data_iterable: Iterable[Tuple[TensorType, TensorType]],
    mode: InfluenceMode = InfluenceMode.Up,
) -> NestedLazyChunkSequence:
    r"""
    Compute approximation of

    \[ \langle H^{-1}\nabla_{\theta} \ell(y_{\text{test}},
    f_{\theta}(x_{\text{test}})), \nabla_{\theta} \ell(y, f_{\theta}(x)) \rangle \]

    for the case of up-weighting influence, resp.

    \[ \langle H^{-1}\nabla_{\theta} \ell(y_{\text{test}},
    f_{\theta}(x_{\text{test}})),
    \nabla_{x} \nabla_{\theta} \ell(y, f_{\theta}(x)) \rangle \]

    for the perturbation type influence case. The computation is done block-wise for
    the chunks of the provided
    data iterables and aggregated into a single tensor in memory.

    Args:
        test_data_iterable: An iterable that returns tuples of tensors.
            Each tuple consists of a pair of tensors (x, y), representing input data
            and corresponding targets.
        train_data_iterable: An iterable that returns tuples of tensors.
            Each tuple consists of a pair of tensors (x, y), representing input data
            and corresponding targets.
        mode: enum value of [InfluenceMode]
            [pydvl.influence.base_influence_function_model.InfluenceMode]

    Returns:
        A lazy data structure representing the chunks of the resulting tensor

    """
    nested_tensor_gen_factory = partial(
        self._influences_gen,
        test_data_iterable,
        train_data_iterable,
        mode,
    )

    try:
        len_iterable = len(cast(Sized, test_data_iterable))
    except Exception as e:
        logger.debug(f"Failed to retrieve len of test data iterable: {e}")
        len_iterable = None

    return NestedLazyChunkSequence(
        nested_tensor_gen_factory, len_outer_generator=len_iterable
    )

influences_from_factors ¶

influences_from_factors(
    z_test_factors: Iterable[TensorType],
    train_data_iterable: Iterable[Tuple[TensorType, TensorType]],
    mode: InfluenceMode = Up,
) -> NestedLazyChunkSequence

Computation of

\[ \langle z_{\text{test_factors}}, \nabla_{\theta} \ell(y, f_{\theta}(x)) \rangle \]

for the case of up-weighting influence, resp.

\[ \langle z_{\text{test_factors}}, \nabla_{x} \nabla_{\theta} \ell(y, f_{\theta}(x)) \rangle \]

for the perturbation type influence case. The gradient is meant to be per sample of the batch \((x, y)\).

PARAMETER	DESCRIPTION
`z_test_factors`	Pre-computed iterable of tensors, approximating \(H^{-1}\nabla_{\theta} \ell(y_{\text{test}}, f_{\theta}(x_{\text{test}}))\) TYPE: `Iterable[TensorType]`
`train_data_iterable`	An iterable that returns tuples of tensors. Each tuple consists of a pair of tensors (x, y), representing input data and corresponding targets. TYPE: `Iterable[Tuple[TensorType, TensorType]]`
`mode`	enum value of InfluenceMode TYPE: `InfluenceMode` DEFAULT: `Up`

RETURNS	DESCRIPTION
`NestedLazyChunkSequence`	A lazy data structure representing the chunks of the resulting tensor

Source code in src/pydvl/influence/influence_calculator.py

def influences_from_factors(
    self,
    z_test_factors: Iterable[TensorType],
    train_data_iterable: Iterable[Tuple[TensorType, TensorType]],
    mode: InfluenceMode = InfluenceMode.Up,
) -> NestedLazyChunkSequence:
    r"""
    Computation of

    \[ \langle z_{\text{test_factors}}, \nabla_{\theta} \ell(y, f_{\theta}(x))
        \rangle \]

    for the case of up-weighting influence, resp.

    \[ \langle z_{\text{test_factors}}, \nabla_{x} \nabla_{\theta}
        \ell(y, f_{\theta}(x)) \rangle \]

    for the perturbation type influence case. The gradient is meant to be per sample
    of the batch $(x, y)$.

    Args:
        z_test_factors: Pre-computed iterable of tensors, approximating
            $H^{-1}\nabla_{\theta} \ell(y_{\text{test}},
                f_{\theta}(x_{\text{test}}))$
        train_data_iterable: An iterable that returns tuples of tensors.
            Each tuple consists of a pair of tensors (x, y), representing input data
            and corresponding targets.
        mode: enum value of [InfluenceMode]
            [pydvl.influence.base_influence_function_model.InfluenceMode]

    Returns:
      A lazy data structure representing the chunks of the resulting tensor

    """
    nested_tensor_gen = partial(
        self._influences_from_factors_gen,
        z_test_factors,
        train_data_iterable,
        mode,
    )

    try:
        len_iterable = len(cast(Sized, z_test_factors))
    except Exception as e:
        logger.debug(f"Failed to retrieve len of factors iterable: {e}")
        len_iterable = None

    return NestedLazyChunkSequence(
        nested_tensor_gen, len_outer_generator=len_iterable
    )

pydvl.influence.influence_calculator ¶

DaskInfluenceCalculator ¶

n_parameters property ¶

influence_factors ¶

influences ¶

influences_from_factors ¶

DisableClientSingleThreadCheck ¶

SequentialInfluenceCalculator ¶

influence_factors ¶

influences ¶

influences_from_factors ¶

n_parameters `property` ¶