Skip to content

Util

TorchTensorContainerType = Union[torch.Tensor, Collection[torch.Tensor], Mapping[str, torch.Tensor]] module-attribute

Type for a PyTorch tensor or a container thereof.

TorchNumpyConverter(device=None)

Bases: NumpyConverter[Tensor]

Helper class for converting between torch.Tensor and numpy.ndarray

PARAMETER DESCRIPTION
device

Optional device parameter to move the resulting torch tensors to the specified device

TYPE: Optional[device] DEFAULT: None

Source code in src/pydvl/influence/torch/util.py
def __init__(self, device: Optional[torch.device] = None):
    self.device = device

to_numpy(x)

Convert a detached torch.Tensor to numpy.ndarray

Source code in src/pydvl/influence/torch/util.py
def to_numpy(self, x: torch.Tensor) -> NDArray:
    """
    Convert a detached [torch.Tensor][torch.Tensor] to
    [numpy.ndarray][numpy.ndarray]
    """
    arr: NDArray = x.cpu().numpy()
    return arr

from_numpy(x)

Convert a numpy.ndarray to torch.Tensor and optionally move it to a provided device

Source code in src/pydvl/influence/torch/util.py
def from_numpy(self, x: NDArray) -> torch.Tensor:
    """
    Convert a [numpy.ndarray][numpy.ndarray] to [torch.Tensor][torch.Tensor] and
    optionally move it to a provided device
    """
    t = torch.from_numpy(x)
    if self.device is not None:
        t = t.to(self.device)
    return t

TorchCatAggregator

Bases: SequenceAggregator[Tensor]

An aggregator that concatenates tensors using PyTorch's torch.cat function. Concatenation is done along the first dimension of the chunks.

__call__(tensor_generator)

Aggregates tensors from a single-level generator into a single tensor by concatenating them. This method is a straightforward way to combine a sequence of tensors into one larger tensor.

PARAMETER DESCRIPTION
tensor_generator

A generator that yields torch.Tensor objects.

TYPE: Generator[Tensor, None, None]

RETURNS DESCRIPTION

A single tensor formed by concatenating all tensors from the generator. The concatenation is performed along the default dimension (0).

Source code in src/pydvl/influence/torch/util.py
def __call__(self, tensor_generator: Generator[torch.Tensor, None, None]):
    """
    Aggregates tensors from a single-level generator into a single tensor by
    concatenating them. This method is a straightforward way to combine a sequence
    of tensors into one larger tensor.

    Args:
        tensor_generator: A generator that yields `torch.Tensor` objects.

    Returns:
        A single tensor formed by concatenating all tensors from the generator.
            The concatenation is performed along the default dimension (0).
    """
    return torch.cat(list(tensor_generator))

NestedTorchCatAggregator

Bases: NestedSequenceAggregator[Tensor]

An aggregator that concatenates tensors using PyTorch's torch.cat function. Concatenation is done along the first two dimensions of the chunks.

__call__(nested_generators_of_tensors)

Aggregates tensors from a nested generator structure into a single tensor by concatenating. Each inner generator is first concatenated along dimension 1 into a tensor, and then these tensors are concatenated along dimension 0 together to form the final tensor.

PARAMETER DESCRIPTION
nested_generators_of_tensors

A generator of generators, where each inner generator yields torch.Tensor objects.

TYPE: Generator[Generator[Tensor, None, None], None, None]

RETURNS DESCRIPTION

A single tensor formed by concatenating all tensors from the nested

generators.

Source code in src/pydvl/influence/torch/util.py
def __call__(
    self,
    nested_generators_of_tensors: Generator[
        Generator[torch.Tensor, None, None], None, None
    ],
):
    """
    Aggregates tensors from a nested generator structure into a single tensor by
    concatenating. Each inner generator is first concatenated along dimension 1 into
    a tensor, and then these tensors are concatenated along dimension 0 together to
    form the final tensor.

    Args:
        nested_generators_of_tensors: A generator of generators, where each inner
            generator yields `torch.Tensor` objects.

    Returns:
        A single tensor formed by concatenating all tensors from the nested
        generators.

    """
    return torch.cat(
        list(
            map(
                lambda tensor_gen: torch.cat(list(tensor_gen), dim=1),
                nested_generators_of_tensors,
            )
        )
    )

to_model_device(x, model)

Returns the tensor x moved to the device of the model, if device of model is set

PARAMETER DESCRIPTION
x

The tensor to be moved to the device of the model.

TYPE: Tensor

model

The model whose device will be used to move the tensor.

TYPE: Module

RETURNS DESCRIPTION
Tensor

The tensor x moved to the device of the model, if device of model is set.

Source code in src/pydvl/influence/torch/util.py
def to_model_device(x: torch.Tensor, model: torch.nn.Module) -> torch.Tensor:
    """
    Returns the tensor `x` moved to the device of the `model`, if device of model is set

    Args:
        x: The tensor to be moved to the device of the model.
        model: The model whose device will be used to move the tensor.

    Returns:
        The tensor `x` moved to the device of the `model`, if device of model is set.
    """
    device = next(model.parameters()).device
    return x.to(device)

reshape_vector_to_tensors(input_vector, target_shapes)

Reshape a 1D tensor into multiple tensors with specified shapes.

This function takes a 1D tensor (input_vector) and reshapes it into a series of tensors with shapes given by 'target_shapes'. The reshaped tensors are returned as a tuple in the same order as their corresponding shapes.

Note

The total number of elements in 'input_vector' must be equal to the sum of the products of the shapes in 'target_shapes'.

PARAMETER DESCRIPTION
input_vector

The 1D tensor to be reshaped. Must be 1D.

TYPE: Tensor

target_shapes

An iterable of tuples. Each tuple defines the shape of a tensor to be reshaped from the 'input_vector'.

TYPE: Iterable[Tuple[int, ...]]

RETURNS DESCRIPTION
Tuple[Tensor, ...]

A tuple of reshaped tensors.

RAISES DESCRIPTION
ValueError

If 'input_vector' is not a 1D tensor or if the total number of elements in 'input_vector' does not match the sum of the products of the shapes in 'target_shapes'.

Source code in src/pydvl/influence/torch/util.py
def reshape_vector_to_tensors(
    input_vector: torch.Tensor, target_shapes: Iterable[Tuple[int, ...]]
) -> Tuple[torch.Tensor, ...]:
    """
    Reshape a 1D tensor into multiple tensors with specified shapes.

    This function takes a 1D tensor (input_vector) and reshapes it into a series of
    tensors with shapes given by 'target_shapes'.
    The reshaped tensors are returned as a tuple in the same order
    as their corresponding shapes.

    Note:
        The total number of elements in 'input_vector' must be equal to the
            sum of the products of the shapes in 'target_shapes'.

    Args:
        input_vector: The 1D tensor to be reshaped. Must be 1D.
        target_shapes: An iterable of tuples. Each tuple defines the shape of a tensor
            to be reshaped from the 'input_vector'.

    Returns:
        A tuple of reshaped tensors.

    Raises:
        ValueError: If 'input_vector' is not a 1D tensor or if the total
            number of elements in 'input_vector' does not
            match the sum of the products of the shapes in 'target_shapes'.
    """

    if input_vector.dim() != 1:
        raise ValueError("Input vector must be a 1D tensor")

    total_elements = sum(math.prod(shape) for shape in target_shapes)

    if total_elements != input_vector.shape[0]:
        raise ValueError(
            f"The total elements in shapes {total_elements} "
            f"does not match the vector length {input_vector.shape[0]}"
        )

    tensors = []
    start = 0
    for shape in target_shapes:
        size = math.prod(shape)  # compute the total size of the tensor with this shape
        tensors.append(
            input_vector[start : start + size].view(shape)
        )  # slice the vector and reshape it
        start += size
    return tuple(tensors)

align_structure(source, target)

This function transforms target to have the same structure as source, i.e., it should be a dictionary with the same keys as source and each corresponding value in target should have the same shape as the value in source.

PARAMETER DESCRIPTION
source

The reference dictionary containing PyTorch tensors.

TYPE: Mapping[str, Tensor]

target

The input to be harmonized. It can be a dictionary, tuple, or tensor.

TYPE: TorchTensorContainerType

RETURNS DESCRIPTION
Dict[str, Tensor]

The harmonized version of target.

RAISES DESCRIPTION
ValueError

If target cannot be harmonized to match source.

Source code in src/pydvl/influence/torch/util.py
def align_structure(
    source: Mapping[str, torch.Tensor],
    target: TorchTensorContainerType,
) -> Dict[str, torch.Tensor]:
    """
    This function transforms `target` to have the same structure as `source`, i.e.,
    it should be a dictionary with the same keys as `source` and each corresponding
    value in `target` should have the same shape as the value in `source`.

    Args:
        source: The reference dictionary containing PyTorch tensors.
        target: The input to be harmonized. It can be a dictionary, tuple, or tensor.

    Returns:
        The harmonized version of `target`.

    Raises:
        ValueError: If `target` cannot be harmonized to match `source`.
    """

    tangent_dict: Dict[str, torch.Tensor]

    if isinstance(target, dict):

        if list(target.keys()) != list(source.keys()):
            raise ValueError("The keys in 'target' do not match the keys in 'source'.")

        if [v.shape for v in target.values()] != [v.shape for v in source.values()]:

            raise ValueError(
                "The shapes of the values in 'target' do not match the shapes "
                "of the values in 'source'."
            )

        tangent_dict = target

    elif isinstance(target, tuple) or isinstance(target, list):

        if [v.shape for v in target] != [v.shape for v in source.values()]:

            raise ValueError(
                "'target' is a tuple/list but its elements' shapes do not match "
                "the shapes of the values in 'source'."
            )

        tangent_dict = dict(zip(source.keys(), target))

    elif isinstance(target, torch.Tensor):

        try:
            tangent_dict = dict(
                zip(
                    source.keys(),
                    reshape_vector_to_tensors(
                        target, [p.shape for p in source.values()]
                    ),
                )
            )
        except Exception as e:
            raise ValueError(
                f"'target' is a tensor but cannot be reshaped to match 'source'. "
                f"Original error: {e}"
            )

    else:
        raise ValueError(f"'target' is of type {type(target)} which is not supported.")

    return tangent_dict

align_with_model(x, model)

Aligns an input to the model's parameter structure, i.e. transforms it into a dict with the same keys as model.named_parameters() and matching tensor shapes

PARAMETER DESCRIPTION
x

The input to be aligned. It can be a dictionary, tuple, or tensor.

TYPE: TorchTensorContainerType

model

model to use for alignment

TYPE: Module

RETURNS DESCRIPTION

The aligned version of x.

RAISES DESCRIPTION
ValueError

If x cannot be aligned to match the model's parameters .

Source code in src/pydvl/influence/torch/util.py
def align_with_model(x: TorchTensorContainerType, model: torch.nn.Module):
    """
    Aligns an input to the model's parameter structure, i.e. transforms it into a dict
    with the same keys as model.named_parameters() and matching tensor shapes

    Args:
        x: The input to be aligned. It can be a dictionary, tuple, or tensor.
        model: model to use for alignment

    Returns:
        The aligned version of `x`.

    Raises:
        ValueError: If `x` cannot be aligned to match the model's parameters .

    """
    model_params = {k: p for k, p in model.named_parameters() if p.requires_grad}
    return align_structure(model_params, x)

flatten_dimensions(tensors, shape=None, concat_at=-1)

Flattens the dimensions of each tensor in the given iterable and concatenates them along a specified dimension.

This function takes an iterable of PyTorch tensors and flattens each tensor. Optionally, each tensor can be reshaped to a specified shape before concatenation. The concatenation is performed along the dimension specified by concat_at.

PARAMETER DESCRIPTION
tensors

An iterable containing PyTorch tensors to be flattened and concatenated.

TYPE: Iterable[Tensor]

shape

A tuple representing the desired shape to which each tensor is reshaped before concatenation. If None, tensors are flattened to 1D.

TYPE: Optional[Tuple[int, ...]] DEFAULT: None

concat_at

The dimension along which to concatenate the tensors.

TYPE: int DEFAULT: -1

RETURNS DESCRIPTION
Tensor

A single tensor resulting from the concatenation of the input tensors,

Tensor

each either flattened or reshaped as specified.

Example
>>> tensors = [torch.tensor([[1, 2], [3, 4]]), torch.tensor([[5, 6], [7, 8]])]
>>> flatten_dimensions(tensors)
tensor([1, 2, 3, 4, 5, 6, 7, 8])

>>> flatten_dimensions(tensors, shape=(2, 2), concat_at=0)
tensor([[1, 2],
        [3, 4],
        [5, 6],
        [7, 8]])
Source code in src/pydvl/influence/torch/util.py
def flatten_dimensions(
    tensors: Iterable[torch.Tensor],
    shape: Optional[Tuple[int, ...]] = None,
    concat_at: int = -1,
) -> torch.Tensor:
    """
    Flattens the dimensions of each tensor in the given iterable and concatenates them
    along a specified dimension.

    This function takes an iterable of PyTorch tensors and flattens each tensor.
    Optionally, each tensor can be reshaped to a specified shape before concatenation.
    The concatenation is performed along the dimension specified by `concat_at`.

    Args:
        tensors: An iterable containing PyTorch tensors to be flattened
            and concatenated.
        shape: A tuple representing the desired shape to which each tensor is reshaped
            before concatenation. If None, tensors are flattened to 1D.
        concat_at: The dimension along which to concatenate the tensors.

    Returns:
        A single tensor resulting from the concatenation of the input tensors,
        each either flattened or reshaped as specified.

    ??? Example
        ```pycon
        >>> tensors = [torch.tensor([[1, 2], [3, 4]]), torch.tensor([[5, 6], [7, 8]])]
        >>> flatten_dimensions(tensors)
        tensor([1, 2, 3, 4, 5, 6, 7, 8])

        >>> flatten_dimensions(tensors, shape=(2, 2), concat_at=0)
        tensor([[1, 2],
                [3, 4],
                [5, 6],
                [7, 8]])
        ```
    """
    return torch.cat(
        [t.reshape(-1) if shape is None else t.reshape(*shape) for t in tensors],
        dim=concat_at,
    )

torch_dataset_to_dask_array(dataset, chunk_size, total_size=None, resulting_dtype=np.float32)

Construct tuple of dask arrays from a PyTorch dataset, using dask.delayed

PARAMETER DESCRIPTION
dataset

A PyTorch dataset

TYPE: Dataset

chunk_size

The size of the chunks for the resulting Dask arrays.

TYPE: int

total_size

If the dataset does not implement len, provide the length via this parameter. If None the length of the dataset is inferred via accessing the dataset once.

TYPE: Optional[int] DEFAULT: None

resulting_dtype

The dtype of the resulting dask.array.Array

DEFAULT: float32

Example
import torch
from torch.utils.data import TensorDataset
x = torch.rand((20, 3))
y = torch.rand((20, 1))
dataset = TensorDataset(x, y)
da_x, da_y = torch_dataset_to_dask_array(dataset, 4)
RETURNS DESCRIPTION
Tuple[Array, ...]

Tuple of Dask arrays corresponding to each tensor in the dataset.

Source code in src/pydvl/influence/torch/util.py
def torch_dataset_to_dask_array(
    dataset: Dataset,
    chunk_size: int,
    total_size: Optional[int] = None,
    resulting_dtype=np.float32,
) -> Tuple[da.Array, ...]:
    """
    Construct tuple of dask arrays from a PyTorch dataset, using dask.delayed

    Args:
        dataset: A PyTorch [dataset][torch.utils.data.Dataset]
        chunk_size: The size of the chunks for the resulting Dask arrays.
        total_size: If the dataset does not implement len, provide the length
            via this parameter. If None
            the length of the dataset is inferred via accessing the dataset once.
        resulting_dtype: The dtype of the resulting [dask.array.Array][dask.array.Array]

    ??? Example
        ```python
        import torch
        from torch.utils.data import TensorDataset
        x = torch.rand((20, 3))
        y = torch.rand((20, 1))
        dataset = TensorDataset(x, y)
        da_x, da_y = torch_dataset_to_dask_array(dataset, 4)
        ```

    Returns:
        Tuple of Dask arrays corresponding to each tensor in the dataset.
    """

    def _infer_data_len(d_set: Dataset):
        try:
            n_data = len(d_set)
            if total_size is not None and n_data != total_size:
                raise ValueError(
                    f"The number of samples in the dataset ({n_data}), derived "
                    f"from calling ´len´, does not match the provided "
                    f"total number of samples ({total_size}). "
                    f"Call the function without total_size."
                )
            return n_data
        except TypeError as e:
            err_msg = (
                f"Could not infer the number of samples in the dataset from "
                f"calling ´len´. Original error: {e}."
            )
            if total_size is not None:
                logger.warning(
                    err_msg
                    + f" Using the provided total number of samples {total_size}."
                )
                return total_size
            else:
                logger.warning(
                    err_msg + f" Infer the number of samples from the dataset, "
                    f"via iterating the dataset once. "
                    f"This might induce severe overhead, so consider"
                    f"providing total_size, if you know the number of samples "
                    f"beforehand."
                )
                idx = 0
                while True:
                    try:
                        t = d_set[idx]
                        if all(_t.numel() == 0 for _t in t):
                            return idx
                        idx += 1

                    except IndexError:
                        return idx

    sample = dataset[0]
    if not isinstance(sample, tuple):
        sample = (sample,)

    def _get_chunk(
        start_idx: int, stop_idx: int, d_set: Dataset
    ) -> Tuple[torch.Tensor, ...]:
        try:
            t = d_set[start_idx:stop_idx]
            if not isinstance(t, tuple):
                t = (t,)
            return t  # type:ignore
        except Exception:
            nested_tensor_list = [
                [d_set[idx][k] for idx in range(start_idx, stop_idx)]
                for k in range(len(sample))
            ]
            return tuple(map(torch.stack, nested_tensor_list))

    n_samples = _infer_data_len(dataset)
    chunk_indices = [
        (i, min(i + chunk_size, n_samples)) for i in range(0, n_samples, chunk_size)
    ]
    delayed_dataset = dask.delayed(dataset)
    delayed_chunks = [
        dask.delayed(partial(_get_chunk, start, stop))(delayed_dataset)
        for (start, stop) in chunk_indices
    ]

    delayed_arrays_dict: Dict[int, List[da.Array]] = {k: [] for k in range(len(sample))}

    for chunk, (start, stop) in zip(delayed_chunks, chunk_indices):
        for tensor_idx, sample_tensor in enumerate(sample):

            delayed_tensor = da.from_delayed(
                dask.delayed(lambda t: t.cpu().numpy())(chunk[tensor_idx]),
                shape=(stop - start, *sample_tensor.shape),
                dtype=resulting_dtype,
            )

            delayed_arrays_dict[tensor_idx].append(delayed_tensor)

    return tuple(
        da.concatenate(array_list) for array_list in delayed_arrays_dict.values()
    )

Last update: 2023-12-21
Created: 2023-12-21