Functional

`hvp(func, params, vec, reverse_only=True)` ¶

Computes the Hessian-vector product (HVP) for a given function at given parameters, i.e.

\[\nabla_{\theta} \nabla_{\theta} f (\theta)\cdot v\]

This function can operate in two modes, either reverse-mode autodiff only or both forward- and reverse-mode autodiff.

PARAMETER	DESCRIPTION
`func`	The scalar-valued function for which the HVP is computed. TYPE: `Callable[[TorchTensorContainerType], Tensor]`
`params`	The parameters at which the HVP is computed. TYPE: `TorchTensorContainerType`
`vec`	The vector with which the Hessian is multiplied. TYPE: `TorchTensorContainerType`
`reverse_only`	Whether to use only reverse-mode autodiff (True, default) or both forward- and reverse-mode autodiff (False). TYPE: `bool` DEFAULT: `True`

RETURNS	DESCRIPTION
`TorchTensorContainerType`	The HVP of the function at the given parameters with the given vector.

Example:

>>> def f(z): return torch.sum(z**2)
>>> u = torch.ones(10, requires_grad=True)
>>> v = torch.ones(10)
>>> hvp_vec = hvp(f, u, v)
>>> assert torch.allclose(hvp_vec, torch.full((10, ), 2.0))

Source code in src/pydvl/influence/torch/functional.py

def hvp(
    func: Callable[[TorchTensorContainerType], torch.Tensor],
    params: TorchTensorContainerType,
    vec: TorchTensorContainerType,
    reverse_only: bool = True,
) -> TorchTensorContainerType:
    r"""
    Computes the Hessian-vector product (HVP) for a given function at given parameters, i.e.

    \[\nabla_{\theta} \nabla_{\theta} f (\theta)\cdot v\]

    This function can operate in two modes, either reverse-mode autodiff only or both
    forward- and reverse-mode autodiff.

    Args:
        func: The scalar-valued function for which the HVP is computed.
        params: The parameters at which the HVP is computed.
        vec: The vector with which the Hessian is multiplied.
        reverse_only: Whether to use only reverse-mode autodiff
            (True, default) or both forward- and reverse-mode autodiff (False).

    Returns:
       The HVP of the function at the given parameters with the given vector.

    Example:
    ```python
    >>> def f(z): return torch.sum(z**2)
    >>> u = torch.ones(10, requires_grad=True)
    >>> v = torch.ones(10)
    >>> hvp_vec = hvp(f, u, v)
    >>> assert torch.allclose(hvp_vec, torch.full((10, ), 2.0))
    ```
    """

    output: TorchTensorContainerType

    if reverse_only:
        _, vjp_fn = vjp(grad(func), params)
        output = vjp_fn(vec)[0]
    else:
        output = jvp(grad(func), (params,), (vec,))[1]

    return output

`batch_hvp_gen(model, loss, data_loader, reverse_only=True)` ¶

Generates a sequence of batch Hessian-vector product (HVP) computations for the provided model, loss function, and data loader. If \(f_i\) is the model's loss on the \(i\)-th batch and \(\theta\) the model parameters, this is the sequence of the callable matrix vector products for the matrices

\[\nabla_{\theta}\nabla_{\theta}f_i(\theta), \quad i=1,\dots, \text{num_batches} \]

i.e. iterating over the data_loader, yielding partial function calls for calculating HVPs.

PARAMETER	DESCRIPTION
`model`	The PyTorch model for which the HVP is calculated. TYPE: `Module`
`loss`	The loss function used to calculate the gradient and HVP. TYPE: `Callable[[Tensor, Tensor], Tensor]`
`data_loader`	PyTorch DataLoader object containing the dataset for which the HVP is calculated. TYPE: `DataLoader`
`reverse_only`	Whether to use only reverse-mode autodiff (True, default) or both forward- and reverse-mode autodiff (False). TYPE: `bool` DEFAULT: `True`

YIELDS	DESCRIPTION
`Callable[[Tensor], Tensor]`	Partial functions `H_{batch}(vec)=hvp(model, loss, inputs, targets, vec)` that when called, will compute the Hessian-vector product H(vec) for the given model and loss in a batch-wise manner, where (inputs, targets) coming from one batch.

Source code in src/pydvl/influence/torch/functional.py

def batch_hvp_gen(
    model: torch.nn.Module,
    loss: Callable[[torch.Tensor, torch.Tensor], torch.Tensor],
    data_loader: DataLoader,
    reverse_only: bool = True,
) -> Generator[Callable[[torch.Tensor], torch.Tensor], None, None]:
    r"""
    Generates a sequence of batch Hessian-vector product (HVP) computations for the provided model, loss function,
    and data loader. If \(f_i\) is the model's loss on the \(i\)-th batch and \(\theta\) the model parameters,
    this is the sequence of the callable matrix vector products for the matrices

    \[\nabla_{\theta}\nabla_{\theta}f_i(\theta), \quad i=1,\dots, \text{num_batches} \]

    i.e. iterating over the data_loader, yielding partial function calls for calculating HVPs.

    Args:
        model: The PyTorch model for which the HVP is calculated.
        loss: The loss function used to calculate the gradient and HVP.
        data_loader: PyTorch DataLoader object containing the dataset for which the HVP is calculated.
        reverse_only: Whether to use only reverse-mode autodiff
            (True, default) or both forward- and reverse-mode autodiff (False).

    Yields:
        Partial functions `H_{batch}(vec)=hvp(model, loss, inputs, targets, vec)` that when called,
            will compute the Hessian-vector product H(vec) for the given model and loss in a batch-wise manner, where
            (inputs, targets) coming from one batch.
    """

    for inputs, targets in iter(data_loader):
        batch_loss = batch_loss_function(model, loss, inputs, targets)
        model_params = dict(model.named_parameters())

        def batch_hvp(vec: torch.Tensor):
            return flatten_tensors_to_vector(
                hvp(
                    batch_loss,
                    model_params,
                    align_structure(model_params, vec),
                    reverse_only=reverse_only,
                ).values()
            )

        yield batch_hvp

`empirical_loss_function(model, loss, data_loader)` ¶

Creates a function to compute the empirical loss of a given model on a given dataset. If we denote the model parameters with \( \theta \), the resulting function approximates:

\[f(\theta) = \frac{1}{N}\sum_{i=1}^N \operatorname{loss}(y_i, \operatorname{model}(\theta, x_i))\]

Args: - model: The model for which the loss should be computed. - loss: The loss function to be used. - data_loader: The data loader for iterating over the dataset.

RETURNS	DESCRIPTION
`Callable[[Dict[str, Tensor]], Tensor]`	A function that computes the empirical loss of the model on the dataset for given model parameters.

Source code in src/pydvl/influence/torch/functional.py

def empirical_loss_function(
    model: torch.nn.Module,
    loss: Callable[[torch.Tensor, torch.Tensor], torch.Tensor],
    data_loader: DataLoader,
) -> Callable[[Dict[str, torch.Tensor]], torch.Tensor]:
    r"""
    Creates a function to compute the empirical loss of a given model on a given dataset.
    If we denote the model parameters with \( \theta \), the resulting function approximates:

    \[f(\theta) = \frac{1}{N}\sum_{i=1}^N \operatorname{loss}(y_i, \operatorname{model}(\theta, x_i))\]

    Args:
    - model: The model for which the loss should be computed.
    - loss: The loss function to be used.
    - data_loader: The data loader for iterating over the dataset.

    Returns:
        A function that computes the empirical loss of the model on the dataset for given model parameters.

    """

    def empirical_loss(params: Dict[str, torch.Tensor]):
        total_loss = to_model_device(torch.zeros((), requires_grad=True), model)
        total_samples = to_model_device(torch.zeros(()), model)

        for x, y in iter(data_loader):
            output = functional_call(
                model, params, (to_model_device(x, model),), strict=True
            )
            loss_value = loss(output, to_model_device(y, model))
            total_loss = total_loss + loss_value * x.size(0)
            total_samples += x.size(0)

        return total_loss / total_samples

    return empirical_loss

`batch_loss_function(model, loss, x, y)` ¶

Creates a function to compute the loss of a given model on a given batch of data, i.e. for the \(i\)-th batch \(B_i\)

\[\frac{1}{|B_i|}\sum_{x,y \in B_i} \operatorname{loss}(y, \operatorname{model}(\theta, x))\]

PARAMETER	DESCRIPTION
`model`	The model for which the loss should be computed. TYPE: `Module`
`loss`	The loss function to be used. TYPE: `Callable[[Tensor, Tensor], Tensor]`
`x`	The input data for the batch. TYPE: `Tensor`
`y`	The true labels for the batch. TYPE: `Tensor`

RETURNS	DESCRIPTION
`Callable[[Dict[str, Tensor]], Tensor]`	A function that computes the loss of the model on the batch for given model parameters.

Source code in src/pydvl/influence/torch/functional.py

def batch_loss_function(
    model: torch.nn.Module,
    loss: Callable[[torch.Tensor, torch.Tensor], torch.Tensor],
    x: torch.Tensor,
    y: torch.Tensor,
) -> Callable[[Dict[str, torch.Tensor]], torch.Tensor]:
    r"""
    Creates a function to compute the loss of a given model on a given batch of data, i.e. for the $i$-th batch $B_i$

    \[\frac{1}{|B_i|}\sum_{x,y \in B_i} \operatorname{loss}(y, \operatorname{model}(\theta, x))\]

    Args:
        model: The model for which the loss should be computed.
        loss: The loss function to be used.
        x: The input data for the batch.
        y: The true labels for the batch.

    Returns:
        A function that computes the loss of the model on the batch for given model parameters.
    """

    def batch_loss(params: Dict[str, torch.Tensor]):
        outputs = functional_call(
            model, params, (to_model_device(x, model),), strict=True
        )
        return loss(outputs, y)

    return batch_loss

`get_hvp_function(model, loss, data_loader, use_hessian_avg=True, reverse_only=True, track_gradients=False)` ¶

Returns a function that calculates the approximate Hessian-vector product for a given vector. If you want to compute the exact hessian, i.e., pulling all data into memory and compute a full gradient computation, use the function hvp.

PARAMETER	DESCRIPTION
`model`	A PyTorch module representing the model whose loss function's Hessian is to be computed. TYPE: `Module`
`loss`	A callable that takes the model's output and target as input and returns the scalar loss. TYPE: `Callable[[Tensor, Tensor], Tensor]`
`data_loader`	A DataLoader instance that provides batches of data for calculating the Hessian-vector product. Each batch from the DataLoader is assumed to return a tuple where the first element is the model's input and the second element is the target output. TYPE: `DataLoader`
`use_hessian_avg`	If True, the returned function uses batch-wise Hessian computation via batch_loss_function and averages the results. If False, the function uses backpropagation on the full empirical_loss_function, which is more accurate than averaging the batch hessians, but probably has a way higher memory usage. TYPE: `bool` DEFAULT: `True`
`reverse_only`	Whether to use only reverse-mode autodiff (True, default) or both forward- and reverse-mode autodiff (False). TYPE: `bool` DEFAULT: `True`
`track_gradients`	Whether to track gradients for the resulting tensor of the hessian vector products are (False, default). TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`Callable[[Tensor], Tensor]`	A function that takes a single argument, a vector, and returns the product of the Hessian of the `loss` function with respect to the `model`'s parameters and the input vector.

Source code in src/pydvl/influence/torch/functional.py

def get_hvp_function(
    model: torch.nn.Module,
    loss: Callable[[torch.Tensor, torch.Tensor], torch.Tensor],
    data_loader: DataLoader,
    use_hessian_avg: bool = True,
    reverse_only: bool = True,
    track_gradients: bool = False,
) -> Callable[[torch.Tensor], torch.Tensor]:
    """
    Returns a function that calculates the approximate Hessian-vector product for a given vector. If you want to
    compute the exact hessian, i.e., pulling all data into memory and compute a full gradient computation, use
    the function `hvp`.

    Args:
        model: A PyTorch module representing the model whose loss function's Hessian is to be computed.
        loss: A callable that takes the model's output and target as input and returns the scalar loss.
        data_loader: A DataLoader instance that provides batches of data for calculating the Hessian-vector product.
            Each batch from the DataLoader is assumed to return a tuple where the first element
            is the model's input and the second element is the target output.
        use_hessian_avg: If True, the returned function uses batch-wise Hessian computation via
            [batch_loss_function][pydvl.influence.torch.functional.batch_loss_function] and averages the results.
            If False, the function uses backpropagation on the full
            [empirical_loss_function][pydvl.influence.torch.functional.empirical_loss_function],
            which is more accurate than averaging the batch hessians, but probably has a way higher memory usage.
        reverse_only: Whether to use only reverse-mode autodiff (True, default) or
            both forward- and reverse-mode autodiff (False).
        track_gradients: Whether to track gradients for the resulting tensor of the hessian vector
            products are (False, default).

    Returns:
        A function that takes a single argument, a vector, and returns the product of the Hessian of the `loss`
            function with respect to the `model`'s parameters and the input vector.
    """

    params = {
        k: p if track_gradients else p.detach() for k, p in model.named_parameters()
    }

    def hvp_function(vec: torch.Tensor) -> torch.Tensor:
        v = align_structure(params, vec)
        empirical_loss = empirical_loss_function(model, loss, data_loader)
        return flatten_tensors_to_vector(
            hvp(empirical_loss, params, v, reverse_only=reverse_only).values()
        )

    def avg_hvp_function(vec: torch.Tensor) -> torch.Tensor:
        v = align_structure(params, vec)
        batch_hessians_vector_products: Iterable[torch.Tensor] = map(
            lambda x: x(v), batch_hvp_gen(model, loss, data_loader, reverse_only)
        )

        num_batches = len(data_loader)
        avg_hessian = to_model_device(torch.zeros_like(vec), model)

        for batch_hvp in batch_hessians_vector_products:
            avg_hessian += batch_hvp

        return avg_hessian / float(num_batches)

    return avg_hvp_function if use_hessian_avg else hvp_function

Last update: 2023-09-02
Created: 2023-09-02

Functional

hvp(func, params, vec, reverse_only=True) ¶

batch_hvp_gen(model, loss, data_loader, reverse_only=True) ¶

empirical_loss_function(model, loss, data_loader) ¶

batch_loss_function(model, loss, x, y) ¶

get_hvp_function(model, loss, data_loader, use_hessian_avg=True, reverse_only=True, track_gradients=False) ¶

`hvp(func, params, vec, reverse_only=True)` ¶

`batch_hvp_gen(model, loss, data_loader, reverse_only=True)` ¶

`empirical_loss_function(model, loss, data_loader)` ¶

`batch_loss_function(model, loss, x, y)` ¶

`get_hvp_function(model, loss, data_loader, use_hessian_avg=True, reverse_only=True, track_gradients=False)` ¶