pydvl.influence.torch.functional
¶
This module provides methods for efficiently computing tensors related to first
and second order derivatives of torch models, using functionality from
torch.func.
To indicate higher-order functions, i.e. functions which return functions,
we use the naming convention create_**_function
.
In particular, the module contains functionality for
- Sample, batch-wise and empirical loss functions:
- Per sample gradient and jacobian product functions:
- Hessian, low rank approximation of Hessian and Hessian vector products:
LowRankProductRepresentation
dataclass
¶
hvp
¶
hvp(
func: Callable[[Dict[str, Tensor]], Tensor],
params: Dict[str, Tensor],
vec: Dict[str, Tensor],
reverse_only: bool = True,
) -> Dict[str, Tensor]
Computes the Hessian-vector product (HVP) for a given function at the given parameters, i.e.
This function can operate in two modes, either reverse-mode autodiff only or both forward- and reverse-mode autodiff.
PARAMETER | DESCRIPTION |
---|---|
func |
The scalar-valued function for which the HVP is computed. |
params |
The parameters at which the HVP is computed. |
vec |
The vector with which the Hessian is multiplied. |
reverse_only |
Whether to use only reverse-mode autodiff (True, default) or both forward- and reverse-mode autodiff (False).
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Dict[str, Tensor]
|
The HVP of the function at the given parameters with the given vector. |
Example
Source code in src/pydvl/influence/torch/functional.py
create_batch_hvp_function
¶
create_batch_hvp_function(
model: Module,
loss: Callable[[Tensor, Tensor], Tensor],
reverse_only: bool = True,
) -> Callable[[Dict[str, Tensor], Tensor, Tensor, Tensor], Tensor]
Creates a function to compute Hessian-vector product (HVP) for a given model and loss function, where the Hessian information is computed for a provided batch.
This function takes a PyTorch model, a loss function,
and an optional boolean parameter. It returns a callable
that computes the Hessian-vector product for batches of input data
and a given vector. The computation can be performed in reverse mode only,
based on the reverse_only
parameter.
PARAMETER | DESCRIPTION |
---|---|
model |
The PyTorch model for which the Hessian-vector product is to be computed.
TYPE:
|
loss |
The loss function. It should take two torch.Tensor objects as input and return a torch.Tensor. |
reverse_only |
If True, the Hessian-vector product is computed in reverse mode only.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Callable[[Dict[str, Tensor], Tensor, Tensor, Tensor], Tensor]
|
A function that takes three |
Example
Source code in src/pydvl/influence/torch/functional.py
create_empirical_loss_function
¶
create_empirical_loss_function(
model: Module,
loss: Callable[[Tensor, Tensor], Tensor],
data_loader: DataLoader,
) -> Callable[[Dict[str, Tensor]], Tensor]
Creates a function to compute the empirical loss of a given model on a given dataset. If we denote the model parameters with \( \theta \), the resulting function approximates:
for a loss function \(\operatorname{loss}\) and a model \(\operatorname{model}\) with model parameters \(\theta\), where \(N\) is the number of all elements provided by the data_loader.
PARAMETER | DESCRIPTION |
---|---|
model |
The model for which the loss should be computed.
TYPE:
|
loss |
The loss function to be used. |
data_loader |
The data loader for iterating over the dataset.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Callable[[Dict[str, Tensor]], Tensor]
|
A function that computes the empirical loss of the model on the dataset for given model parameters. |
Source code in src/pydvl/influence/torch/functional.py
create_batch_loss_function
¶
create_batch_loss_function(
model: Module, loss: Callable[[Tensor, Tensor], Tensor]
) -> Callable[[Dict[str, Tensor], Tensor, Tensor], Tensor]
Creates a function to compute the loss of a given model on a given batch of data, i.e. the function
for a loss function \(\operatorname{loss}\) and a model \(\operatorname{model}\) with model parameters \(\theta\), where \(N\) is the number of elements in the batch. Args: model: The model for which the loss should be computed. loss: The loss function to be used, which should be able to handle a batch dimension
RETURNS | DESCRIPTION |
---|---|
Callable[[Dict[str, Tensor], Tensor, Tensor], Tensor]
|
A function that computes the loss of the model on a batch for given model parameters. The model parameter input to the function must take the form of a dict conform to model.named_parameters(), i.e. the keys must be a subset of the parameters and the corresponding tensor shapes must align. For the data input, the first dimension has to be the batch dimension. |
Source code in src/pydvl/influence/torch/functional.py
create_hvp_function
¶
create_hvp_function(
model: Module,
loss: Callable[[Tensor, Tensor], Tensor],
data_loader: DataLoader,
precompute_grad: bool = True,
use_average: bool = True,
reverse_only: bool = True,
track_gradients: bool = False,
) -> Callable[[Tensor], Tensor]
Returns a function that calculates the approximate Hessian-vector product for a given vector. If you want to compute the exact hessian, i.e., pulling all data into memory and compute a full gradient computation, use the function hvp.
PARAMETER | DESCRIPTION |
---|---|
model |
A PyTorch module representing the model whose loss function's Hessian is to be computed.
TYPE:
|
loss |
A callable that takes the model's output and target as input and returns the scalar loss. |
data_loader |
A DataLoader instance that provides batches of data for calculating the Hessian-vector product. Each batch from the DataLoader is assumed to return a tuple where the first element is the model's input and the second element is the target output.
TYPE:
|
precompute_grad |
If
TYPE:
|
use_average |
If
TYPE:
|
reverse_only |
Whether to use only reverse-mode autodiff or
both forward- and reverse-mode autodiff. Ignored if
TYPE:
|
track_gradients |
Whether to track gradients for the resulting tensor of the Hessian-vector products.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Callable[[Tensor], Tensor]
|
A function that takes a single argument, a vector, and returns the |
Callable[[Tensor], Tensor]
|
product of the Hessian of the |
Callable[[Tensor], Tensor]
|
|
Source code in src/pydvl/influence/torch/functional.py
248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 |
|
hessian
¶
hessian(
model: Module,
loss: Callable[[Tensor, Tensor], Tensor],
data_loader: DataLoader,
use_hessian_avg: bool = True,
track_gradients: bool = False,
) -> Tensor
Computes the Hessian matrix for a given model and loss function.
PARAMETER | DESCRIPTION |
---|---|
model |
The PyTorch model for which the Hessian is computed.
TYPE:
|
loss |
A callable that computes the loss. |
data_loader |
DataLoader providing batches of input data and corresponding ground truths.
TYPE:
|
use_hessian_avg |
Flag to indicate whether the average Hessian across mini-batches should be computed. If False, the empirical loss across the entire dataset is used.
TYPE:
|
track_gradients |
Whether to track gradients for the resulting tensor of the hessian vector products.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tensor
|
A tensor representing the Hessian matrix. The shape of the tensor will be (n_parameters, n_parameters), where n_parameters is the number of trainable parameters in the model. |
Source code in src/pydvl/influence/torch/functional.py
create_per_sample_loss_function
¶
create_per_sample_loss_function(
model: Module, loss: Callable[[Tensor, Tensor], Tensor]
) -> Callable[[Dict[str, Tensor], Tensor, Tensor], Tensor]
Generates a function to compute per-sample losses using PyTorch's vmap, i.e. the vector-valued function
for a loss function \(\operatorname{loss}\) and a model \(\operatorname{model}\) with model parameters \(\theta\), where \(N\) is the number of elements in the batch.
PARAMETER | DESCRIPTION |
---|---|
model |
The PyTorch model for which per-sample losses will be computed.
TYPE:
|
loss |
A callable that computes the loss. |
RETURNS | DESCRIPTION |
---|---|
Callable[[Dict[str, Tensor], Tensor, Tensor], Tensor]
|
A callable that computes the loss for each sample in the batch, given a dictionary of model inputs, the model's predictions, and the true values. The callable will return a tensor where each entry corresponds to the loss of the corresponding sample. |
Source code in src/pydvl/influence/torch/functional.py
create_per_sample_gradient_function
¶
create_per_sample_gradient_function(
model: Module, loss: Callable[[Tensor, Tensor], Tensor]
) -> Callable[[Dict[str, Tensor], Tensor, Tensor], Dict[str, Tensor]]
Generates a function to computes the per-sample gradient of the loss with respect to the model's parameters, i.e. the tensor-valued function
for a loss function \(\operatorname{loss}\) and a model \(\operatorname{model}\) with model parameters \(\theta\), where \(N\) is the number of elements in the batch.
PARAMETER | DESCRIPTION |
---|---|
model |
The PyTorch model for which per-sample gradients will be computed.
TYPE:
|
loss |
A callable that computes the loss. |
RETURNS | DESCRIPTION |
---|---|
Callable[[Dict[str, Tensor], Tensor, Tensor], Dict[str, Tensor]]
|
A callable that takes a dictionary of model parameters, the model's input, and the labels. It returns a dictionary with the same keys as the model's named parameters. Each entry in the returned dictionary corresponds to the gradient of the corresponding model parameter for each sample in the batch. |
Source code in src/pydvl/influence/torch/functional.py
create_matrix_jacobian_product_function
¶
create_matrix_jacobian_product_function(
model: Module, loss: Callable[[Tensor, Tensor], Tensor], g: Tensor
) -> Callable[[Dict[str, Tensor], Tensor, Tensor], Tensor]
Generates a function to computes the matrix-Jacobian product (MJP) of the per-sample loss with respect to the model's parameters, i.e. the function
for a loss function \(\operatorname{loss}\) and a model \(\operatorname{model}\) with model parameters \(\theta\).
PARAMETER | DESCRIPTION |
---|---|
model |
The PyTorch model for which the MJP will be computed.
TYPE:
|
loss |
A callable that computes the loss. |
g |
Matrix for which the product with the Jacobian will be computed. The shape of this matrix should be consistent with the shape of the jacobian.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Callable[[Dict[str, Tensor], Tensor, Tensor], Tensor]
|
A callable that takes a dictionary of model inputs, the model's input,
and the labels. The callable returns the matrix-Jacobian product of the
per-sample loss with respect to the model's parameters for the given
matrix |
Source code in src/pydvl/influence/torch/functional.py
create_per_sample_mixed_derivative_function
¶
create_per_sample_mixed_derivative_function(
model: Module, loss: Callable[[Tensor, Tensor], Tensor]
) -> Callable[[Dict[str, Tensor], Tensor, Tensor], Dict[str, Tensor]]
Generates a function to computes the mixed derivatives, of the per-sample loss with respect to the model parameters and the input, i.e. the function
for a loss function \(\operatorname{loss}\) and a model \(\operatorname{model}\) with model parameters \(\theta\).
PARAMETER | DESCRIPTION |
---|---|
model |
The PyTorch model for which the mixed derivatives are computed.
TYPE:
|
loss |
A callable that computes the loss. |
RETURNS | DESCRIPTION |
---|---|
Callable[[Dict[str, Tensor], Tensor, Tensor], Dict[str, Tensor]]
|
A callable that takes a dictionary of model inputs, the model's input, and the labels. The callable returns the mixed derivatives of the per-sample loss with respect to the model's parameters and input. |
Source code in src/pydvl/influence/torch/functional.py
lanzcos_low_rank_hessian_approx
¶
lanzcos_low_rank_hessian_approx(
hessian_vp: Callable[[Tensor], Tensor],
matrix_shape: Tuple[int, int],
hessian_perturbation: float = 0.0,
rank_estimate: int = 10,
krylov_dimension: Optional[int] = None,
tol: float = 1e-06,
max_iter: Optional[int] = None,
device: Optional[device] = None,
eigen_computation_on_gpu: bool = False,
torch_dtype: Optional[dtype] = None,
) -> LowRankProductRepresentation
Calculates a low-rank approximation of the Hessian matrix of a scalar-valued function using the implicitly restarted Lanczos algorithm, i.e.:
where \(D\) is a diagonal matrix with the top (in absolute value) rank_estimate
eigenvalues of the Hessian and \(V\) contains the corresponding eigenvectors.
PARAMETER | DESCRIPTION |
---|---|
hessian_vp |
A function that takes a vector and returns the product of the Hessian of the loss function. |
matrix_shape |
The shape of the matrix, represented by the hessian vector product. |
hessian_perturbation |
Regularization parameter added to the Hessian-vector product for numerical stability.
TYPE:
|
rank_estimate |
The number of eigenvalues and corresponding eigenvectors to compute. Represents the desired rank of the Hessian approximation.
TYPE:
|
krylov_dimension |
The number of Krylov vectors to use for the Lanczos method. If not provided, it defaults to \( \min(\text{model.n_parameters}, \max(2 \times \text{rank_estimate} + 1, 20)) \). |
tol |
The stopping criteria for the Lanczos algorithm, which stops when
the difference in the approximated eigenvalue is less than
TYPE:
|
max_iter |
The maximum number of iterations for the Lanczos method. If not provided, it defaults to \( 10 \cdot \text{model.n_parameters}\). |
device |
The device to use for executing the hessian vector product. |
eigen_computation_on_gpu |
If True, tries to execute the eigen pair approximation on the provided device via cupy implementation. Ensure that either your model is small enough, or you use a small rank_estimate to fit your device's memory. If False, the eigen pair approximation is executed on the CPU with scipy's wrapper to ARPACK.
TYPE:
|
torch_dtype |
If not provided, the current torch default dtype is used for conversion to torch. |
RETURNS | DESCRIPTION |
---|---|
LowRankProductRepresentation
|
LowRankProductRepresentation instance that contains the top (up until rank_estimate) eigenvalues and corresponding eigenvectors of the Hessian. |
Source code in src/pydvl/influence/torch/functional.py
648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 |
|
model_hessian_low_rank
¶
model_hessian_low_rank(
model: Module,
loss: Callable[[Tensor, Tensor], Tensor],
training_data: DataLoader,
hessian_perturbation: float = 0.0,
rank_estimate: int = 10,
krylov_dimension: Optional[int] = None,
tol: float = 1e-06,
max_iter: Optional[int] = None,
eigen_computation_on_gpu: bool = False,
precompute_grad: bool = False,
) -> LowRankProductRepresentation
Calculates a low-rank approximation of the Hessian matrix of the model's loss function using the implicitly restarted Lanczos algorithm, i.e.
where \(D\) is a diagonal matrix with the top (in absolute value) rank_estimate
eigenvalues of the Hessian and \(V\) contains the corresponding eigenvectors.
PARAMETER | DESCRIPTION |
---|---|
model |
A PyTorch model instance. The Hessian will be calculated with respect to this model's parameters.
TYPE:
|
loss |
A callable that computes the loss.
|
training_data |
A DataLoader instance that provides the model's training data. Used in calculating the Hessian-vector products.
TYPE:
|
hessian_perturbation |
Optional regularization parameter added to the Hessian-vector product for numerical stability.
TYPE:
|
rank_estimate |
The number of eigenvalues and corresponding eigenvectors to compute. Represents the desired rank of the Hessian approximation.
TYPE:
|
krylov_dimension |
The number of Krylov vectors to use for the Lanczos method. If not provided, it defaults to min(model.n_parameters, max(2*rank_estimate + 1, 20)). |
tol |
The stopping criteria for the Lanczos algorithm,
which stops when the difference in the approximated eigenvalue is less than
TYPE:
|
max_iter |
The maximum number of iterations for the Lanczos method. If not provided, it defaults to 10*model.n_parameters. |
eigen_computation_on_gpu |
If True, tries to execute the eigen pair approximation on the provided device via cupy implementation. Make sure, that either your model is small enough or you use a small rank_estimate to fit your device's memory. If False, the eigen pair approximation is executed on the CPU by scipy wrapper to ARPACK.
TYPE:
|
precompute_grad |
If True, the full data gradient is precomputed and kept in memory, which can speed up the hessian vector product computation. Set this to False, if you can't afford to keep the full computation graph in memory.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
LowRankProductRepresentation
|
LowRankProductRepresentation instance that contains the top (up until rank_estimate) eigenvalues and corresponding eigenvectors of the Hessian. |
Source code in src/pydvl/influence/torch/functional.py
randomized_nystroem_approximation
¶
randomized_nystroem_approximation(
mat_mat_prod: Union[Tensor, Callable[[Tensor], Tensor]],
input_dim: int,
rank: int,
input_type: dtype,
shift_func: Optional[Callable[[Tensor], Tensor]] = None,
mat_vec_device: device = torch.device("cpu"),
) -> LowRankProductRepresentation
Given a matrix vector product function (representing a symmetric positive definite matrix \(A\) ), computes a random Nyström low rank approximation of \(A\) in factored form, i.e.
where \(\Omega\) is a standard normal random matrix.
PARAMETER | DESCRIPTION |
---|---|
mat_mat_prod |
A callable representing the matrix vector product |
input_dim |
dimension of the input for the matrix vector product
TYPE:
|
input_type |
data_type of inputs
TYPE:
|
rank |
rank of the approximation
TYPE:
|
shift_func |
optional function for computing the stabilizing shift in the construction of the randomized nystroem approximation, defaults to \[ \sqrt{\operatorname{\text{input_dim}}} \cdot
\varepsilon(\operatorname{\text{input_type}}) \cdot \|A\Omega\|_2,\]
where \(\varepsilon(\operatorname{\text{input_type}})\) is the value of the machine precision corresponding to the data type. |
mat_vec_device |
device where the matrix vector product has to be executed |
RETURNS | DESCRIPTION |
---|---|
LowRankProductRepresentation
|
object containing, \(U\) and \(\Sigma\) |
Source code in src/pydvl/influence/torch/functional.py
846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 |
|
model_hessian_nystroem_approximation
¶
model_hessian_nystroem_approximation(
model: Module,
loss: Callable[[Tensor, Tensor], Tensor],
data_loader: DataLoader,
rank: int,
shift_func: Optional[Callable[[Tensor], Tensor]] = None,
) -> LowRankProductRepresentation
Given a model, loss and a data_loader, computes a random Nyström low rank approximation of the corresponding Hessian matrix in factored form, i.e.
PARAMETER | DESCRIPTION |
---|---|
model |
A PyTorch model instance. The Hessian will be calculated with respect to this model's parameters.
TYPE:
|
loss |
A callable that computes the loss.
|
data_loader |
A DataLoader instance that provides the model's training data. Used in calculating the Hessian-vector products.
TYPE:
|
rank |
rank of the approximation
TYPE:
|
shift_func |
optional function for computing the stabilizing shift in the construction of the randomized nystroem approximation, defaults to \[ \sqrt{\operatorname{\text{input_dim}}} \cdot
\varepsilon(\operatorname{\text{input_type}}) \cdot \|A\Omega\|_2,\]
where \(\varepsilon(\operatorname{\text{input_type}})\) is the value of the machine precision corresponding to the data type. |
RETURNS | DESCRIPTION |
---|---|
LowRankProductRepresentation
|
object containing, \(U\) and \(\Sigma\) |