pydvl.influence.torch.functional
¶
This module provides methods for efficiently computing tensors related to first
and second order derivatives of torch models, using functionality from
torch.func.
To indicate higher-order functions, i.e. functions which return functions,
we use the naming convention create_**_function
.
In particular, the module contains functionality for
- Sample, batch-wise and empirical loss functions:
- Per sample gradient and jacobian product functions:
- Hessian, low rank approximation of Hessian and Hessian vector products:
- hvp
- create_hvp_function
- create_batch_hvp_function
- hessian
- [model_hessian_low_rank][pydvl.influence.torch.functional.model_hessian_low_rank]
LowRankProductRepresentation
dataclass
¶
hvp
¶
hvp(
func: Callable[[Dict[str, Tensor]], Tensor],
params: Dict[str, Tensor],
vec: Dict[str, Tensor],
reverse_only: bool = True,
) -> Dict[str, Tensor]
Computes the Hessian-vector product (HVP) for a given function at the given parameters, i.e.
This function can operate in two modes, either reverse-mode autodiff only or both forward- and reverse-mode autodiff.
PARAMETER | DESCRIPTION |
---|---|
func |
The scalar-valued function for which the HVP is computed. |
params |
The parameters at which the HVP is computed. |
vec |
The vector with which the Hessian is multiplied. |
reverse_only |
Whether to use only reverse-mode autodiff (True, default) or both forward- and reverse-mode autodiff (False).
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Dict[str, Tensor]
|
The HVP of the function at the given parameters with the given vector. |
Example
Source code in src/pydvl/influence/torch/functional.py
create_batch_hvp_function
¶
create_batch_hvp_function(
model: Module,
loss: Callable[[Tensor, Tensor], Tensor],
reverse_only: bool = True,
) -> Callable[[Dict[str, Tensor], Tensor, Tensor, Tensor], Tensor]
Creates a function to compute Hessian-vector product (HVP) for a given model and loss function, where the Hessian information is computed for a provided batch.
This function takes a PyTorch model, a loss function,
and an optional boolean parameter. It returns a callable
that computes the Hessian-vector product for batches of input data
and a given vector. The computation can be performed in reverse mode only,
based on the reverse_only
parameter.
PARAMETER | DESCRIPTION |
---|---|
model |
The PyTorch model for which the Hessian-vector product is to be computed.
TYPE:
|
loss |
The loss function. It should take two torch.Tensor objects as input and return a torch.Tensor. |
reverse_only |
If True, the Hessian-vector product is computed in reverse mode only.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Callable[[Dict[str, Tensor], Tensor, Tensor, Tensor], Tensor]
|
A function that takes three |
Example
Source code in src/pydvl/influence/torch/functional.py
create_empirical_loss_function
¶
create_empirical_loss_function(
model: Module,
loss: Callable[[Tensor, Tensor], Tensor],
data_loader: DataLoader,
) -> Callable[[Dict[str, Tensor]], Tensor]
Creates a function to compute the empirical loss of a given model on a given dataset. If we denote the model parameters with \( \theta \), the resulting function approximates:
for a loss function \(\operatorname{loss}\) and a model \(\operatorname{model}\) with model parameters \(\theta\), where \(N\) is the number of all elements provided by the data_loader.
PARAMETER | DESCRIPTION |
---|---|
model |
The model for which the loss should be computed.
TYPE:
|
loss |
The loss function to be used. |
data_loader |
The data loader for iterating over the dataset.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Callable[[Dict[str, Tensor]], Tensor]
|
A function that computes the empirical loss of the model on the dataset for given model parameters. |
Source code in src/pydvl/influence/torch/functional.py
create_batch_loss_function
¶
create_batch_loss_function(
model: Module, loss: Callable[[Tensor, Tensor], Tensor]
) -> Callable[[Dict[str, Tensor], Tensor, Tensor], Tensor]
Creates a function to compute the loss of a given model on a given batch of data, i.e. the function
for a loss function \(\operatorname{loss}\) and a model \(\operatorname{model}\) with model parameters \(\theta\), where \(N\) is the number of elements in the batch. Args: model: The model for which the loss should be computed. loss: The loss function to be used, which should be able to handle a batch dimension
RETURNS | DESCRIPTION |
---|---|
Callable[[Dict[str, Tensor], Tensor, Tensor], Tensor]
|
A function that computes the loss of the model on a batch for given model parameters. The model parameter input to the function must take the form of a dict conform to model.named_parameters(), i.e. the keys must be a subset of the parameters and the corresponding tensor shapes must align. For the data input, the first dimension has to be the batch dimension. |
Source code in src/pydvl/influence/torch/functional.py
create_hvp_function
¶
create_hvp_function(
model: Module,
loss: Callable[[Tensor, Tensor], Tensor],
data_loader: DataLoader,
precompute_grad: bool = True,
use_average: bool = True,
reverse_only: bool = True,
track_gradients: bool = False,
) -> Callable[[Tensor], Tensor]
Returns a function that calculates the approximate Hessian-vector product for a given vector. If you want to compute the exact hessian, i.e., pulling all data into memory and compute a full gradient computation, use the function hvp.
PARAMETER | DESCRIPTION |
---|---|
model |
A PyTorch module representing the model whose loss function's Hessian is to be computed.
TYPE:
|
loss |
A callable that takes the model's output and target as input and returns the scalar loss. |
data_loader |
A DataLoader instance that provides batches of data for calculating the Hessian-vector product. Each batch from the DataLoader is assumed to return a tuple where the first element is the model's input and the second element is the target output.
TYPE:
|
precompute_grad |
If
TYPE:
|
use_average |
If
TYPE:
|
reverse_only |
Whether to use only reverse-mode autodiff or
both forward- and reverse-mode autodiff. Ignored if
TYPE:
|
track_gradients |
Whether to track gradients for the resulting tensor of the Hessian-vector products.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Callable[[Tensor], Tensor]
|
A function that takes a single argument, a vector, and returns the |
Callable[[Tensor], Tensor]
|
product of the Hessian of the |
Callable[[Tensor], Tensor]
|
|
Source code in src/pydvl/influence/torch/functional.py
261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 |
|
hessian
¶
hessian(
model: Module,
loss: Callable[[Tensor, Tensor], Tensor],
data_loader: DataLoader,
use_hessian_avg: bool = True,
track_gradients: bool = False,
restrict_to: Optional[Dict[str, Tensor]] = None,
) -> Tensor
Computes the Hessian matrix for a given model and loss function.
PARAMETER | DESCRIPTION |
---|---|
model |
The PyTorch model for which the Hessian is computed.
TYPE:
|
loss |
A callable that computes the loss. |
data_loader |
DataLoader providing batches of input data and corresponding ground truths.
TYPE:
|
use_hessian_avg |
Flag to indicate whether the average Hessian across mini-batches should be computed. If False, the empirical loss across the entire dataset is used.
TYPE:
|
track_gradients |
Whether to track gradients for the resulting tensor of the hessian vector products.
TYPE:
|
restrict_to |
The parameters to restrict the second order differentiation to, i.e. the corresponding sub-matrix of the Hessian. If None, the full Hessian is computed. |
RETURNS | DESCRIPTION |
---|---|
Tensor
|
A tensor representing the Hessian matrix. The shape of the tensor will be (n_parameters, n_parameters), where n_parameters is the number of trainable parameters in the model. |
Source code in src/pydvl/influence/torch/functional.py
gauss_newton
¶
gauss_newton(
model: Module,
loss: Callable[[Tensor, Tensor], Tensor],
data_loader: DataLoader,
restrict_to: Optional[Dict[str, Tensor]] = None,
)
Compute the Gauss-Newton matrix, i.e.
$$ \sum_{i=1}^N \nabla_{\theta}\ell(m(x_i; \theta), y) \nabla_{\theta}\ell(m(x_i; \theta), y)^t,$$ for a loss function \(\ell\) and a model \(m\) with model parameters \(\theta\).
PARAMETER | DESCRIPTION |
---|---|
model |
The PyTorch model.
TYPE:
|
loss |
A callable that computes the loss. |
data_loader |
A PyTorch DataLoader providing batches of input data and corresponding output data.
TYPE:
|
restrict_to |
The parameters to restrict the differentiation to, i.e. the corresponding sub-matrix of the Jacobian. If None, the full Jacobian is used. |
RETURNS | DESCRIPTION |
---|---|
The Gauss-Newton matrix. |
Source code in src/pydvl/influence/torch/functional.py
create_per_sample_loss_function
¶
create_per_sample_loss_function(
model: Module, loss: Callable[[Tensor, Tensor], Tensor]
) -> Callable[[Dict[str, Tensor], Tensor, Tensor], Tensor]
Generates a function to compute per-sample losses using PyTorch's vmap, i.e. the vector-valued function
for a loss function \(\operatorname{loss}\) and a model \(\operatorname{model}\) with model parameters \(\theta\), where \(N\) is the number of elements in the batch.
PARAMETER | DESCRIPTION |
---|---|
model |
The PyTorch model for which per-sample losses will be computed.
TYPE:
|
loss |
A callable that computes the loss. |
RETURNS | DESCRIPTION |
---|---|
Callable[[Dict[str, Tensor], Tensor, Tensor], Tensor]
|
A callable that computes the loss for each sample in the batch, given a dictionary of model inputs, the model's predictions, and the true values. The callable will return a tensor where each entry corresponds to the loss of the corresponding sample. |
Source code in src/pydvl/influence/torch/functional.py
create_per_sample_gradient_function
¶
create_per_sample_gradient_function(
model: Module, loss: Callable[[Tensor, Tensor], Tensor]
) -> Callable[[Dict[str, Tensor], Tensor, Tensor], Dict[str, Tensor]]
Generates a function to computes the per-sample gradient of the loss with respect to the model's parameters, i.e. the tensor-valued function
for a loss function \(\operatorname{loss}\) and a model \(\operatorname{model}\) with model parameters \(\theta\), where \(N\) is the number of elements in the batch.
PARAMETER | DESCRIPTION |
---|---|
model |
The PyTorch model for which per-sample gradients will be computed.
TYPE:
|
loss |
A callable that computes the loss. |
RETURNS | DESCRIPTION |
---|---|
Callable[[Dict[str, Tensor], Tensor, Tensor], Dict[str, Tensor]]
|
A callable that takes a dictionary of model parameters, the model's input, and the labels. It returns a dictionary with the same keys as the model's named parameters. Each entry in the returned dictionary corresponds to the gradient of the corresponding model parameter for each sample in the batch. |
Source code in src/pydvl/influence/torch/functional.py
create_matrix_jacobian_product_function
¶
create_matrix_jacobian_product_function(
model: Module, loss: Callable[[Tensor, Tensor], Tensor], g: Tensor
) -> Callable[[Dict[str, Tensor], Tensor, Tensor], Tensor]
Generates a function to computes the matrix-Jacobian product (MJP) of the per-sample loss with respect to the model's parameters, i.e. the function
for a loss function \(\operatorname{loss}\) and a model \(\operatorname{model}\) with model parameters \(\theta\).
PARAMETER | DESCRIPTION |
---|---|
model |
The PyTorch model for which the MJP will be computed.
TYPE:
|
loss |
A callable that computes the loss. |
g |
Matrix for which the product with the Jacobian will be computed. The shape of this matrix should be consistent with the shape of the jacobian.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Callable[[Dict[str, Tensor], Tensor, Tensor], Tensor]
|
A callable that takes a dictionary of model inputs, the model's input,
and the labels. The callable returns the matrix-Jacobian product of the
per-sample loss with respect to the model's parameters for the given
matrix |
Source code in src/pydvl/influence/torch/functional.py
create_per_sample_mixed_derivative_function
¶
create_per_sample_mixed_derivative_function(
model: Module, loss: Callable[[Tensor, Tensor], Tensor]
) -> Callable[[Dict[str, Tensor], Tensor, Tensor], Dict[str, Tensor]]
Generates a function to computes the mixed derivatives, of the per-sample loss with respect to the model parameters and the input, i.e. the function
for a loss function \(\operatorname{loss}\) and a model \(\operatorname{model}\) with model parameters \(\theta\).
PARAMETER | DESCRIPTION |
---|---|
model |
The PyTorch model for which the mixed derivatives are computed.
TYPE:
|
loss |
A callable that computes the loss. |
RETURNS | DESCRIPTION |
---|---|
Callable[[Dict[str, Tensor], Tensor, Tensor], Dict[str, Tensor]]
|
A callable that takes a dictionary of model inputs, the model's input, and the labels. The callable returns the mixed derivatives of the per-sample loss with respect to the model's parameters and input. |
Source code in src/pydvl/influence/torch/functional.py
randomized_nystroem_approximation
¶
randomized_nystroem_approximation(
mat_mat_prod: Union[Tensor, Callable[[Tensor], Tensor]],
input_dim: int,
rank: int,
input_type: dtype,
shift_func: Optional[Callable[[Tensor], Tensor]] = None,
mat_vec_device: device = torch.device("cpu"),
) -> LowRankProductRepresentation
Given a matrix vector product function (representing a symmetric positive definite matrix \(A\) ), computes a random Nyström low rank approximation of \(A\) in factored form, i.e.
where \(\Omega\) is a standard normal random matrix.
PARAMETER | DESCRIPTION |
---|---|
mat_mat_prod |
A callable representing the matrix vector product |
input_dim |
dimension of the input for the matrix vector product
TYPE:
|
input_type |
data_type of inputs
TYPE:
|
rank |
rank of the approximation
TYPE:
|
shift_func |
optional function for computing the stabilizing shift in the construction of the randomized nystroem approximation, defaults to \[ \sqrt{\operatorname{\text{input_dim}}} \cdot
\varepsilon(\operatorname{\text{input_type}}) \cdot \|A\Omega\|_2,\]
where \(\varepsilon(\operatorname{\text{input_type}})\) is the value of the machine precision corresponding to the data type. |
mat_vec_device |
device where the matrix vector product has to be executed |
RETURNS | DESCRIPTION |
---|---|
LowRankProductRepresentation
|
object containing, \(U\) and \(\Sigma\) |
Source code in src/pydvl/influence/torch/functional.py
713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 |
|
model_hessian_nystroem_approximation
¶
model_hessian_nystroem_approximation(
model: Module,
loss: Callable[[Tensor, Tensor], Tensor],
data_loader: DataLoader,
rank: int,
shift_func: Optional[Callable[[Tensor], Tensor]] = None,
) -> LowRankProductRepresentation
Given a model, loss and a data_loader, computes a random Nyström low rank approximation of the corresponding Hessian matrix in factored form, i.e.
PARAMETER | DESCRIPTION |
---|---|
model |
A PyTorch model instance. The Hessian will be calculated with respect to this model's parameters.
TYPE:
|
loss |
A callable that computes the loss.
|
data_loader |
A DataLoader instance that provides the model's training data. Used in calculating the Hessian-vector products.
TYPE:
|
rank |
rank of the approximation
TYPE:
|
shift_func |
optional function for computing the stabilizing shift in the construction of the randomized nystroem approximation, defaults to \[ \sqrt{\operatorname{\text{input_dim}}} \cdot
\varepsilon(\operatorname{\text{input_type}}) \cdot \|A\Omega\|_2,\]
where \(\varepsilon(\operatorname{\text{input_type}})\) is the value of the machine precision corresponding to the data type. |
RETURNS | DESCRIPTION |
---|---|
LowRankProductRepresentation
|
object containing, \(U\) and \(\Sigma\) |
Source code in src/pydvl/influence/torch/functional.py
operator_nystroem_approximation
¶
operator_nystroem_approximation(
operator: "TensorOperator",
rank: int,
shift_func: Optional[Callable[[Tensor], Tensor]] = None,
)
Given an operator (representing a symmetric positive definite matrix \(A\) ), computes a random Nyström low rank approximation of \(A\) in factored form, i.e.
where \(\Omega\) is a standard normal random matrix.
PARAMETER | DESCRIPTION |
---|---|
operator |
the operator to approximate
TYPE:
|
rank |
rank of the approximation
TYPE:
|
shift_func |
optional function for computing the stabilizing shift in the construction of the randomized nystroem approximation, defaults to \[ \sqrt{\operatorname{\text{input_dim}}} \cdot
\varepsilon(\operatorname{\text{input_type}}) \cdot \|A\Omega\|_2,\]
where \(\varepsilon(\operatorname{\text{input_type}})\) is the value of the machine precision corresponding to the data type. |
RETURNS | DESCRIPTION |
---|---|
object containing, \(U\) and \(\Sigma\) |
Source code in src/pydvl/influence/torch/functional.py
operator_spectral_approximation
¶
operator_spectral_approximation(
operator: "TensorOperator",
rank: int = 10,
krylov_dimension: Optional[int] = None,
tol: float = 1e-06,
max_iter: Optional[int] = None,
eigen_computation_on_gpu: bool = False,
)
Calculates a low-rank approximation of an operator \(H\) using the implicitly restarted Lanczos algorithm, i.e.:
where \(D\) is a diagonal matrix with the top (in absolute value) rank
eigenvalues of the Hessian and \(V\) contains the corresponding eigenvectors.
PARAMETER | DESCRIPTION |
---|---|
operator |
The operator to approximate.
TYPE:
|
rank |
The number of eigenvalues and corresponding eigenvectors to compute. Represents the desired rank of the Hessian approximation.
TYPE:
|
krylov_dimension |
The number of Krylov vectors to use for the Lanczos method. If not provided, it defaults to \( \min(\text{model.n_parameters}, \max(2 \times \text{rank_estimate} + 1, 20)) \). |
tol |
The stopping criteria for the Lanczos algorithm, which stops when
the difference in the approximated eigenvalue is less than
TYPE:
|
max_iter |
The maximum number of iterations for the Lanczos method. If not provided, it defaults to \( 10 \cdot \text{model.n_parameters}\). |
eigen_computation_on_gpu |
If True, tries to execute the eigen pair approximation on the provided device via cupy implementation. Ensure that either your model is small enough, or you use a small rank_estimate to fit your device's memory. If False, the eigen pair approximation is executed on the CPU with scipy's wrapper to ARPACK.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
LowRankProductRepresentation instance that contains the top (up until rank_estimate) eigenvalues and corresponding eigenvectors of the Hessian. |
Source code in src/pydvl/influence/torch/functional.py
902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 |
|