Torch differentiable
Contains methods for differentiating a pyTorch model. Most of the methods focus on ways to calculate matrix vector products. Moreover, it contains several methods to invert the Hessian vector product. These are used to calculate the influence of a training point on the model.
References¶
-
Koh, P.W., Liang, P., 2017. Understanding Black-box Predictions via Influence Functions. In: Proceedings of the 34th International Conference on Machine Learning, pp. 1885–1894. PMLR. ↩
-
Agarwal, N., Bullins, B., Hazan, E., 2017. Second-Order Stochastic Optimization for Machine Learning in Linear Time. In: Journal of Machine Learning Research, Vol. 18, pp. 1–40. JMLR. ↩
TorchTwiceDifferentiable(model, loss)
¶
Bases: TwiceDifferentiable[torch.Tensor]
Wraps a torch.nn.Module and a loss function and provides methods to compute gradients and second derivative of the loss wrt. the model parameters
PARAMETER | DESCRIPTION |
---|---|
model |
A (differentiable) function.
TYPE:
|
loss |
A differentiable scalar loss \( L(\hat{y}, y) \), mapping a prediction and a target to a real value. |
Source code in src/pydvl/influence/torch/torch_differentiable.py
parameters: List[torch.Tensor]
property
¶
num_params: int
property
¶
Get the number of parameters of model f.
RETURNS | DESCRIPTION |
---|---|
int
|
Number of parameters.
TYPE:
|
grad(x, y, create_graph=False)
¶
Calculates gradient of model parameters with respect to the model parameters.
PARAMETER | DESCRIPTION |
---|---|
x |
A matrix [NxD] representing the features \( x_i \).
TYPE:
|
y |
A matrix [NxK] representing the target values \( y_i \).
TYPE:
|
create_graph |
If True, the resulting gradient tensor can be used for further differentiation.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tensor
|
An array [P] with the gradients of the model. |
Source code in src/pydvl/influence/torch/torch_differentiable.py
hessian(x, y)
¶
Calculates the explicit hessian of model parameters given data \(x\) and \(y\).
PARAMETER | DESCRIPTION |
---|---|
x |
A matrix [NxD] representing the features \(x_i\).
TYPE:
|
y |
A matrix [NxK] representing the target values \(y_i\).
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tensor
|
A tensor representing the hessian of the loss with respect to the model parameters. |
Source code in src/pydvl/influence/torch/torch_differentiable.py
mvp(grad_xy, v, backprop_on, *, progress=False)
staticmethod
¶
Calculates the second-order derivative of the model along directions v.
This second-order derivative can be selected through the backprop_on
argument.
PARAMETER | DESCRIPTION |
---|---|
grad_xy |
An array [P] holding the gradients of the model parameters with respect to input
\(x\) and labels \(y\), where P is the number of parameters of the model.
It is typically obtained through
TYPE:
|
v |
An array ([DxP] or even one-dimensional [D]) which multiplies the matrix, where D is the number of directions.
TYPE:
|
progress |
If True, progress will be printed.
TYPE:
|
backprop_on |
Tensor used in the second backpropagation (the first one is defined via grad_xy).
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tensor
|
A matrix representing the implicit matrix-vector product of the model along the given directions.
The output shape is [DxM], with M being the number of elements of |
Source code in src/pydvl/influence/torch/torch_differentiable.py
LowRankProductRepresentation
dataclass
¶
TorchTensorUtilities
¶
Bases: TensorUtilities[Tensor, TorchTwiceDifferentiable]
einsum(equation, *operands)
staticmethod
¶
Sums the product of the elements of the input :attr:operands
along dimensions specified using a notation
based on the Einstein summation convention.
Source code in src/pydvl/influence/torch/torch_differentiable.py
cat(a, **kwargs)
staticmethod
¶
Concatenates a sequence of tensors into a single torch tensor
stack(a, **kwargs)
staticmethod
¶
Stacks a sequence of tensors into a single torch tensor
unsqueeze(x, dim)
staticmethod
¶
Add a singleton dimension at a specified position in a tensor.
PARAMETER | DESCRIPTION |
---|---|
x |
A PyTorch tensor.
TYPE:
|
dim |
The position at which to add the singleton dimension. Zero-based indexing.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tensor
|
A new tensor with an additional singleton dimension. |
Source code in src/pydvl/influence/torch/torch_differentiable.py
lanzcos_low_rank_hessian_approx(hessian_vp, matrix_shape, hessian_perturbation=0.0, rank_estimate=10, krylov_dimension=None, tol=1e-06, max_iter=None, device=None, eigen_computation_on_gpu=False, torch_dtype=None)
¶
Calculates a low-rank approximation of the Hessian matrix of a scalar-valued function using the implicitly restarted Lanczos algorithm, i.e.:
where \(D\) is a diagonal matrix with the top (in absolute value) rank_estimate
eigenvalues of the Hessian
and \(V\) contains the corresponding eigenvectors.
PARAMETER | DESCRIPTION |
---|---|
hessian_vp |
A function that takes a vector and returns the product of the Hessian of the loss function. |
matrix_shape |
The shape of the matrix, represented by the hessian vector product. |
hessian_perturbation |
Regularization parameter added to the Hessian-vector product for numerical stability.
TYPE:
|
rank_estimate |
The number of eigenvalues and corresponding eigenvectors to compute. Represents the desired rank of the Hessian approximation.
TYPE:
|
krylov_dimension |
The number of Krylov vectors to use for the Lanczos method. If not provided, it defaults to \( \min(\text{model.num_parameters}, \max(2 \times \text{rank_estimate} + 1, 20)) \). |
tol |
The stopping criteria for the Lanczos algorithm, which stops when
the difference in the approximated eigenvalue is less than
TYPE:
|
max_iter |
The maximum number of iterations for the Lanczos method. If not provided, it defaults to \( 10 \cdot \text{model.num_parameters}\). |
device |
The device to use for executing the hessian vector product. |
eigen_computation_on_gpu |
If True, tries to execute the eigen pair approximation on the provided device via cupy implementation. Ensure that either your model is small enough, or you use a small rank_estimate to fit your device's memory. If False, the eigen pair approximation is executed on the CPU with scipy's wrapper to ARPACK.
TYPE:
|
torch_dtype |
If not provided, the current torch default dtype is used for conversion to torch.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
LowRankProductRepresentation
|
A LowRankProductRepresentation instance that contains the top (up until rank_estimate) eigenvalues and corresponding eigenvectors of the Hessian. |
Source code in src/pydvl/influence/torch/torch_differentiable.py
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 |
|
model_hessian_low_rank(model, training_data, hessian_perturbation=0.0, rank_estimate=10, krylov_dimension=None, tol=1e-06, max_iter=None, eigen_computation_on_gpu=False)
¶
Calculates a low-rank approximation of the Hessian matrix of the model's loss function using the implicitly restarted Lanczos algorithm, i.e.
where \(D\) is a diagonal matrix with the top (in absolute value) rank_estimate
eigenvalues of the Hessian
and \(V\) contains the corresponding eigenvectors.
PARAMETER | DESCRIPTION |
---|---|
model |
A PyTorch model instance that is twice differentiable, wrapped into
TYPE:
|
training_data |
A DataLoader instance that provides the model's training data. Used in calculating the Hessian-vector products.
TYPE:
|
hessian_perturbation |
Optional regularization parameter added to the Hessian-vector product for numerical stability.
TYPE:
|
rank_estimate |
The number of eigenvalues and corresponding eigenvectors to compute. Represents the desired rank of the Hessian approximation.
TYPE:
|
krylov_dimension |
The number of Krylov vectors to use for the Lanczos method. If not provided, it defaults to min(model.num_parameters, max(2*rank_estimate + 1, 20)). |
tol |
The stopping criteria for the Lanczos algorithm, which stops when the difference
in the approximated eigenvalue is less than
TYPE:
|
max_iter |
The maximum number of iterations for the Lanczos method. If not provided, it defaults to 10*model.num_parameters. |
eigen_computation_on_gpu |
If True, tries to execute the eigen pair approximation on the provided device via cupy implementation. Make sure, that either your model is small enough or you use a small rank_estimate to fit your device's memory. If False, the eigen pair approximation is executed on the CPU by scipy wrapper to ARPACK.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
LowRankProductRepresentation
|
A LowRankProductRepresentation instance that contains the top (up until rank_estimate) eigenvalues and corresponding eigenvectors of the Hessian. |
Source code in src/pydvl/influence/torch/torch_differentiable.py
solve_linear(model, training_data, b, hessian_perturbation=0.0)
¶
Given a model and training data, it finds x such that \(Hx = b\), with \(H\) being the model hessian.
PARAMETER | DESCRIPTION |
---|---|
model |
A model wrapped in the TwiceDifferentiable interface.
TYPE:
|
training_data |
A DataLoader containing the training data.
TYPE:
|
b |
A vector or matrix, the right hand side of the equation \(Hx = b\).
TYPE:
|
hessian_perturbation |
Regularization of the hessian.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
InverseHvpResult
|
Instance of InverseHvpResult, having an array that solves the inverse problem, i.e. it returns \(x\) such that \(Hx = b\), and a dictionary containing information about the solution. |
Source code in src/pydvl/influence/torch/torch_differentiable.py
solve_batch_cg(model, training_data, b, hessian_perturbation=0.0, *, x0=None, rtol=1e-07, atol=1e-07, maxiter=None, progress=False)
¶
Given a model and training data, it uses conjugate gradient to calculate the inverse of the Hessian Vector Product. More precisely, it finds x such that \(Hx = b\), with \(H\) being the model hessian. For more info, see Wikipedia.
PARAMETER | DESCRIPTION |
---|---|
model |
A model wrapped in the TwiceDifferentiable interface.
TYPE:
|
training_data |
A DataLoader containing the training data.
TYPE:
|
b |
A vector or matrix, the right hand side of the equation \(Hx = b\).
TYPE:
|
hessian_perturbation |
Regularization of the hessian.
TYPE:
|
x0 |
Initial guess for hvp. If None, defaults to b. |
rtol |
Maximum relative tolerance of result.
TYPE:
|
atol |
Absolute tolerance of result.
TYPE:
|
maxiter |
Maximum number of iterations. If None, defaults to 10*len(b). |
progress |
If True, display progress bars.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
InverseHvpResult
|
Instance of InverseHvpResult, having a matrix of shape [NxP] with each line being a solution of \(Ax=b\), and a dictionary containing information about the convergence of CG, one entry for each line of the matrix. |
Source code in src/pydvl/influence/torch/torch_differentiable.py
solve_cg(hvp, b, *, x0=None, rtol=1e-07, atol=1e-07, maxiter=None)
¶
Conjugate gradient solver for the Hessian vector product.
PARAMETER | DESCRIPTION |
---|---|
hvp |
A callable Hvp, operating with tensors of size N. |
b |
A vector or matrix, the right hand side of the equation \(Hx = b\).
TYPE:
|
x0 |
Initial guess for hvp. |
rtol |
Maximum relative tolerance of result.
TYPE:
|
atol |
Absolute tolerance of result.
TYPE:
|
maxiter |
Maximum number of iterations. If None, defaults to 10*len(b). |
RETURNS | DESCRIPTION |
---|---|
InverseHvpResult
|
Instance of InverseHvpResult, with a vector x, solution of \(Ax=b\), and a dictionary containing information about the convergence of CG. |
Source code in src/pydvl/influence/torch/torch_differentiable.py
solve_lissa(model, training_data, b, hessian_perturbation=0.0, *, maxiter=1000, dampen=0.0, scale=10.0, h0=None, rtol=0.0001, progress=False)
¶
Uses LISSA, Linear time Stochastic Second-Order Algorithm, to iteratively approximate the inverse Hessian. More precisely, it finds x s.t. \(Hx = b\), with \(H\) being the model's second derivative wrt. the parameters. This is done with the update
where \(I\) is the identity matrix, \(d\) is a dampening term and \(s\) a scaling factor that are applied to help convergence. For details, see (Koh and Liang, 2017)1 and the original paper (Agarwal et. al.)2.
PARAMETER | DESCRIPTION |
---|---|
model |
A model wrapped in the TwiceDifferentiable interface.
TYPE:
|
training_data |
A DataLoader containing the training data.
TYPE:
|
b |
A vector or matrix, the right hand side of the equation \(Hx = b\).
TYPE:
|
hessian_perturbation |
Regularization of the hessian.
TYPE:
|
maxiter |
Maximum number of iterations.
TYPE:
|
dampen |
Dampening factor, defaults to 0 for no dampening.
TYPE:
|
scale |
Scaling factor, defaults to 10.
TYPE:
|
h0 |
Initial guess for hvp. |
rtol |
tolerance to use for early stopping
TYPE:
|
progress |
If True, display progress bars.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
InverseHvpResult
|
Instance of InverseHvpResult, with a matrix of shape [NxP] with each line being a solution of \(Ax=b\), and a dictionary containing information about the accuracy of the solution. |
Source code in src/pydvl/influence/torch/torch_differentiable.py
647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 |
|
solve_arnoldi(model, training_data, b, hessian_perturbation=0.0, *, rank_estimate=10, krylov_dimension=None, low_rank_representation=None, tol=1e-06, max_iter=None, eigen_computation_on_gpu=False)
¶
Solves the linear system Hx = b, where H is the Hessian of the model's loss function and b is the given right-hand side vector. It employs the implicitly restarted Arnoldi method for computing a partial eigen decomposition, which is used fo the inversion i.e.
where \(D\) is a diagonal matrix with the top (in absolute value) rank_estimate
eigenvalues of the Hessian
and \(V\) contains the corresponding eigenvectors.
PARAMETER | DESCRIPTION |
---|---|
model |
A PyTorch model instance that is twice differentiable, wrapped into TorchTwiceDifferential. The Hessian will be calculated with respect to this model's parameters.
TYPE:
|
training_data |
A DataLoader instance that provides the model's training data. Used in calculating the Hessian-vector products.
TYPE:
|
b |
The right-hand side vector in the system Hx = b.
TYPE:
|
hessian_perturbation |
Optional regularization parameter added to the Hessian-vector product for numerical stability.
TYPE:
|
rank_estimate |
The number of eigenvalues and corresponding eigenvectors to compute. Represents the desired rank of the Hessian approximation.
TYPE:
|
krylov_dimension |
The number of Krylov vectors to use for the Lanczos method. Defaults to min(model's number of parameters, max(2 times rank_estimate + 1, 20)). |
low_rank_representation |
An instance of LowRankProductRepresentation containing a previously computed low-rank representation of the Hessian. If provided, all other parameters are ignored; otherwise, a new low-rank representation is computed using provided parameters.
TYPE:
|
tol |
The stopping criteria for the Lanczos algorithm.
Ignored if
TYPE:
|
max_iter |
The maximum number of iterations for the Lanczos method.
Ignored if |
eigen_computation_on_gpu |
If True, tries to execute the eigen pair approximation on the model's device via a cupy implementation. Ensure the model size or rank_estimate is appropriate for device memory. If False, the eigen pair approximation is executed on the CPU by the scipy wrapper to ARPACK.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
InverseHvpResult
|
Instance of InverseHvpResult, having the solution vector x that satisfies the system \(Ax = b\), where \(A\) is a low-rank approximation of the Hessian \(H\) of the model's loss function, and an instance of LowRankProductRepresentation, which represents the approximation of H. |
Source code in src/pydvl/influence/torch/torch_differentiable.py
739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 |
|
Created: 2023-09-02