General
This module contains influence calculation functions for general models, as introduced in (Koh and Liang, 2017)1.
References¶
-
Koh, P.W., Liang, P., 2017. Understanding Black-box Predictions via Influence Functions. In: Proceedings of the 34th International Conference on Machine Learning, pp. 1885–1894. PMLR. ↩
InfluenceType
¶
compute_influence_factors(model, training_data, test_data, inversion_method, *, hessian_perturbation=0.0, progress=False, **kwargs)
¶
Calculates influence factors of a model for training and test data.
Given a test point \(z_{test} = (x_{test}, y_{test})\), a loss \(L(z_{test}, \theta)\) (\(\theta\) being the parameters of the model) and the Hessian of the model \(H_{\theta}\), influence factors are defined as:
They are used for efficient influence calculation. This method first (implicitly) calculates
the Hessian and then (explicitly) finds the influence factors for the model using the given
inversion method. The parameter hessian_perturbation
is used to regularize the inversion of
the Hessian. For more info, refer to (Koh and Liang, 2017)1, paragraph 3.
PARAMETER | DESCRIPTION |
---|---|
model |
A model wrapped in the TwiceDifferentiable interface.
TYPE:
|
training_data |
DataLoader containing the training data.
TYPE:
|
test_data |
DataLoader containing the test data.
TYPE:
|
inversion_method |
Name of method for computing inverse hessian vector products.
TYPE:
|
hessian_perturbation |
Regularization of the hessian.
TYPE:
|
progress |
If True, display progress bars.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
array
|
An array of size (N, D) containing the influence factors for each dimension (D) and test sample (N).
TYPE:
|
Source code in src/pydvl/influence/general.py
compute_influences_up(model, input_data, influence_factors, *, progress=False)
¶
Given the model, the training points, and the influence factors, this function calculates the influences using the up-weighting method.
The procedure involves two main steps: 1. Calculating the gradients of the model with respect to each training sample (\(\operatorname{grad}_{\theta} L\), where \(L\) is the loss of a single point and \(\theta\) are the parameters of the model). 2. Multiplying each gradient with the influence factors.
For a detailed description of the methodology, see section 2.1 of (Koh and Liang, 2017)1.
PARAMETER | DESCRIPTION |
---|---|
model |
A model that implements the TwiceDifferentiable interface.
TYPE:
|
input_data |
DataLoader containing the samples for which the influence will be calculated.
TYPE:
|
influence_factors |
Array containing pre-computed influence factors.
TYPE:
|
progress |
If set to True, progress bars will be displayed during computation.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
TensorType
|
An array of shape [NxM], where N is the number of influence factors, and M is the number of input samples. |
Source code in src/pydvl/influence/general.py
compute_influences_pert(model, input_data, influence_factors, *, progress=False)
¶
Calculates the influence values based on the influence factors and training samples using the perturbation method.
The process involves two main steps: 1. Calculating the gradient of the model with respect to each training sample (\(\operatorname{grad}_{\theta} L\), where \(L\) is the loss of the model for a single data point and \(\theta\) are the parameters of the model). 2. Using the method TwiceDifferentiable.mvp to efficiently compute the product of the influence factors and \(\operatorname{grad}_x \operatorname{grad}_{\theta} L\).
For a detailed methodology, see section 2.2 of (Koh and Liang, 2017)1.
PARAMETER | DESCRIPTION |
---|---|
model |
A model that implements the TwiceDifferentiable interface.
TYPE:
|
input_data |
DataLoader containing the samples for which the influence will be calculated.
TYPE:
|
influence_factors |
Array containing pre-computed influence factors.
TYPE:
|
progress |
If set to True, progress bars will be displayed during computation.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
TensorType
|
A 3D array with shape [NxMxP], where N is the number of influence factors, M is the number of input samples, and P is the number of features. |
Source code in src/pydvl/influence/general.py
compute_influences(differentiable_model, training_data, *, test_data=None, input_data=None, inversion_method=InversionMethod.Direct, influence_type=InfluenceType.Up, hessian_regularization=0.0, progress=False, **kwargs)
¶
Calculates the influence of each input data point on the specified test points.
This method operates in two primary stages: 1. Computes the influence factors for all test points concerning the model and its training data. 2. Uses these factors to derive the influences over the complete set of input data.
The influence calculation relies on the twice-differentiable nature of the provided model.
PARAMETER | DESCRIPTION |
---|---|
differentiable_model |
A model bundled with its corresponding loss in the
TYPE:
|
training_data |
DataLoader instance supplying the training data. This data is pivotal in computing the Hessian matrix for the model's loss.
TYPE:
|
test_data |
DataLoader instance with the test samples. Defaults to
TYPE:
|
input_data |
DataLoader instance holding samples whose influences need to be computed. Defaults to
TYPE:
|
inversion_method |
An enumeration value determining the approach for inverting matrices or computing inverse operations, see [.inversion.InversionMethod]
TYPE:
|
progress |
A boolean indicating whether progress bars should be displayed during computation.
TYPE:
|
influence_type |
Determines the methodology for computing influences. Valid choices include 'up' (for up-weighting) and 'perturbation'. For an in-depth understanding, see (Koh and Liang, 2017)1.
TYPE:
|
hessian_regularization |
A lambda value used in Hessian regularization. The regularized Hessian, \( H_{reg} \), is computed as \( H + \lambda \times I \), where \( I \) is the identity matrix and \( H \) is the simple, unmodified Hessian. This regularization is typically utilized for more sophisticated models to ensure that the Hessian remains positive definite.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
TensorType
|
The shape of this array varies based on the |
Source code in src/pydvl/influence/general.py
Created: 2023-09-02