The Influence Function
The influence function¶
Warning
The code in the package pydvl.influence is experimental. Package structure and basic API are bound to change before v1.0.0
The influence function (IF) is a method to quantify the effect (influence) that each training point has on the parameters of a model, and by extension on any function thereof. In particular, it allows to estimate how much each training sample affects the error on a test point, making the IF useful for understanding and debugging models.
Alas, the influence function relies on some assumptions that can make their application difficult. Yet another drawback is that they require the computation of the inverse of the Hessian of the model wrt. its parameters, which is intractable for large models like deep neural networks. Much of the recent research tackles this issue using approximations, like a Neuman series (Agarwal et al., 2017)1, with the most successful solution using a low-rank approximation that iteratively finds increasing eigenspaces of the Hessian (Schioppa et al., 2022)2.
pyDVL implements several methods for the efficient computation of the IF for machine learning. In the examples we document some of the difficulties that can arise when using the IF.
Construction¶
First introduced in the context of robust statistics in (Hampel, 1974)3, the IF was popularized in the context of machine learning in (Koh and Liang, 2017)4.
Following their formulation, consider an input space \(\mathcal{X}\) (e.g. images) and an output space \(\mathcal{Y}\) (e.g. labels). Let's take \(z_i = (x_i, y_i)\), for \(i \in \{1,...,n\}\) to be the \(i\)-th training point, and \(\theta\) to be the (potentially highly) multi-dimensional parameters of a model (e.g. \(\theta\) is a big array with all of a neural network's parameters, including biases and/or dropout rates). We will denote with \(L(z, \theta)\) the loss of the model for point \(z\) when the parameters are \(\theta.\)
To train a model, we typically minimize the loss over all \(z_i\), i.e. the optimal parameters are
In practice, lack of convexity means that one doesn't really obtain the minimizer of the loss, and the training is stopped when the validation loss stops decreasing.
For notational convenience, let's define
i.e. \(\hat{\theta}_{-z}\) are the model parameters that minimize the total loss when \(z\) is not in the training dataset.
In order to compute the impact of each training point on the model, we would need to calculate \(\hat{\theta}_{-z}\) for each \(z\) in the training dataset, thus re-training the model at least ~\(n\) times (more if model training is stochastic). This is computationally very expensive, especially for big neural networks. To circumvent this problem, we can just calculate a first order approximation of \(\hat{\theta}\). This can be done through single backpropagation and without re-training the full model.
pyDVL supports two ways of computing the empirical influence function, namely up-weighting of samples and perturbation influences.
Approximating the influence of a point¶
Let's define
which is the optimal \(\hat{\theta}\) when we up-weight \(z\) by an amount \(\epsilon \gt 0\).
From a classical result (a simple derivation is available in Appendix A of (Koh and Liang, 2017)4), we know that:
where \(H_{\hat{\theta}} = \frac{1}{n} \sum_{i=1}^n \nabla_\theta^2 L(z_i, \hat{\theta})\) is the Hessian of \(L\). These quantities are also knows as influence factors.
Importantly, notice that this expression is only valid when \(\hat{\theta}\) is a minimum of \(L\), or otherwise \(H_{\hat{\theta}}\) cannot be inverted! At the same time, in machine learning full convergence is rarely achieved, so direct Hessian inversion is not possible. Approximations need to be developed that circumvent the problem of inverting the Hessian of the model in all those (frequent) cases where it is not positive definite.
The influence of training point \(z\) on test point \(z_{\text{test}}\) is defined as:
Notice that \(\mathcal{I}\) is higher for points \(z\) which positively impact the model score, since the loss is higher when they are excluded from training. In practice, one needs to rely on the following infinitesimal approximation:
Using the chain rule and the results calculated above, we get:
All the resulting factors are gradients of the loss wrt. the model parameters \(\hat{\theta}\). This can be easily computed through one or more backpropagation passes.
Perturbation definition of the influence score¶
How would the loss of the model change if, instead of up-weighting an individual point \(z\), we were to up-weight only a single feature of that point? Given \(z = (x, y)\), we can define \(z_{\delta} = (x+\delta, y)\), where \(\delta\) is a vector of zeros except for a 1 in the position of the feature we want to up-weight. In order to approximate the effect of modifying a single feature of a single point on the model score we can define
Similarly to what was done above, we up-weight point \(z_{\delta}\), but then we also remove the up-weighting for all the features that are not modified by \(\delta\). From the calculations in the previous section, it is then easy to see that
and if the feature space is continuous and as \(\delta \to 0\) we can write
The influence of each feature of \(z\) on the loss of the model can therefore be estimated through the following quantity:
which, using the chain rule and the results calculated above, is equal to
The perturbation definition of the influence score is not straightforward to understand, but it has a simple interpretation: it tells how much the loss of the model changes when a certain feature of point z is up-weighted. A positive perturbation influence score indicates that the feature might have a positive effect on the accuracy of the model.
It is worth noting that the perturbation influence score is a very rough estimate of the impact of a point on the models loss and it is subject to large approximation errors. It can nonetheless be used to build training-set attacks, as done in (Koh and Liang, 2017)4.
Computation¶
The main abstraction of the library for influence calculation is
InfluenceFunctionModel.
On implementations of this abstraction, you can call the method influences
to compute influences.
pyDVL provides implementations to use with pytorch model in pydvl.influence.torch. For detailed information on available implementations see the documentation in InfluenceFunctionModel.
Given a pre-trained pytorch model and a loss, a basic example would look like
from torch.utils.data import DataLoader
from pydvl.influence.torch import DirectInfluence
training_data_loader = DataLoader(...)
infl_model = DirectInfluence(model, loss)
infl_model = infl_model.fit(training_data_loader)
influences = infl_model.influences(x_test, y_test, x, y)
Warning
Compared to the mathematical definitions above, we switch the ordering of \(z\) and \(z_{\text{test}}\), in order to make the input ordering consistent with the dimensions of the resulting tensor. More concrete if the first dimension of \(z_{\text{test}}\) is \(N\) and that of \(z\), the resulting tensor is of shape \(N \times M\)
A large positive influence indicates that training point \(j\) tends to improve the performance of the model on test point \(i\), and vice versa, a large negative influence indicates that training point \(j\) tends to worsen the performance of the model on test point \(i\).
Hessian regularization¶
Additionally, and as discussed in the introduction, in machine learning training rarely converges to a global minimum of the loss. Despite good apparent convergence, \(\hat{\theta}\) might be located in a region with flat curvature or close to a saddle point. In particular, the Hessian might have vanishing eigenvalues making its direct inversion impossible. Certain methods, such as the Arnoldi method are robust against these problems, but most are not.
To circumvent this problem, many approximate methods can be implemented. The simplest adds a small hessian perturbation term, i.e. \(H_{\hat{\theta}} + \lambda \mathbb{I}\), with \(\mathbb{I}\) being the identity matrix.
from torch.utils.data import DataLoader
from pydvl.influence.torch import DirectInfluence
training_data_loader = DataLoader(...)
infl_model = DirectInfluence(model, loss, regularization=0.01)
infl_model = infl_model.fit(training_data_loader)
This standard trick ensures that the eigenvalues of \(H_{\hat{\theta}}\) are bounded away from zero and therefore the matrix is invertible. In order for this regularization not to corrupt the outcome too much, the parameter \(\lambda\) should be as small as possible while still allowing a reliable inversion of \(H_{\hat{\theta}} + \lambda \mathbb{I}\).
Block-diagonal approximation¶
This implementation is capable of using a block-diagonal approximation.
The full matrix is approximated by a block-diagonal version, which
reduces both the time and memory consumption.
The blocking structure can be specified via the block_structure
parameter.
The block_structure
parameter can either be a
BlockMode enum (which provides
layer-wise or parameter-wise blocking) or a custom block structure defined
by an ordered dictionary with the keys being the block identifiers (arbitrary
strings) and the values being lists of parameter names contained in the block.
from torch.utils.data import DataLoader
from pydvl.influence.torch import DirectInfluence, BlockMode, SecondOrderMode
training_data_loader = DataLoader(...)
# layer-wise block-diagonal approximation
infl_model = DirectInfluence(model, loss,
regularization=0.1,
block_structure=BlockMode.LAYER_WISE)
block_structure = OrderedDict((
("custom_block1", ["0.weight", "1.bias"]),
("custom_block2", ["1.weight", "0.bias"]),
))
# custom block-diagonal structure
infl_model = DirectInfluence(model, loss,
regularization=0.1,
block_structure=block_structure)
infl_model = infl_model.fit(training_data_loader)
regularization = {
"custom_block1": 0.1,
"custom_block2": 0.2,
}
infl_model = DirectInfluence(model, loss,
regularization=regularization,
block_structure=block_structure)
infl_model = infl_model.fit(training_data_loader)
BlockMode.LAYER_WISE
or BlockMode.PARAMETER_WISE
for
block_structure
) the keys must be the layer names or parameter names,
respectively.
You can retrieve the block-wise influence information from the methods
with suffix _by_block
. By default, block_structure
is set to
BlockMode.FULL
and in this case these methods will return a dictionary
with the empty string being the only key.
Gauss-Newton approximation¶
In the computation of the influence values, the inversion of the Hessian can be replaced by the inversion of the Gauss-Newton matrix
so the computed values are of the form
The parameter second_orer_mode
is used to configure this approximation.
from torch.utils.data import DataLoader
from pydvl.influence.torch import DirectInfluence, BlockMode, SecondOrderMode
training_data_loader = DataLoader(...)
infl_model = DirectInfluence(model, loss,
regularization={"layer_1": 0.1, "layer_2": 0.2},
block_structure=BlockMode.LAYER_WISE,
second_order_mode=SecondOrderMode.GAUSS_NEWTON)
infl_model = infl_model.fit(training_data_loader)
Perturbation influences¶
The method of empirical influence computation can be selected with the
parameter mode
:
from pydvl.influence import InfluenceMode
influences = infl_model.influences(x_test, y_test, x, y,
mode=InfluenceMode.Perturbation)
mode=InfluenceMode.Up
case, i.e. one row per test
point and one column per training point. The remaining dimensions are the same
as the number of input features in the data. Therefore, each entry in the tensor
represents the influence of each feature of each training point on each test
point.
Influence factors¶
The influence factors(refer to
the previous section for a definition)
are typically the most computationally demanding part of influence calculation.
They can be obtained via calling the influence_factors
method, saved, and later used
for influence calculation on different subsets of the training dataset.
influence_factors = infl_model.influence_factors(x_test, y_test)
influences = infl_model.influences_from_factors(influence_factors, x, y)
-
Agarwal, N., Bullins, B., Hazan, E., 2017. Second-Order Stochastic Optimization for Machine Learning in Linear Time. JMLR 18, 1–40. ↩
-
Schioppa, A., Zablotskaia, P., Vilar, D., Sokolov, A., 2022. Scaling Up Influence Functions. Proc. AAAI Conf. Artif. Intell. 36, 8179–8186. https://doi.org/10.1609/aaai.v36i8.20791 ↩
-
Hampel, F.R., 1974. The Influence Curve and Its Role in Robust Estimation. J. Am. Stat. Assoc. 69, 383–393. https://doi.org/10.2307/2285666 ↩
-
Koh, P.W., Liang, P., 2017. Understanding Black-box Predictions via Influence Functions, in: Proceedings of the 34th International Conference on Machine Learning. Presented at the International Conference on Machine Learning, PMLR, pp. 1885–1894. ↩↩↩