pydvl.value.oob.oob
¶
References¶
-
Kwon et al. Data-OOB: Out-of-bag Estimate as a Simple and Efficient Data Value. In: Published at ICML 2023 ↩
compute_data_oob
¶
compute_data_oob(
u: Utility,
*,
n_est: int = 10,
max_samples: float = 0.8,
loss: Optional[LossFunction] = None,
n_jobs: Optional[int] = None,
seed: Optional[Seed] = None,
progress: bool = False
) -> ValuationResult
Computes Data out of bag values
This implements the method described in (Kwon and Zou, 2023)1. It fits several base estimators provided through u.model through a bagging process. The point value corresponds to the average loss of estimators which were not fit on it.
\(w_{bj}\in Z\) is the number of times the j-th datum \((x_j, y_j)\) is selected in the b-th bootstrap dataset.
With:
T is a score function that represents the goodness of a weak learner \(\hat{f}_b\) at the i-th datum \((x_i, y_i)\).
n_est
and max_samples
must be tuned jointly to ensure that all samples
are at least 1 time out-of-bag, otherwise the result could include a NaN
value for that datum.
PARAMETER | DESCRIPTION |
---|---|
u |
Utility object with model, data, and scoring function.
TYPE:
|
n_est |
Number of estimator used in the bagging procedure.
TYPE:
|
max_samples |
The fraction of samples to draw to train each base estimator.
TYPE:
|
loss |
A function taking as parameters model prediction and corresponding data labels(y_true, y_pred) and returning an array of point-wise errors.
TYPE:
|
n_jobs |
The number of jobs to run in parallel used in the bagging procedure for both fit and predict. |
seed |
Either an instance of a numpy random number generator or a seed for it.
TYPE:
|
progress |
If True, display a progress bar.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ValuationResult
|
Object with the data values. |
Source code in src/pydvl/value/oob/oob.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
|
point_wise_accuracy
¶
Point-wise 0-1 loss between two arrays
PARAMETER | DESCRIPTION |
---|---|
y_true |
Array of true values (e.g. labels)
TYPE:
|
y_pred |
Array of estimated values (e.g. model predictions)
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
NDArray[T]
|
Array with point-wise 0-1 losses between labels and model predictions |
Source code in src/pydvl/value/oob/oob.py
neg_l2_distance
¶
Point-wise negative \(l_2\) distance between two arrays
PARAMETER | DESCRIPTION |
---|---|
y_true |
Array of true values (e.g. labels)
TYPE:
|
y_pred |
Array of estimated values (e.g. model predictions)
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
NDArray[T]
|
Array with point-wise negative \(l_2\) distances between labels and model |
NDArray[T]
|
predictions |