pydvl.valuation.utility.deepset
¶
This module provides an implementation of DeepSet, from Zaheer et al. (2017)...
DeepSet uses a simple permutation-invariant architecture to learn embeddings for sets of points, see...
References¶
...
DeepSet
¶
DeepSet(
input_dim: int,
phi_hidden_dim: int,
phi_output_dim: int,
rho_hidden_dim: int,
use_embedding: bool = False,
num_embeddings: int | None = None,
)
Bases: Module
Simple implementation of DeepSets to learn utility functions.
Given a set \(S= \{x_1, x_2, ..., x_n\},\) deepset learns a representation of the set which is invariant to the order of elements in the set. The model consists of two networks:
where \(\phi(x_i)\) is a learned embedding for data point \(x_i,\) and a second network \(\rho\) that predicts the output \(y\) from the aggregated representation:
PARAMETER | DESCRIPTION |
---|---|
input_dim
|
Dimensions of each instance in the set, or dimension of the embedding if using one.
TYPE:
|
phi_hidden_dim
|
Number of hidden units in the phi network.
TYPE:
|
phi_output_dim
|
Output dimension of the phi network.
TYPE:
|
rho_hidden_dim
|
Number of hidden units in the rho network.
TYPE:
|
use_embedding
|
If
TYPE:
|
num_embeddings
|
Number of unique x_i values (only needed if
TYPE:
|
Source code in src/pydvl/valuation/utility/deepset.py
forward
¶
PARAMETER | DESCRIPTION |
---|---|
x
|
If using embedding, x should be of shape (batch_size, set_size) with integer ids. Otherwise, x is of shape (batch_size, set_size, input_dim) with feature vectors.
TYPE:
|
Returns: Output tensor of shape (batch_size, 1), the predicted y for each set.
Source code in src/pydvl/valuation/utility/deepset.py
SetDatasetRaw
¶
SetDatasetRaw(
samples: dict[Sample, float],
training_data: Dataset,
dtype: dtype = float32,
device: device = "cpu",
)
Bases: Dataset
training_data: the [Dataset][pydvl.valuation.dataset.Dataset] from which the
samples are drawn.
Source code in src/pydvl/valuation/utility/deepset.py
__getitem__
¶
__getitem__(idx: int)
Builds the tensor for the set with index idx
Source code in src/pydvl/valuation/utility/deepset.py
DeepSetUtilityModel
¶
DeepSetUtilityModel(
data: Dataset,
phi_hidden_dim: int,
phi_output_dim: int,
rho_hidden_dim: int,
lr: float = 0.001,
lr_step_size: int = 10,
lr_gamma: float = 0.1,
batch_size: int = 64,
num_epochs: int = 20,
device: str = "cpu",
dtype: dtype = float32,
progress: dict[str, Any] | bool = False,
)
Bases: UtilityModel
A utility model that uses a simple DeepSet architecture to learn utility functions.
PARAMETER | DESCRIPTION |
---|---|
data
|
The pydvl dataset from which the samples are drawn.
TYPE:
|
phi_hidden_dim
|
Number of hidden units in the phi network.
TYPE:
|
phi_output_dim
|
Output dimension of the phi network.
TYPE:
|
rho_hidden_dim
|
Number of hidden units in the rho network.
TYPE:
|
lr
|
Learning rate for the optimizer.
TYPE:
|
lr_step_size
|
Step size for the learning rate scheduler.
TYPE:
|
lr_gamma
|
Multiplicative factor for the learning rate scheduler.
TYPE:
|
batch_size
|
Batch size for training.
TYPE:
|
num_epochs
|
Number of epochs for training.
TYPE:
|
device
|
Device to use for training.
TYPE:
|
dtype
|
Data type to use for training.
TYPE:
|
progress
|
Whether to display a progress bar during training. If a dictionary is
provided, it is passed to |
Source code in src/pydvl/valuation/utility/deepset.py
fit
¶
PARAMETER | DESCRIPTION |
---|---|
samples
|
A collection of utility samples |
Returns:
Source code in src/pydvl/valuation/utility/deepset.py
predict
¶
predict(samples: Collection[Sample]) -> NDArray
PARAMETER | DESCRIPTION |
---|---|
samples
|
A collection of samples to predict their utility values.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
NDArray
|
An array of values of dimension (len(samples), 1) with the predicted utility |