pydvl.valuation.games
¶
This module provides several predefined games used in the literature 1 and, depending on the game, precomputed Shapley, Least-Core, and / or Banzhaf values, for benchmarking purposes.
The games are:
References¶
-
Castro, J., Gómez, D. and Tejada, J., 2009. Polynomial calculation of the Shapley value based on sampling. Computers & Operations Research, 36(5), pp.1726-1730. ↩
AirportGame
¶
AirportGame(n_players: int = 100)
Bases: Game
Toy game that is used for testing and demonstration purposes.
An airport game defined in (Castro et al., 2009)1 Section 4.3
PARAMETER | DESCRIPTION |
---|---|
n_players
|
Number of players that participate in the game.
TYPE:
|
Source code in src/pydvl/valuation/games.py
AsymmetricVotingGame
¶
AsymmetricVotingGame(n_players: int = 51)
Bases: Game
Toy game that is used for testing and demonstration purposes.
An asymmetric voting game defined in (Castro et al., 2009)1 Section 4.2.
For this game the player set is \(N = \{1,\dots,51\}\) and the utility of a coalition is given by:
where \(w = [w_1,\dots, w_{51}]\) is a list of weights associated with each player.
PARAMETER | DESCRIPTION |
---|---|
n_players
|
Number of players that participate in the game.
TYPE:
|
Source code in src/pydvl/valuation/games.py
268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 |
|
DummyGameDataset
¶
Bases: Dataset
Dummy game dataset.
Initializes a dummy game dataset with n_players
and an optional description.
This class is used internally inside the Game class.
PARAMETER | DESCRIPTION |
---|---|
n_players
|
Number of players that participate in the game.
TYPE:
|
description
|
Optional description of the dataset.
TYPE:
|
Source code in src/pydvl/valuation/games.py
indices
property
¶
Index of positions in data.x_train.
Contiguous integers from 0 to len(Dataset).
names
property
¶
Names of each individual datapoint.
Used for reporting Shapley values.
__getstate__
¶
Prepare the object state for pickling replacing memmapped arrays with their file paths
Source code in src/pydvl/valuation/dataset.py
__setstate__
¶
Restore the object state from pickling.
Source code in src/pydvl/valuation/dataset.py
data
¶
Given a set of indices, returns the training data that refer to those indices, as a read-only tuple-like structure.
This is used mainly by subclasses of UtilityBase to retrieve subsets of the data from indices.
PARAMETER | DESCRIPTION |
---|---|
indices
|
Optional indices that will be used to select points from
the training data. If
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
RawData
|
If |
Source code in src/pydvl/valuation/dataset.py
data_indices
¶
Returns a subset of indices.
This is equivalent to using Dataset.indices[logical_indices]
but allows
subclasses to define special behaviour, e.g. when indices in Dataset
do not
match the indices in the data.
For Dataset
, this is a simple pass-through.
PARAMETER | DESCRIPTION |
---|---|
indices
|
A set of indices held by this object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
NDArray[int_]
|
The indices of the data points in the data array. |
Source code in src/pydvl/valuation/dataset.py
feature
¶
Returns a slice for the feature with the given name.
Source code in src/pydvl/valuation/dataset.py
from_arrays
classmethod
¶
from_arrays(
X: ArrayT,
y: ArrayT,
train_size: float = 0.8,
random_state: int | None = None,
stratify_by_target: bool = False,
**kwargs: Any,
) -> tuple[Dataset, Dataset]
Constructs a Dataset object from X and
y arrays as returned by the make_*
functions in sklearn generated datasets.
Example
PARAMETER | DESCRIPTION |
---|---|
X
|
array of shape (n_samples, n_features) - either numpy array or PyTorch
TYPE:
|
y
|
array of shape (n_samples,) - must be same type as X
TYPE:
|
train_size
|
size of the training dataset. Used in
TYPE:
|
random_state
|
seed for train / test split
TYPE:
|
stratify_by_target
|
If stratification).¶
TYPE:
|
kwargs
|
Additional keyword arguments to pass to the
Dataset constructor. Use this to pass
e.g.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
tuple[Dataset, Dataset]
|
Object with the passed X and y arrays split across training and test sets. |
New in version 0.4.0
Changed in version 0.6.0
Added kwargs to pass to the Dataset constructor.
Changed in version 0.10.0
Returns a tuple of two Dataset objects.
New in version 0.11.0
Added support for PyTorch tensors.
Source code in src/pydvl/valuation/dataset.py
from_sklearn
classmethod
¶
from_sklearn(
data: Bunch,
train_size: int | float = 0.8,
random_state: int | None = None,
stratify_by_target: bool = False,
**kwargs,
) -> tuple[Dataset, Dataset]
Constructs two Dataset objects from a
sklearn.utils.Bunch, as returned by the load_*
functions in scikit-learn toy datasets.
Example
PARAMETER | DESCRIPTION |
---|---|
data
|
scikit-learn Bunch object. The following attributes are supported:
TYPE:
|
train_size
|
size of the training dataset. Used in |
the value is automatically set to the complement of the test size.
random_state: seed for train / test split
stratify_by_target: If True
, data is split in a stratified
fashion, using the target variable as labels. Read more in
scikit-learn's user guide.
kwargs: Additional keyword arguments to pass to the
Dataset constructor. Use this to
pass e.g. is_multi_output
.
RETURNS | DESCRIPTION |
---|---|
tuple[Dataset, Dataset]
|
Object with the sklearn dataset |
Changed in version 0.6.0
Added kwargs to pass to the Dataset constructor.
Changed in version 0.10.0
Returns a tuple of two Dataset objects.
Source code in src/pydvl/valuation/dataset.py
538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 |
|
logical_indices
¶
Returns the indices in this Dataset
for the given indices in the data
array.
This is equivalent to using Dataset.indices[data_indices]
but allows
subclasses to define special behaviour, e.g. when indices in Dataset
do not
match the indices in the data.
PARAMETER | DESCRIPTION |
---|---|
indices
|
A set of indices in the data array.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
NDArray[int_]
|
The abstract indices for the given data indices. |
Source code in src/pydvl/valuation/dataset.py
target
¶
Returns a slice or index for the target with the given name.
If targets are multidimensional (2D array), returns a tuple (slice(None), target_idx). If targets are 1D, returns just a slice(None).
PARAMETER | DESCRIPTION |
---|---|
name
|
The name of the target to retrieve.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
tuple[slice, int] | slice
|
For multi-output targets: tuple of (slice(None), target_idx) |
tuple[slice, int] | slice
|
For single-output targets: target_idx (usually 0) |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If the target name is not found. |
Source code in src/pydvl/valuation/dataset.py
DummyGameUtility
¶
Bases: UtilityBase
Dummy game utility
This class is used internally inside the Game class.
PARAMETER | DESCRIPTION |
---|---|
score
|
Function to compute the score of a coalition. |
score_range
|
Minimum and maximum values of the score function. |
Source code in src/pydvl/valuation/games.py
training_data
property
¶
training_data: Dataset | None
Retrieves the training data used by this utility.
This property is read-only. In order to set it, use with_dataset().
__str__
¶
Returns a string representation of the utility. Subclasses should override this method to provide a more informative string
DummyModel
¶
Bases: SupervisedModel[NDArray, NDArray]
Dummy model class.
A dummy supervised model used for testing purposes only.
Game
¶
Bases: ABC
Base class for games
Any Game subclass has to implement the abstract _score
method
to assign a score to each coalition/subset and at least
one of shapley_values
, least_core_values
, or banzhaf_values
.
PARAMETER | DESCRIPTION |
---|---|
n_players
|
Number of players that participate in the game.
TYPE:
|
score_range
|
Minimum and maximum values of the |
ATTRIBUTE | DESCRIPTION |
---|---|
n_players |
Number of players that participate in the game.
|
data |
Dummy dataset object.
|
u |
Utility object with a dummy model and dataset.
|
Source code in src/pydvl/valuation/games.py
MinerGame
¶
MinerGame(n_players: int)
Bases: Game
Toy game that is used for testing and demonstration purposes.
Consider a group of n miners, who have discovered large bars of gold.
If two miners can carry one piece of gold, then the payoff of a coalition \(S\) is:
If there are more than two miners and there is an even number of miners, then the core consists of the single payoff where each miner gets 1/2.
If there is an odd number of miners, then the core is empty.
Taken from Wikipedia
PARAMETER | DESCRIPTION |
---|---|
n_players
|
Number of miners that participate in the game.
TYPE:
|
data_description
|
|
Source code in src/pydvl/valuation/games.py
MinimumSpanningTreeGame
¶
MinimumSpanningTreeGame(n_players: int = 100)
Bases: Game
Toy game that is used for testing and demonstration purposes.
A minimum spanning tree game defined in (Castro et al., 2009)1.
Let \(G = (N \cup \{0\},E)\) be a valued graph where \(N = \{1,\dots,100\}\), and the cost associated to an edge \((i, j)\) is:
A minimum spanning tree game \((N, c)\) is a cost game, where for a given coalition \(S \subset N\), \(v(S)\) is the sum of the edge cost of the minimum spanning tree, i.e. \(v(S)\) = Minimum Spanning Tree of the graph \(G|_{S\cup\{0\}}\), which is the partial graph restricted to the players \(S\) and the source node \(0\).
PARAMETER | DESCRIPTION |
---|---|
n_players
|
Number of players that participate in the game.
TYPE:
|
Source code in src/pydvl/valuation/games.py
ShoesGame
¶
Bases: Game
Toy game that is used for testing and demonstration purposes.
A shoes game defined in (Castro et al., 2009)1.
In this game, some players have a left shoe and others a right shoe.
The payoff (utility) of a coalition \(S\) is:
Where \(L\), respectively \(R\), is the set of players with left shoes, respectively right shoes. This means that the marginal contribution of a player with a left shoe to a coalition \(S\) is 1 if the number of players with a left shoe in \(S\) is strictly less than the number of players with a right shoe in \(S\), and 0 otherwise. Let player \(i\) have a left shoe, then:
The situation is analogous for players with a right shoe. In order to compute the Shapley or Banzhaf value for a player \(i\) with a left shoe, we need then the number of subsets \(S\) of \(D_{-i}\) such that \(\mid S \cap L \mid < \mid S \cap R \mid\). This number is given by the sum:
PARAMETER | DESCRIPTION |
---|---|
left
|
Number of players with a left shoe.
TYPE:
|
right
|
Number of players with a right shoe.
TYPE:
|
Source code in src/pydvl/valuation/games.py
banzhaf_values
cached
¶
banzhaf_values() -> ValuationResult
We use the fact that the marginal utility of a coalition S is 1 if |S ∩ L| < |S ∩ R| and 0 otherwise, and simply count those sets.
The solution for left or right shoes is symmetrical.
Source code in src/pydvl/valuation/games.py
shapley_values
cached
¶
shapley_values() -> ValuationResult
We use the fact that the marginal utility of a coalition S of size k is 1 if |S ∩ L| < |S ∩ R| and 0 otherwise, and compute Shapley values with the formula that iterates over subset sizes.
The solution for left or right shoes is symmetrical
Source code in src/pydvl/valuation/games.py
SymmetricVotingGame
¶
SymmetricVotingGame(n_players: int)
Bases: Game
Toy game that is used for testing and demonstration purposes.
A symmetric voting game defined in (Castro et al., 2009)1 Section 4.1
For this game the utility of a coalition is 1 if its cardinality is greater than num_samples/2, or 0 otherwise.
PARAMETER | DESCRIPTION |
---|---|
n_players
|
Number of players that participate in the game.
TYPE:
|