Skip to content

pydvl.value.games

This module provides several predefined games and, depending on the game, the corresponding Shapley values, Least Core values or both of them, for benchmarking purposes.

References


  1. Castro, J., Gómez, D. and Tejada, J., 2009. Polynomial calculation of the Shapley value based on sampling. Computers & Operations Research, 36(5), pp.1726-1730. 

DummyGameDataset

DummyGameDataset(n_players: int, description: Optional[str] = None)

Bases: Dataset

Dummy game dataset.

Initializes a dummy game dataset with n_players and an optional description.

This class is used internally inside the Game class.

PARAMETER DESCRIPTION
n_players

Number of players that participate in the game.

TYPE: int

description

Optional description of the dataset.

TYPE: Optional[str] DEFAULT: None

Source code in src/pydvl/value/games.py
def __init__(self, n_players: int, description: Optional[str] = None) -> None:
    x = np.arange(0, n_players, 1).reshape(-1, 1)
    nil = np.zeros_like(x)
    super().__init__(
        x,
        nil.copy(),
        nil.copy(),
        nil.copy(),
        feature_names=["x"],
        target_names=["y"],
        description=description,
    )

indices property

indices: NDArray[int_]

Index of positions in data.x_train.

Contiguous integers from 0 to len(Dataset).

data_names property

data_names: NDArray[object_]

Names of each individual datapoint.

Used for reporting Shapley values.

dim property

dim: int

Returns the number of dimensions of a sample.

get_training_data

get_training_data(
    indices: Optional[Iterable[int]] = None,
) -> Tuple[NDArray, NDArray]

Given a set of indices, returns the training data that refer to those indices.

This is used mainly by Utility to retrieve subsets of the data from indices. It is typically not needed in algorithms.

PARAMETER DESCRIPTION
indices

Optional indices that will be used to select points from the training data. If None, the entire training data will be returned.

TYPE: Optional[Iterable[int]] DEFAULT: None

RETURNS DESCRIPTION
Tuple[NDArray, NDArray]

If indices is not None, the selected x and y arrays from the training data. Otherwise, the entire dataset.

Source code in src/pydvl/utils/dataset.py
def get_training_data(
    self, indices: Optional[Iterable[int]] = None
) -> Tuple[NDArray, NDArray]:
    """Given a set of indices, returns the training data that refer to those
    indices.

    This is used mainly by [Utility][pydvl.utils.utility.Utility] to retrieve
    subsets of the data from indices. It is typically **not needed in
    algorithms**.

    Args:
        indices: Optional indices that will be used to select points from
            the training data. If `None`, the entire training data will be
            returned.

    Returns:
        If `indices` is not `None`, the selected x and y arrays from the
            training data. Otherwise, the entire dataset.
    """
    if indices is None:
        return self.x_train, self.y_train
    x = self.x_train[indices]
    y = self.y_train[indices]
    return x, y

from_sklearn classmethod

from_sklearn(
    data: Bunch,
    train_size: float = 0.8,
    random_state: Optional[int] = None,
    stratify_by_target: bool = False,
    **kwargs
) -> Dataset

Constructs a Dataset object from a sklearn.utils.Bunch, as returned by the load_* functions in scikit-learn toy datasets.

Example
>>> from pydvl.utils import Dataset
>>> from sklearn.datasets import load_boston
>>> dataset = Dataset.from_sklearn(load_boston())
PARAMETER DESCRIPTION
data

scikit-learn Bunch object. The following attributes are supported:

  • data: covariates.
  • target: target variables (labels).
  • feature_names (optional): the feature names.
  • target_names (optional): the target names.
  • DESCR (optional): a description.

TYPE: Bunch

train_size

size of the training dataset. Used in train_test_split

TYPE: float DEFAULT: 0.8

random_state

seed for train / test split

TYPE: Optional[int] DEFAULT: None

stratify_by_target

If True, data is split in a stratified fashion, using the target variable as labels. Read more in scikit-learn's user guide.

TYPE: bool DEFAULT: False

kwargs

Additional keyword arguments to pass to the Dataset constructor. Use this to pass e.g. is_multi_output.

DEFAULT: {}

RETURNS DESCRIPTION
Dataset

Object with the sklearn dataset

Changed in version 0.6.0

Added kwargs to pass to the Dataset constructor.

Source code in src/pydvl/utils/dataset.py
@classmethod
def from_sklearn(
    cls,
    data: Bunch,
    train_size: float = 0.8,
    random_state: Optional[int] = None,
    stratify_by_target: bool = False,
    **kwargs,
) -> "Dataset":
    """Constructs a [Dataset][pydvl.utils.Dataset] object from a
    [sklearn.utils.Bunch][], as returned by the `load_*`
    functions in [scikit-learn toy datasets](https://scikit-learn.org/stable/datasets/toy_dataset.html).

    ??? Example
        ```pycon
        >>> from pydvl.utils import Dataset
        >>> from sklearn.datasets import load_boston
        >>> dataset = Dataset.from_sklearn(load_boston())
        ```

    Args:
        data: scikit-learn Bunch object. The following attributes are supported:

            - `data`: covariates.
            - `target`: target variables (labels).
            - `feature_names` (**optional**): the feature names.
            - `target_names` (**optional**): the target names.
            - `DESCR` (**optional**): a description.
        train_size: size of the training dataset. Used in `train_test_split`
        random_state: seed for train / test split
        stratify_by_target: If `True`, data is split in a stratified
            fashion, using the target variable as labels. Read more in
            [scikit-learn's user guide](https://scikit-learn.org/stable/modules/cross_validation.html#stratification).
        kwargs: Additional keyword arguments to pass to the
            [Dataset][pydvl.utils.Dataset] constructor. Use this to pass e.g. `is_multi_output`.

    Returns:
        Object with the sklearn dataset

    !!! tip "Changed in version 0.6.0"
        Added kwargs to pass to the [Dataset][pydvl.utils.Dataset] constructor.
    """
    x_train, x_test, y_train, y_test = train_test_split(
        data.data,
        data.target,
        train_size=train_size,
        random_state=random_state,
        stratify=data.target if stratify_by_target else None,
    )
    return cls(
        x_train,
        y_train,
        x_test,
        y_test,
        feature_names=data.get("feature_names"),
        target_names=data.get("target_names"),
        description=data.get("DESCR"),
        **kwargs,
    )

from_arrays classmethod

from_arrays(
    X: NDArray,
    y: NDArray,
    train_size: float = 0.8,
    random_state: Optional[int] = None,
    stratify_by_target: bool = False,
    **kwargs
) -> Dataset

Constructs a Dataset object from X and y numpy arrays as returned by the make_* functions in sklearn generated datasets.

Example
>>> from pydvl.utils import Dataset
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression()
>>> dataset = Dataset.from_arrays(X, y)
PARAMETER DESCRIPTION
X

numpy array of shape (n_samples, n_features)

TYPE: NDArray

y

numpy array of shape (n_samples,)

TYPE: NDArray

train_size

size of the training dataset. Used in train_test_split

TYPE: float DEFAULT: 0.8

random_state

seed for train / test split

TYPE: Optional[int] DEFAULT: None

stratify_by_target

If True, data is split in a stratified fashion, using the y variable as labels. Read more in sklearn's user guide.

TYPE: bool DEFAULT: False

kwargs

Additional keyword arguments to pass to the Dataset constructor. Use this to pass e.g. feature_names or target_names.

DEFAULT: {}

RETURNS DESCRIPTION
Dataset

Object with the passed X and y arrays split across training and test sets.

New in version 0.4.0

Changed in version 0.6.0

Added kwargs to pass to the Dataset constructor.

Source code in src/pydvl/utils/dataset.py
@classmethod
def from_arrays(
    cls,
    X: NDArray,
    y: NDArray,
    train_size: float = 0.8,
    random_state: Optional[int] = None,
    stratify_by_target: bool = False,
    **kwargs,
) -> "Dataset":
    """Constructs a [Dataset][pydvl.utils.Dataset] object from X and y numpy arrays  as
    returned by the `make_*` functions in [sklearn generated datasets](https://scikit-learn.org/stable/datasets/sample_generators.html).

    ??? Example
        ```pycon
        >>> from pydvl.utils import Dataset
        >>> from sklearn.datasets import make_regression
        >>> X, y = make_regression()
        >>> dataset = Dataset.from_arrays(X, y)
        ```

    Args:
        X: numpy array of shape (n_samples, n_features)
        y: numpy array of shape (n_samples,)
        train_size: size of the training dataset. Used in `train_test_split`
        random_state: seed for train / test split
        stratify_by_target: If `True`, data is split in a stratified fashion,
            using the y variable as labels. Read more in [sklearn's user
            guide](https://scikit-learn.org/stable/modules/cross_validation.html#stratification).
        kwargs: Additional keyword arguments to pass to the
            [Dataset][pydvl.utils.Dataset] constructor. Use this to pass e.g. `feature_names`
            or `target_names`.

    Returns:
        Object with the passed X and y arrays split across training and test sets.

    !!! tip "New in version 0.4.0"

    !!! tip "Changed in version 0.6.0"
        Added kwargs to pass to the [Dataset][pydvl.utils.Dataset] constructor.
    """
    x_train, x_test, y_train, y_test = train_test_split(
        X,
        y,
        train_size=train_size,
        random_state=random_state,
        stratify=y if stratify_by_target else None,
    )
    return cls(x_train, y_train, x_test, y_test, **kwargs)

get_test_data

get_test_data(
    indices: Optional[Iterable[int]] = None,
) -> Tuple[NDArray, NDArray]

Returns the subsets of the train set instead of the test set.

PARAMETER DESCRIPTION
indices

Indices into the training data.

TYPE: Optional[Iterable[int]] DEFAULT: None

RETURNS DESCRIPTION
Tuple[NDArray, NDArray]

Subset of the train data.

Source code in src/pydvl/value/games.py
def get_test_data(
    self, indices: Optional[Iterable[int]] = None
) -> Tuple[NDArray, NDArray]:
    """Returns the subsets of the train set instead of the test set.

    Args:
        indices: Indices into the training data.

    Returns:
        Subset of the train data.
    """
    if indices is None:
        return self.x_train, self.y_train
    x = self.x_train[indices]
    y = self.y_train[indices]
    return x, y

DummyModel

DummyModel()

Bases: SupervisedModel

Dummy model class.

A dummy supervised model used for testing purposes only.

Source code in src/pydvl/value/games.py
def __init__(self) -> None:
    pass

Game

Game(
    n_players: int,
    score_range: Tuple[float, float] = (-np.inf, np.inf),
    description: Optional[str] = None,
)

Bases: ABC

Base class for games

Any Game subclass has to implement the abstract _score method to assign a score to each coalition/subset and at least one of shapley_values, least_core_values.

PARAMETER DESCRIPTION
n_players

Number of players that participate in the game.

TYPE: int

score_range

Minimum and maximum values of the _score method.

TYPE: Tuple[float, float] DEFAULT: (-inf, inf)

description

Optional string description of the dummy dataset that will be created.

TYPE: Optional[str] DEFAULT: None

ATTRIBUTE DESCRIPTION
n_players

Number of players that participate in the game.

data

Dummy dataset object.

u

Utility object with a dummy model and dataset.

Source code in src/pydvl/value/games.py
def __init__(
    self,
    n_players: int,
    score_range: Tuple[float, float] = (-np.inf, np.inf),
    description: Optional[str] = None,
):
    self.n_players = n_players
    self.data = DummyGameDataset(self.n_players, description)
    self.u = Utility(
        DummyModel(),
        self.data,
        scorer=Scorer(self._score, range=score_range),
        catch_errors=False,
        show_warnings=True,
    )

SymmetricVotingGame

SymmetricVotingGame(n_players: int)

Bases: Game

Toy game that is used for testing and demonstration purposes.

A symmetric voting game defined in (Castro et al., 2009)1 Section 4.1

For this game the utility of a coalition is 1 if its cardinality is greater than num_samples/2, or 0 otherwise.

\[{ v(S) = \left\{\begin{array}{ll} 1, & \text{ if} \quad \mid S \mid > \frac{N}{2} \\ 0, & \text{ otherwise} \end{array}\right. }\]
PARAMETER DESCRIPTION
n_players

Number of players that participate in the game.

TYPE: int

Source code in src/pydvl/value/games.py
def __init__(self, n_players: int) -> None:
    if n_players % 2 != 0:
        raise ValueError("n_players must be an even number.")
    description = "Dummy data for the symmetric voting game in Castro et al. 2009"
    super().__init__(
        n_players,
        score_range=(0, 1),
        description=description,
    )

AsymmetricVotingGame

AsymmetricVotingGame(n_players: int = 51)

Bases: Game

Toy game that is used for testing and demonstration purposes.

An asymmetric voting game defined in (Castro et al., 2009)1 Section 4.2.

For this game the player set is \(N = \{1,\dots,51\}\) and the utility of a coalition is given by:

\[{ v(S) = \left\{\begin{array}{ll} 1, & \text{ if} \quad \sum\limits_{i \in S} w_i > \sum\limits_{j \in N}\frac{w_j}{2} \\ 0, & \text{ otherwise} \end{array}\right. }\]

where \(w = [w_1,\dots, w_{51}]\) is a list of weights associated with each player.

PARAMETER DESCRIPTION
n_players

Number of players that participate in the game.

TYPE: int DEFAULT: 51

Source code in src/pydvl/value/games.py
def __init__(self, n_players: int = 51) -> None:
    if n_players != 51:
        raise ValueError(
            f"{self.__class__.__name__} only supports n_players=51 but got {n_players=}."
        )
    description = "Dummy data for the asymmetric voting game in Castro et al. 2009"
    super().__init__(
        n_players,
        score_range=(0, 1),
        description=description,
    )

    ranges = [
        range(0, 1),
        range(1, 2),
        range(2, 3),
        range(3, 5),
        range(5, 6),
        range(6, 7),
        range(7, 9),
        range(9, 10),
        range(10, 12),
        range(12, 15),
        range(15, 16),
        range(16, 20),
        range(20, 24),
        range(24, 26),
        range(26, 30),
        range(30, 34),
        range(34, 35),
        range(35, 44),
        range(44, 51),
    ]

    ranges_weights = [
        45,
        41,
        27,
        26,
        25,
        21,
        17,
        14,
        13,
        12,
        11,
        10,
        9,
        8,
        7,
        6,
        5,
        4,
        3,
    ]
    ranges_values = [
        "0.08831",
        "0.07973",
        "0.05096",
        "0.04898",
        "0.047",
        "0.03917",
        "0.03147",
        "0.02577",
        "0.02388",
        "0.022",
        "0.02013",
        "0.01827",
        "0.01641",
        "0.01456",
        "0.01272",
        "0.01088",
        "0.009053",
        "0.00723",
        "0.005412",
    ]

    self.weight_table = np.zeros(self.n_players)
    exact_values = np.zeros(self.n_players)
    for r, w, v in zip(ranges, ranges_weights, ranges_values):
        self.weight_table[r] = w
        exact_values[r] = v

    self.exact_values = exact_values
    self.threshold = np.sum(self.weight_table) / 2

ShoesGame

ShoesGame(left: int, right: int)

Bases: Game

Toy game that is used for testing and demonstration purposes.

A shoes game defined in (Castro et al., 2009)1.

In this game, some players have a left shoe and others a right shoe. Single shoes have a worth of zero while pairs have a worth of 1.

The payoff of a coalition \(S\) is:

\[{ v(S) = \min( \mid S \cap L \mid, \mid S \cap R \mid ) }\]

Where \(L\), respectively \(R\), is the set of players with left shoes, respectively right shoes.

PARAMETER DESCRIPTION
left

Number of players with a left shoe.

TYPE: int

right

Number of players with a right shoe.

TYPE: int

Source code in src/pydvl/value/games.py
def __init__(self, left: int, right: int) -> None:
    self.left = left
    self.right = right
    n_players = self.left + self.right
    description = "Dummy data for the shoe game in Castro et al. 2009"
    max_score = n_players // 2
    super().__init__(n_players, score_range=(0, max_score), description=description)

AirportGame

AirportGame(n_players: int = 100)

Bases: Game

Toy game that is used for testing and demonstration purposes.

An airport game defined in (Castro et al., 2009)1 Section 4.3

PARAMETER DESCRIPTION
n_players

Number of players that participate in the game.

TYPE: int DEFAULT: 100

Source code in src/pydvl/value/games.py
def __init__(self, n_players: int = 100) -> None:
    if n_players != 100:
        raise ValueError(
            f"{self.__class__.__name__} only supports n_players=100 but got {n_players=}."
        )
    description = "A dummy dataset for the airport game in Castro et al. 2009"
    super().__init__(n_players, score_range=(0, 100), description=description)
    ranges = [
        range(0, 8),
        range(8, 20),
        range(20, 26),
        range(26, 40),
        range(40, 48),
        range(48, 57),
        range(57, 70),
        range(70, 80),
        range(80, 90),
        range(90, 100),
    ]
    exact = [
        0.01,
        0.020869565,
        0.033369565,
        0.046883079,
        0.063549745,
        0.082780515,
        0.106036329,
        0.139369662,
        0.189369662,
        0.289369662,
    ]
    c = list(range(1, 10))
    score_table = np.zeros(100)
    exact_values = np.zeros(100)

    for r, v in zip(ranges, exact):
        score_table[r] = c
        exact_values[r] = v

    self.exact_values = exact_values
    self.score_table = score_table

MinimumSpanningTreeGame

MinimumSpanningTreeGame(n_players: int = 100)

Bases: Game

Toy game that is used for testing and demonstration purposes.

A minimum spanning tree game defined in (Castro et al., 2009)1.

Let \(G = (N \cup \{0\},E)\) be a valued graph where \(N = \{1,\dots,100\}\), and the cost associated to an edge \((i, j)\) is:

\[{ c_{ij} = \left\{\begin{array}{lll} 1, & \text{ if} & i = j + 1 \text{ or } i = j - 1 \\ & & \text{ or } (i = 1 \text{ and } j = 100) \text{ or } (i = 100 \text{ and } j = 1) \\ 101, & \text{ if} & i = 0 \text{ or } j = 0 \\ \infty, & \text{ otherwise} \end{array}\right. }\]

A minimum spanning tree game \((N, c)\) is a cost game, where for a given coalition \(S \subset N\), \(v(S)\) is the sum of the edge cost of the minimum spanning tree, i.e. \(v(S)\) = Minimum Spanning Tree of the graph \(G|_{S\cup\{0\}}\), which is the partial graph restricted to the players \(S\) and the source node \(0\).

PARAMETER DESCRIPTION
n_players

Number of players that participate in the game.

TYPE: int DEFAULT: 100

Source code in src/pydvl/value/games.py
def __init__(self, n_players: int = 100) -> None:
    if n_players != 100:
        raise ValueError(
            f"{self.__class__.__name__} only supports n_players=100 but got {n_players=}."
        )
    description = (
        "A dummy dataset for the minimum spanning tree game in Castro et al. 2009"
    )
    super().__init__(n_players, score_range=(0, np.inf), description=description)

    graph = np.zeros(shape=(self.n_players, self.n_players))

    for i in range(self.n_players):
        for j in range(self.n_players):
            if (
                i == j + 1
                or i == j - 1
                or (i == 1 and j == self.n_players - 1)
                or (i == self.n_players - 1 and j == 1)
            ):
                graph[i, j] = 1
            elif i == 0 or j == 0:
                graph[i, j] = 0
            else:
                graph[i, j] = np.inf
    assert np.all(graph == graph.T)

    self.graph = graph

MinerGame

MinerGame(n_players: int)

Bases: Game

Toy game that is used for testing and demonstration purposes.

Consider a group of n miners, who have discovered large bars of gold.

If two miners can carry one piece of gold, then the payoff of a coalition \(S\) is:

\[{ v(S) = \left\{\begin{array}{lll} \mid S \mid / 2, & \text{ if} & \mid S \mid \text{ is even} \\ ( \mid S \mid - 1)/2, & \text{ otherwise} \end{array}\right. }\]

If there are more than two miners and there is an even number of miners, then the core consists of the single payoff where each miner gets 1/2.

If there is an odd number of miners, then the core is empty.

Taken from Wikipedia

PARAMETER DESCRIPTION
n_players

Number of miners that participate in the game.

TYPE: int

Source code in src/pydvl/value/games.py
def __init__(self, n_players: int) -> None:
    if n_players <= 2:
        raise ValueError(f"n_players, {n_players}, should be > 2")
    description = "Dummy data for Miner Game taken from https://en.wikipedia.org/wiki/Core_(game_theory)"
    super().__init__(
        n_players,
        score_range=(0, n_players // 2),
        description=description,
    )