pydvl.value.games ¶

This module provides several predefined games and, depending on the game, the corresponding Shapley values, Least Core values or both of them, for benchmarking purposes.

References¶

Castro, J., Gómez, D. and Tejada, J., 2009. Polynomial calculation of the Shapley value based on sampling. Computers & Operations Research, 36(5), pp.1726-1730. ↩

DummyGameDataset ¶

DummyGameDataset(n_players: int, description: Optional[str] = None)

Bases: Dataset

Dummy game dataset.

Initializes a dummy game dataset with n_players and an optional description.

This class is used internally inside the Game class.

PARAMETER	DESCRIPTION
`n_players`	Number of players that participate in the game. TYPE: `int`
`description`	Optional description of the dataset. TYPE: `Optional[str]` DEFAULT: `None`

Source code in src/pydvl/value/games.py

def __init__(self, n_players: int, description: Optional[str] = None) -> None:
    x = np.arange(0, n_players, 1).reshape(-1, 1)
    nil = np.zeros_like(x)
    super().__init__(
        x,
        nil.copy(),
        nil.copy(),
        nil.copy(),
        feature_names=["x"],
        target_names=["y"],
        description=description,
    )

indices `property` ¶

indices: NDArray[int_]

Index of positions in data.x_train.

Contiguous integers from 0 to len(Dataset).

data_names `property` ¶

data_names: NDArray[object_]

Names of each individual datapoint.

Used for reporting Shapley values.

dim `property` ¶

dim: int

Returns the number of dimensions of a sample.

get_training_data ¶

get_training_data(
    indices: Optional[Iterable[int]] = None,
) -> Tuple[NDArray, NDArray]

Given a set of indices, returns the training data that refer to those indices.

This is used mainly by Utility to retrieve subsets of the data from indices. It is typically not needed in algorithms.

PARAMETER	DESCRIPTION
`indices`	Optional indices that will be used to select points from the training data. If `None`, the entire training data will be returned. TYPE: `Optional[Iterable[int]]` DEFAULT: `None`

RETURNS	DESCRIPTION
`Tuple[NDArray, NDArray]`	If `indices` is not `None`, the selected x and y arrays from the training data. Otherwise, the entire dataset.

Source code in src/pydvl/utils/dataset.py

def get_training_data(
    self, indices: Optional[Iterable[int]] = None
) -> Tuple[NDArray, NDArray]:
    """Given a set of indices, returns the training data that refer to those
    indices.

    This is used mainly by [Utility][pydvl.utils.utility.Utility] to retrieve
    subsets of the data from indices. It is typically **not needed in
    algorithms**.

    Args:
        indices: Optional indices that will be used to select points from
            the training data. If `None`, the entire training data will be
            returned.

    Returns:
        If `indices` is not `None`, the selected x and y arrays from the
            training data. Otherwise, the entire dataset.
    """
    if indices is None:
        return self.x_train, self.y_train
    x = self.x_train[indices]
    y = self.y_train[indices]
    return x, y

from_sklearn `classmethod` ¶

from_sklearn(
    data: Bunch,
    train_size: float = 0.8,
    random_state: Optional[int] = None,
    stratify_by_target: bool = False,
    **kwargs
) -> Dataset

Constructs a Dataset object from a sklearn.utils.Bunch, as returned by the load_* functions in scikit-learn toy datasets.

Example

>>> from pydvl.utils import Dataset
>>> from sklearn.datasets import load_boston
>>> dataset = Dataset.from_sklearn(load_boston())

PARAMETER	DESCRIPTION
`data`	scikit-learn Bunch object. The following attributes are supported: `data`: covariates. `target`: target variables (labels). `feature_names` (optional): the feature names. `target_names` (optional): the target names. `DESCR` (optional): a description. TYPE: `Bunch`
`train_size`	size of the training dataset. Used in `train_test_split` TYPE: `float` DEFAULT: `0.8`
`random_state`	seed for train / test split TYPE: `Optional[int]` DEFAULT: `None`
`stratify_by_target`	If `True`, data is split in a stratified fashion, using the target variable as labels. Read more in scikit-learn's user guide. TYPE: `bool` DEFAULT: `False`
`kwargs`	Additional keyword arguments to pass to the Dataset constructor. Use this to pass e.g. `is_multi_output`. DEFAULT: `{}`

RETURNS	DESCRIPTION
`Dataset`	Object with the sklearn dataset

Changed in version 0.6.0

Added kwargs to pass to the Dataset constructor.

Source code in src/pydvl/utils/dataset.py

@classmethod
def from_sklearn(
    cls,
    data: Bunch,
    train_size: float = 0.8,
    random_state: Optional[int] = None,
    stratify_by_target: bool = False,
    **kwargs,
) -> "Dataset":
    """Constructs a [Dataset][pydvl.utils.Dataset] object from a
    [sklearn.utils.Bunch][], as returned by the `load_*`
    functions in [scikit-learn toy datasets](https://scikit-learn.org/stable/datasets/toy_dataset.html).

    ??? Example
        ```pycon
        >>> from pydvl.utils import Dataset
        >>> from sklearn.datasets import load_boston
        >>> dataset = Dataset.from_sklearn(load_boston())
        ```

    Args:
        data: scikit-learn Bunch object. The following attributes are supported:

            - `data`: covariates.
            - `target`: target variables (labels).
            - `feature_names` (**optional**): the feature names.
            - `target_names` (**optional**): the target names.
            - `DESCR` (**optional**): a description.
        train_size: size of the training dataset. Used in `train_test_split`
        random_state: seed for train / test split
        stratify_by_target: If `True`, data is split in a stratified
            fashion, using the target variable as labels. Read more in
            [scikit-learn's user guide](https://scikit-learn.org/stable/modules/cross_validation.html#stratification).
        kwargs: Additional keyword arguments to pass to the
            [Dataset][pydvl.utils.Dataset] constructor. Use this to pass e.g. `is_multi_output`.

    Returns:
        Object with the sklearn dataset

    !!! tip "Changed in version 0.6.0"
        Added kwargs to pass to the [Dataset][pydvl.utils.Dataset] constructor.
    """
    x_train, x_test, y_train, y_test = train_test_split(
        data.data,
        data.target,
        train_size=train_size,
        random_state=random_state,
        stratify=data.target if stratify_by_target else None,
    )
    return cls(
        x_train,
        y_train,
        x_test,
        y_test,
        feature_names=data.get("feature_names"),
        target_names=data.get("target_names"),
        description=data.get("DESCR"),
        **kwargs,
    )

from_arrays `classmethod` ¶

from_arrays(
    X: NDArray,
    y: NDArray,
    train_size: float = 0.8,
    random_state: Optional[int] = None,
    stratify_by_target: bool = False,
    **kwargs
) -> Dataset

Constructs a Dataset object from X and y numpy arrays as returned by the make_* functions in sklearn generated datasets.

Example

>>> from pydvl.utils import Dataset
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression()
>>> dataset = Dataset.from_arrays(X, y)

PARAMETER	DESCRIPTION
`X`	numpy array of shape (n_samples, n_features) TYPE: `NDArray`
`y`	numpy array of shape (n_samples,) TYPE: `NDArray`
`train_size`	size of the training dataset. Used in `train_test_split` TYPE: `float` DEFAULT: `0.8`
`random_state`	seed for train / test split TYPE: `Optional[int]` DEFAULT: `None`
`stratify_by_target`	If `True`, data is split in a stratified fashion, using the y variable as labels. Read more in sklearn's user guide. TYPE: `bool` DEFAULT: `False`
`kwargs`	Additional keyword arguments to pass to the Dataset constructor. Use this to pass e.g. `feature_names` or `target_names`. DEFAULT: `{}`

RETURNS	DESCRIPTION
`Dataset`	Object with the passed X and y arrays split across training and test sets.

New in version 0.4.0

Changed in version 0.6.0

Added kwargs to pass to the Dataset constructor.

Source code in src/pydvl/utils/dataset.py

@classmethod
def from_arrays(
    cls,
    X: NDArray,
    y: NDArray,
    train_size: float = 0.8,
    random_state: Optional[int] = None,
    stratify_by_target: bool = False,
    **kwargs,
) -> "Dataset":
    """Constructs a [Dataset][pydvl.utils.Dataset] object from X and y numpy arrays  as
    returned by the `make_*` functions in [sklearn generated datasets](https://scikit-learn.org/stable/datasets/sample_generators.html).

    ??? Example
        ```pycon
        >>> from pydvl.utils import Dataset
        >>> from sklearn.datasets import make_regression
        >>> X, y = make_regression()
        >>> dataset = Dataset.from_arrays(X, y)
        ```

    Args:
        X: numpy array of shape (n_samples, n_features)
        y: numpy array of shape (n_samples,)
        train_size: size of the training dataset. Used in `train_test_split`
        random_state: seed for train / test split
        stratify_by_target: If `True`, data is split in a stratified fashion,
            using the y variable as labels. Read more in [sklearn's user
            guide](https://scikit-learn.org/stable/modules/cross_validation.html#stratification).
        kwargs: Additional keyword arguments to pass to the
            [Dataset][pydvl.utils.Dataset] constructor. Use this to pass e.g. `feature_names`
            or `target_names`.

    Returns:
        Object with the passed X and y arrays split across training and test sets.

    !!! tip "New in version 0.4.0"

    !!! tip "Changed in version 0.6.0"
        Added kwargs to pass to the [Dataset][pydvl.utils.Dataset] constructor.
    """
    x_train, x_test, y_train, y_test = train_test_split(
        X,
        y,
        train_size=train_size,
        random_state=random_state,
        stratify=y if stratify_by_target else None,
    )
    return cls(x_train, y_train, x_test, y_test, **kwargs)

get_test_data ¶

get_test_data(
    indices: Optional[Iterable[int]] = None,
) -> Tuple[NDArray, NDArray]

Returns the subsets of the train set instead of the test set.

PARAMETER	DESCRIPTION
`indices`	Indices into the training data. TYPE: `Optional[Iterable[int]]` DEFAULT: `None`

RETURNS	DESCRIPTION
`Tuple[NDArray, NDArray]`	Subset of the train data.

Source code in src/pydvl/value/games.py

def get_test_data(
    self, indices: Optional[Iterable[int]] = None
) -> Tuple[NDArray, NDArray]:
    """Returns the subsets of the train set instead of the test set.

    Args:
        indices: Indices into the training data.

    Returns:
        Subset of the train data.
    """
    if indices is None:
        return self.x_train, self.y_train
    x = self.x_train[indices]
    y = self.y_train[indices]
    return x, y

DummyModel ¶

DummyModel()

Bases: SupervisedModel

Dummy model class.

A dummy supervised model used for testing purposes only.

Source code in src/pydvl/value/games.py

def __init__(self) -> None:
    pass

Game ¶

Game(
    n_players: int,
    score_range: Tuple[float, float] = (-np.inf, np.inf),
    description: Optional[str] = None,
)

Bases: ABC

Base class for games

Any Game subclass has to implement the abstract _score method to assign a score to each coalition/subset and at least one of shapley_values, least_core_values.

PARAMETER	DESCRIPTION
`n_players`	Number of players that participate in the game. TYPE: `int`
`score_range`	Minimum and maximum values of the `_score` method. TYPE: `Tuple[float, float]` DEFAULT: `(-inf, inf)`
`description`	Optional string description of the dummy dataset that will be created. TYPE: `Optional[str]` DEFAULT: `None`

ATTRIBUTE	DESCRIPTION
`n_players`	Number of players that participate in the game.
`data`	Dummy dataset object.
`u`	Utility object with a dummy model and dataset.

Source code in src/pydvl/value/games.py

def __init__(
    self,
    n_players: int,
    score_range: Tuple[float, float] = (-np.inf, np.inf),
    description: Optional[str] = None,
):
    self.n_players = n_players
    self.data = DummyGameDataset(self.n_players, description)
    self.u = Utility(
        DummyModel(),
        self.data,
        scorer=Scorer(self._score, range=score_range),
        catch_errors=False,
        show_warnings=True,
    )

SymmetricVotingGame ¶

SymmetricVotingGame(n_players: int)

Bases: Game

Toy game that is used for testing and demonstration purposes.

A symmetric voting game defined in (Castro et al., 2009)¹ Section 4.1

For this game the utility of a coalition is 1 if its cardinality is greater than num_samples/2, or 0 otherwise.

\[{ v(S) = \left\{\begin{array}{ll} 1, & \text{ if} \quad \mid S \mid > \frac{N}{2} \\ 0, & \text{ otherwise} \end{array}\right. }\]

PARAMETER	DESCRIPTION
`n_players`	Number of players that participate in the game. TYPE: `int`

Source code in src/pydvl/value/games.py

def __init__(self, n_players: int) -> None:
    if n_players % 2 != 0:
        raise ValueError("n_players must be an even number.")
    description = "Dummy data for the symmetric voting game in Castro et al. 2009"
    super().__init__(
        n_players,
        score_range=(0, 1),
        description=description,
    )

AsymmetricVotingGame ¶

AsymmetricVotingGame(n_players: int = 51)

Bases: Game

Toy game that is used for testing and demonstration purposes.

An asymmetric voting game defined in (Castro et al., 2009)¹ Section 4.2.

For this game the player set is \(N = \{1,\dots,51\}\) and the utility of a coalition is given by:

\[{ v(S) = \left\{\begin{array}{ll} 1, & \text{ if} \quad \sum\limits_{i \in S} w_i > \sum\limits_{j \in N}\frac{w_j}{2} \\ 0, & \text{ otherwise} \end{array}\right. }\]

where \(w = [w_1,\dots, w_{51}]\) is a list of weights associated with each player.

PARAMETER	DESCRIPTION
`n_players`	Number of players that participate in the game. TYPE: `int` DEFAULT: `51`

Source code in src/pydvl/value/games.py

def __init__(self, n_players: int = 51) -> None:
    if n_players != 51:
        raise ValueError(
            f"{self.__class__.__name__} only supports n_players=51 but got {n_players=}."
        )
    description = "Dummy data for the asymmetric voting game in Castro et al. 2009"
    super().__init__(
        n_players,
        score_range=(0, 1),
        description=description,
    )

    ranges = [
        range(0, 1),
        range(1, 2),
        range(2, 3),
        range(3, 5),
        range(5, 6),
        range(6, 7),
        range(7, 9),
        range(9, 10),
        range(10, 12),
        range(12, 15),
        range(15, 16),
        range(16, 20),
        range(20, 24),
        range(24, 26),
        range(26, 30),
        range(30, 34),
        range(34, 35),
        range(35, 44),
        range(44, 51),
    ]

    ranges_weights = [
        45,
        41,
        27,
        26,
        25,
        21,
        17,
        14,
        13,
        12,
        11,
        10,
        9,
        8,
        7,
        6,
        5,
        4,
        3,
    ]
    ranges_values = [
        "0.08831",
        "0.07973",
        "0.05096",
        "0.04898",
        "0.047",
        "0.03917",
        "0.03147",
        "0.02577",
        "0.02388",
        "0.022",
        "0.02013",
        "0.01827",
        "0.01641",
        "0.01456",
        "0.01272",
        "0.01088",
        "0.009053",
        "0.00723",
        "0.005412",
    ]

    self.weight_table = np.zeros(self.n_players)
    exact_values = np.zeros(self.n_players)
    for r, w, v in zip(ranges, ranges_weights, ranges_values):
        self.weight_table[r] = w
        exact_values[r] = v

    self.exact_values = exact_values
    self.threshold = np.sum(self.weight_table) / 2

ShoesGame ¶

ShoesGame(left: int, right: int)

Bases: Game

Toy game that is used for testing and demonstration purposes.

A shoes game defined in (Castro et al., 2009)¹.

In this game, some players have a left shoe and others a right shoe. Single shoes have a worth of zero while pairs have a worth of 1.

The payoff of a coalition \(S\) is:

\[{ v(S) = \min( \mid S \cap L \mid, \mid S \cap R \mid ) }\]

Where \(L\), respectively \(R\), is the set of players with left shoes, respectively right shoes.

PARAMETER	DESCRIPTION
`left`	Number of players with a left shoe. TYPE: `int`
`right`	Number of players with a right shoe. TYPE: `int`

Source code in src/pydvl/value/games.py

def __init__(self, left: int, right: int) -> None:
    self.left = left
    self.right = right
    n_players = self.left + self.right
    description = "Dummy data for the shoe game in Castro et al. 2009"
    max_score = n_players // 2
    super().__init__(n_players, score_range=(0, max_score), description=description)

AirportGame ¶

AirportGame(n_players: int = 100)

Bases: Game

Toy game that is used for testing and demonstration purposes.

An airport game defined in (Castro et al., 2009)¹ Section 4.3

PARAMETER	DESCRIPTION
`n_players`	Number of players that participate in the game. TYPE: `int` DEFAULT: `100`

Source code in src/pydvl/value/games.py

def __init__(self, n_players: int = 100) -> None:
    if n_players != 100:
        raise ValueError(
            f"{self.__class__.__name__} only supports n_players=100 but got {n_players=}."
        )
    description = "A dummy dataset for the airport game in Castro et al. 2009"
    super().__init__(n_players, score_range=(0, 100), description=description)
    ranges = [
        range(0, 8),
        range(8, 20),
        range(20, 26),
        range(26, 40),
        range(40, 48),
        range(48, 57),
        range(57, 70),
        range(70, 80),
        range(80, 90),
        range(90, 100),
    ]
    exact = [
        0.01,
        0.020869565,
        0.033369565,
        0.046883079,
        0.063549745,
        0.082780515,
        0.106036329,
        0.139369662,
        0.189369662,
        0.289369662,
    ]
    c = list(range(1, 10))
    score_table = np.zeros(100)
    exact_values = np.zeros(100)

    for r, v in zip(ranges, exact):
        score_table[r] = c
        exact_values[r] = v

    self.exact_values = exact_values
    self.score_table = score_table

MinimumSpanningTreeGame ¶

MinimumSpanningTreeGame(n_players: int = 100)

Bases: Game

Toy game that is used for testing and demonstration purposes.

A minimum spanning tree game defined in (Castro et al., 2009)¹.

Let \(G = (N \cup \{0\},E)\) be a valued graph where \(N = \{1,\dots,100\}\), and the cost associated to an edge \((i, j)\) is:

\[{ c_{ij} = \left\{\begin{array}{lll} 1, & \text{ if} & i = j + 1 \text{ or } i = j - 1 \\ & & \text{ or } (i = 1 \text{ and } j = 100) \text{ or } (i = 100 \text{ and } j = 1) \\ 101, & \text{ if} & i = 0 \text{ or } j = 0 \\ \infty, & \text{ otherwise} \end{array}\right. }\]

A minimum spanning tree game \((N, c)\) is a cost game, where for a given coalition \(S \subset N\), \(v(S)\) is the sum of the edge cost of the minimum spanning tree, i.e. \(v(S)\) = Minimum Spanning Tree of the graph \(G|_{S\cup\{0\}}\), which is the partial graph restricted to the players \(S\) and the source node \(0\).

PARAMETER	DESCRIPTION
`n_players`	Number of players that participate in the game. TYPE: `int` DEFAULT: `100`

Source code in src/pydvl/value/games.py

def __init__(self, n_players: int = 100) -> None:
    if n_players != 100:
        raise ValueError(
            f"{self.__class__.__name__} only supports n_players=100 but got {n_players=}."
        )
    description = (
        "A dummy dataset for the minimum spanning tree game in Castro et al. 2009"
    )
    super().__init__(n_players, score_range=(0, np.inf), description=description)

    graph = np.zeros(shape=(self.n_players, self.n_players))

    for i in range(self.n_players):
        for j in range(self.n_players):
            if (
                i == j + 1
                or i == j - 1
                or (i == 1 and j == self.n_players - 1)
                or (i == self.n_players - 1 and j == 1)
            ):
                graph[i, j] = 1
            elif i == 0 or j == 0:
                graph[i, j] = 0
            else:
                graph[i, j] = np.inf
    assert np.all(graph == graph.T)

    self.graph = graph

MinerGame ¶

MinerGame(n_players: int)

Bases: Game

Toy game that is used for testing and demonstration purposes.

Consider a group of n miners, who have discovered large bars of gold.

If two miners can carry one piece of gold, then the payoff of a coalition \(S\) is:

\[{ v(S) = \left\{\begin{array}{lll} \mid S \mid / 2, & \text{ if} & \mid S \mid \text{ is even} \\ ( \mid S \mid - 1)/2, & \text{ otherwise} \end{array}\right. }\]

If there are more than two miners and there is an even number of miners, then the core consists of the single payoff where each miner gets 1/2.

If there is an odd number of miners, then the core is empty.

Taken from Wikipedia

PARAMETER	DESCRIPTION
`n_players`	Number of miners that participate in the game. TYPE: `int`

Source code in src/pydvl/value/games.py

def __init__(self, n_players: int) -> None:
    if n_players <= 2:
        raise ValueError(f"n_players, {n_players}, should be > 2")
    description = "Dummy data for Miner Game taken from https://en.wikipedia.org/wiki/Core_(game_theory)"
    super().__init__(
        n_players,
        score_range=(0, n_players // 2),
        description=description,
    )

pydvl.value.games ¶

References¶

DummyGameDataset ¶

indices property ¶

data_names property ¶

dim property ¶

get_training_data ¶

from_sklearn classmethod ¶

from_arrays classmethod ¶

get_test_data ¶

DummyModel ¶

Game ¶

SymmetricVotingGame ¶

AsymmetricVotingGame ¶

ShoesGame ¶

AirportGame ¶

MinimumSpanningTreeGame ¶

MinerGame ¶

indices `property` ¶

data_names `property` ¶

dim `property` ¶

from_sklearn `classmethod` ¶

from_arrays `classmethod` ¶