pydvl.valuation.games ¶

This module provides several predefined games used in the literature ¹ and, depending on the game, precomputed Shapley, Least-Core, and / or Banzhaf values, for benchmarking purposes.

The games are:

SymmetricVotingGame
AsymmetricVotingGame
ShoesGame
AirportGame
MinimumSpanningTreeGame
MinerGame

References¶

Castro, J., Gómez, D. and Tejada, J., 2009. Polynomial calculation of the Shapley value based on sampling. Computers & Operations Research, 36(5), pp.1726-1730. ↩

AirportGame ¶

AirportGame(n_players: int = 100)

Bases: Game

Toy game that is used for testing and demonstration purposes.

An airport game defined in (Castro et al., 2009)¹ Section 4.3

PARAMETER	DESCRIPTION
`n_players`	Number of players that participate in the game. TYPE: `int` DEFAULT: `100`

Source code in src/pydvl/valuation/games.py

def __init__(self, n_players: int = 100):
    if n_players != 100:
        raise ValueError(
            f"{self.__class__.__name__} only supports n_players=100 but got {n_players=}."
        )
    super().__init__(n_players, score_range=(0, 100))
    ranges = [
        range(0, 8),
        range(8, 20),
        range(20, 26),
        range(26, 40),
        range(40, 48),
        range(48, 57),
        range(57, 70),
        range(70, 80),
        range(80, 90),
        range(90, 100),
    ]
    exact = [
        0.01,
        0.020869565,
        0.033369565,
        0.046883079,
        0.063549745,
        0.082780515,
        0.106036329,
        0.139369662,
        0.189369662,
        0.289369662,
    ]
    c = list(range(1, 10))
    score_table = np.zeros(100)
    exact_values = np.zeros(100)

    for r, v in zip(ranges, exact):
        score_table[r] = c
        exact_values[r] = v

    self.exact_values = exact_values
    self.score_table = score_table

AsymmetricVotingGame ¶

AsymmetricVotingGame(n_players: int = 51)

Bases: Game

Toy game that is used for testing and demonstration purposes.

An asymmetric voting game defined in (Castro et al., 2009)¹ Section 4.2.

For this game the player set is \(N = \{1,\dots,51\}\) and the utility of a coalition is given by:

\[{ v(S) = \left\{\begin{array}{ll} 1, & \text{ if} \quad \sum\limits_{i \in S} w_i > \sum\limits_{j \in N}\frac{w_j}{2} \\ 0, & \text{ otherwise} \end{array}\right. }\]

where \(w = [w_1,\dots, w_{51}]\) is a list of weights associated with each player.

PARAMETER	DESCRIPTION
`n_players`	Number of players that participate in the game. TYPE: `int` DEFAULT: `51`

Source code in src/pydvl/valuation/games.py

def __init__(self, n_players: int = 51):
    if n_players != 51:
        raise ValueError(
            f"{self.__class__.__name__} only supports n_players=51 but got {n_players=}."
        )
    super().__init__(n_players, score_range=(0, 1))

    ranges = [
        range(0, 1),
        range(1, 2),
        range(2, 3),
        range(3, 5),
        range(5, 6),
        range(6, 7),
        range(7, 9),
        range(9, 10),
        range(10, 12),
        range(12, 15),
        range(15, 16),
        range(16, 20),
        range(20, 24),
        range(24, 26),
        range(26, 30),
        range(30, 34),
        range(34, 35),
        range(35, 44),
        range(44, 51),
    ]

    ranges_weights = [
        45,
        41,
        27,
        26,
        25,
        21,
        17,
        14,
        13,
        12,
        11,
        10,
        9,
        8,
        7,
        6,
        5,
        4,
        3,
    ]
    ranges_values = [
        "0.08831",
        "0.07973",
        "0.05096",
        "0.04898",
        "0.047",
        "0.03917",
        "0.03147",
        "0.02577",
        "0.02388",
        "0.022",
        "0.02013",
        "0.01827",
        "0.01641",
        "0.01456",
        "0.01272",
        "0.01088",
        "0.009053",
        "0.00723",
        "0.005412",
    ]

    self.weight_table = np.zeros(self.n_players)
    exact_values = np.zeros(self.n_players)
    for r, w, v in zip(ranges, ranges_weights, ranges_values):
        self.weight_table[r] = w
        exact_values[r] = v

    self.exact_values = exact_values
    self.threshold = np.sum(self.weight_table) / 2

DummyGameDataset ¶

DummyGameDataset(n_players: int, description: str | None = None)

Bases: Dataset

Dummy game dataset.

Initializes a dummy game dataset with n_players and an optional description.

This class is used internally inside the Game class.

PARAMETER	DESCRIPTION
`n_players`	Number of players that participate in the game. TYPE: `int`
`description`	Optional description of the dataset. TYPE: `str \| None` DEFAULT: `None`

Source code in src/pydvl/valuation/games.py

def __init__(self, n_players: int, description: str | None = None) -> None:
    x = np.arange(0, n_players, 1).reshape(-1, 1)
    nil = np.zeros_like(x)
    super().__init__(
        x,
        nil.copy(),
        feature_names=["x"],
        target_names=["y"],
        description=description,
    )

indices `property` ¶

indices: NDArray[int_]

Index of positions in data.x_train.

Contiguous integers from 0 to len(Dataset).

n_features `property` ¶

n_features: int

Returns the number of dimensions of a sample.

n_targets `property` ¶

n_targets: int

Returns the number of target variables.

names `property` ¶

names: NDArray[str_]

Names of each individual datapoint.

Used for reporting Shapley values.

getstate ¶

__getstate__()

Prepare the object state for pickling replacing memmapped arrays with their file paths

Source code in src/pydvl/valuation/dataset.py

def __getstate__(self):
    """Prepare the object state for pickling replacing memmapped arrays with
    their file paths"""
    state = self.__dict__.copy()

    if isinstance(self._x, np.memmap):
        state["_x"] = Path(self._x.filename)
    if isinstance(self._y, np.memmap):
        state["_y"] = Path(self._y.filename)
    return state

setstate ¶

__setstate__(state)

Restore the object state from pickling.

Source code in src/pydvl/valuation/dataset.py

def __setstate__(self, state):
    """Restore the object state from pickling."""
    self.__dict__.update(state)
    self._x = _maybe_open_mmap(self._x, dtype=self._x_dtype, shape=self._x_shape)
    self._y = _maybe_open_mmap(self._y, dtype=self._y_dtype, shape=self._y_shape)

data ¶

data(
    indices: int | slice | Sequence[int] | NDArray[int_] | None = None,
) -> RawData

Given a set of indices, returns the training data that refer to those indices, as a read-only tuple-like structure.

This is used mainly by subclasses of UtilityBase to retrieve subsets of the data from indices.

PARAMETER	DESCRIPTION
`indices`	Optional indices that will be used to select points from the training data. If `None`, the entire training data will be returned. TYPE: `int \| slice \| Sequence[int] \| NDArray[int_] \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`RawData`	If `indices` is not `None`, the selected x and y arrays from the training data. Otherwise, the entire dataset.

Source code in src/pydvl/valuation/dataset.py

def data(
    self, indices: int | slice | Sequence[int] | NDArray[np.int_] | None = None
) -> RawData:
    """Given a set of indices, returns the training data that refer to those
    indices, as a read-only tuple-like structure.

    This is used mainly by subclasses of
    [UtilityBase][pydvl.valuation.utility.base.UtilityBase] to retrieve subsets of
    the data from indices.

    Args:
        indices: Optional indices that will be used to select points from
            the training data. If `None`, the entire training data will be
            returned.

    Returns:
        If `indices` is not `None`, the selected x and y arrays from the
            training data. Otherwise, the entire dataset.
    """
    if indices is None:
        return RawData(self._x, self._y)
    if isinstance(indices, Integral):
        indices = [indices]  # type: ignore
    return RawData(self._x[indices], self._y[indices])

data_indices ¶

data_indices(
    indices: Sequence[int] | NDArray[int_] | slice | None = None,
) -> NDArray[int_]

Returns a subset of indices.

This is equivalent to using Dataset.indices[logical_indices] but allows subclasses to define special behaviour, e.g. when indices in Dataset do not match the indices in the data.

For Dataset, this is a simple pass-through.

PARAMETER	DESCRIPTION
`indices`	A set of indices held by this object TYPE: `Sequence[int] \| NDArray[int_] \| slice \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`NDArray[int_]`	The indices of the data points in the data array.

Source code in src/pydvl/valuation/dataset.py

def data_indices(
    self, indices: Sequence[int] | NDArray[np.int_] | slice | None = None
) -> NDArray[np.int_]:
    """Returns a subset of indices.

    This is equivalent to using `Dataset.indices[logical_indices]` but allows
    subclasses to define special behaviour, e.g. when indices in `Dataset` do not
    match the indices in the data.

    For `Dataset`, this is a simple pass-through.

    Args:
        indices: A set of indices held by this object

    Returns:
        The indices of the data points in the data array.
    """
    if indices is None:
        return self._indices
    if isinstance(indices, slice):
        return self._indices[indices]
    return cast(NDArray, self._indices[to_numpy(indices)])

feature ¶

feature(name: str) -> tuple[slice, int]

Returns a slice for the feature with the given name.

Source code in src/pydvl/valuation/dataset.py

def feature(self, name: str) -> tuple[slice, int]:
    """Returns a slice for the feature with the given name."""
    try:
        feature_idx = np.where(self.feature_names == name)[0][0]
        return np.index_exp[:, feature_idx]  # type: ignore
    except IndexError as e:
        raise ValueError(f"Feature {name} is not in {self.feature_names}") from e

from_arrays `classmethod` ¶

from_arrays(
    X: ArrayT,
    y: ArrayT,
    train_size: float = 0.8,
    random_state: int | None = None,
    stratify_by_target: bool = False,
    **kwargs: Any,
) -> tuple[Dataset, Dataset]

Constructs a Dataset object from X and y arrays as returned by the make_* functions in sklearn generated datasets.

Example

>>> from pydvl.valuation.dataset import Dataset
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression()
>>> dataset = Dataset.from_arrays(X, y)

PARAMETER	DESCRIPTION
`X`	array of shape (n_samples, n_features) - either numpy array or PyTorch TYPE: `ArrayT`
`y`	array of shape (n_samples,) - must be same type as X TYPE: `ArrayT`
`train_size`	size of the training dataset. Used in `train_test_split` TYPE: `float` DEFAULT: `0.8`
`random_state`	seed for train / test split TYPE: `int \| None` DEFAULT: `None`
`stratify_by_target`	If `True`, data is split in a stratified fashion, using the y variable as labels. Read more in [sklearn's user guide](https://scikit-learn.org/stable/modules/cross_validation.html stratification).¶ TYPE: `bool` DEFAULT: `False`
`kwargs`	Additional keyword arguments to pass to the Dataset constructor. Use this to pass e.g. `feature_names` or `target_names`. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`tuple[Dataset, Dataset]`	Object with the passed X and y arrays split across training and test sets.

New in version 0.4.0

Changed in version 0.6.0

Added kwargs to pass to the Dataset constructor.

Changed in version 0.10.0

Returns a tuple of two Dataset objects.

New in version 0.11.0

Added support for PyTorch tensors.

Source code in src/pydvl/valuation/dataset.py

@classmethod
def from_arrays(
    cls,
    X: ArrayT,
    y: ArrayT,
    train_size: float = 0.8,
    random_state: int | None = None,
    stratify_by_target: bool = False,
    **kwargs: Any,
) -> tuple[Dataset, Dataset]:
    """Constructs a [Dataset][pydvl.valuation.dataset.Dataset] object from X and
    y arrays as returned by the `make_*` functions in [sklearn generated datasets](
    https://scikit-learn.org/stable/datasets/sample_generators.html).

    ??? Example
        ```pycon
        >>> from pydvl.valuation.dataset import Dataset
        >>> from sklearn.datasets import make_regression
        >>> X, y = make_regression()
        >>> dataset = Dataset.from_arrays(X, y)
        ```

    Args:
        X: array of shape (n_samples, n_features) - either numpy array or PyTorch
        tensor
        y: array of shape (n_samples,) - must be same type as X
        train_size: size of the training dataset. Used in `train_test_split`
        random_state: seed for train / test split
        stratify_by_target: If `True`, data is split in a stratified fashion,
            using the y variable as labels. Read more in [sklearn's user
            guide](https://scikit-learn.org/stable/modules/cross_validation.html
            #stratification).
        kwargs: Additional keyword arguments to pass to the
            [Dataset][pydvl.valuation.dataset.Dataset] constructor. Use this to pass
            e.g. `feature_names` or `target_names`.

    Returns:
        Object with the passed X and y arrays split across training and test sets.

    !!! tip "New in version 0.4.0"

    !!! tip "Changed in version 0.6.0"
        Added kwargs to pass to the [Dataset][pydvl.valuation.dataset.Dataset]
        constructor.

    !!! tip "Changed in version 0.10.0"
        Returns a tuple of two [Dataset][pydvl.valuation.dataset.Dataset] objects.

    !!! tip "New in version 0.11.0"
        Added support for PyTorch tensors.
    """
    if stratify_by_target:
        # Use our stratified_split_indices function for proper tensor handling
        train_indices, test_indices = stratified_split_indices(
            y, train_size=train_size, random_state=random_state
        )

        x_train, y_train = X[train_indices], y[train_indices]
        x_test, y_test = X[test_indices], y[test_indices]
    else:
        x_train, x_test, y_train, y_test = train_test_split(
            X,
            y,
            train_size=train_size,
            random_state=random_state,
        )

    return cls(x_train, cast(ArrayT, y_train), **kwargs), cls(
        x_test, cast(ArrayT, y_test), **kwargs
    )

from_sklearn `classmethod` ¶

from_sklearn(
    data: Bunch,
    train_size: int | float = 0.8,
    random_state: int | None = None,
    stratify_by_target: bool = False,
    **kwargs,
) -> tuple[Dataset, Dataset]

Constructs two Dataset objects from a sklearn.utils.Bunch, as returned by the load_* functions in scikit-learn toy datasets.

Example

>>> from pydvl.valuation.dataset import Dataset
>>> from sklearn.datasets import load_boston  # noqa
>>> train, test = Dataset.from_sklearn(load_boston())

PARAMETER	DESCRIPTION
`data`	scikit-learn Bunch object. The following attributes are supported: `data`: covariates. `target`: target variables (labels). `feature_names` (optional): the feature names. `target_names` (optional): the target names. `DESCR` (optional): a description. TYPE: `Bunch`
`train_size`	size of the training dataset. Used in `train_test_split` float values represent the fraction of the dataset to include in the training split and should be in (0,1). An integer value sets the absolute number of training samples. TYPE: `int \| float` DEFAULT: `0.8`

the value is automatically set to the complement of the test size. random_state: seed for train / test split stratify_by_target: If True, data is split in a stratified fashion, using the target variable as labels. Read more in scikit-learn's user guide. kwargs: Additional keyword arguments to pass to the Dataset constructor. Use this to pass e.g. is_multi_output.

RETURNS	DESCRIPTION
`tuple[Dataset, Dataset]`	Object with the sklearn dataset

Changed in version 0.6.0

Added kwargs to pass to the Dataset constructor.

Changed in version 0.10.0

Returns a tuple of two Dataset objects.

Source code in src/pydvl/valuation/dataset.py

@classmethod
def from_sklearn(
    cls,
    data: Bunch,
    train_size: int | float = 0.8,
    random_state: int | None = None,
    stratify_by_target: bool = False,
    **kwargs,
) -> tuple[Dataset, Dataset]:
    """Constructs two [Dataset][pydvl.valuation.dataset.Dataset] objects from a
    [sklearn.utils.Bunch][], as returned by the `load_*`
    functions in [scikit-learn toy datasets](
    https://scikit-learn.org/stable/datasets/toy_dataset.html).

    ??? Example
        ```pycon
        >>> from pydvl.valuation.dataset import Dataset
        >>> from sklearn.datasets import load_boston  # noqa
        >>> train, test = Dataset.from_sklearn(load_boston())
        ```

    Args:
        data: scikit-learn Bunch object. The following attributes are supported:

            - `data`: covariates.
            - `target`: target variables (labels).
            - `feature_names` (**optional**): the feature names.
            - `target_names` (**optional**): the target names.
            - `DESCR` (**optional**): a description.
        train_size: size of the training dataset. Used in `train_test_split`
            float values represent the fraction of the dataset to include in the
            training split and should be in (0,1). An integer value sets the
            absolute number of training samples.
    the value is automatically set to the complement of the test size.
        random_state: seed for train / test split
        stratify_by_target: If `True`, data is split in a stratified
            fashion, using the target variable as labels. Read more in
            [scikit-learn's user guide](
            https://scikit-learn.org/stable/modules/cross_validation.html
            #stratification).
        kwargs: Additional keyword arguments to pass to the
            [Dataset][pydvl.valuation.dataset.Dataset] constructor. Use this to
            pass e.g. `is_multi_output`.

    Returns:
        Object with the sklearn dataset

    !!! tip "Changed in version 0.6.0"
        Added kwargs to pass to the [Dataset][pydvl.valuation.dataset.Dataset]
        constructor.
    !!! tip "Changed in version 0.10.0"
        Returns a tuple of two [Dataset][pydvl.valuation.dataset.Dataset] objects.
    """
    x_train, x_test, y_train, y_test = train_test_split(
        data.data,
        data.target,
        train_size=train_size,
        random_state=random_state,
        stratify=data.target if stratify_by_target else None,
    )
    return (
        cls(
            x_train,
            y_train,
            feature_names=data.get("feature_names"),
            target_names=data.get("target_names"),
            description=data.get("DESCR"),
            **kwargs,
        ),
        cls(
            x_test,
            y_test,
            feature_names=data.get("feature_names"),
            target_names=data.get("target_names"),
            description=data.get("DESCR"),
            **kwargs,
        ),
    )

logical_indices ¶

logical_indices(
    indices: Sequence[int] | NDArray[int_] | slice | None = None,
) -> NDArray[int_]

Returns the indices in this Dataset for the given indices in the data array.

This is equivalent to using Dataset.indices[data_indices] but allows subclasses to define special behaviour, e.g. when indices in Dataset do not match the indices in the data.

PARAMETER	DESCRIPTION
`indices`	A set of indices in the data array. TYPE: `Sequence[int] \| NDArray[int_] \| slice \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`NDArray[int_]`	The abstract indices for the given data indices.

Source code in src/pydvl/valuation/dataset.py

def logical_indices(
    self, indices: Sequence[int] | NDArray[np.int_] | slice | None = None
) -> NDArray[np.int_]:
    """Returns the indices in this `Dataset` for the given indices in the data
    array.

    This is equivalent to using `Dataset.indices[data_indices]` but allows
    subclasses to define special behaviour, e.g. when indices in `Dataset` do not
    match the indices in the data.

    Args:
        indices: A set of indices in the data array.

    Returns:
        The abstract indices for the given data indices.
    """
    if indices is None:
        return self._indices
    if isinstance(indices, slice):
        return self._indices[indices]
    return cast(NDArray, self._indices[to_numpy(indices)])

target ¶

target(name: str) -> tuple[slice, int] | slice

Returns a slice or index for the target with the given name.

If targets are multidimensional (2D array), returns a tuple (slice(None), target_idx). If targets are 1D, returns just a slice(None).

PARAMETER	DESCRIPTION
`name`	The name of the target to retrieve. TYPE: `str`

RETURNS	DESCRIPTION
`tuple[slice, int] \| slice`	For multi-output targets: tuple of (slice(None), target_idx)
`tuple[slice, int] \| slice`	For single-output targets: target_idx (usually 0)

RAISES	DESCRIPTION
`ValueError`	If the target name is not found.

Source code in src/pydvl/valuation/dataset.py

def target(self, name: str) -> tuple[slice, int] | slice:
    """
    Returns a slice or index for the target with the given name.

    If targets are multidimensional (2D array), returns a tuple (slice(None),
    target_idx). If targets are 1D, returns just a slice(None).

    Args:
        name: The name of the target to retrieve.

    Returns:
        For multi-output targets: tuple of (slice(None), target_idx)
        For single-output targets: target_idx (usually 0)

    Raises:
        ValueError: If the target name is not found.
    """
    try:
        target_idx = np.where(self.target_names == name)[0][0]
        if self.n_targets == 1:
            return slice(None)
        else:
            return slice(None), target_idx
    except IndexError as e:
        raise ValueError(f"Target {name} is not in {self.target_names}") from e

DummyGameUtility ¶

DummyGameUtility(
    score: Callable[[NDArray], float], score_range: tuple[float, float]
)

Bases: UtilityBase

Dummy game utility

This class is used internally inside the Game class.

PARAMETER	DESCRIPTION
`score`	Function to compute the score of a coalition. TYPE: `Callable[[NDArray], float]`
`score_range`	Minimum and maximum values of the score function. TYPE: `tuple[float, float]`

Source code in src/pydvl/valuation/games.py

def __init__(
    self, score: Callable[[NDArray], float], score_range: tuple[float, float]
):
    self.score = score
    self.score_range = score_range

training_data `property` ¶

training_data: Dataset | None

Retrieves the training data used by this utility.

This property is read-only. In order to set it, use with_dataset().

str ¶

__str__()

Returns a string representation of the utility. Subclasses should override this method to provide a more informative string

Source code in src/pydvl/valuation/utility/base.py

def __str__(self):
    """Returns a string representation of the utility.
    Subclasses should override this method to provide a more informative string
    """
    return f"{self.__class__.__name__}"

DummyModel ¶

DummyModel()

Bases: SupervisedModel[NDArray, NDArray]

Dummy model class.

A dummy supervised model used for testing purposes only.

Source code in src/pydvl/valuation/games.py

def __init__(self):
    pass

Game ¶

Game(n_players: int, score_range: tuple[float, float] = (-inf, inf))

Bases: ABC

Base class for games

Any Game subclass has to implement the abstract _score method to assign a score to each coalition/subset and at least one of shapley_values, least_core_values, or banzhaf_values.

PARAMETER	DESCRIPTION
`n_players`	Number of players that participate in the game. TYPE: `int`
`score_range`	Minimum and maximum values of the `_score` method. TYPE: `tuple[float, float]` DEFAULT: `(-inf, inf)`

ATTRIBUTE	DESCRIPTION
`n_players`	Number of players that participate in the game.
`data`	Dummy dataset object.
`u`	Utility object with a dummy model and dataset.

Source code in src/pydvl/valuation/games.py

def __init__(
    self,
    n_players: int,
    score_range: tuple[float, float] = (-np.inf, np.inf),
):
    self.n_players = n_players
    self.data = DummyGameDataset(
        n_players=self.n_players,
        description=f"Dummy data for {self.__class__.__name__}",
    )
    self.u = DummyGameUtility(score=self._score, score_range=score_range)

MinerGame ¶

MinerGame(n_players: int)

Bases: Game

Toy game that is used for testing and demonstration purposes.

Consider a group of n miners, who have discovered large bars of gold.

If two miners can carry one piece of gold, then the payoff of a coalition \(S\) is:

\[{ v(S) = \left\{\begin{array}{lll} \mid S \mid / 2, & \text{ if} & \mid S \mid \text{ is even} \\ ( \mid S \mid - 1)/2, & \text{ otherwise} \end{array}\right. }\]

If there are more than two miners and there is an even number of miners, then the core consists of the single payoff where each miner gets 1/2.

If there is an odd number of miners, then the core is empty.

Taken from Wikipedia

PARAMETER	DESCRIPTION
`n_players`	Number of miners that participate in the game. TYPE: `int`
`data_description`

Source code in src/pydvl/valuation/games.py

def __init__(self, n_players: int):
    if n_players <= 2:
        raise ValueError(f"n_players, {n_players}, should be > 2")
    super().__init__(n_players, score_range=(0, n_players // 2))

MinimumSpanningTreeGame ¶

MinimumSpanningTreeGame(n_players: int = 100)

Bases: Game

Toy game that is used for testing and demonstration purposes.

A minimum spanning tree game defined in (Castro et al., 2009)¹.

Let \(G = (N \cup \{0\},E)\) be a valued graph where \(N = \{1,\dots,100\}\), and the cost associated to an edge \((i, j)\) is:

\[{ c_{ij} = \left\{\begin{array}{lll} 1, & \text{ if} & i = j + 1 \text{ or } i = j - 1 \\ & & \text{ or } (i = 1 \text{ and } j = 100) \text{ or } (i = 100 \text{ and } j = 1) \\ 101, & \text{ if} & i = 0 \text{ or } j = 0 \\ \infty, & \text{ otherwise} \end{array}\right. }\]

A minimum spanning tree game \((N, c)\) is a cost game, where for a given coalition \(S \subset N\), \(v(S)\) is the sum of the edge cost of the minimum spanning tree, i.e. \(v(S)\) = Minimum Spanning Tree of the graph \(G|_{S\cup\{0\}}\), which is the partial graph restricted to the players \(S\) and the source node \(0\).

PARAMETER	DESCRIPTION
`n_players`	Number of players that participate in the game. TYPE: `int` DEFAULT: `100`

Source code in src/pydvl/valuation/games.py

def __init__(self, n_players: int = 100) -> None:
    if n_players != 100:
        raise ValueError(
            f"{self.__class__.__name__} only supports n_players=100 but got {n_players=}."
        )
    super().__init__(n_players, score_range=(0, np.inf))

    graph = np.zeros(shape=(self.n_players, self.n_players))

    for i in range(self.n_players):
        for j in range(self.n_players):
            if (
                i == j + 1
                or i == j - 1
                or (i == 1 and j == self.n_players - 1)
                or (i == self.n_players - 1 and j == 1)
            ):
                graph[i, j] = 1
            elif i == 0 or j == 0:
                graph[i, j] = 0
            else:
                graph[i, j] = np.inf
    assert np.all(graph == graph.T)

    self.graph = graph

ShoesGame ¶

ShoesGame(left: int, right: int)

Bases: Game

Toy game that is used for testing and demonstration purposes.

A shoes game defined in (Castro et al., 2009)¹.

In this game, some players have a left shoe and others a right shoe.

The payoff (utility) of a coalition \(S\) is:

\[{ U(S) = \min( \mid S \cap L \mid, \mid S \cap R \mid ) }\]

Where \(L\), respectively \(R\), is the set of players with left shoes, respectively right shoes. This means that the marginal contribution of a player with a left shoe to a coalition \(S\) is 1 if the number of players with a left shoe in \(S\) is strictly less than the number of players with a right shoe in \(S\), and 0 otherwise. Let player \(i\) have a left shoe, then:

\[{ U(S_{+i}) - U(S) = \left\{ \begin{array}{ll} 1, & \text{ if} \mid S \cap L \mid < \mid S \cap R \mid \\ 0, & \text{ otherwise} \end{array} \right. }\]

The situation is analogous for players with a right shoe. In order to compute the Shapley or Banzhaf value for a player \(i\) with a left shoe, we need then the number of subsets \(S\) of \(D_{-i}\) such that \(\mid S \cap L \mid < \mid S \cap R \mid\). This number is given by the sum:

\[\sum^{| L |}_{i = 0} \sum_{j > i}^{| R |} \binom{| L |}{i} \binom{| R |}{j}.\]

PARAMETER	DESCRIPTION
`left`	Number of players with a left shoe. TYPE: `int`
`right`	Number of players with a right shoe. TYPE: `int`

Source code in src/pydvl/valuation/games.py

def __init__(self, left: int, right: int):
    self.left = left
    self.right = right
    n_players = self.left + self.right
    max_score = n_players // 2
    super().__init__(n_players, score_range=(0, max_score))

banzhaf_values `cached` ¶

banzhaf_values() -> ValuationResult

We use the fact that the marginal utility of a coalition S is 1 if |S ∩ L| < |S ∩ R| and 0 otherwise, and simply count those sets.

The solution for left or right shoes is symmetrical.

Source code in src/pydvl/valuation/games.py

@lru_cache
def banzhaf_values(self) -> ValuationResult:
    """
    We use the fact that the marginal utility of a coalition S is 1 if
    |S ∩ L| < |S ∩ R| and 0 otherwise, and simply count those sets.

    The solution for left or right shoes is symmetrical.
    """
    m = self.n_players - 1
    left_value = self.n_subsets_left(self.left - 1, self.right) / 2**m
    right_value = self.n_subsets_right(self.left, self.right - 1) / 2**m

    exact_values = np.array([left_value] * self.left + [right_value] * self.right)
    return ValuationResult(
        algorithm="exact_banzhaf",
        status=Status.Converged,
        indices=self.data.indices,
        values=exact_values,
        variances=np.zeros_like(self.data.data().x),
        counts=np.zeros_like(self.data.data().x),
    )

shapley_values `cached` ¶

shapley_values() -> ValuationResult

We use the fact that the marginal utility of a coalition S of size k is 1 if |S ∩ L| < |S ∩ R| and 0 otherwise, and compute Shapley values with the formula that iterates over subset sizes.

The solution for left or right shoes is symmetrical

Source code in src/pydvl/valuation/games.py

@lru_cache
def shapley_values(self) -> ValuationResult:
    """
    We use the fact that the marginal utility of a coalition S of size k is 1 if
    |S ∩ L| < |S ∩ R| and 0 otherwise, and compute Shapley values with the formula
    that iterates over subset sizes.

    The solution for left or right shoes is symmetrical
    """
    left_value = 0.0
    right_value = 0.0
    m = self.n_players - 1
    for k in range(m + 1):
        left_value += (
            1 / math.comb(m, k) * self.n_subsets_left(self.left - 1, self.right, k)
        )
        right_value += (
            1 / math.comb(m, k) * self.n_subsets_right(self.left, self.right - 1, k)
        )
    left_value /= self.n_players
    right_value /= self.n_players
    exact_values = np.array([left_value] * self.left + [right_value] * self.right)
    return ValuationResult(
        algorithm="exact_shapley",
        status=Status.Converged,
        indices=self.data.indices,
        values=exact_values,
        variances=np.zeros_like(self.data.data().x),
        counts=np.zeros_like(self.data.data().x),
    )

SymmetricVotingGame ¶

SymmetricVotingGame(n_players: int)

Bases: Game

Toy game that is used for testing and demonstration purposes.

A symmetric voting game defined in (Castro et al., 2009)¹ Section 4.1

For this game the utility of a coalition is 1 if its cardinality is greater than num_samples/2, or 0 otherwise.

\[{ v(S) = \left\{\begin{array}{ll} 1, & \text{ if} \quad \mid S \mid > \frac{N}{2} \\ 0, & \text{ otherwise} \end{array}\right. }\]

PARAMETER	DESCRIPTION
`n_players`	Number of players that participate in the game. TYPE: `int`

Source code in src/pydvl/valuation/games.py

def __init__(self, n_players: int):
    if n_players % 2 != 0:
        raise ValueError("n_players must be an even number.")
    super().__init__(n_players, score_range=(0, 1))

pydvl.valuation.games ¶

References¶

AirportGame ¶

AsymmetricVotingGame ¶

DummyGameDataset ¶

indices property ¶

n_features property ¶

n_targets property ¶

names property ¶

__getstate__ ¶

__setstate__ ¶

data ¶

data_indices ¶

feature ¶

from_arrays classmethod ¶

stratification).¶

from_sklearn classmethod ¶

logical_indices ¶

target ¶

DummyGameUtility ¶

training_data property ¶

__str__ ¶

DummyModel ¶

Game ¶

MinerGame ¶

MinimumSpanningTreeGame ¶

ShoesGame ¶

banzhaf_values cached ¶

shapley_values cached ¶

SymmetricVotingGame ¶

indices `property` ¶

n_features `property` ¶

n_targets `property` ¶

names `property` ¶

getstate ¶

setstate ¶

from_arrays `classmethod` ¶

from_sklearn `classmethod` ¶

training_data `property` ¶

str ¶

banzhaf_values `cached` ¶

shapley_values `cached` ¶