Skip to content

Owen

References


  1. Okhrati, R., Lipani, A., 2021. A Multilinear Sampling Algorithm to Estimate Shapley Values. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 7992–7999. IEEE. 

owen_sampling_shapley(u, n_samples, max_q, *, method=OwenAlgorithm.Standard, n_jobs=1, config=ParallelConfig(), progress=False, seed=None)

Owen sampling of Shapley values as described in (Okhrati and Lipani, 2021)1.

This function computes a Monte Carlo approximation to

\[v_u(i) = \int_0^1 \mathbb{E}_{S \sim P_q(D_{\backslash \{i\}})} [u(S \cup \{i\}) - u(S)]\]

using one of two methods. The first one, selected with the argument mode = OwenAlgorithm.Standard, approximates the integral with:

\[\hat{v}_u(i) = \frac{1}{Q M} \sum_{j=0}^Q \sum_{m=1}^M [u(S^{(q_j)}_m \cup \{i\}) - u(S^{(q_j)}_m)],\]

where \(q_j = \frac{j}{Q} \in [0,1]\) and the sets \(S^{(q_j)}\) are such that a sample \(x \in S^{(q_j)}\) if a draw from a \(Ber(q_j)\) distribution is 1.

The second method, selected with the argument mode = OwenAlgorithm.Antithetic, uses correlated samples in the inner sum to reduce the variance:

\[\hat{v}_u(i) = \frac{1}{2 Q M} \sum_{j=0}^Q \sum_{m=1}^M [u(S^{(q_j)}_m \cup \{i\}) - u(S^{(q_j)}_m) + u((S^{(q_j)}_m)^c \cup \{i\}) - u((S^{( q_j)}_m)^c)],\]

where now \(q_j = \frac{j}{2Q} \in [0,\frac{1}{2}]\), and \(S^c\) is the complement of \(S\).

Note

The outer integration could be done instead with a quadrature rule.

PARAMETER DESCRIPTION
u

Utility object holding data, model and scoring function.

TYPE: Utility

n_samples

Numer of sets to sample for each value of q

TYPE: int

max_q

Number of subdivisions for q ∈ [0,1] (the element sampling probability) used to approximate the outer integral.

TYPE: int

method

Selects the algorithm to use, see the description. Either [OwenAlgorithm.Full][pydvl.value.shapley.owen.OwenAlgorithm] for \(q \in [0,1]\) or [OwenAlgorithm.Halved][pydvl.value.shapley.owen.OwenAlgorithm] for \(q \in [0,0.5]\) and correlated samples

TYPE: OwenAlgorithm DEFAULT: Standard

n_jobs

Number of parallel jobs to use. Each worker receives a chunk of the total of max_q values for q.

TYPE: int DEFAULT: 1

config

Object configuring parallel computation, with cluster address, number of cpus, etc.

TYPE: ParallelConfig DEFAULT: ParallelConfig()

progress

Whether to display progress bars for each job.

TYPE: bool DEFAULT: False

seed

Either an instance of a numpy random number generator or a seed for it.

TYPE: Optional[Seed] DEFAULT: None

RETURNS DESCRIPTION
ValuationResult

Object with the data values.

New in version 0.3.0

Changed in version 0.5.0

Support for parallel computation and enable antithetic sampling.

Source code in src/pydvl/value/shapley/owen.py
def owen_sampling_shapley(
    u: Utility,
    n_samples: int,
    max_q: int,
    *,
    method: OwenAlgorithm = OwenAlgorithm.Standard,
    n_jobs: int = 1,
    config: ParallelConfig = ParallelConfig(),
    progress: bool = False,
    seed: Optional[Seed] = None
) -> ValuationResult:
    r"""Owen sampling of Shapley values as described in
    (Okhrati and Lipani, 2021)<sup><a href="#okhrati_multilinear_2021">1</a></sup>.

    This function computes a Monte Carlo approximation to

    $$v_u(i) = \int_0^1 \mathbb{E}_{S \sim P_q(D_{\backslash \{i\}})}
    [u(S \cup \{i\}) - u(S)]$$

    using one of two methods. The first one, selected with the argument ``mode =
    OwenAlgorithm.Standard``, approximates the integral with:

    $$\hat{v}_u(i) = \frac{1}{Q M} \sum_{j=0}^Q \sum_{m=1}^M [u(S^{(q_j)}_m
    \cup \{i\}) - u(S^{(q_j)}_m)],$$

    where $q_j = \frac{j}{Q} \in [0,1]$ and the sets $S^{(q_j)}$ are such that a
    sample $x \in S^{(q_j)}$ if a draw from a $Ber(q_j)$ distribution is 1.

    The second method, selected with the argument ``mode =
    OwenAlgorithm.Antithetic``, uses correlated samples in the inner sum to
    reduce the variance:

    $$\hat{v}_u(i) = \frac{1}{2 Q M} \sum_{j=0}^Q \sum_{m=1}^M [u(S^{(q_j)}_m
    \cup \{i\}) - u(S^{(q_j)}_m) + u((S^{(q_j)}_m)^c \cup \{i\}) - u((S^{(
    q_j)}_m)^c)],$$

    where now $q_j = \frac{j}{2Q} \in [0,\frac{1}{2}]$, and $S^c$ is the
    complement of $S$.

    !!! Note
        The outer integration could be done instead with a quadrature rule.

    Args:
        u: [Utility][pydvl.utils.utility.Utility] object holding data, model
            and scoring function.
        n_samples: Numer of sets to sample for each value of q
        max_q: Number of subdivisions for q ∈ [0,1] (the element sampling
            probability) used to approximate the outer integral.
        method: Selects the algorithm to use, see the description. Either
            [OwenAlgorithm.Full][pydvl.value.shapley.owen.OwenAlgorithm] for
            $q \in [0,1]$ or
            [OwenAlgorithm.Halved][pydvl.value.shapley.owen.OwenAlgorithm] for
            $q \in [0,0.5]$ and correlated samples
        n_jobs: Number of parallel jobs to use. Each worker receives a chunk
            of the total of `max_q` values for q.
        config: Object configuring parallel computation, with cluster address,
            number of cpus, etc.
        progress: Whether to display progress bars for each job.
        seed: Either an instance of a numpy random number generator or a seed for it.

    Returns:
        Object with the data values.

    !!! tip "New in version 0.3.0"

    !!! tip "Changed in version 0.5.0"
        Support for parallel computation and enable antithetic sampling.

    """
    map_reduce_job: MapReduceJob[NDArray, ValuationResult] = MapReduceJob(
        u.data.indices,
        map_func=_owen_sampling_shapley,
        reduce_func=lambda results: reduce(operator.add, results),
        map_kwargs=dict(
            u=u,
            method=OwenAlgorithm(method),
            n_samples=n_samples,
            max_q=max_q,
            progress=progress,
        ),
        n_jobs=n_jobs,
        config=config,
    )

    return map_reduce_job(seed=seed)