Map reduce
This module contains a wrapper around joblib's Parallel()
class that makes it
easy to run map-reduce jobs.
Deprecation
This interface might be deprecated or changed in a future release before 1.0
MapReduceJob(inputs, map_func, reduce_func=identity, map_kwargs=None, reduce_kwargs=None, config=ParallelConfig(), *, n_jobs=-1, timeout=None)
¶
Bases: Generic[T, R]
Takes an embarrassingly parallel fun and runs it in n_jobs
parallel
jobs, splitting the data evenly into a number of chunks equal to the number of jobs.
Typing information for objects of this class requires the type of the inputs
that are split for map_func
and the type of its output.
PARAMETER | DESCRIPTION |
---|---|
inputs |
The input that will be split and passed to
TYPE:
|
map_func |
Function that will be applied to the input chunks in each job.
TYPE:
|
reduce_func |
Function that will be applied to the results of
TYPE:
|
map_kwargs |
Keyword arguments that will be passed to |
reduce_kwargs |
Keyword arguments that will be passed to |
config |
Instance of ParallelConfig with cluster address, number of cpus, etc.
TYPE:
|
n_jobs |
Number of parallel jobs to run. Does not accept 0
TYPE:
|
Example
A simple usage example with 2 jobs:
>>> from pydvl.parallel import MapReduceJob
>>> import numpy as np
>>> map_reduce_job: MapReduceJob[np.ndarray, np.ndarray] = MapReduceJob(
... np.arange(5),
... map_func=np.sum,
... reduce_func=np.sum,
... n_jobs=2,
... )
>>> map_reduce_job()
10
When passed a single object as input, it will be repeated for each job:
Source code in src/pydvl/parallel/map_reduce.py
n_jobs: int
property
writable
¶
Effective number of jobs according to the used ParallelBackend instance.
__call__(seed=None)
¶
Runs the map-reduce job.
PARAMETER | DESCRIPTION |
---|---|
seed |
Either an instance of a numpy random number generator or a seed for it.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
R
|
The result of the reduce function. |
Source code in src/pydvl/parallel/map_reduce.py
Created: 2023-10-14