pydvl.parallel.map_reduce
¶
This module contains a wrapper around joblib's Parallel()
class that makes it
easy to run map-reduce jobs.
Deprecation
This interface might be deprecated or changed in a future release before 1.0
MapReduceJob
¶
MapReduceJob(
inputs: Union[Collection[T], T],
map_func: MapFunction[R],
reduce_func: ReduceFunction[R] = identity,
parallel_backend: Optional[ParallelBackend] = None,
config: Optional[ParallelConfig] = None,
*,
map_kwargs: Optional[Dict] = None,
reduce_kwargs: Optional[Dict] = None,
n_jobs: int = -1,
timeout: Optional[float] = None
)
Bases: Generic[T, R]
Takes an embarrassingly parallel fun and runs it in n_jobs
parallel
jobs, splitting the data evenly into a number of chunks equal to the number of jobs.
Typing information for objects of this class requires the type of the inputs
that are split for map_func
and the type of its output.
PARAMETER | DESCRIPTION |
---|---|
inputs |
The input that will be split and passed to
TYPE:
|
map_func |
Function that will be applied to the input chunks in each job.
TYPE:
|
reduce_func |
Function that will be applied to the results of
TYPE:
|
map_kwargs |
Keyword arguments that will be passed to |
reduce_kwargs |
Keyword arguments that will be passed to |
parallel_backend |
Parallel backend instance to use
for parallelizing computations. If
TYPE:
|
config |
(DEPRECATED) Object configuring parallel computation, with cluster address, number of cpus, etc.
TYPE:
|
n_jobs |
Number of parallel jobs to run. Does not accept 0
TYPE:
|
Example
A simple usage example with 2 jobs:
>>> from pydvl.parallel import MapReduceJob
>>> import numpy as np
>>> map_reduce_job: MapReduceJob[np.ndarray, np.ndarray] = MapReduceJob(
... np.arange(5),
... map_func=np.sum,
... reduce_func=np.sum,
... n_jobs=2,
... )
>>> map_reduce_job()
10
When passed a single object as input, it will be repeated for each job:
Source code in src/pydvl/parallel/map_reduce.py
n_jobs
property
writable
¶
n_jobs: int
Effective number of jobs according to the used ParallelBackend instance.
__call__
¶
__call__(seed: Optional[Union[Seed, SeedSequence]] = None) -> R
Runs the map-reduce job.
PARAMETER | DESCRIPTION |
---|---|
seed |
Either an instance of a numpy random number generator or a seed for it.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
R
|
The result of the reduce function. |