kliff.parallel#
- kliff.parallel.parmap1(f, X, *args, tuple_X=False, nprocs=2)[source]#
Parallelism over data.
This function mimics
multiprocessing.Pool.map
to allow extra arguments to be used for the functionf
.- Parameters
f (function) – The function that operates on the data.
X (list) – Data to be parallelized.
args (args) – Extra positional arguments needed by the function
f
.tuple_X (bool) – This depends on
X
. It should be set toTrue
if multiple arguments are parallelized and set toFalse
if only one argument is parallelized. SeeExample
below.nprocs (int) – Number of processors to use.
- Returns
A list of results, corresponding to
X
.- Return type
list
Note
The data is put into a job queue, a worker process gets a piece of the data to work on, the worker pushes the result back to the manager through another queue, and then get another piece of data until the job queue is empty. So, in principle, there will not be idle worker and it should be faster than
kliff.parallel.parmap2()
.Warning
This is implemented using
multiprocessing.Queue
, which requires the function``f`` to be picklable. If it is not the case (e.g. use KIM library functions), usekliff.parallel.parmap2()
that is based onmultiprocessing.Pipe
.Example
>>> def func(x, y, z=1): >>> return x+y+z >>> X = range(3) >>> Y = range(3) >>> parmap1(func, X, 1, nprocs=2) # [2,3,4] >>> parmap1(func, X, 1, 1, nprocs=2) # [2,3,4] >>> parmap1(func, zip(X, Y), tuple_X=True, nprocs=2) # [1,3,5] >>> parmap1(func, zip(X, Y), 1, tuple_X=True, nprocs=2) # [1,3,5]
- kliff.parallel.parmap2(f, X, *args, tuple_X=False, nprocs=2)[source]#
Parallelism over data.
This is to mimic
multiprocessing.Pool.map
, which requires the functionf
to be picklable. This function does not have this restriction and allows extra arguments to be used for the functionf
.- Parameters
f (function) – The function that operates on the data.
X (list) – Data to be parallelized.
args (args) – Extra positional arguments needed by the function
f
.tuple_X (bool) – This depends on
X
. It should be set toTrue
if multiple arguments are parallelized and set toFalse
if only one argument is parallelized. SeeExample
below.nprocs (int) – Number of processors to use.
- Returns
A list of results, corresponding to
X
.- Return type
list
Note
This function is implemented using
multiprocessing.Pipe
. The data is subdivided intonprocs
groups and then each group of data is distributed to a process. The results from each group are then assembled together. The data is shuffled to balance the load in each process. Seekliff.parallel.parmap1()
for another implementation that usesmultiprocessing.Queue
.Example
>>> def func(x, y, z=1): >>> return x+y+z >>> X = range(3) >>> Y = range(3) >>> parmap2(func, X, 1, nprocs=2) # [2,3,4] >>> parmap2(func, X, 1, 1, nprocs=2) # [2,3,4] >>> parmap2(func, zip(X, Y), tuple_X=True, nprocs=2) # [1,3,5] >>> parmap2(func, zip(X, Y), 1, tuple_X=True, nprocs=2) # [1,3,5]