kliff.parallel#

kliff.parallel.parmap1(f, X, *args, tuple_X=False, nprocs=2)[source]#

Parallelism over data.

This function mimics multiprocessing.Pool.map to allow extra arguments to be used for the function f.

Parameters
  • f (function) – The function that operates on the data.

  • X (list) – Data to be parallelized.

  • args (args) – Extra positional arguments needed by the function f.

  • tuple_X (bool) – This depends on X. It should be set to True if multiple arguments are parallelized and set to False if only one argument is parallelized. See Example below.

  • nprocs (int) – Number of processors to use.

Returns

A list of results, corresponding to X.

Return type

list

Note

The data is put into a job queue, a worker process gets a piece of the data to work on, the worker pushes the result back to the manager through another queue, and then get another piece of data until the job queue is empty. So, in principle, there will not be idle worker and it should be faster than kliff.parallel.parmap2().

Warning

This is implemented using multiprocessing.Queue, which requires the function``f`` to be picklable. If it is not the case (e.g. use KIM library functions), use kliff.parallel.parmap2() that is based on multiprocessing.Pipe.

Example

>>> def func(x, y, z=1):
>>>     return x+y+z
>>> X = range(3)
>>> Y = range(3)
>>> parmap1(func, X, 1, nprocs=2)  # [2,3,4]
>>> parmap1(func, X, 1, 1, nprocs=2)  # [2,3,4]
>>> parmap1(func, zip(X, Y), tuple_X=True, nprocs=2)  # [1,3,5]
>>> parmap1(func, zip(X, Y), 1, tuple_X=True, nprocs=2)  # [1,3,5]
kliff.parallel.parmap2(f, X, *args, tuple_X=False, nprocs=2)[source]#

Parallelism over data.

This is to mimic multiprocessing.Pool.map, which requires the function f to be picklable. This function does not have this restriction and allows extra arguments to be used for the function f.

Parameters
  • f (function) – The function that operates on the data.

  • X (list) – Data to be parallelized.

  • args (args) – Extra positional arguments needed by the function f.

  • tuple_X (bool) – This depends on X. It should be set to True if multiple arguments are parallelized and set to False if only one argument is parallelized. See Example below.

  • nprocs (int) – Number of processors to use.

Returns

A list of results, corresponding to X.

Return type

list

Note

This function is implemented using multiprocessing.Pipe. The data is subdivided into nprocs groups and then each group of data is distributed to a process. The results from each group are then assembled together. The data is shuffled to balance the load in each process. See kliff.parallel.parmap1() for another implementation that uses multiprocessing.Queue.

Example

>>> def func(x, y, z=1):
>>>     return x+y+z
>>> X = range(3)
>>> Y = range(3)
>>> parmap2(func, X, 1, nprocs=2)  # [2,3,4]
>>> parmap2(func, X, 1, 1, nprocs=2)  # [2,3,4]
>>> parmap2(func, zip(X, Y), tuple_X=True, nprocs=2)  # [1,3,5]
>>> parmap2(func, zip(X, Y), 1, tuple_X=True, nprocs=2)  # [1,3,5]
kliff.parallel.get_MPI_world_size()[source]#
kliff.parallel.get_context()[source]#