Differences between revisions 1 and 29 (spanning 28 versions)
Revision 1 as of 2007-01-14 01:50:05
Size: 3971
Editor: PaulBoddie
Comment: Initial lists of solutions.
Revision 29 as of 2008-03-26 17:36:29
Size: 7988
Editor: PaulBoddie
Comment: Added Tahoe.
Deletions are marked like this. Additions are marked like this.
Line 7: Line 7:
Some libraries, often to preserve some similarity with more familiar concurrency models (such as Python's threading API), employ parallel processing techniques which limit their relevance to SMP-based hardware, mostly due to the usage of process creation functions such as the UNIX fork system call. However, a technique called process migration may permit such libraries to be useful in certain kinds of computational clusters as well, notably single-system image cluster solutions ([http://openmosix.sourceforge.net/ OpenMosix] being one such example). Some libraries, often to preserve some similarity with more familiar concurrency models (such as Python's threading API), employ parallel processing techniques which limit their relevance to SMP-based hardware, mostly due to the usage of process creation functions such as the UNIX fork system call. However, a technique called process migration may permit such libraries to be useful in certain kinds of computational clusters as well, notably [http://en.wikipedia.org/wiki/Single-system_image single-system image] cluster solutions ([http://sourceforge.net/projects/ssic-linux OpenSSI] and [http://openmosix.sourceforge.net/ OpenMosix] being examples).
Line 9: Line 9:
 * [http://www.python.org/pypi/parallel parallel/pprocess] - fork-based process creation with asynchronous channel-based communications
 * [http://www.parallelpython.com/ ppsmp] - process-based, job-oriented solution (''source code not available, has restrictive licence'')
 * [http://www.python.org/pypi/processing processing] - fork-based process creation (using threads on other platforms), implementing an API like the standard library's threading API and providing familiar objects such as queues and semaphores through the use of a manager process
 * [http://www.python.org/pypi/remoteD remoteD] - fork-based process creation with a dictionary-based communications paradigm
 * [http://lfw.org/python/delegate.html delegate] - fork-based process creation with pickled data sent through pipes
 * [http://www.honeypot.net/multi-processing-map-python forkmap] - fork-based process creation using a function resembling Python's built-in map function
 * [http://poshmodule.sourceforge.net/ POSH] Python Object Sharing is an extension module to Python that allows objects to be placed in shared memory. POSH allows concurrent processes to communicate simply by assigning objects to shared container objects. (''POSIX/UNIX/Linux only'')
 * [http://www.parallelpython.com/ pp] - process-based, job-oriented solution with cluster support (''Windows, Linux, Unix'')
 * [http://www.python.org/pypi/pprocess pprocess] (previously parallel/pprocess) - fork-based process creation with asynchronous channel-based communications employing pickled data [http://www.boddie.org.uk/python/pprocess/tutorial.html (tutorial)] (''currently only POSIX/UNIX/Linux, perhaps Cygwin'')
 * [http://www.python.org/pypi/processing processing] - process-based using either fork on Unix or the subprocess module on Windows, implementing an API like the standard library's threading API and providing familiar objects such as queues and semaphores. Can use native semaphores, message queues etc or can use of a manager process for sharing objects (''Unix and Windows'')
 * [http://www.python.org/pypi/remoteD remoteD] - fork-based process creation with a dictionary-based communications paradigm (''platform independent, according to PyPI entry'')
Line 18: Line 21:
Unlike SMP architectures and especially in contrast to thread-based concurrency, cluster (and grid) architectures offer high scalability due to the relative absence of shared resources, although this can make the programming paradigms seem somewhat alien to uninitiated developers. In this domain, some overlap with other distributed computing technologies may be observed. Unlike SMP architectures and especially in contrast to thread-based concurrency, cluster (and grid) architectures offer high scalability due to the relative absence of shared resources, although this can make the programming paradigms seem somewhat alien to uninitiated developers. In this domain, some overlap with other distributed computing technologies may be observed (see DistributedProgramming for more details).
Line 20: Line 23:
 * [http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/software.htm#pycluster pycluster] - binding for the [http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/ Cluster] software (apparently oriented towards bioinformatics tasks)  * [http://seweb.se.wtb.tue.nl/~hat/batchlib.html batchlib] - a distributed computation system with automatic selection of processing services (''no longer developed'')
 * [http://seweb.se.wtb.tue.nl/~hat/execproxy.html exec_proxy] - a system for executing arbitrary programs and transferring files (''no longer developed'')
 * [http://mpi4py.scipy.org mpi4py] - MPI-based solution
 * [http://www.lindaspaces.com/products/NWS_overview.html NetWorkSpaces] appears to be a rebranding and rebinding of [http://www.lindaspaces.com/products/linda.html Lindaspaces] for Python
 * [http://code.google.com/p/papyros/ papyros] - lightweight master-slave based parallel processing. Clients submit jobs to a master object which is monitored by one or more slave objects that do the real work. Two main implementations are currently provided, one using multiple threads and one multiple processes in one or more hosts through [http://pyro.sourceforge.net/ Pyro].
 * [http://codespeak.net/py/current/doc/execnet.html py.execnet] - asynchronous execution of client-provided code fragments
Line 25: Line 33:
 * [http://www.cimec.org.ar/python/ "Resources for Parallel Computing in Python"]
   * [http://www.cimec.org.ar/python/mpi4py.html "MPI for Python"] - MPI-based solution
   * [http://www.cimec.org.ar/python/python.html "Parallelized Python Interpreter"] - interactive, parallelized version of the Python interpreter
 * [http://dirac.cnrs-orleans.fr/ScientificPython/ ScientificPython] - MPI and BSP-based solutions, as well as a Pyro-based master-slave process manager solution
 * [http://pynpvm.sourceforge.net/ pynpvm] - PVM-based solution for NumPy
 * [http://pyro.sourceforge.net/ Pyro] PYthon Remote Objects, distributed object system, takes care of network communication between your objects once you split them over different machines on the network
 * [http://www.cs.tut.fi/~ask/rthread/index.html rthread] - distributed execution of functions via SSH
 * [http://dirac.cnrs-orleans.fr/ScientificPython/ ScientificPython] contains three subpackages for parallel computing:
   * Scientific.Distributed``Computing.Master``Slave implements a master-slave model in which a master process requests computational tasks that are executed by an arbitrary number of slave processes. The strong points are ease of use and the possibility to work with a varying number of slave process. It is less suited for the construction of large, modular parallel applications. Ideal for parallel scripting. Uses [http://pyro.sourceforge.net/ "Pyro"]. (''works wherever Pyro works'')
   * Scientific.BSP is an object-oriented implementation of the [http://www.bsp-worldwide.org/ "Bulk Synchronous Parallel (BSP)"] model for parallel computing, whose main advantages over message passing are the impossibility of deadlocks and the possibility to evaluate the computational cost of an algorithm as a function of machine parameters. The Python implementation of BSP features parallel data objects, communication of arbitrary Python objects, and a framework for defining distributed data objects implementing parallelized methods. (''works on all platforms that have an MPI library or an implementation of BSPlib'')
   * Scientific.MPI is an interface to MPI that emphasizes the possibility to combine Python and C code, both using MPI. Contrary to pypar and pyMPI, it does not support the communication of arbitrary Python objects, being instead optimized for Numeric/NumPy arrays. (''works on all platforms that have an MPI library'')
 * [http://www.its.caltech.edu/~astraw/seppo.html seppo] - based on Pyro mobile code, providing a parallel map function which evaluates each iteration "in a different process, possibly in a different computer".
 * "[http://www.interactivesupercomputing.com/getpr.php?id=246 Star-P for Python] is an interactive parallel computing platform ..."
Line 32: Line 45:
 * [http://ganga.web.cern.ch/ganga/ Ganga] - an interface to the Grid that is being developed jointly by the ATLAS and LHCb experiments at CERN.
Line 38: Line 52:
 * [http://kosmosfs.sourceforge.net/ Kosmos Distributed File System] - has Python bindings
 * [http://allmydata.org/source/tahoe/trunk/docs/about.html Tahoe: a secure, decentralized, fault-tolerant filesystem]

Parallel Processing and Multiprocessing in Python

A number of Python-related libraries exist for the programming of solutions either employing multiple CPUs or multicore CPUs in a [http://en.wikipedia.org/wiki/Symmetric_multiprocessing symmetric multiprocessing (SMP)] or shared memory environment, or potentially huge numbers of computers in a cluster or grid environment. This page seeks to provide references to the different libraries and solutions available.

Symmetric Multiprocessing

Some libraries, often to preserve some similarity with more familiar concurrency models (such as Python's threading API), employ parallel processing techniques which limit their relevance to SMP-based hardware, mostly due to the usage of process creation functions such as the UNIX fork system call. However, a technique called process migration may permit such libraries to be useful in certain kinds of computational clusters as well, notably [http://en.wikipedia.org/wiki/Single-system_image single-system image] cluster solutions ([http://sourceforge.net/projects/ssic-linux OpenSSI] and [http://openmosix.sourceforge.net/ OpenMosix] being examples).

Advantages of such approaches include convenient process creation and the ability to share resources. Indeed, the fork system call permits efficient sharing of common read-only data structures on modern UNIX-like operating systems.

Cluster Computing

Unlike SMP architectures and especially in contrast to thread-based concurrency, cluster (and grid) architectures offer high scalability due to the relative absence of shared resources, although this can make the programming paradigms seem somewhat alien to uninitiated developers. In this domain, some overlap with other distributed computing technologies may be observed (see DistributedProgramming for more details).

Grid Computing

Editorial Notes

The above lists should be arranged in ascending alphabetical order - please respect this when adding new frameworks or tools.

ParallelProcessing (last edited 2021-05-17 13:47:48 by MordicusEtCubitus)

Unable to edit the page? See the FrontPage for instructions.