Differences between revisions 16 and 17
Revision 16 as of 2018-01-05 10:49:46
Size: 4294
Comment:
Revision 17 as of 2018-01-05 10:50:55
Size: 4295
Comment:
Deletions are marked like this. Additions are marked like this.
Line 18: Line 18:
* [[https://github.com/tobigue/Orkan|Orkan]] - "Orkan is a pipeline parallelization library, written in Python.  * [[https://github.com/tobigue/Orkan|Orkan]] - "Orkan is a pipeline parallelization library, written in Python.

According to wikipedia: flow-based programming (FBP) is a programming paradigm that defines applications as networks of "black box" processes, which exchange data across predefined connections by message passing, where the connections are specified externally to the processes. These black box processes can be reconnected endlessly to form different applications without having to be changed internally. FBP is thus naturally component-oriented.

This page describes Python packages for FBP.

Flow Based Programming

To install Luigi: pip install luigi

  • Using a workflow manager like Luigi is in general helpful because it handles dependencies, it reduces the amount of boilerplate code that is necessary for parameters and error checking, it manages failure recovery and overall it forces us to follow a clear pattern when developing the data pipeline."
  • Pipeless - "A simple Python library for building a basic data pipeline."

To install pipeless: pip install pipeless

  • papy - "The papy package provides an implementation of the flow-based programming paradigm in Python that enables the construction and deployment of distributed workflows."

  • Orkan - "Orkan is a pipeline parallelization library, written in Python.

Making use of the multicore capabilities of ones machine in Python is often not as easy as it should be. Orkan aims to provide a plain API to utilize those underused CPUs of yours in cases you need some extra horse power for your computation."

To install pipeless: use git repo: https://github.com/randomwalker/pype

  • Kamaelia - "In Kamaelia you build systems from simple components that talk to each other. This speeds development, massively aids maintenance and also means you build naturally concurrent software. It's intended to be accessible by any developer, including novices. It also makes it fun :)"

  • Ruffus - "The Ruffus python module provides automatic support for: Managing dependencies, Parallel jobs, Re-starting from arbitrary points, especially after errors, Display of the pipeline as a flowchart, Reporting"

Offline Resources

  • Pyleaf - "An ASCII-art based pipeline language for Python. Produces intuitive diagrams in simple text. Supports memoization, parallelization, caching, code-data consistency, HTML report generation."

  • Pype - "Pypes provides a scalable, standards based, extensible platform for building ETL solutions. Most commercial platforms have steep learning curves and try to generalize too much of the process. Pypes provides a simple yet powerful framework for designing custom data processing workflows using components you write. In turn, it takes care of scalability and scheduling semantics."

  • zFlow - "Flow-based Programming Library using Python generators, loosely based on J.P. Morrison's book of the same name."

  • PyF - "PyF is a python open source framework and platform dedicated to large data processing, mining, transforming, reporting and more."

  • Bein - "Bein is a workflow manager and miniature LIMS system built in the Bioinformatics and Biostatistics Core Facility of the EPFL. It fills the gap for the working scientist between the classical shell and big workflow managers like Galaxy and major LIMS systems like OpenBIS."

  • florun - "florun is a visual workflow editor and runner"

  • DAGPype - "DAGPype is a pipe-syntax framework for data processing from within scripts or the Python interactive interpreter"

FlowBasedProgramming (last edited 2018-01-05 12:11:33 by GeorgeLambert)

Unable to edit the page? See the FrontPage for instructions.