Differences between revisions 1 and 28 (spanning 27 versions)
Revision 1 as of 2010-10-25 18:06:25
Size: 1591
Editor: 75-162-129-134
Comment:
Revision 28 as of 2018-01-05 12:11:33
Size: 4674
Comment:
Deletions are marked like this. Additions are marked like this.
Line 6: Line 6:
 * [[http://pypes.org|Pype]] - "Pypes provides a scalable, standards based, extensible platform for building ETL solutions. Most commercial platforms have steep learning curves and try to generalize too much of the process. Pypes provides a simple yet powerful framework for designing custom data processing workflows using components you write. In turn, it takes care of scalability and scheduling semantics."
 * [[https://marcobonzanini.com/2015/10/24/building-data-pipelines-with-python-and-luigi/|Luigi]] - "Luigi is a Python tool for workflow management. It has been developed at Spotify, to help building complex data pipelines of batch jobs. Using a workflow manager like Luigi is in general helpful because it handles dependencies, it reduces the amount of boilerplate code that is necessary for parameters and error checking, it manages failure recovery and overall it forces us to follow a clear pattern when developing the data pipeline."

  ''To install Luigi'': '''''pip install luigi'''''

 * [[http://asperous.us/pipeless/|Pipeless]] - "A simple Python library for building a basic data pipeline."
  ''To install pipeless'': '''''pip install pipeless'''''
Line 8: Line 15:
 * [[http://arvindn.livejournal.com/68137.html|another pype]] - A simpler module for chaining operations
 * [[http://www.thensys.com/?page_id=21|zFlow]] - "Flow-based Programming Library using Python generators, loosely based on J.P. Morrison's book of the same name."
 * [[http://pyfproject.org/|PyF]] - "PyF is a python open source framework and platform dedicated to large data processing, mining, transforming, reporting and more."

 * [[https://github.com/tobigue/Orkan|Orkan]] - "Orkan is a pipeline parallelization library, written in Python. Making use of the multicore capabilities of ones machine in Python is often not as easy as it should be. Orkan aims to provide a plain API to utilize those underused CPUs of yours in cases you need some extra horse power for your computation."

 * [[http://arvindn.livejournal.com/68137.html|another pype]] - A simpler module for chaining operations
  ''To install pype'': '''''use git repo: https://github.com/randomwalker/pype '''''


 * [[http://www.kamaelia.org/Home.html|Kamaelia]] - "In Kamaelia you build systems from simple components that talk to each other. This speeds development, massively aids maintenance and also means you build naturally concurrent software. It's intended to be accessible by any developer, including novices. It also makes it fun :)"

  ''To install Kamaelia'': '''''http://www.kamaelia.org/Home.html'''''


 * [[http://www.ruffus.org.uk/|Ruffus]] - "The Ruffus python module provides automatic support for: Managing dependencies, Parallel jobs, Re-starting from arbitrary points, especially after errors, Display of the pipeline as a flowchart, Reporting"

  ''To install Ruffus'': '''''sudo pip install ruffus --upgrade'''''

 * [[https://bitbucket.org/jslowery/zflow|zFlow]] - "ZFlow is a simplistic implementation of Flow-based Programming for Python as defined by J.P. Morrison. More information about Flow-based Programming can be found here: http://www.jpaulmorrison.com/fbp/ However, there are a few fundamental differences between ZFlow and the standard definition:

   * ZFlow does not support loops in the graph
   * ZFlow uses Python generators instead of asynchronous threads so port data flow works in a lazy, pulling way not by pushing."

 * [[https://github.com/garywu/pypedream|pypedream formerly DAGPype]] - "This is a Python framework for scientific data-processing and data-preparation DAG (directed acyclic graph) pipelines. It is designed to work well within Python scripts or IPython, provide an in-Python alternative for sed, awk, perl, and grep, and complement libraries such as NumPy/SciPy, SciKits, pandas, MayaVi, PyTables, and so forth. Those libraries process data once it has been assembled. This library is for flexible data assembly and quick exploration, or for aggregating huge data which cannot be reasonably assembled."

 * [[https://github.com/leplatrem/florun|florun]] - "florun is a visual workflow editor and runner"

 * [[https://github.com/mhcomm/pypeman|pypeman]] - "Pypeman is a minimalist but pragmatic ESB / ETL / EAI in python."

 * [[https://pypi.python.org/pypi/pyf|PyF]] - "PyF is a python open source framework and platform dedicated to large data processing, mining, transforming, reporting and more."

 * [[https://pypi.python.org/pypi/bein|Bein]] - "Bein is a workflow manager and miniature LIMS system built in the Bioinformatics and Biostatistics Core Facility of the EPFL. It fills the gap for the working scientist between the classical shell and big workflow managers like Galaxy and major LIMS systems like OpenBIS."

 

According to wikipedia: flow-based programming (FBP) is a programming paradigm that defines applications as networks of "black box" processes, which exchange data across predefined connections by message passing, where the connections are specified externally to the processes. These black box processes can be reconnected endlessly to form different applications without having to be changed internally. FBP is thus naturally component-oriented.

This page describes Python packages for FBP.

Flow Based Programming

  • Luigi - "Luigi is a Python tool for workflow management. It has been developed at Spotify, to help building complex data pipelines of batch jobs. Using a workflow manager like Luigi is in general helpful because it handles dependencies, it reduces the amount of boilerplate code that is necessary for parameters and error checking, it manages failure recovery and overall it forces us to follow a clear pattern when developing the data pipeline."

    • To install Luigi: pip install luigi

  • Pipeless - "A simple Python library for building a basic data pipeline."

    • To install pipeless: pip install pipeless

  • papy - "The papy package provides an implementation of the flow-based programming paradigm in Python that enables the construction and deployment of distributed workflows."

  • Orkan - "Orkan is a pipeline parallelization library, written in Python. Making use of the multicore capabilities of ones machine in Python is often not as easy as it should be. Orkan aims to provide a plain API to utilize those underused CPUs of yours in cases you need some extra horse power for your computation."

  • another pype - A simpler module for chaining operations

  • Kamaelia - "In Kamaelia you build systems from simple components that talk to each other. This speeds development, massively aids maintenance and also means you build naturally concurrent software. It's intended to be accessible by any developer, including novices. It also makes it fun :)"

  • Ruffus - "The Ruffus python module provides automatic support for: Managing dependencies, Parallel jobs, Re-starting from arbitrary points, especially after errors, Display of the pipeline as a flowchart, Reporting"

    • To install Ruffus: sudo pip install ruffus --upgrade

  • zFlow - "ZFlow is a simplistic implementation of Flow-based Programming for Python as defined by J.P. Morrison. More information about Flow-based Programming can be found here: http://www.jpaulmorrison.com/fbp/ However, there are a few fundamental differences between ZFlow and the standard definition:

    • ZFlow does not support loops in the graph
    • ZFlow uses Python generators instead of asynchronous threads so port data flow works in a lazy, pulling way not by pushing."
  • pypedream formerly DAGPype - "This is a Python framework for scientific data-processing and data-preparation DAG (directed acyclic graph) pipelines. It is designed to work well within Python scripts or IPython, provide an in-Python alternative for sed, awk, perl, and grep, and complement libraries such as NumPy/SciPy, SciKits, pandas, MayaVi, PyTables, and so forth. Those libraries process data once it has been assembled. This library is for flexible data assembly and quick exploration, or for aggregating huge data which cannot be reasonably assembled."

  • florun - "florun is a visual workflow editor and runner"

  • pypeman - "Pypeman is a minimalist but pragmatic ESB / ETL / EAI in python."

  • PyF - "PyF is a python open source framework and platform dedicated to large data processing, mining, transforming, reporting and more."

  • Bein - "Bein is a workflow manager and miniature LIMS system built in the Bioinformatics and Biostatistics Core Facility of the EPFL. It fills the gap for the working scientist between the classical shell and big workflow managers like Galaxy and major LIMS systems like OpenBIS."

FlowBasedProgramming (last edited 2018-01-05 12:11:33 by GeorgeLambert)

Unable to edit the page? See the FrontPage for instructions.