Differences between revisions 21 and 22
Revision 21 as of 2018-01-05 11:22:35
Size: 4993
Comment:
Revision 22 as of 2018-01-05 11:23:28
Size: 4994
Comment:
Deletions are marked like this. Additions are marked like this.
Line 38: Line 38:
* [[https://github.com/garywu/pypedream|pypedream formerly DAGPype]] - "This is a Python framework for scientific data-processing and data-preparation DAG (directed acyclic graph) pipelines. It is designed to work well within Python scripts or IPython, provide an in-Python alternative for sed, awk, perl, and grep, and complement libraries such as NumPy/SciPy, SciKits, pandas, MayaVi, PyTables, and so forth. Those libraries process data once it has been assembled. This library is for flexible data assembly and quick exploration, or for aggregating huge data which cannot be reasonably assembled."  * [[https://github.com/garywu/pypedream|pypedream formerly DAGPype]] - "This is a Python framework for scientific data-processing and data-preparation DAG (directed acyclic graph) pipelines. It is designed to work well within Python scripts or IPython, provide an in-Python alternative for sed, awk, perl, and grep, and complement libraries such as NumPy/SciPy, SciKits, pandas, MayaVi, PyTables, and so forth. Those libraries process data once it has been assembled. This library is for flexible data assembly and quick exploration, or for aggregating huge data which cannot be reasonably assembled."

According to wikipedia: flow-based programming (FBP) is a programming paradigm that defines applications as networks of "black box" processes, which exchange data across predefined connections by message passing, where the connections are specified externally to the processes. These black box processes can be reconnected endlessly to form different applications without having to be changed internally. FBP is thus naturally component-oriented.

This page describes Python packages for FBP.

Flow Based Programming

  • https://marcobonzanini.com/2015/10/24/building-data-pipelines-with-python-and-luigi/ - "Luigi is a Python tool for workflow management. It has been developed at Spotify, to help building complex data pipelines of batch jobs.

    • To install Luigi: pip install luigi Using a workflow manager like Luigi is in general helpful because it handles dependencies, it reduces the amount of boilerplate code that is necessary for parameters and error checking, it manages failure recovery and overall it forces us to follow a clear pattern when developing the data pipeline."

  • Pipeless - "A simple Python library for building a basic data pipeline."

    • To install pipeless: pip install pipeless

  • papy - "The papy package provides an implementation of the flow-based programming paradigm in Python that enables the construction and deployment of distributed workflows."

  • Orkan - "Orkan is a pipeline parallelization library, written in Python. Making use of the multicore capabilities of ones machine in Python is often not as easy as it should be. Orkan aims to provide a plain API to utilize those underused CPUs of yours in cases you need some extra horse power for your computation."

  • another pype - A simpler module for chaining operations

  • Kamaelia - "In Kamaelia you build systems from simple components that talk to each other. This speeds development, massively aids maintenance and also means you build naturally concurrent software. It's intended to be accessible by any developer, including novices. It also makes it fun :)"

  • Ruffus - "The Ruffus python module provides automatic support for: Managing dependencies, Parallel jobs, Re-starting from arbitrary points, especially after errors, Display of the pipeline as a flowchart, Reporting"

    • To install Ruffus: sudo pip install ruffus --upgrade

  • zFlow - "ZFlow is a simplistic implementation of Flow-based Programming for Python as defined by J.P. Morrison. More information about Flow-based Programming can be found here: http://www.jpaulmorrison.com/fbp/ However, there are a few fundamental differences between ZFlow and the standard definition:

    • ZFlow does not support loops in the graph
    • ZFlow uses Python generators instead of asynchronous threads so port data flow works in a lazy, pulling way not by pushing."
  • pypedream formerly DAGPype - "This is a Python framework for scientific data-processing and data-preparation DAG (directed acyclic graph) pipelines. It is designed to work well within Python scripts or IPython, provide an in-Python alternative for sed, awk, perl, and grep, and complement libraries such as NumPy/SciPy, SciKits, pandas, MayaVi, PyTables, and so forth. Those libraries process data once it has been assembled. This library is for flexible data assembly and quick exploration, or for aggregating huge data which cannot be reasonably assembled."

Offline Resources

  • Pype - "Pypes provides a scalable, standards based, extensible platform for building ETL solutions. Most commercial platforms have steep learning curves and try to generalize too much of the process. Pypes provides a simple yet powerful framework for designing custom data processing workflows using components you write. In turn, it takes care of scalability and scheduling semantics."

  • PyF - "PyF is a python open source framework and platform dedicated to large data processing, mining, transforming, reporting and more."

  • Bein - "Bein is a workflow manager and miniature LIMS system built in the Bioinformatics and Biostatistics Core Facility of the EPFL. It fills the gap for the working scientist between the classical shell and big workflow managers like Galaxy and major LIMS systems like OpenBIS."

  • florun - "florun is a visual workflow editor and runner"

FlowBasedProgramming (last edited 2018-01-05 12:11:33 by GeorgeLambert)

Unable to edit the page? See the FrontPage for instructions.