Differences between revisions 4 and 10 (spanning 6 versions)
Revision 4 as of 2006-10-10 23:11:34
Size: 5694
Editor: go
Comment:
Revision 10 as of 2008-11-15 14:00:18
Size: 6409
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
   Scientific applications are known for their ability to generate huge amounts of
   data which are sometimes hard to manage. This page lists some of the tools which
   have been made available for interfacing with standard scientific file formats,
   as well as Python-specific tools for manipulating arrays and text files.
Scientific applications are known for their ability to generate huge amounts of
data which are sometimes hard to manage. This page lists some of the tools which
have been made available for interfacing with standard scientific file formats,
as well as Python-specific tools for manipulating arrays and text files.
Line 8: Line 8:
     * NetCDF Interfaces
       Interface to
[http://www.unidata.ucar.edu/packages/netcdf/ Unidata NetCDF] array-oriented data interface files.
     * [[http://www.unidata.ucar.edu/packages/netcdf/|Unidata NetCDF]] Interfaces
Line 15: Line 14:
          + [http://starship.python.net/crew/hinsen/scientific.html NetCDF interface] - NetCDF interface that makes array variables look like
            NumPy arrays. (Konrad Hinsen)
          + [http://snow.cit.cornell.edu/noon/ncmodule.html NetCDF interface] - Interface to NetCDF portable data files (William
            Noon)
          + [http://www.geog.ubc.ca/~kschalmNetCDF interface] - A Numeric-Python aware NetCDF portable data file
            interface (Kyle Schalm)
          + [http://www.pyngl.ucar.edu/Nio.shtml PyNIO] - A Numeric-based Python package that allows read and/or write access to a variety of data formats (NetCDF, HDF 4, GRIB) using an interface modelled on Konrad Hinsen's [http://starship.python.net/crew/hinsen/scientific.html NetCDF interface].
Line 23: Line 15:
          + [http://code.google.com/p/netcdf4-python/ netcdf4-python] - python/numpy interface to netCDF version 4 library. netCDF version 4 has many features not found in earlier versions of the library, such as hierarchical groups, zlib compression, multiple unlimited dimensions, and new data types. It is implemented on top of HDF5. This module implements many of the new features, and can read and write netCDF files compatibile with older versions of the library. The API is modelled after Scientific.IO.NetCDF, and should be familiar to users of that module (Jeff Whitaker).           + [[http://dirac.cnrs-orleans.fr/ScientificPython/|NetCDF interface in ScientificPython]] - NetCDF interface that makes array variables look like NumPy arrays. (KonradHinsen)

          + [[http://snow.cit.cornell.edu/noon/ncmodule.html|NetCDF interface]] - Interface to NetCDF portable data files (William Noon)

          + [[http://www.geog.ubc.ca/~kschalmNetCDF|interface]] - A Numeric-Python aware NetCDF portable data file interface (Kyle Schalm)

          + [[http://www.pyngl.ucar.edu/Nio.shtml|PyNIO]] - A Numeric-based Python package that allows read and/or write access to a variety of data formats (NetCDF, HDF 4, GRIB) using an interface modelled on KonradHinsen's [[http://dirac.cnrs-orleans.fr/ScientificPython/|NetCDF interface]].

         + [[http://www.pyacts.org/pypnetcdf|PyPnetCDF]] - A Numeric-based Python package that allows read and/or write access to NetCDF file in a parallel environment using MPI and an interface to the [[http://www-unix.mcs.anl.gov/parallel-netcdf/|PnetCDF]] library. The object PNetCDFVariable and PNetCDFFile are very similar to KonradHinsen's definitions but PyPnetCDF implements a parallel access in a transparent and simple way to the programmer.

         + [[http://code.google.com/p/netcdf4-python/|netcdf4-python]] - python/numpy interface to netCDF version 4 library. netCDF version 4 has many features not found in earlier versions of the library, such as hierarchical groups, zlib compression, multiple unlimited dimensions, and new data types. It is implemented on top of HDF5. This module implements many of the new features, and can read and write netCDF files compatibile with older versions of the library. The API is modelled after Scientific.IO.NetCDF, and should be familiar to users of that module (Jeff Whitaker).

         + [[http://pypi.python.org/pypi/pupynere/1.0|pupynere]] - a PUre PYthon NEtcdf REader, and now also a Writer. Pupynere implements the NetCDF specification from scratch, written in pure Python, and only depends on Numpy. It uses the same syntax as the Scientific.IO.NetCDF module, and allows you to read and create NetCDF files.
Line 28: Line 33:
       Interface to the [http://hdf.ncsa.uiuc.edu/HDF5/ HDF5] format (hierachically organised datasets).        Interface to the [[http://hdf.ncsa.uiuc.edu/HDF5/|HDF5]] format (hierachically organised datasets).
Line 32: Line 37:
          + [ftp://ftp.ncsa.uiuc.edu/HDF/HDF5/contrib/hl-hdf5/README.html PyHL interface] - A High Level Interface to the HDF5 File Format. (Anders Henja and Daniel B. Michelson)           + [[ftp://ftp.hdfgroup.org/HDF5/contrib/hl-hdf5/README.html|PyHL interface]] - A High Level Interface to the HDF5 File Format. (Anders Henja and Daniel B. Michelson)
Line 34: Line 39:
          + [http://pytables.sf.net/ PyTables interface] - HDF5 interface with full support of 64-bit data address and data indexing. (Carabos Coop. V.)           + [[http://pytables.sf.net/|PyTables interface]] - HDF5 interface with full support of 64-bit data address and data indexing. (Carabos Coop. V.)
Line 38: Line 43:
     * [http://php.iupui.edu/~mmiller3/python/#TableIO TableIO] by Mike Miller.      * [[http://php.iupui.edu/~mmiller3/python/#TableIO|TableIO]] by Mike Miller.
Line 52: Line 57:
     * [http://starship.python.net/crew/hinsen/scientific.html FortranFormat.py] by Konrad Hinsen.      * [[http://dirac.cnrs-orleans.fr/ScientificPython/|Scientific.IO.FortranFormat]] by KonradHinsen.
Line 58: Line 63:
     * [http://starship.python.net/~da/Travis/numpyio/ numpyio] by Travis Oliphant.      * [[http://starship.python.net/~da/Travis/numpyio/|numpyio]] by Travis Oliphant.

Scientific applications are known for their ability to generate huge amounts of data which are sometimes hard to manage. This page lists some of the tools which have been made available for interfacing with standard scientific file formats, as well as Python-specific tools for manipulating arrays and text files.

Interfaces to Standard Formats

  • Unidata NetCDF Interfaces The netCDF datafile format stores large, uniform, data arrays efficiently and avoids byte-order problems when moving binary data between different machines. It is well-documented and looks like a good compromise between simplicity and generality.

    • + PyPnetCDF - A Numeric-based Python package that allows read and/or write access to NetCDF file in a parallel environment using MPI and an interface to the PnetCDF library. The object PNetCDFVariable and PNetCDFFile are very similar to KonradHinsen's definitions but PyPnetCDF implements a parallel access in a transparent and simple way to the programmer.

      + netcdf4-python - python/numpy interface to netCDF version 4 library. netCDF version 4 has many features not found in earlier versions of the library, such as hierarchical groups, zlib compression, multiple unlimited dimensions, and new data types. It is implemented on top of HDF5. This module implements many of the new features, and can read and write netCDF files compatibile with older versions of the library. The API is modelled after Scientific.IO.NetCDF, and should be familiar to users of that module (Jeff Whitaker).

      + pupynere - a PUre PYthon NEtcdf REader, and now also a Writer. Pupynere implements the NetCDF specification from scratch, written in pure Python, and only depends on Numpy. It uses the same syntax as the Scientific.IO.NetCDF module, and allows you to read and create NetCDF files.

  • PyPDB is an interface to the PDB Portable Data Format library which is part
    • of the PACT system (by the LLNL crew). It is available as part of the LLNLPython distribution.
  • HDF5 interfaces
    • Interface to the HDF5 format (hierachically organised datasets).

    HDF5 is a general purpose library and file format for storing scientific data. It's more complex and powerful than NetCDF, and the forthcoming NetCDF-4 is based on it.
    • + PyHL interface - A High Level Interface to the HDF5 File Format. (Anders Henja and Daniel B. Michelson)

      + PyTables interface - HDF5 interface with full support of 64-bit data address and data indexing. (Carabos Coop. V.)

Python-specific Tools

  • TableIO by Mike Miller.

    • "When I first started using Python, I wanted to read lots of numbers into

      NumPy arrays. This can be done with the standard Python file reading methods, but I found that to be prohibitively slow for largish data sets. So I wrote TableIO (_tableio.c and TableIO.py), which lets me start with a file containing a rectangular array of ASCII data (a `table') and read it into Python so I can manipulate it. For example, if I have a file containing an table in a file with 10 columns and 50 rows, I can use

        >>> d = TableIO.readTableAsArray(file)
      to get an array with shape (50,10). If I only want to read a couple of columns,

      say the first and ninth and tenth, I can use

        >>> [x, y, dy] = TableIO.readColumns(file, [0, 8, 9])
      to read the first column in to the 1D array x and the eigth and ninth into yand dy."
  • Scientific.IO.FortranFormat by KonradHinsen.

    • "This module provides two classes that aid in reading and writing Fortran-formatted text files. Only a subset of formatting options is supported: A, D, E, F, G, I, and X formats, plus string constants for output. Repetition (e.g. 4I5 or 3(1X,A4)) is supported. Complex numbers are not supported; you have to treat real and imaginary parts separately."
  • numpyio by Travis Oliphant.

    • "Once compiled, numpyio is a loadable module that can be used in python for reading and writing arbitrary binary data to and from Numerical Python arrays. I work in Medical Imaging and often have large data sets to manipulate. I do much of my interactive data analysis with MATLAB, however, only having doubles to work with really puts a crimp on the sizes of the data sets I can manipulate. The fact that Numerical Python has more data types defined than doubles encouraged me to try it out. I have been very impressed with its speed and utility, but I needed some way to read large data sets from an arbitrary binary file into Numerical Python arrays. I didn't see any obvious way to do this so I wrote an extension module. Although there is not much documentation, having the sources available is ultimately better than documentation. But, as this is my first extension module, my style may not be elegant as I may not be using the correct APIs. Feel free to send me corrections."

NumericAndScientific/Formats (last edited 2008-11-15 14:00:18 by localhost)

Unable to edit the page? See the FrontPage for instructions.