Revision 2 as of 2003-02-26 20:31:28

Clear message

Processing And Analyzing Extremely Large Amounts Of Data In Python

Processing large amounts of data is a must for people working in such fields of scientific applications as CFD (Computational Fluid Dynamics), Meteorology, Astronomy, Human Genomic Sequence or High Energy Physics, to name only a few. Existing relational or object-oriented databases usually are good solutions for applications in which multiple distributed clients need to access and update a large centrally managed database (e.g., a financial trading system). However, they are not optimally designed for efficient read-only database queries to pieces, or even single attributes, of objects, a requirement for processing data in many scientific fields such as the ones mentioned above.

Presentation Notes

My talk will describe PyTables, a Python library that addresses this need, enabling the end user to manipulate easily scientific data tables and [Numeric and numarray http://www.pfdubois.com/numpy] Python objects in a persistent, hierarchical structure. The foundation of the underlying hierarchical data in permament storage is the excellent [http://hdf.ncsa.uiuc.edu/HDF5 HDF5] library.

I will be walking through the basic features of the PyTables, and demonstrating the use of the package in real-life scenarios. In addition, I will present some benchmark where PyTables will show to be competitive when compared with other persistent databases in Python.

This presentation is currently [http://www.python.org/pycon/pycon-schedule.html scheduled] for 10am on friday March 28th.


I would like to target my presentation as best I can to those people attending.

So please add questions/suggestions below; for example:


Unable to edit the page? See the FrontPage for instructions.