What is PyTables?

PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data.

PyTables is built on top of the HDF5 library, using the Python language and the NumPy package. It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code (generated using Pyrex), makes it a fast, yet extremely easy to use tool for interactively dealing with, processing and searching very large amounts of data. One important feature of PyTables is that it optimizes memory and disk resources so that data takes much less space (specially if on-flight compression is used) than other solutions such as relational or object oriented databases.

Design goals

PyTables has been designed to fulfill the next requirements:

Allow to structure your data in a hierarchical way.
Easy to use. It implements the natural naming scheme for allowing convenient access to the data.
All the cells in datasets can be multidimensional entities.
Most of the I/O operations speed should be only limited by the underlying I/O subsystem.
Enable the end user to save large datasets in a efficient way, i.e. each single byte of data on disk has to be represented by one byte plus a small fraction when loaded in memory.

Where to find it

For more info, documentation and downloads of PyTables, please go to its official home page.

Page

User

What is PyTables?

Design goals

Where to find it