Revision 2 as of 2008-11-15 13:59:39

Clear message

The objective of a memory profiler clearly would be to help a developer determine where the memory is going, why the memory was allocated and why it didn't get released.

There are many approaches to this problem, and they involve various aspects, such as:

Sizing objects

One metric for memory consumptions is certainly "number of allocated bytes"; others are "number of allocated objects", "amount of unallocated-yet-unreleased memory" (a.k.a. fragmentation).

To compute the number of allocated bytes, one could either track the memory allocations at the lowest level, and sum them up, or track all objects, and then compute the size each object takes in memory.

Tracking memory

To determine the set of allocated blocks, again one may either track memory blocks at the lowest level, or try to keep lists of objects. Python already has support for the latter, in two forms: in the debug mode, Python keeps a list of all allocated objects. In addition, even in release mode, Python keeps a list of all objects that participate in cyclic garbage collection, i.e. of all container objects. It also supports tracing references from one object to all referred-to objects, so that even non-container objects can be found.

These approaches all come with their own overheads; the latter one (use cyclic GC) has the least overhead, as the Python interpreter has to perform this bookkeeping anyway. They also differ in what kinds of errors they can detect:

In summary: if the suspected memory leak is not in an extension module, it should be much cheaper to track memory allocations. If extension modules may contain errors, finding those errors requires more expensive machinery.

Classifying memory

With a given set of memory blocks, the question is what these blocks are used for (and then, whether they are still needed). Such classification can use various strategies:

Tracking references

Python's memory management is "safe", in the sense that memory won't be released while it is still referenced (unless there is a bug in an extension module). So if memory remains allocated even though it should have been released, it's typically because the objects are still referenced.

Again, the garbage collector provides machinery to determine what references an object has. Getting useful results out of that is a tedious task without proper tool support, though.

Unable to edit the page? See the FrontPage for instructions.