Revision 13 as of 2006-05-30 11:19:55

Clear message

Things that we tried but decided were not good ideas.

Using more aggressive calling conventions/inlining in ceval

The Py_LOCAL/Py_LOCAL_INLINE macros can be used instead of static to force the use of a more efficient C calling convention, on platforms that support that. We tried applying that to ceval, and saw small speedups on some platforms, and somewhat larger slowdowns on others.

List pre-allocation

Adding the ability to pre-allocate builtin list storage. At best we can speed up appends by about 7-8% for lists of 50-100 elements. For the large part the benefit is 0-2%. For lists under 20 elements, peformance is actually reduced when pre-allocating.

Out-thinking exceptions

CPython spends considerable time moving exception info around among thread states, frame objects, and the sys module. This code is complicated and under-documented. Patch 1145039 took a stab at reverse-engineering a weak invariant, and exploited it for a bit of speed. That worked fine so far as it went (and a variant was checked in), but it's likely more remains to be gotten. Alas, the tim-exc_sanity branch set up to try that consumed a lot of time fighting mysteries, and looks like it's more trouble than it's worth.

Singleton of StopIteration

As part of the new exceptions implementation, we tried making a singleton StopIteration instance. No speedup was detected. This is primarily due to most uses of StopIteration using the type object directly (ie "raise StopIteration" vs. "raise StopIteration()"). Even for a crafted test case where the instance use was forced there was no detectable change in speed.


Making a PyDict_GET_SIZE like PyTuple_GET_SIZE doesn't give a measurable improvement in pybench or pystone. This is likely because the compiler notices that those functions that use it have alreaday done NULL checks and frequently PyDict_Check so we aren't telling it anything it didn't already know.

Conversely changing all Py(Tuple|List)_GET_SIZE to point to plain Size has no measurable slowdown! Well, in the range of 0.5%, which may just be noise. Switching the #define to the real functions generates some spurious warnings because the regular methods expect PyObjects and not the more specific types.

Specializing Dictionaries

One man-day was spent trying to seperate the dicts used in namespaces (module.dict, instance/type/class dicts) the result was changing over a quarter of the PyDict* macros in the trunk to PySymdict (over half if you exclude Modules/). This was such a massive change it was abandoned after sprint Day1.

Switching to 64-bit ints on 32-bit platforms

Tested out an idea about switching Python "integer" types to use 64 bits instead of 32 bits on 32-bit platforms. In the end, it would have been a major pain to do, and while it would have resulted in 34% performance improvements for apps that use integers between 32 and 64 bits, it would have been around a 10% slow-down for apps that only use 32-bit numbers. Decided not to do it for those reasons.

Unable to edit the page? See the FrontPage for instructions.