Differences between revisions 6 and 8 (spanning 2 versions)
Revision 6 as of 2006-05-27 10:25:46
Size: 2018
Editor: 213
Comment:
Revision 8 as of 2006-05-27 10:57:54
Size: 2312
Editor: FredrikLundh
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:

'''Using more aggressive calling conventions/inlining in ceval'''

The Py_LOCAL/Py_LOCAL_INLINE macros can be used instead of static
to force the use of a more efficient C calling convention, on platforms that support that. We tried applying that to ceval,
and saw small speedups on some platforms, and somewhat larger
slowdowns on others.
Line 20: Line 27:
Conversley changing all Py(Tuple|List)_GET_SIZE to point to plain Size has no measurable slowdown! Well, in the range of 0.5%, which may just be noise. A few calls to the Tuple/List GET_SIZE macros are also in error; they work because ->ob_size is the right thing to get but they aren't actually accessing Py(Tuple|List)Object variables (may just need a cast). Conversley changing all Py(Tuple|List)_GET_SIZE to point to plain Size has no measurable slowdown! Well, in the range of 0.5%, which may just be noise. Switching the #define to the real functions generates some spurious warnings because the regular methods expect PyObjects and not the more specific types.

Things that we tried but decided were not good ideas.

Using more aggressive calling conventions/inlining in ceval

The Py_LOCAL/Py_LOCAL_INLINE macros can be used instead of static to force the use of a more efficient C calling convention, on platforms that support that. We tried applying that to ceval, and saw small speedups on some platforms, and somewhat larger slowdowns on others.

List pre-allocation

Adding the ability to pre-allocate builtin list storage. At best we can speed up appends by about 7-8% for lists of 50-100 elements. For the large part the benefit is 0-2%. For lists under 20 elements, peformance is actually reduced when pre-allocating.

Out-thinking exceptions

CPython spends considerable time moving exception info around among thread states, frame objects, and the sys module. This code is complicated and under-documented. Patch 1145039 took a stab at reverse-engineering a weak invariant, and exploited it for a bit of speed. That worked fine so far as it went (and a variant was checked in), but it's likely more remains to be gotten. Alas, the tim-exc_sanity branch set up to try that consumed a lot of time fighting mysteries, and looks like it's more trouble than it's worth.

Singleton of StopIteration

As part of the new exceptions implementation, we tried making a singleton StopIteration instance. No speedup was detected. This is primarily due to most uses of StopIteration using the type object directly (ie "raise StopIteration" vs. "raise StopIteration()"). Even for a crafted test case where the instance use was forced there was no detectable change in speed.

GET_SIZE Macros

Making a PyDict_GET_SIZE like PyTuple_GET_SIZE doesn't give a measurable improvement in pybench or pystone. This is likely because the compiler notices that those functions that use it have alreaday done NULL checks and frequently PyDict_Check so we aren't telling it anything it didn't already know.

Conversley changing all Py(Tuple|List)_GET_SIZE to point to plain Size has no measurable slowdown! Well, in the range of 0.5%, which may just be noise. Switching the #define to the real functions generates some spurious warnings because the regular methods expect PyObjects and not the more specific types.

NeedForSpeed/Failures (last edited 2008-11-15 14:01:24 by localhost)

Unable to edit the page? See the FrontPage for instructions.