|Deletions are marked like this.||Additions are marked like this.|
|Line 2:||Line 2:|
'''Using more aggressive calling conventions/inlining in ceval'''
The Py_LOCAL/Py_LOCAL_INLINE macros can be used instead of static
to force the use of a more efficient C calling convention, on platforms that support that. We tried applying that to ceval,
and saw small speedups on some platforms, and somewhat larger
slowdowns on others.
Things that we tried but decided were not good ideas.
Using more aggressive calling conventions/inlining in ceval
The Py_LOCAL/Py_LOCAL_INLINE macros can be used instead of static to force the use of a more efficient C calling convention, on platforms that support that. We tried applying that to ceval, and saw small speedups on some platforms, and somewhat larger slowdowns on others.
Adding the ability to pre-allocate builtin list storage. At best we can speed up appends by about 7-8% for lists of 50-100 elements. For the large part the benefit is 0-2%. For lists under 20 elements, peformance is actually reduced when pre-allocating.
CPython spends considerable time moving exception info around among thread states, frame objects, and the sys module. This code is complicated and under-documented. Patch 1145039 took a stab at reverse-engineering a weak invariant, and exploited it for a bit of speed. That worked fine so far as it went (and a variant was checked in), but it's likely more remains to be gotten. Alas, the tim-exc_sanity branch set up to try that consumed a lot of time fighting mysteries, and looks like it's more trouble than it's worth.
Singleton of StopIteration
As part of the new exceptions implementation, we tried making a singleton StopIteration instance. No speedup was detected. This is primarily due to most uses of StopIteration using the type object directly (ie "raise StopIteration" vs. "raise StopIteration()"). Even for a crafted test case where the instance use was forced there was no detectable change in speed.
Making a PyDict_GET_SIZE like PyTuple_GET_SIZE doesn't give a measurable improvement in pybench or pystone. This is likely because the compiler notices that those functions that use it have alreaday done NULL checks and frequently PyDict_Check so we aren't telling it anything it didn't already know.
Conversley changing all Py(Tuple|List)_GET_SIZE to point to plain Size has no measurable slowdown! Well, in the range of 0.5%, which may just be noise. Switching the #define to the real functions generates some spurious warnings because the regular methods expect PyObjects and not the more specific types.