Differences between revisions 62 and 63
Revision 62 as of 2006-05-22 15:06:18
Size: 11545
Editor: FredrikLundh
Comment:
Revision 63 as of 2006-05-22 15:45:38
Size: 11191
Comment:
Deletions are marked like this. Additions are marked like this.
Line 85: Line 85:
---- /!\ '''Edit conflict - other version:''' ----
Line 87: Line 86:

---- /!\ '''Edit conflict - your version:''' ----
 * Optimize methods in stringobject.c/unicodeobject.c

---- /!\ '''End of edit conflict''' ----
Line 104: Line 98:
---- /!\ '''Edit conflict - other version:''' ----
Line 111: Line 104:

---- /!\ '''Edit conflict - your version:''' ----

---- /!\ '''End of edit conflict''' ----
 
* Add itertools.imerge() and itertools.izip_longest()
* Add itertools.imerge() and itertools.izip_longest()

Sprint topics following the NeedForSpeed theme

You can add additional topics below. Please discuss the specific goals and possible approaches to these tasks!

CPython

  • CPython 2.5 is slower than CPython 2.4. See ["/Slowdown"].
  • Evaluate the PEPs for optimizing global and attribute lookups
  • Can floating point ops be sped-up by avoiding flag/exception checks at every step? Can some floating point ops be in-lined in ceval.c?
    • TimPeters: By default, CPython does no flag or exception checks on float ops -- the PyFPE_START_PROTECT and PyFPE_END_PROTECT macros normally have empty expansions. Inlining is possible but probably undesirable. Doing masses of fp ops in one gulp via the NumPy (or whatever it's called now) extension is the sanest approach.

  • Implement portions of the decimal module in C
    • RichardJones: Looks like this has a good chance of being done in SummerOfCode.

      RaymondHettinger: We can get them off to a good start by laying the groundwork (the C struct, some access macros, and implementing a couple of methods that they can use as a model).

      GeorgBrandl: I've created a skeleton C module since I was expecting to work on this.

  • Build-out struct module to support fast, high-volume binary conversions -- perhaps with regexp analogs such struct.compile() and struct.finditer()
  • Create a string subclass that provides lazy slicing without copying
    • FredrikLundh: Creating the class is easy, but integrating it into Python is harder (most code that handles e.g. 8-bit strings assume a PyString). For Py3K, it would be quite interesting to "instrument" a Python interpreter, mapping all PyString macros to functions, and gathering some kind of usage statistics.

      TimPeters: Just noting a subtlety: PyString objects are always NUL-terminated, so that passing them to random C libraries doesn't require copying the guts. String slices won't have that property.

      RaymondHettinger: I will search for a patch that Skip wrote for a possible implementation of str.partition() which returns a string subclass with pointers to the slices instead of a full copy. Any subsequent string ops lazily evaluate the slice into a full string.

  • Allow selective removal of unused features such as profiling support
    • SteveHolden: this looks promising enough that I'll try to get my teeth into it. I'll take suggestions for other features we might look at for conditional removal.

  • Faster parsing of strings and bytes into int, long, etc.
    • TimPeters: part of that is algorithms, and part is the sheer depth of the call stack. There's at least one patch pending to use faster algorithms for conversion of decimal strings to ints/longs. The platform C library converts decimal strings to floats.

      SeanReifschneider: If the string is less than 9 bytes, can we just call strtol? Would that help? I suspect the check may cost too much, but maybe something in between like special-casing short strings to call strtol?

  • Buffer for use with network I/O
    • Fredrik Lundh: Also see the stringobject comment below.

      SeanReifschneider: If we have buffering of network I/O, we can change readline() so that there is not a system-call for every character in the line. What about for file I/O?

  • Build-out the collections module for optimized data structures:
  • Create a 64 bit PyInt type (for 32 bit machines)

    • FredrikLundh: PyInt64, I hope? Or a configuration option? Or a polymorphic-under-the-hood PyLong type ?

      TimPeters: Guido would probably be happy if "short" Python ints were in fact 64 bits on all boxes; that's come up before.

  • Optimize methods in stringobject.c/unicodeobject.c
    • FredrikLundh: I'd like to work on refactoring the string method implementations into a "polymorphic" (SRE-style) support library. This would let us share source code between 8-bit and Unicode strings, and make it easier to reuse code also for future array/buffer/bytes types (etc).

      SeanReifschneider: I'd help with that. Adding rfind() to string was horribly painful because of the code duplication.

* Comparing 8-bit and Unicode performance (AndrewDalke, FredrikLundh)

  • 8-bit repeat is a lot faster than Unicode:
    • handle single-character strings separately (use Py_UNICODE_FILL)
    • instead of repeating the original string, repeat the string that has already been built (use Py_UNICODE_COPY instead of memcpy)

* Comparing 8-bit and Unicode performance (AndrewDalke, FredrikLundh)

  • 8-bit repeat is a lot faster than Unicode:
    • handle single-character strings separately (use Py_UNICODE_FILL)
    • instead of repeating the original string, repeat the string that has already been built (use Py_UNICODE_COPY instead of memcpy)

* Add itertools.imerge() and itertools.izip_longest()

  • Guido has a standing request to have threading.py written in C
    • ChristianTismer: I could imagine to try to create a C module for threading.py, but writing it in RPython instead of C. The RPython version would use a subset of Python, that still works from alone. In the presence of an installed PyPy release, configure could find this out and create a C extension, instead. This was from a discussion with Martin v.L. (who would love to come if he could) but needs some more input from the sprinters.

  • Revisit Armin's zombie frame idea for reducing function call overhead.
    • RaymondHettinger: IIRC, the unsolved problem was how to save partially constructed frames without impacting the performance of recursive functions.

  • Other ideas for speeding up function calls. For example, [http://www.python.org/sf/1479611 patch 1479611].

  • DONE See ListOfPerformanceRelatedPatches for the list. Are there any other patches in the [http://sourceforge.net/tracker/?group_id=5470&atid=305470 patch tracker] that are worth investigating? See ListOfPerformanceRelatedPatches for the list.

  • Improve gzip's readline performance (e.g. [http://www.python.org/sf/1281707 patch 1281707]).

  • Improve interpreter startup time, like in patch 921466.

Pure Python Projects

  • Improve [http://dada.perl.it/shootout/python.html language shootout] submissions

    • SteveHolden: while this might be useful for PR purposes, does it actually win us much in terms of improving performance of existing programs?

  • Improve the API for timeit.py. Guido thinks that much of the intelligence in the command-line interface should be shifted to importable functions.
  • Continue function call speed enhancements.
    • From Neal Norwitz on python-dev: I should probably check in my perf patch for speeding up function calls. It would be great if someone picked that up in the sprint and polished it off. The last version I posted should work in debug and normal modes, adds some simple profiling. i think the only thing that it didn't have that Martin requested was more inlining of PyCFunction_CallMethod(?). I'm not sure if that would be faster or not.

  • Update PythonSpeed/PerformanceTips

Twisted

  • Speed improvements to select and poll reactors
  • Reactor based on /dev/epoll
  • Better integration with psyco
  • Improvements against twisted benchmark

Psyco

  • Support for generator expressions
  • Support for nested scopes
  • Support for more dictionary operations
  • Speedup float arithmetic
  • Support for more built-ins (e.g. int(), long(), float(), etc.)
  • Upgrade for python 2.5
  • Better tools for profiling psyco-ness of application
  • Investigate usefulness of IVM (with aim to producing a more streamlined dispatch loop)
  • LLVM backend
  • Virtualized longs (for long longs)
  • Virtualized slots (Ability to cache getattribute() values)

Py3000

  • Make a wishlist for possible performance gains in Py3.0

Shared Memory Lib

* MartinBlais: I have had this idea in the back of my head for the longest time, to write a small C library to let non-related Python processes (i.e. not forked from one another) access a same shared memory location, even if it is just to share strings, in the manner of a shared dictionary. There is POSH, but it cannot do that, you have to fork from a single parent to use it, which means it won't work within certain contexts (i.e. apache prefork). There is memcached, but it uses sockets, I'm sure shmem would be faster (and it's applicable in the case where sharing only within the host is good enough). Now, I understand that it's not directly on the Python interpreter itself, but it would provide a path to making certain kinds of Python applications faster (could EWT benefit from this?), so I'm proposing it here anyway. I'm thinking of the following steps:

  • - Write it in C, with a thin Python layer just to convert the strings - Only allow the following operations: get, set, del, list keys - Only allow storing strings into it

(I'm a sucker, really, because that's mostly an excuse to play with shmem interfaces, which I haven't had the chance to do under Linux yet...)

NeedForSpeed/Goals (last edited 2008-11-15 13:59:37 by localhost)

Unable to edit the page? See the FrontPage for instructions.