Differences between revisions 26 and 27
Revision 26 as of 2010-04-17 00:16:30
Size: 8926
Editor: gad13-2-82-231-218-205
Comment: link PyCon presentation
Revision 27 as of 2012-08-02 23:51:38
Size: 8857
Editor: JimD
Comment: wiki restore 2013-01-23
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
## page was renamed from GIL
In CPython, the '''global interpreter lock''', or '''GIL''', is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.)
Line 4: Line 2:
CPython extensions must be GIL-aware in order to avoid defeating threads. For an explanation, see [[http://docs.python.org/api/threads.html|Global interpreter lock]]. In CPython, the '''global interpreter lock''', or '''GIL''', is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.)
Line 6: Line 4:
The GIL is controversial because it prevents multithreaded CPython programs from taking full advantage of multiprocessor systems in certain situations. Note that potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen ''outside'' the GIL. Therefore it is only in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode, that the GIL becomes a bottleneck.
Line 8: Line 5:
However [[http://www.dabeaz.com/python/GIL.pdf|the GIL degrades performance]] even when it is not a bottleneck.  Summarizing those slides: The system call overhead is significant, especially on multicore hardware.  Two threads calling a function may take twice as much time as a single thread calling the function twice.  The GIL can cause I/O-bound threads to be scheduled ahead of CPU-bound threads.  And it prevents signals from being delivered. CPython extensions must be GIL-aware in order to avoid defeating threads. For an explanation, see [[http://docs.python.org/api/threads.html|Global interpreter lock]].


The GIL is controversial because it prevents multithreaded CPython programs from taking full advantage of multiprocessor systems in certain situations. Note that potentially blocking or long-running operations, such as I/O, image processing, and [[NumPy|NumPy]] number crunching, happen ''outside'' the GIL. Therefore it is only in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode, that the GIL becomes a bottleneck.


However [[http://www.dabeaz.com/python/GIL.pdf|the GIL degrades performance]] even when it is not a bottleneck. Summarizing those slides: The system call overhead is significant, especially on multicore hardware. Two threads calling a function may take twice as much time as a single thread calling the function twice. The GIL can cause I/O-bound threads to be scheduled ahead of CPU-bound threads. And it prevents signals from being delivered.
Line 12: Line 16:
Jython and IronPython have no GIL and can fully exploit multiprocessor systems.
Line 14: Line 17:
[Mention place of GIL in StacklessPython.] Jython and [[IronPython|IronPython]] have no GIL and can fully exploit multiprocessor systems.


[Mention place of GIL in [[StacklessPython|StacklessPython]].]
Line 19: Line 25:
Getting rid of the GIL is an occasional topic on the python-dev mailing list. No one has managed it yet. The following properties are all highly desirable for any potential GIL replacement; some are hard requirements.
Line 21: Line 26:
 * '''Simplicity.''' The proposal must be implementable and maintainable in the long run. Getting rid of the GIL is an occasional topic on the python-dev mailing list. No one has managed it yet. The following properties are all highly desirable for any potential GIL replacement; some are hard requirements.
Line 23: Line 28:
 * '''Concurrency.''' The point of eliminating the GIL would be to improve the performance of multithreaded programs. So any proposal must show that it actually does so in practice.
Line 25: Line 29:
 * '''Speed.''' The [[BDFL]] has said he will reject any proposal in this direction that slows down single-threaded programs. (citation needed)  Note that this is harder than it looks.  The existing reference count mechanism is very fast in the non-concurrent case, but means that almost any reference to an object is a modification (at least to the refcount); many concurrent GC algorithms assume that modifications are rare.  * '''Simplicity.''' The proposal must be implementable and maintainable in the long run.
 * '''Concurrency.''' The point of eliminating the GIL would be to improve the performance of multithreaded programs. So any proposal must show that it actually does so in practice.
 * '''Speed.'''
The [[BDFL|BDFL]] has said he will reject any proposal in this direction that slows down single-threaded programs. (citation needed) Note that this is harder than it looks. The existing reference count mechanism is very fast in the non-concurrent case, but means that almost any reference to an object is a modification (at least to the refcount); many concurrent GC algorithms assume that modifications are rare.
 * '''Features.''' The proposal must support existing CPython features including `__del__` and weak references.
  .
  `__del__` isn't thread-safe, becoming a large problem if any sort of locking becomes commonplace. My CPython fork will provide a safer and more reliable replacement. --Rhamphoryncus
Line 27: Line 36:
 * '''Features.''' The proposal must support existing CPython features including `__del__` and weak references.    .
   What's not thread-safe about `__del__`? --jorendorff
Line 29: Line 39:
    `__del__` isn't thread-safe, becoming a large problem if any sort of locking becomes commonplace. My CPython fork will provide a safer and more reliable replacement. --Rhamphoryncus     . The language doesn't say anything about what sort of atomicity any operation has. That dict operations are often thread-safe in CPython today is mostly to avoid low-level crashes.
    This is normally solved in threads by using locks, but `__del__` may be executed currently holding the same lock you want, resulting in a deadlock.
Line 31: Line 42:
        What's not thread-safe about `__del__`? --jorendorff     To fix this (without adding a memory model to the language) requires you run `__del__` in a dedicated system thread and require you to use locks (such as those provided by a monitor.) (Non-blocking algorithms are possible in assembly, but insanely overcomplicated from a Python perspect.) --Rhamphoryncus
Line 33: Line 44:
            The language doesn't say anything about what sort of atomicity any operation has. That dict operations are often thread-safe in CPython today is mostly to avoid low-level crashes.
Line 35: Line 45:
            This is normally solved in threads by using locks, but `__del__` may be executed currently holding the same lock you want, resulting in a deadlock.
Line 37: Line 46:
            To fix this (without adding a memory model to the language) requires you run `__del__` in a dedicated system thread and require you to use locks (such as those provided by a monitor.) (Non-blocking algorithms are possible in assembly, but insanely overcomplicated from a Python perspect.) --Rhamphoryncus
Line 39: Line 47:
 * '''API compatibility.''' The proposal should be source-compatible with the macros used by all existing CPython extensions (`Py_INCREF` and friends). See [[http://docs.python.org/api/countingRefs.html|Python/C API Reference Manual: Reference Counting]].  * '''API compatibility.''' The proposal should be source-compatible with the macros used by all existing CPython extensions (`Py_INCREF` and friends). See [[http://docs.python.org/api/countingRefs.html|Python/C API Reference Manual: Reference Counting]].
 * '''Prompt destruction''' (nice to have?). The existing reference-counting scheme destroys objects as soon as they become unreachable, except for objects in reference cycles. Those are collected later by Python's cycle collector. Some CPython programs depend on this, e.g. to close `file`s promptly, so it would be nice to keep this feature.
 * '''Ordered destruction''' (nice to have?). Barring cycles, Python currently always destroys an unreachable object ''X'' before destroying any other objects referenced by ''X''. This means all the object's attributes are still there when `__del__` runs. (Many garbage collection schemes don't guarantee this.)
  . I'd say this is necessary for Python. There's very little you can usefully do with a half-destroyed object. That which you can do, you could also do without being exposed to half-destroyed objects. --Rhamphoryncus
   .
   The [[http://docs.python.org/ref/customization.html|language reference]] doesn't require this. I doubt Jython or [[IronPython|IronPython]] provides it. --jorendorff
Line 41: Line 54:
 * '''Prompt destruction''' (nice to have?). The existing reference-counting scheme destroys objects as soon as they become unreachable, except for objects in reference cycles. Those are collected later by Python's cycle collector. Some CPython programs depend on this, e.g. to close `file`s promptly, so it would be nice to keep this feature.     . They seem deliberately vague. Java distinguishes finalized from non-finalized objects, and a single finalizer is ordered with regard to non-finalized objects. The catch is that it's not ordered with regard to other finalizers, so you need to program as if they may already be deleted. In practise this means avoiding finalizers unless absolutely necessary, and if necessary they must not depend on each other.
    CPython uses low-level finalizers extensively, so it must have stronger guarantees. Because of this, it refuses to delete cycles involving `__del__`. The interesting realization is that, although finalizers in java are run even in the face of reference cycles, they cannot be correct unless you eliminate finalizer cycles. Finding ways to tell the GC implementation of this key distinction lets you have your cake and eat it too. --Rhamphoryncus
Line 43: Line 57:
 * '''Ordered destruction''' (nice to have?). Barring cycles, Python currently always destroys an unreachable object ''X'' before destroying any other objects referenced by ''X''. This means all the object's attributes are still there when `__del__` runs. (Many garbage collection schemes don't guarantee this.)      .
     I don't subscribe to "it must have stronger guarantees". CPython uses C-level finalizers mostly for memory management and occasionally (as with `file` objects) for some non-order-dependent cleanup. --jorendorff
Line 45: Line 60:
    I'd say this is necessary for Python. There's very little you can usefully do with a half-destroyed object. That which you can do, you could also do without being exposed to half-destroyed objects. --Rhamphoryncus       .
      The implementation doesn't distinguish C-level finalizers from Python-level finalizers (except to refuse to delete cycles involving `__del__`), which is why it needs stronger guarantees. If you made `__del__` use a separate pass then you could loosen it to what Java provides. --Rhamphoryncus
Line 47: Line 63:
        The [[http://docs.python.org/ref/customization.html|language reference]] doesn't require this. I doubt Jython or IronPython provides it. --jorendorff
Line 49: Line 64:
            They seem deliberately vague. Java distinguishes finalized from non-finalized objects, and a single finalizer is ordered with regard to non-finalized objects. The catch is that it's not ordered with regard to other finalizers, so you need to program as if they may already be deleted. In practise this means avoiding finalizers unless absolutely necessary, and if necessary they must not depend on each other.
Line 51: Line 65:
            CPython uses low-level finalizers extensively, so it must have stronger guarantees. Because of this, it refuses to delete cycles involving `__del__`.
Line 53: Line 66:
            The interesting realization is that, although finalizers in java are run even in the face of reference cycles, they cannot be correct unless you eliminate finalizer cycles. Finding ways to tell the GC implementation of this key distinction lets you have your cake and eat it too. --Rhamphoryncus
Line 55: Line 67:
                I don't subscribe to "it must have stronger guarantees". CPython uses C-level finalizers mostly for memory management and occasionally (as with `file` objects) for some non-order-dependent cleanup. --jorendorff
Line 57: Line 68:
                    The implementation doesn't distinguish C-level finalizers from Python-level finalizers (except to refuse to delete cycles involving `__del__`), which is why it needs stronger guarantees. If you made `__del__` use a separate pass then you could loosen it to what Java provides. --Rhamphoryncus
Line 62: Line 72:
API compatibility is an especially difficult aspect of the problem. All concurrent memory management schemes we've found rely on one or more of the following techniques, all of which are incompatible with the existing Python/C API.
Line 64: Line 73:
 * '''Tracing.''' Most garbage collectors need to be able to start with an object and enumerate all the objects that it points to. The builtin CPython pointer-containing types, like `PyList` and `PyDict`, all have a `tp_traverse` method that can do this, but not all extension types have that method. API compatibility is an especially difficult aspect of the problem. All concurrent memory management schemes we've found rely on one or more of the following techniques, all of which are incompatible with the existing Python/C API.
Line 66: Line 75:
 * '''Write barriers.''' A write barrier is a small piece of code that executes whenever a pointer variable is modified. Alas, no matter how you hack the `Py_INCREF` etc. macros, you can't make a write barrier hook out of them. Even if you could, many schemes require a different write barrier for stack variables vs. global variables vs. object fields that point to other objects; nothing in the Python/C API makes that distinction.
Line 68: Line 76:
 * '''Exact stack information.''' Exact garbage collection schemes need to be able to mark all objects reachable from local C variables. To do this, some schemes need to know where such variables are located on the C stack (and/or registers)--something the Python/C API does not require extensions to track.  * '''Tracing.''' Most garbage collectors need to be able to start with an object and enumerate all the objects that it points to. The builtin CPython pointer-containing types, like `PyList` and `PyDict`, all have a `tp_traverse` method that can do this, but not all extension types have that method.
 * '''Write barriers.''' A write barrier is a small piece of code that executes whenever a pointer variable is modified. Alas, no matter how you hack the `Py_INCREF` etc. macros, you can't make a write barrier hook out of them. Even if you could, many schemes require a different write barrier for stack variables vs. global variables vs. object fields that point to other objects; nothing in the Python/C API makes that distinction.
 * '''Exact stack information.''' Exact garbage collection schemes need to be able to mark all objects reachable from local C variables. To do this, some schemes need to know where such variables are located on the C stack (and/or registers)--something the Python/C API does not require extensions to track.
Line 72: Line 83:
Another issue in this area is that existing C extensions depend on the GIL guarantees.  They assume that when extension code is called, all other threads are locked out.  If an extension does need to deal with a threaded environment, it explicitly opts in (by releasing the GIL).  Therefore any would-be GIL replacement must provide GIL-like guarantees by default.  Threading must remain opt-in for extensions.
Another issue in this area is that existing C extensions depend on the GIL guarantees. They assume that when extension code is called, all other threads are locked out. If an extension does need to deal with a threaded environment, it explicitly opts in (by releasing the GIL). Therefore any would-be GIL replacement must provide GIL-like guarantees by default. Threading must remain opt-in for extensions.
Line 76: Line 89:
Line 77: Line 91:
 * [[http://us.pycon.org/2010/conference/schedule/event/76|Understanding the Python GIL]]: Conference at PyCon 2010  * [[http://www.dabeaz.com/GIL/|Understanding the Python GIL]]: David Beazley at [[PyCon|PyCon]] 2010

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.)

CPython extensions must be GIL-aware in order to avoid defeating threads. For an explanation, see Global interpreter lock.

The GIL is controversial because it prevents multithreaded CPython programs from taking full advantage of multiprocessor systems in certain situations. Note that potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL. Therefore it is only in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode, that the GIL becomes a bottleneck.

However the GIL degrades performance even when it is not a bottleneck. Summarizing those slides: The system call overhead is significant, especially on multicore hardware. Two threads calling a function may take twice as much time as a single thread calling the function twice. The GIL can cause I/O-bound threads to be scheduled ahead of CPU-bound threads. And it prevents signals from being delivered.

Non-CPython implementations

Jython and IronPython have no GIL and can fully exploit multiprocessor systems.

[Mention place of GIL in StacklessPython.]

Eliminating the GIL

Getting rid of the GIL is an occasional topic on the python-dev mailing list. No one has managed it yet. The following properties are all highly desirable for any potential GIL replacement; some are hard requirements.

  • Simplicity. The proposal must be implementable and maintainable in the long run.

  • Concurrency. The point of eliminating the GIL would be to improve the performance of multithreaded programs. So any proposal must show that it actually does so in practice.

  • Speed. The BDFL has said he will reject any proposal in this direction that slows down single-threaded programs. (citation needed) Note that this is harder than it looks. The existing reference count mechanism is very fast in the non-concurrent case, but means that almost any reference to an object is a modification (at least to the refcount); many concurrent GC algorithms assume that modifications are rare.

  • Features. The proposal must support existing CPython features including __del__ and weak references.

    • __del__ isn't thread-safe, becoming a large problem if any sort of locking becomes commonplace. My CPython fork will provide a safer and more reliable replacement. --Rhamphoryncus

      • What's not thread-safe about __del__? --jorendorff

        • The language doesn't say anything about what sort of atomicity any operation has. That dict operations are often thread-safe in CPython today is mostly to avoid low-level crashes.

          This is normally solved in threads by using locks, but __del__ may be executed currently holding the same lock you want, resulting in a deadlock.

          To fix this (without adding a memory model to the language) requires you run __del__ in a dedicated system thread and require you to use locks (such as those provided by a monitor.) (Non-blocking algorithms are possible in assembly, but insanely overcomplicated from a Python perspect.) --Rhamphoryncus

  • API compatibility. The proposal should be source-compatible with the macros used by all existing CPython extensions (Py_INCREF and friends). See Python/C API Reference Manual: Reference Counting.

  • Prompt destruction (nice to have?). The existing reference-counting scheme destroys objects as soon as they become unreachable, except for objects in reference cycles. Those are collected later by Python's cycle collector. Some CPython programs depend on this, e.g. to close files promptly, so it would be nice to keep this feature.

  • Ordered destruction (nice to have?). Barring cycles, Python currently always destroys an unreachable object X before destroying any other objects referenced by X. This means all the object's attributes are still there when __del__ runs. (Many garbage collection schemes don't guarantee this.)

    • I'd say this is necessary for Python. There's very little you can usefully do with a half-destroyed object. That which you can do, you could also do without being exposed to half-destroyed objects. --Rhamphoryncus
      • The language reference doesn't require this. I doubt Jython or IronPython provides it. --jorendorff

        • They seem deliberately vague. Java distinguishes finalized from non-finalized objects, and a single finalizer is ordered with regard to non-finalized objects. The catch is that it's not ordered with regard to other finalizers, so you need to program as if they may already be deleted. In practise this means avoiding finalizers unless absolutely necessary, and if necessary they must not depend on each other.

          CPython uses low-level finalizers extensively, so it must have stronger guarantees. Because of this, it refuses to delete cycles involving __del__. The interesting realization is that, although finalizers in java are run even in the face of reference cycles, they cannot be correct unless you eliminate finalizer cycles. Finding ways to tell the GC implementation of this key distinction lets you have your cake and eat it too. --Rhamphoryncus

          • I don't subscribe to "it must have stronger guarantees". CPython uses C-level finalizers mostly for memory management and occasionally (as with file objects) for some non-order-dependent cleanup. --jorendorff

            • The implementation doesn't distinguish C-level finalizers from Python-level finalizers (except to refuse to delete cycles involving __del__), which is why it needs stronger guarantees. If you made __del__ use a separate pass then you could loosen it to what Java provides. --Rhamphoryncus

API compatibility in detail

API compatibility is an especially difficult aspect of the problem. All concurrent memory management schemes we've found rely on one or more of the following techniques, all of which are incompatible with the existing Python/C API.

  • Tracing. Most garbage collectors need to be able to start with an object and enumerate all the objects that it points to. The builtin CPython pointer-containing types, like PyList and PyDict, all have a tp_traverse method that can do this, but not all extension types have that method.

  • Write barriers. A write barrier is a small piece of code that executes whenever a pointer variable is modified. Alas, no matter how you hack the Py_INCREF etc. macros, you can't make a write barrier hook out of them. Even if you could, many schemes require a different write barrier for stack variables vs. global variables vs. object fields that point to other objects; nothing in the Python/C API makes that distinction.

  • Exact stack information. Exact garbage collection schemes need to be able to mark all objects reachable from local C variables. To do this, some schemes need to know where such variables are located on the C stack (and/or registers)--something the Python/C API does not require extensions to track.

It is barely credible that CPython might someday make tp_traverse mandatory for pointer-carrying types; adding support for write barriers or stack bookkeeping to the Python/C API seems extremely unlikely.

Another issue in this area is that existing C extensions depend on the GIL guarantees. They assume that when extension code is called, all other threads are locked out. If an extension does need to deal with a threaded environment, it explicitly opts in (by releasing the GIL). Therefore any would-be GIL replacement must provide GIL-like guarantees by default. Threading must remain opt-in for extensions.

Recent discussions

GlobalInterpreterLock (last edited 2020-12-22 21:57:53 by eriky)

Unable to edit the page? See the FrontPage for instructions.