Differences between revisions 1 and 11 (spanning 10 versions)
Revision 1 as of 2007-02-24 21:25:22
Size: 2299
Editor: 72-254-192-46
Comment:
Revision 11 as of 2007-02-26 21:15:43
Size: 5013
Editor: tmcb-u110-3N10E-CE1
Comment:
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
This Wiki will serve as a place to record import points raised during the mailing list discussion. This Wiki will serve as a place to develop the PEP until it is assigned a number and committed to the Python development tree.
Line 5: Line 5:
Overview
============
= Abstract =
Line 8: Line 7:
The buffer protocol allows different Python types to exchange a pointer to a sequence of internal buffers. This functionality is *extremely* useful for sharing large segments of memory between different high-level objects, but it's too limited and has issues. This PEP proposes re-designing the buffer API (!PyBufferProcs function pointers) to improve the way Python allows memory sharing in Python 3.0
Line 10: Line 9:
1) There is the little used "sequence-of-segments" option.
2) There is no way for a consumer to tell the protocol-exporting object it is "finished" with its view of the memory and therefore no way for the object to be sure that it can reallocate the pointer to the memory that it owns (the array object reallocating its memory after sharing it with the buffer object led to the infamous buffer-object problem).
3) Memory is just a pointer. There is no way to describe what's "in" the memory (float, int, C-structure, etc.)
4) There is no shape information provided for the memory. But, several array-like Python types could make use of a standard way to describe the shape of the memory (wxPython, GTK, CVXOPT, PyVox, Audio and Video Libraries, ctypes, NumPy)
In particular, it is proposed that the multiple-segment and character buffer portions of the buffer API are eliminated and additional function pointers are provided to allow sharing any multi-dimensional character of the memory and what the memory contains.
Line 15: Line 11:
Proposal
=========
= Rational =
Line 18: Line 13:
1) Replace the buffer protocol that allows sharing of a single pointer to memory
2) Have the protocol define a way to describe what's in the memory location (this should unify what is done now in struct, array, ctypes, and NumPy)
3) Have the protocol be able to share information about shape (and striding if any)
4) Allow exporting objects to define some function that should be called when the consumer object is "done" with the view.
The buffer protocol allows different Python types to exchange a pointer to a sequence of internal buffers. This functionality is '''extremely''' useful for sharing large segments of memory between different high-level objects, but it's too limited and has issues.
Line 23: Line 15:
Idea
======
  1. There is the little (never?) used "sequence-of-segments" option (bf_getsegcount)
  2. There is the apparently redundant character-buffer option (bf_getcharbuffer)
  3. There is no way for a consumer to tell the buffer-API-exporting object it is "finished" with its view of the memory and therefore no way for the expoerting object to be sure that it is safe to reallocate the pointer to the memory that it owns (the array object reallocating its memory after sharing it with the buffer object which held the original pointer led to the infamous buffer-object problem).
  4. Memory is just a pointer with a length. There is no way to describe what's "in" the memory (float, int, C-structure, etc.)
  5. There is no shape information provided for the memory. But, several array-like Python types could make use of a standard way to describe the shape-interpretation of the memory (!wxPython, GTK, pyQT, CVXOPT, !PyVox, Audio and Video Libraries, ctypes, !NumPy)
Line 26: Line 21:
All that is needed is to create a Python "memory_view" object that can contain all the information needed and be returned when the buffer protocol is called --- when it is garbage-collected, the "bp_release_view" function is called on the exporting object. = General Proposal =
Line 28: Line 23:
This "memory_view" is essentially the old Numeric C-structure (including the fact that the data-format is described by another C-structure).   1. Get rid of the char-buffer and multiple-segment sections of the buffer-protocol.
  2. Unify the read/write versions of getting the buffer.
  2. Add a new function to the protocol that should be called when the consumer object is "done" with the view.
  3. Add a new function to allow the protocol to describe what is in memory (unifying what is currently done now in struct and array)
  4. Add a new function to allow the protocol to share shape information
Line 30: Line 29:
This object is what the buffer protocol should return. = Specification =
Line 32: Line 31:
 Change the PyBufferProcs structure to
Line 33: Line 33:
Details
===========
{{{
typedef struct {
     getbufferproc bf_getbuffer
     releasebufferproc bf_releasebuffer
     formatbufferproc bf_getbufferformat
     shapebufferproc bf_getbuffershape
}
}}}
Line 36: Line 42:
The signatures and purposes of these function pointers are provided here:
Line 37: Line 44:
Questions
===========
{{{
typedef Py_ssize_t (*getbufferproc)(PyObject *obj, void **buf, int writeable)
}}}
A pointer to the memory is returned in buf and the length of that memory buffer is the function return value. If writeable is 1, then a writeable buffer is needed, otherwise a read-only buffer is sufficient. A -1 is returned if an error occurs.
Line 40: Line 49:
{{{
typedef int (*releasebufferproc)(PyObject *obj)
}}}
This function is called when a view of memory previously acquired from the object is no longer needed. It is up to the exporter of the API to make sure all views have been released before eliminating a reference to a previously returned pointer. It is up to consumers of the API to call this function on the object whose view is obtained when it is no longer needed. A -1 is returned on error and 0 on success.
Line 41: Line 54:
Problems
===========
{{{
typedef char *(*formatbufferproc)(PyObject *obj)
}}}
Get the format-string of the memory using the struct-string syntax (see below for additions to the struct-string syntax). If the implied size of this string is smaller than the length of the buffer then it is assumed that the string is repeated.

{{{
typedef PyObject *(*shapebufferproc)(PyObject *obj)
}}}
Return a 3-tuple of lists containing shape information: (shape, strides, offsets). The strides and offsets objects can be None if the memory is C-style contiguous with 0 offsets in each dimension).

Some C-API calls should also be made available.

All of these routines are optional (but the last three make no sense unless at least one of the first two is implemented).

= Additions to the struct string-syntax =

||Character|| Description ||
||'1' || bit (number before states how many bits) ||
||'g' || long double ||
||'F' || complex float ||
||'D' || complex double ||
||'G' || complex long double ||
||'&' || pointer to (prefix before another charater) ||
||'O' || pointer to Python Object ||
||'X{}'|| pointer to a function (function signature inside of {})||
|| 'c' || ucs-1 (latin-1) encoding ||
|| 'u' || ucs-2 ||
|| 'w' || ucs-4 ||
|| '(' || begin nested structure ||
|| ')' || end nested structure ||

We should also allow an endian-specification inside the string so that it could change if needed.

= Questions =

This pre-PEP proposes enhancing the buffer protocol in Python 3000 to implement the array interface (protocol).

This Wiki will serve as a place to develop the PEP until it is assigned a number and committed to the Python development tree.

Abstract

This PEP proposes re-designing the buffer API (PyBufferProcs function pointers) to improve the way Python allows memory sharing in Python 3.0

In particular, it is proposed that the multiple-segment and character buffer portions of the buffer API are eliminated and additional function pointers are provided to allow sharing any multi-dimensional character of the memory and what the memory contains.

Rational

The buffer protocol allows different Python types to exchange a pointer to a sequence of internal buffers. This functionality is extremely useful for sharing large segments of memory between different high-level objects, but it's too limited and has issues.

  1. There is the little (never?) used "sequence-of-segments" option (bf_getsegcount)
  2. There is the apparently redundant character-buffer option (bf_getcharbuffer)
  3. There is no way for a consumer to tell the buffer-API-exporting object it is "finished" with its view of the memory and therefore no way for the expoerting object to be sure that it is safe to reallocate the pointer to the memory that it owns (the array object reallocating its memory after sharing it with the buffer object which held the original pointer led to the infamous buffer-object problem).
  4. Memory is just a pointer with a length. There is no way to describe what's "in" the memory (float, int, C-structure, etc.)
  5. There is no shape information provided for the memory. But, several array-like Python types could make use of a standard way to describe the shape-interpretation of the memory (!wxPython, GTK, pyQT, CVXOPT, PyVox, Audio and Video Libraries, ctypes, NumPy)

General Proposal

  1. Get rid of the char-buffer and multiple-segment sections of the buffer-protocol.
  2. Unify the read/write versions of getting the buffer.
  3. Add a new function to the protocol that should be called when the consumer object is "done" with the view.
  4. Add a new function to allow the protocol to describe what is in memory (unifying what is currently done now in struct and array)
  5. Add a new function to allow the protocol to share shape information

Specification

typedef struct {
     getbufferproc bf_getbuffer
     releasebufferproc bf_releasebuffer
     formatbufferproc bf_getbufferformat
     shapebufferproc bf_getbuffershape 
}

The signatures and purposes of these function pointers are provided here:

typedef Py_ssize_t (*getbufferproc)(PyObject *obj, void **buf, int writeable)

A pointer to the memory is returned in buf and the length of that memory buffer is the function return value. If writeable is 1, then a writeable buffer is needed, otherwise a read-only buffer is sufficient. A -1 is returned if an error occurs.

typedef int (*releasebufferproc)(PyObject *obj)

This function is called when a view of memory previously acquired from the object is no longer needed. It is up to the exporter of the API to make sure all views have been released before eliminating a reference to a previously returned pointer. It is up to consumers of the API to call this function on the object whose view is obtained when it is no longer needed. A -1 is returned on error and 0 on success.

typedef char *(*formatbufferproc)(PyObject *obj)

Get the format-string of the memory using the struct-string syntax (see below for additions to the struct-string syntax). If the implied size of this string is smaller than the length of the buffer then it is assumed that the string is repeated.

typedef PyObject *(*shapebufferproc)(PyObject *obj)

Return a 3-tuple of lists containing shape information: (shape, strides, offsets). The strides and offsets objects can be None if the memory is C-style contiguous with 0 offsets in each dimension).

Some C-API calls should also be made available.

All of these routines are optional (but the last three make no sense unless at least one of the first two is implemented).

Additions to the struct string-syntax

Character

Description

'1'

bit (number before states how many bits)

'g'

long double

'F'

complex float

'D'

complex double

'G'

complex long double

'&'

pointer to (prefix before another charater)

'O'

pointer to Python Object

'X{}'

pointer to a function (function signature inside of {})

'c'

ucs-1 (latin-1) encoding

'u'

ucs-2

'w'

ucs-4

'('

begin nested structure

')'

end nested structure

We should also allow an endian-specification inside the string so that it could change if needed.

Questions

ArrayInterface (last edited 2008-11-15 14:00:41 by localhost)

Unable to edit the page? See the FrontPage for instructions.