Differences between revisions 2 and 9 (spanning 7 versions)
Revision 2 as of 2012-06-24 07:37:52
Size: 3789
Editor: JeffAllen
Comment: References and design
Revision 9 as of 2014-05-11 07:52:22
Size: 6966
Editor: JeffAllen
Comment: Spam
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
This page proposes (later documents) a design for a Jython equivalent to the CPython buffer protocol. This page proposes (now begins to document) a design for a Jython equivalent to
the CPython buffer protocol. A good place to start is this
[[attachment:buffer_api_model.pdf | worked example (pdf)]]
that motivated the current design.
Line 7: Line 10:
== Introduction (Situation in June 2012) == == What is the Buffer API? ==

The Jython Buffer API is an interface you can use in Java when
accessing or implementing certain built-in types, or your own.
It is the basis of the type `memoryview` with which it shares
many features.
Line 15: Line 23:
modules. And it is the basis of the type `memoryview`. modules.
Line 17: Line 25:
Although the buffer API and memoryview are Python 3k
features, they were backported into CPython 2.7. In C the
buffer protocol the exporting object gives consumers a
pointer to memory and some information about how it is
structured.
The capability is just as useful in Jython, and has been implemented
for version 2.7.
Line 23: Line 28:
Jython does not (yet) have an equivalent of buffer protocol
or support `memoryview`. There is a stub for each, but no
access to data through it. In the recent implementation of
`PyByteArray` absence of a buffer protocol implemented by
incoming arguments was a complicating factor. In CPython,
the majority of methods start by getting a buffer view of
their arguments: acceptable types are all those that
implement the API. Other types and modules present a similar
challenge in reaching 2.7 compliance. But we cannot directly
emulate the C buffer protocol in Java, since it does not
allow pointers to memory like:{{{#!cplusplus
    self_size = _getbuffer(self, &self_bytes);
    other_size = _getbuffer(other, &other_bytes);
 ...
    cmp = memcmp(self_bytes.buf, other_bytes.buf, minsize);
== Accessing an Object that has the Buffer API ==

Objects flag their willingness to provide a buffer by implementing the
following interface:
{{{#!java
public interface BufferProtocol {
    Buffer getBuffer(int flags);
}
Line 39: Line 37:
or casts like:{{{#!cplusplus
 ((float*)ap->ob_item)[i] = x;

The `flags` argument indicates the buffer characteristics wanted by the consumer,
and what kind of data organisation the consumer can cope with.
The return is an implementation of `PyBuffer` appropriate to the storage
organisation of the exporter. This interface is quite rich, but the Javadoc is
thorough. In fact, the interface is defined in two stages: a part that is
independent of the type of unit in the buffer, and a part that is definite that
the units are `byte`s.

The type-agnostic part of the interface is `PyBUF`:
{{{#!java
public interface PyBUF {

    boolean isReadonly();

    // Access to buffer (client responsible for indexing)
    int getNdim();
    int[] getShape();
    int getItemsize();
    int getLen();
    int[] getStrides();
    int[] getSuboffsets();
    boolean isContiguous(char order);

    // Constants taken from CPython object.h in v3.3
    static final int MAX_NDIM = 64;
    static final int WRITABLE = 0x0001;
    static final int SIMPLE = 0;
    static final int FORMAT = 0x0004;
    static final int ND = 0x0008;
    static final int STRIDES = 0x0010 | ND;
    static final int C_CONTIGUOUS = 0x0020 | STRIDES;
    static final int F_CONTIGUOUS = 0x0040 | STRIDES;
    static final int ANY_CONTIGUOUS = 0x0080 | STRIDES;
    static final int INDIRECT = 0x0100 | STRIDES;
    static final int CONTIG = ND | WRITABLE;
    static final int CONTIG_RO = ND;
    static final int STRIDED = STRIDES | WRITABLE;
    static final int STRIDED_RO = STRIDES;
    static final int RECORDS = STRIDES | WRITABLE | FORMAT;
    static final int RECORDS_RO = STRIDES | FORMAT;
    static final int FULL = INDIRECT | WRITABLE | FORMAT;
    static final int FULL_RO = INDIRECT | FORMAT;
    static final int NAVIGATION = SIMPLE | ND | STRIDES | INDIRECT;
    static final int IS_C_CONTIGUOUS = C_CONTIGUOUS & ~STRIDES;
    static final int IS_F_CONTIGUOUS = F_CONTIGUOUS & ~STRIDES;
    static final int CONTIGUITY = (C_CONTIGUOUS | F_CONTIGUOUS | ANY_CONTIGUOUS) & ~STRIDES;
}
Line 42: Line 86:
Yet, without something filling the role, core implementation
is made more difficult and always falls short of
compatibility.
Line 46: Line 87:
This raises two questions:
 * Can we create a buffer API that provides an equivalent facility in Java?
 * Can we go on from there to implement memoryview?
This page looks at the first of these, arguing for a
particular approach. In due course, it ought to change into
documentation of the approach, preserving the rationale of
the final design.
The byte-oriented part of the interface is `PyBuffer`. For the most part, you can
ignore the difference between `PyBUF` and `PyBuffer` and think of all the methods
as belonging to `PyBuffer`:
{{{#!java
public interface PyBuffer extends PyBUF, BufferProtocol {
Line 54: Line 93:
== References ==     // Access to buffer contents (index calculation done by buffer)
    byte byteAt(int index) throws IndexOutOfBoundsException;
    int intAt(int index) throws IndexOutOfBoundsException;
    void storeAt(byte value, int index) throws IndexOutOfBoundsException;
    byte byteAt(int... indices) throws IndexOutOfBoundsException;
    int intAt(int... indices) throws IndexOutOfBoundsException;
    void storeAt(byte value, int... indices) throws IndexOutOfBoundsException;
Line 56: Line 101:
 1. [[http://python.org/dev/peps/pep-3118/|PEP 3118]] Revising the buffer protocol
 1. [[http://docs.python.org/library/stdtypes.html#memoryview-type|memoryview type]] (in Python 2.7 docs)
 1. [[http://docs.python.org/py3k/library/stdtypes.html#memoryview-type|memoryview type]] (in Python 3 docs)
 1. [[http://docs.python.org/c-api/typeobj.html#buffer-object-structures|Buffer Object Structures]] (in Python 2.7 docs)
 1. [[http://docs.python.org/py3k/c-api/buffer.html|Buffer Protocol]] (in Python 3 docs)
 1. [[http://bugs.python.org/issue10181| CPython issue 10181]] Problems with Py_buffer management in `memoryobject.c` (and elsewhere?)
    // Bulk operations (index calculation done by buffer)
    void copyTo(byte[] dest, int destPos);
    void copyTo(int srcIndex, byte[] dest, int destPos, int length);
    void copyFrom(byte[] src, int srcPos, int destIndex, int length);
    void copyFrom(PyBuffer src);
Line 63: Line 107:
http://docs.python.org/py3k/c-api/buffer.html     // Releasing a buffer or getting another (or a slice)
    void release();
    boolean isReleased();
    @Override PyBuffer getBuffer(int flags); // from BufferProtocol
    public PyBuffer getBufferSlice(int flags, int start, int length);
    public PyBuffer getBufferSlice(int flags, int start, int length, int stride);
Line 65: Line 114:
    // Access to buffer (client responsible for indexing)
    public static class Pointer {
        public byte[] storage;
        public int offset;
        public Pointer(byte[] storage, int offset) {
            this.storage = storage;
            this.offset = offset;
        }
    }
Line 66: Line 124:
    PyBuffer.Pointer getBuf();
    PyBuffer.Pointer getPointer(int index);
    PyBuffer.Pointer getPointer(int... indices);
Line 67: Line 128:
== Design ==
=== Considerations ===
    // Interpreting the bytes
    String getFormat();
}
}}}
Notice that an object that implements `PyBuffer` must itself implement the
`BufferProtocol`. A buffer can give you a buffer, which may just be itself or
could be an independent object, depending on the implementation.
Line 70: Line 136:
 * We would benefit from something in the role of the CPython buffer API.
 * It should be as familiar as possible to people who know the CPython buffer API.
 * It should be a feasible basis for `memoryview`.
 * Java is not C:
   * some CPython buffer API idioms have no equivalent, e.g. {{{((float*)ap->ob_item)[i] = x;}}}
   * and some may be done in a better way (e.g. polymorphism in place of if statements, object lifetime).
 * 99% (maybe 100%) of core development only accesses the buffer as a one-dimensional array of bytes.
 * Some significant applications (NumPy, PIL) use the richer facilities (>1 dimension, strided access, indirection).
== Adding the Buffer API to an Object ==
Line 79: Line 138:
The core package org.python.core only defines the interfaces. Several `PyBuffer`
implementations that could be exported by object implementations are provided in
`org.python.core.buffer`. The place to start is with one of these basic types:
Line 80: Line 142:
=== Sketch interface to bytes === ||'''Buffer Type''' ||'''Suitable for ...'''||
||!SimpleBuffer ||read-only 1D array of bytes||
||!SimpleWritableBuffer || 1D array of bytes ||
||!SimpleStringBuffer ||Java String (representing bytes)||
Line 82: Line 147:
These could all be extended or for a more sophisticated behaviour, consider
extending `BaseBuffer`.
Line 83: Line 150:
=== Provision for things other than bytes === == Converting from CPython ==
Line 85: Line 152:
=== Similarities with CPython ===
Line 86: Line 154:
=== Combined proposal ===  * The navigational arrays (`shape`, `strides`, `suboffsets`) and `format`, `unitsize` are present with the same meanings.
 * The "request flags" have the same values, with similar names `PyBUF.STRIDED` in place of `PyBUF_STRIDED`.
 * The discipline of matching `get` and `release` applies also in Jython.

=== Differences from CPython ===

 * `PyBuffer` is a Java interface: quantities that were struct members become `getXXX()` methods.
 * `PyBuffer` always supplies full information (`shape`, `strides`, `format`).
 * The different buffer organisations are expressed through different classes implementing the interface.
 * Library functions taking `PyObject` or `PyBuffer` arguments in CPython become methods on those types.
 * A `PyBuffer` manages the get-release accounting for exporters.
 * Wherever CPython uses a `char*` pointer, Jython reference a buffer of bytes and an offset within it.

Buffer Protocol

This page proposes (now begins to document) a design for a Jython equivalent to the CPython buffer protocol. A good place to start is this worked example (pdf) that motivated the current design.

What is the Buffer API?

The Jython Buffer API is an interface you can use in Java when accessing or implementing certain built-in types, or your own. It is the basis of the type memoryview with which it shares many features.

In CPython, certain objects are based on an underlying memory array or buffer. The CPython designers judged it desirable to be able to access that buffer directly, without intermediate copying. CPython provides this at the C level in the form of the buffer protocol. This is used heaviliy in the implementation of some core types and standard library modules.

The capability is just as useful in Jython, and has been implemented for version 2.7.

Accessing an Object that has the Buffer API

Objects flag their willingness to provide a buffer by implementing the following interface:

   1 public interface BufferProtocol {
   2     Buffer getBuffer(int flags);   
   3 }

The flags argument indicates the buffer characteristics wanted by the consumer, and what kind of data organisation the consumer can cope with. The return is an implementation of PyBuffer appropriate to the storage organisation of the exporter. This interface is quite rich, but the Javadoc is thorough. In fact, the interface is defined in two stages: a part that is independent of the type of unit in the buffer, and a part that is definite that the units are bytes.

The type-agnostic part of the interface is PyBUF:

   1 public interface PyBUF {
   2 
   3     boolean isReadonly();
   4 
   5     // Access to buffer (client responsible for indexing)
   6     int getNdim();
   7     int[] getShape();
   8     int getItemsize();
   9     int getLen();
  10     int[] getStrides();
  11     int[] getSuboffsets();
  12     boolean isContiguous(char order);
  13 
  14     // Constants taken from CPython object.h in v3.3
  15     static final int MAX_NDIM = 64;
  16     static final int WRITABLE = 0x0001;
  17     static final int SIMPLE = 0;
  18     static final int FORMAT = 0x0004;
  19     static final int ND = 0x0008;
  20     static final int STRIDES = 0x0010 | ND;
  21     static final int C_CONTIGUOUS = 0x0020 | STRIDES;
  22     static final int F_CONTIGUOUS = 0x0040 | STRIDES;
  23     static final int ANY_CONTIGUOUS = 0x0080 | STRIDES;
  24     static final int INDIRECT = 0x0100 | STRIDES;
  25     static final int CONTIG = ND | WRITABLE;
  26     static final int CONTIG_RO = ND;
  27     static final int STRIDED = STRIDES | WRITABLE;
  28     static final int STRIDED_RO = STRIDES;
  29     static final int RECORDS = STRIDES | WRITABLE | FORMAT;
  30     static final int RECORDS_RO = STRIDES | FORMAT;
  31     static final int FULL = INDIRECT | WRITABLE | FORMAT;
  32     static final int FULL_RO = INDIRECT | FORMAT;
  33     static final int NAVIGATION = SIMPLE | ND | STRIDES | INDIRECT;
  34     static final int IS_C_CONTIGUOUS = C_CONTIGUOUS & ~STRIDES;
  35     static final int IS_F_CONTIGUOUS = F_CONTIGUOUS & ~STRIDES;
  36     static final int CONTIGUITY = (C_CONTIGUOUS | F_CONTIGUOUS | ANY_CONTIGUOUS) & ~STRIDES;
  37 }

The byte-oriented part of the interface is PyBuffer. For the most part, you can ignore the difference between PyBUF and PyBuffer and think of all the methods as belonging to PyBuffer:

   1 public interface PyBuffer extends PyBUF, BufferProtocol {
   2 
   3     // Access to buffer contents (index calculation done by buffer)
   4     byte byteAt(int index) throws IndexOutOfBoundsException;
   5     int intAt(int index) throws IndexOutOfBoundsException;
   6     void storeAt(byte value, int index) throws IndexOutOfBoundsException;
   7     byte byteAt(int... indices) throws IndexOutOfBoundsException;
   8     int intAt(int... indices) throws IndexOutOfBoundsException;
   9     void storeAt(byte value, int... indices) throws IndexOutOfBoundsException;
  10 
  11     // Bulk operations (index calculation done by buffer)
  12     void copyTo(byte[] dest, int destPos);
  13     void copyTo(int srcIndex, byte[] dest, int destPos, int length);
  14     void copyFrom(byte[] src, int srcPos, int destIndex, int length);
  15     void copyFrom(PyBuffer src);
  16 
  17     // Releasing a buffer or getting another (or a slice)
  18     void release();
  19     boolean isReleased();
  20     @Override PyBuffer getBuffer(int flags); // from BufferProtocol
  21     public PyBuffer getBufferSlice(int flags, int start, int length);
  22     public PyBuffer getBufferSlice(int flags, int start, int length, int stride);
  23 
  24     // Access to buffer (client responsible for indexing)
  25     public static class Pointer {
  26         public byte[] storage;
  27         public int offset;
  28         public Pointer(byte[] storage, int offset) {
  29             this.storage = storage;
  30             this.offset = offset;
  31         }
  32     }
  33 
  34     PyBuffer.Pointer getBuf();
  35     PyBuffer.Pointer getPointer(int index);
  36     PyBuffer.Pointer getPointer(int... indices);
  37 
  38     // Interpreting the bytes
  39     String getFormat();
  40 }

Notice that an object that implements PyBuffer must itself implement the BufferProtocol. A buffer can give you a buffer, which may just be itself or could be an independent object, depending on the implementation.

Adding the Buffer API to an Object

The core package org.python.core only defines the interfaces. Several PyBuffer implementations that could be exported by object implementations are provided in org.python.core.buffer. The place to start is with one of these basic types:

Buffer Type

Suitable for ...

SimpleBuffer

read-only 1D array of bytes

SimpleWritableBuffer

1D array of bytes

SimpleStringBuffer

Java String (representing bytes)

These could all be extended or for a more sophisticated behaviour, consider extending BaseBuffer.

Converting from CPython

Similarities with CPython

  • The navigational arrays (shape, strides, suboffsets) and format, unitsize are present with the same meanings.

  • The "request flags" have the same values, with similar names PyBUF.STRIDED in place of PyBUF_STRIDED.

  • The discipline of matching get and release applies also in Jython.

Differences from CPython

  • PyBuffer is a Java interface: quantities that were struct members become getXXX() methods.

  • PyBuffer always supplies full information (shape, strides, format).

  • The different buffer organisations are expressed through different classes implementing the interface.
  • Library functions taking PyObject or PyBuffer arguments in CPython become methods on those types.

  • A PyBuffer manages the get-release accounting for exporters.

  • Wherever CPython uses a char* pointer, Jython reference a buffer of bytes and an offset within it.

BufferProtocol (last edited 2014-06-14 14:38:51 by JeffAllen)