Background

This page considers the integration of java.util collections interfaces into core jython objects in going from Jython 2.1 to 2.2:

Jython 2.1 support for Java Collections integration is a one-way street. It's possible to make Collections object act as a PyObject but it's not possible to make a PyObject act as a Collection.

The integration of Collections into Jython happens through the CollectionProxy and CollectionProxy2 classes. They wrap the Collection instance with the appropriate proxy and delegate Jython calls to the Collection instance.

Going the other way fails. Take for example:

  >>> from java.util import ArrayList
  >>> a = ArrayList([1,2,3])
  Traceback (innermost last):
    File "<console>", line 1, in ?
  TypeError: java.util.ArrayList(): 1st arg can't be coerced to java.util.Collection or int

In this example the ArrayList constructor is expecting a java.util.Collection instance but since the PyList does not implement this interface the TypeError is thrown. Since the Collection framework is fundamental to Java since 1.2 the JythonDevelopmentTeam will address this issue. The implementation is currently being written by ClarkUpdike.

Design

There are two different approaches:

  1. Subclass the Abstract classes available in java.util for the Collection framework.
  2. Continuing subclassing PyObject with the additional work of implementing the appropriate interface.

Approach 2 offers the best integration options. Jython is primarily an implementation of Python so implementing the data structures as they are in Python takes priority over the Java implementations. In addition, the keyword and index arguments for method calls are already done. The implementation of the interfaces will need only delegate to the appropriate PyObject instance's method for the same functionality.

Jython Class

Extends

Implements

PySequence

PyObject

List (Collection)

PyArray

PySequence

PyList

PySequence

PyString

PySequence

PyTuple

PySequence

PyXRange

PySequence

PySet

PyObject

Set

PyDictionary

PyObject

Map

Discussion


ClarkUpdike Mar 2 2005 PM
Reflecting further on PyArray, we may want to consider not having it implement List. When I read the docs on the jarray and array modules at the respective jython and python websites, I find the following:

Performance might not have been a concern at all for jarray. But I know I have used jarray on occasion for that very reason. In any event, I'll presume the two reasons for their existence are, in descending priority:

I've been assuming that since PyArray is a sequence, it should also implement List. But the reasons not do this are:

Anyway, that's why I think PyArray should not implement list, but should just grow the python 2.2 array behavior, and provide an easy bridge to a List, via tolist(). As always, maybe I'm missing something obvious...

Here's what this design would look like:

PySequence <--+-- PSequenceList <---- (PyList, PyTuple)
              |
              +-- (PyArray, PyString, PyXRange)


ClarkUpdike Mar 2 2005 AM
This is in response to <waiting for post to show up on sourceforge>. It is a work in progress--but feel free to comment. I've concentrated on the impact of implementing the List interface:

`List`

java.util.List

`list`

python list type (nice to be able to say 'type', eh?)

I've been thinking on how to accomplish the "Approach 2" design, which is basically a delegation model, in the sense that the subject classes will continue to subclass PyObject. PySequence does not have an "element data" field--which leaves the concrete classes to handle that. Here are some observations on the current 2.1 design:

Jython Class

Extends

current element data field

Notes

PySequence

PyObject

N/A

PyArray

PySequence

Object data;

data is set to primitive array or array of arbitrary class

PyList

PySequence

protected PyObject[] list;

PyString

PySequence

private String string;

jython depends heavily on interning of String

PyTuple

PySequence

*public PyObject[] list;

list field is referenced directly by 8 classes in 14 methods

PyXRange

PySequence

N/A

int attributes start, stop, step, (useless without PySequence.__iter__()

PySet

PyObject

protected HashSet _set;

based on BrianZimmer's SetsModule

PyDictionary

PyObject

protected Hashtable table;

My current thinking is that we should add a new branch to PySequence--let's call it PySequenceList. PySequence would remain as is, and PySequenceList would subclass PySequence and implement the java.util.List interface. PyString and PyXRange would subclass PySequence and PyArray, PyList, PyTuple would subclass PySequenceList. This seems appropriate because, although PyXRange and PyString technically fall under the description of List, the practicality of them as a List is nil. Am I missing something obvious here? Would anyone ever use PyXRange in java? And PyString is auto-converted to a java.lang.String.

PySequence <--+-- PySequenceList <---- (PyArray, PyList, PyTuple)
              |
              +-- (PyString, PyXRange)

Not sure about PyList and PyTuple sharing an additional base class with a predefined element data field. Could do it that way, or could use delegation on a specialized element data class, or could leave it as-is (with copied code for certain methods--although the amount of copied code will increase.

One key decision is about the element data field in PyList and PyArray. Currently, it's PyObject[]. This is efficient because it eliminates casting, but keeping it makes List implementation difficult (take a look at the source for java.util.AbstractList), and all the System.arraycopy makes append slow. If we were to switch it to an ArrayList, it would buy us easier collection integration, but will cost performance (who knows how much). There's also a "middle-of-the-road" approach, which is to use a specialized class to wrap a PyObject array and provide List like behavior (I have some experience with doing this). This approach might also be used with PyArray.

PyArray requires some important decisions also. The collections methods are all Object based. So anything coming or going throught these interfaces will require wrapping/boxing. If PyArray were to use an ArrayList and fully box everything from both java and jython, their performance (their main reason for existing?) would take a major hit. My thinking is this is not a viable option. This means there could be some difficult code to write to implement List, unless we use the specialized class mentioned above (but typed to the particular primitive array types).

Other general improvements:


CollectionsIntegration (last edited 2008-11-15 09:15:59 by localhost)