JythonDeveloperGuide/PortingPythonModulesToJython

Porting an existing Python module written in C into Java that Jython understands is a pretty straightforward task so it can serve as a good introduction to the Jython codebase. I'm going to explain how to go about porting the csv module here.

Declare your intention to implement the csv module on the Jython dev list so no one else starts working on it.
Add a new class org.python.modules._csv.java in src to mirror _csv.c from CPython.
Add "_csv" to the builtinModules array in org.python.modules.Setup
Run dist/Lib/test/test_csv.py. Everything will fail since none of the csv methods are implemented yet. Now pick one of the simpler tests and start adding methods to csv to get it to work. _csv.java will be an implementation of the stuff in _csv.c from Python. All of csv_methods from _csv.c needs to be implemented as static methods in _csv.java. You can get an idea of how it's done from _codecs.java and _codecs.c or any of the module implementations in org.python.modules and their corresponding C implementation. As you add the methods to _csv.java, Jython will pick up on them and parts of the tests will start working.
Keep adding pieces to _csv.java till the tests pass
Submit a patch to the tracker.
Revel in the glory of another implemented module

The table below contains modules implemented in C in Python that are missing in Jython. Feel free to grab one of them and get started. If none of them catch your fancy, run dist/Lib/test/regrtest.py. All of the skipped tests are for modules that are present in CPython but not in Jython, so they're fair game too. See the "Missing Modules" section of ../RegressionTestNotes for a list as of 20061123.

Module	Taken	Affected tests
csv	Y	test_csv
tarfile	N	test_tarfile
unicodedata	Y	test_unicodedata,test_codeccallback
heapq	N	test_heapq
mmap	Y	test_mmap

Comments

Why have you called the Java file _csv.java instead of just csv.java? Is there a convention here and if so, why don't all the classes follow the same convention.

pdrummond: Short answer: It's called _csv because that is what it's called in CPython!

Long answer: Hmmm. Will have to swat up on CPython's module naming conventions to answer this properly! I think CPython's general convention is "<name>module.c" so csv really should be "csvmodule.c" shouldn't it? I guess the underscore is necessary because there is a csv.py wrapper so the "csv" name is already taken, but I need to check this to be sure!

FrankWierzbicki: This is actually a CPython convention -- but one that is inconsistently applied -- the 2.6 version is making this more consistent. The convention is: for a given module x, there is usually an implementation in python called x.py and a c implementation called _x.c. Generally x.py tries to import _x.c at the beginning -- this way implementations like Jython can get a pure python implementation right away, but can also implement a faster version that we should call _x.java later. I'd like to add that I would recommend looking at the CPython implementation as you go -- there are almost always subtleties that the tests miss - I used to assume the tests where good enough but got burned a couple of times this way.

Will the test_csv.py file automatically appear after a build or do we have to implement this ourself?

pdrummond: test_csv.py will appear auto-magically after a build in dist/Lib/test (at least, it will in the 2.3 branch) and it is a CPython test script.