Differences between revisions 6 and 46 (spanning 40 versions)

These are the Google "Summer of Code" projects involving Python.

Note: if a project is listed as having two mentors, the first mentor listed is the primary mentor, and the second one is the back-up mentor.

Python Implementation of the Data Access Protocol

Student: Roberto Antonio Ferreira De Almeida

Mentor: Paul DuBois.

Web page: [http://opendap.oceanografia.org/ OPeNDAP/DODS module for Python]

Proposal

The Data Access Protocol (DAP) is a data transmission protocol designed specifically for science data. The protocol relies on the widely used HTTP and MIME standards, and provides data types to accommodate gridded data, relational data, and time series, as well as allowing users to define their own data types. The initiative is funded by NASA, and counts with the support of several institutions. Hundreds of scientific datasets are available on the internet through DAP servers, which can be accessed remotely by DAP clients in a transparent and efficient way. Here I propose to develop a Python implementation of the protocol based on its latest specification. The proposed implementation will consist of a client module that will allow Python applications to access remote datasets, as well as a server for data stored in a variety of formats commonly used by the scientific community, including NetCDF and Matlab files.

Outcome

The proposal was completed as stated, and is already in use by some scientists. It can be used together with the Python-based CDAT tool or alone.

Bitten: A Python framework for collecting software metrics from automated builds

Student: Christopher Lenz

Mentors: Greg Wilson, Trent Mick.

Web page: [http://bitten.cmlenz.net Bitten]

Proposal

The goal of this work is to design and implement a distributed system for automated builds and continuous integration that allows the central collection and storage of software metrics generated during the build. The information collected this way needs to be structured and available in a machine-readable format, so that it can be analyzed, aggregated/correlated and presented after the build itself has completed.

Outcome

Lots of good code, very good time management and prioritization, and he's delivered a working system. (Lots still to be done, of course, but it's up and running.) See http://bitten.cmlenz.net for details.

OpenExVis - A Program Visualization Tool

Student: Tero Kuusela

Mentor: David Ascher.

Proposal

The goal is to write, in Python, a functional program visualization tool that can visualize Python code. With the visualization tool, one can write a program and see the execution visualized to help understanding how the program works. This is especially useful to assist students learning how to program.

The project website, where you can also find the original proposal sent to Google, is at http://openexvis.sourceforge.net/ and the progress during Summer of Code is tracked in Tero's blog at http://www.teroajk.net/blog/ .

Outcome

David says: The project successfully animates several Python program through instrumentation of the low-level hooks inside standard Python interpreters. The demonstration program works, and a screencast is avalable off of the project's home page for people who don't have access to a Linux machine with the required dependencies installed. The project needs more work before it can be useful in an educational context, and there are probably features & bug fixes needed, but the foundational bits seem good!

Wax

Student: Jason Gedge

Mentors: Hans Nowak

Proposal

This project consists of updating the Wax library for Python. Code will be updated, or even added, to further develop the Wax library. Also, a primary focus will be that of documentation, which Wax currently lacks.

Outcome

Many additions to Wax were released in these two months, including WaxRF (a system to load forms from XML, much like wxPython's XRC), a documentation viewing/generating tool, and a number of new controls. Some existing issues were also fixed (OverlaySizer, Wizard). More information at http://zephyrfalcon.org/weblog2/arch_e10_00810.html#e817.

Data Serving/Collection Framework in Python/WSGI

Student: Ho Chun Wei

Weblog: http://cwho.blogspot.com/

Mentors: Ian Bicking

Proposal

A framework based on bulk data serving/collection via the internet. Bulk data are in the form of files that could easily be several hundred MB (not surveys or simple POST data).

The client has a file repository that it wishes to sync to the server (a WSGI application). This server should be able to facilitate transfer via a number of protocols, including HTTP file transfer, HTTP form upload, FTP, Email.

This project is aimed not at yet another ad-hoc file transfer or p2p file-sharing program but as a persistent production setup for transferring data from data collection sites/areas to a server, possibly via internet through different methods to get through strict organizational firewalls and web admins.

Outcome

I'd consider the code a solid beta. There's probably packaging and documentation work to be done, but that's the kind of thing that only user experience feedback will really accomplish well, IMHO. Also it's something where there's No One Right Way (yet!) to distribute web applications, so it'll evolve in time.

We changed projects at the start from his original proposal to something we both found more interesting/useful, a WebDAV server. There were good generic WebDAV tests already written (acceptance only, but picky acceptance tests).

Python Bayesian Network Toolbox

Student: Elliot Cohen

Mentor: James Tauber

Web page: [http://pbnt.berlios.de/ PBNT – Python Bayesian Network Toolbox]

Proposal

Understanding about Bayesian Belief Networks and use of them is becoming more and more widespread. As understanding develops and spreads out of the research community, there is greater and greater need for a simple to use efficient open source Bayesian Network Toolbox. Bayesian Networks have been used to study a wide array of different areas including, ecological systems, medical diagnoses and financial modeling, among others. Currently, tools to define and use Bayesian Networks are limited to expensive closed source libraries or open source libraries designed for too specific a domain. One package that does support many varieties of Bayesian Networks is Kevin Murphy's Full BNT, which supports both discrete and continuous probability distributions in static and dynamic Bayesian Networks.

For (almost) daily updates please see http://elliotpbnt.blogspot.com.

Outcome

Elliot Cohen has done a great job with the Python Bayes Network Toolbox. He will undoubtedly continue working on pbnt after the SoC but I've encouraged him to get it ready for an initial release to get it out to a wider audience.

I have high hopes that this package will be useful to people wanting to do real work on Bayesian Network based inference and learning and I'm excited that the existence of pbnt means Python becomes a natural choice for this sort of work.

Efficiently Analysing Data Polymorphism and Deducing Generics in Shedskin

Student: Mark Dufour

Mentors: Jeremy Hylton, Brett Cannon

Proposal

As part of my Master's Thesis, I am working on a Python-to-C++ compilation system, called Shedskin. Currently, it performs static type inference based on two techniques. The Cartesian Product Algorithm is used to handle parametric polymorphism (calling functions with different combinations of argument types); single-level class duplication, or 1CFA, is employed to handle data polymorphism (mostly polymorphic containers, such as list; in 1CFA, each allocation site gets its own class type, so we can analyze these (somewhat) precisely.) Run-time checks such as 'isinstance' are considered during inference. Further, short tuples are analyzed internally, which of course is especially important in case of Python.

Based on the statically determined type information, the compiler currently performs stack- and static pre-allocation (using a simple escape analysis, and the static call graph respectively) and unboxing. Further, it generates polymorphic inline caches or virtual calls when a singleton type set cannot be deduced.

Single-level class duplication is imprecise, because it only duplicates class types once for each allocation site, and allocation sites may be duplicated during analysis (as CPA possibly creates many templates for each function.) Extending it to N levels, or NCFA, would make the analysis terribly exponential and still not precise for deep polymorphism. For the summer of code, my main goal will be to efficiently and precisely handle data polymorphism up to arbitrary depths. I am currently looking into an iterative technique developed by John Plevyak. (Tiejun & Wang's technique is incomprehensible, and I don't see how the method used in Starkiller would work.) My other large goal will be to generate generics of appreciable complexity, based on the inferred types, i.e. to determine whether types may be uniformly parameterized, and to generate class and function templates. Finally, I will integrate an existing C++ garbage collector into the run-time system in order to clean up objects that could not be stack- or statically pre-allocated.

Outcome

Plevyak's method has been implemented. Function and class templates of arbitrary complexity are generated. The Boehm garbage collector has been integrated. In total about 2000 lines have been added during the SoC. 123 (all) unit tests compile correctly, among which 6 larger non-trivial programs of between 100 and 200 lines. These are analyzed in a few seconds on a fast cpu, and become 10-50 times faster after compilation. Mission accomplished!

([http://mail.python.org/pipermail/summerofcode/2005-September/000179.html Announcement of first release])

Mailbox modification

Student: Gregory K. Johnson

Web page: http://gkj.freeshell.org/soc

Mentor: Andrew Kuchling

Proposal

I intend to rewrite the Python library's mailbox module to support mailbox modification. I will extend the module's API (e.g., mailboxes will sport dictionary-like mapping) and enhance certain existing functionality (e.g., message objects will maintain mailbox-format-specific attributes). Full backward compatibility will be maintained.

Outcome

The modified version of the mailbox module is in the nondist/ tree of Python CVS (nondist/sandbox/mailbox). I think it's acceptable for inclusion in the standard library, and is now waiting for a second opinion from some other Python developer, to see if there's any disagreement. I expect this code will get into Python 2.5.

Interactive Python Notebooks

Students:

Toni Alatalo, [http://an.org/programming], Documnet transformation functionality.
Tzanko Matev, graphical user interface.

Mentors: Fernando Perez and Robert Kern

Proposal

Original proposal: [http://ipython.scipy.org/google_soc/ipnb_google_soc.pdf].

Further details: [http://www.scipy.org/wikis/featurerequests/NoteBook]

Outcome

The students have successfully implemented a prototype of a GUI and the underlying machinery for document transformation. Currently the WXPython-based GUI accepts normal python code and the extended set of ipython commands, normal text (not #comments, but regular text processed as a separate entity) and embedded figures. The document transformation infrastructure (XML based) can render these files into LaTeX, HTML or PDF, including mathematical notation.

The code is still considered alpha quality, but the basics are in place. We are in the process (as of 9/8/05) of cleaning things up to allow early testers to download it and play with it. Those willing to test out of the raw Subversion repository can do so by checking out the nbshell component:

svn co http://ipython.scipy.org/svn/ipython/nbshell/trunk nbshell

which includes instructions on the other pieces needed.

Those interested in the following the development can do so either on the ipython-dev maling list, or by browsing the Trac pages for IPython at:

http://projects.scipy.org/ipython/ipython

Porting _sre.c and arraymodule.c to Python

Student: Niklaus Haldimann

Weblog: http://ubique.ch/soc

Mentors: Armin Rigo, Samuele Pedroni

Proposal

I would like to create a port of the standard library modules "_sre" and "array" to pure Python. This will benefit alternative Python implementations like PyPy, Jython and IronPython. These projects all have to provide their own implementations of standard library modules written in C if they're not available in pure Python.

Outcome

Niklaus Haldimann has done everything planned and more on writing a pure Python implementation of the _sre (core of regular expressions) and array modules (http://codespeak.net/svn/user/nik). In addition to the base work, he made regular releases of the _sre module (three so far). This is a drop-in replacement for the equivalent C modules; for example, _sre.py could be used in projects that need to run on top of Python 2.3 or earlier, while using regexps that require the recursion-less approach introduced in 2.4 and supported by _sre.py.

He has also integrated the modules with PyPy (he was present at two PyPy sprints); in the case of the _sre module, he ported it to "interpreter-level", which means that in addition to integration, he made sure that the code is static enough to be automatically translatable to good C code, like the rest of PyPy. This should give an excellent write-once, translate-to-anything regular expression engine that might also be used as the basis for a Jython or IronPython module (or even back to a C extension module for CPython).

Object-Oriented File System Virtualisation

Student: Adam Kerz

Mentor: Trent Mick.

Proposal

Create an object oriented model of a file system in Python that can be used to interface many different resource types (with appropriate implementations).

Wax GUI for Python

Student: Abhishek Reddy

Mentor: Hans Nowak

Proposal

Wax requires work on four broad fronts. Firstly, support for several basic controls need to be added, some of which are listed above. Secondly, the design of the whole module has to be reviewed, particularly focusing on the initialisation. Thirdly, there are teething problems with passing data between Wax and wxPython that must be looked at. Fourthly, documentation, presently lacking, needs to be written.

Outcome

In spite of the promising proposal, no work (code or documentation) was delivered. After the first initial contact, I have not heard from this student, for reasons still unknown. (Mentor list and Chris DiBona were notified.)

PyTrails

Student: Jennifer Dozar

Mentors: Cameron Laird, Andrew Kuchling

Web page: [http://seul.org/~jennifer/proj/pytrail/ PyTrails]

Proposal

I'm working on an extensible opensource engine for implementing trail-style games such as [http://www.gamespot.com/gamespot/features/all/greatestgames/p-34.html Oregon Trail] or Amazon Trail. The primary goal is to produce a quality edutainment title that can be used free of cost. The secondary goal is to make it easy for other edutainment trail games to be created. PyTrails will be Python based and uses PyGame. The engine will allow following a branching map including making stops to rest, hunt, or trade. Additional choices such as shopping and fording rivers may be available at special points. Each of these activities will be replacable in other trail games as to allow for maximum flexibility.

mmpy -- A garbage collection tool kit in Python

Student: Carl Friedrich Bolz

Mentors: Samuele Pedroni, Armin Rigo

Proposal

The project aims at producing a framework for writing and evaluating garbage collectors in Python. The interfaces to the low level memory and to the object model will be general enough to make it usable for a wide range of projects in need for garbage collection as well as for teaching and research purposes. It will be designed with flexibility and modularity in mind to encourage component reuse. It aims a being directly useful for the PyPy project and translatable by its translation tools.

Memory Profiler

Student: Nick Smallbone

Weblog: http://starship.python.net/crew/mwh/blog/nb.cgi/portal/nickblog

Mentors: Michael Hudson, Jeremy Hylton

URL: [http://pysizer.8325.org]

Proposal

I would like to apply to work over the summer on a Python memory profiler, as listed at CodingProjectIdeas.

To see how much work is involved in this, I've put together a prototype, which tries to enumerate all objects from a root, calculating the size of each object it finds.

Outcome

[http://starship.python.net/crew/mwh/blog/nb.cgi/view/nickblog/2005/09/01/0 The student's assessment]

asyncIO

Student: Vladimir Sukhoy

Mentor: Mark Hammond

Proposal

The proposed goal is to bring cross-platform proactive I/O capabilities to Python. That will enable whole new style of application development with Python in cases when I/O is a bottleneck.

Library releases available from: http://developer.berlios.de/project/showfiles.php?group_id=4124

Profile Replacement

Student: Floris Bruynooghe

Weblog: http://bruynooghe.blogspot.com/

Mentor: Brett Cannon

Proposal

[Original idea from ProfileReplacementProject page.]

The current profiler is not free according to the Debian Free Software Guidelines (http://bugs.debian.org/293932) and has been taken out of the main Debian distribution. This affects many users as the profiler is integrated into other programs such as ipython who lose functionality withouth the profiling available.

The aim is to write a wrapper for hotshot that will act as a drop in replacement for the profile module. hotshot was chosen as base since it is much better tested then any newly written code would be. Secondly an independed stats module will be written for hotshot so that loading of the data will be much faster. This module will then also have a 100% pstats compatible wrapper.

When this all gets completed and time is left over one of the things to investigate is weather it is possible to make hotshot thread aware.

The project is registered as pyprof on savannah.nongnu.org: http://savannah.nongnu.org/projects/pyprof

Outcome

Floris seems to have met all the goals. While I (Brett) have not had a chance to do a thorough testing, Floris has a nice set of tests that seem to prove that he was successful.

PythonModulePackaging

Student: Vincenzo Di Massa

URL: http://udu.wiki.ubuntu.com/PythonModulePackaging

Mentor: Matthias Klose

Proposal

(an ubuntu python SoC project)

Create a mechanism for fully automated packaging of python modules based on an upstream release. Support different Python implementations and different versions of CPython (needed, when not all software can run with the latest/default python version when an Ubuntu release is going to happen).

Refactor Pirate (Python on Parrot)

Student: Curtis Hall Mentor: Michal Wallace

URL: http://pirate.tangentcode.com/

Proposal

This project didn't get mentioned here before. Probably because it was funded through the perl foundation.

This wasn't the actual proposal, but basically:

Pirate is the python-to-parrot compiler. Parrot is the virtual machine for perl 5, but will also support a variety of other languages. The goal for this project was to refactor pirate. Instead of a few giant classes to compile all of python in one pass, we now have trees of many smaller compilers (for example, a while loop compiler and an assignment compiler and so on). The idea is that these mini-compilers can compile similar concepts from a variety of languages (for example, while loops in perl or lua are pretty similar to a while loop in python, other than that tricky else clause).

Outcome

This refactoring was actually a huge project. Curt worked like crazy and finished a large chunk of it by the deadline and decided to keep working. Currently (Sep 15 2005) there are still a small number of test cases that are not passing, and he plans to have them working and posted on the site within the week.

-  ⇤ ← Revision 6 as of 2005-06-29 23:31:30 → 
  Size: 10796
  Editor: p54A95EB5
  Comment: Fix Bitten entry
+   ← Revision 46 as of 2005-11-14 14:04:52 → ⇥
  Size: 20799
  Editor: judes
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
+These are the Google "Summer of Code" projects involving Python.

Note: if a project is listed as having two mentors, the first mentor listed is the ''primary'' mentor, and the second one is the ''back-up'' mentor.
-Line 3:
+Line 7:
-(Roberto Antonio Ferreira De Almeida)
+Student: Roberto Antonio Ferreira De Almeida

Mentor: Paul Du``Bois.

Web page: [http://opendap.oceanografia.org/ OPeNDAP/DODS module for Python]

== Proposal ==
-Line 20:
+Line 30:
-Mentor: Paul DuBois.
+== Outcome ==

The proposal was completed as stated, and is already in use by some scientists. It can be used together with the Python-based CDAT tool or alone.
-Line 25:
+Line 36:
-(Christopher Lenz)
+Student: Christopher Lenz

Mentors: Greg Wilson, Trent Mick.

Web page: [http://bitten.cmlenz.net Bitten]

== Proposal ==
-Line 29:
+Line 46:
-Mentors: Greg Wilson, Trent Mick.

= A Program Visualization Tool =

(Tero Kuusela)
+== Outcome ==

Lots of good code, very good time management and prioritization, and he's
delivered a working system.  (Lots still to be done, of course, but it's
up and running.)  See http://bitten.cmlenz.net for details.


= OpenExVis - A Program Visualization Tool =

Student: Tero Kuusela

Mentor: David Ascher.

== Proposal ==
-Line 41:
+Line 67:
-Mentor: David Ascher.



= Object-Oriented File System Virtualisation =

(Adam Kerz)

Create an object oriented model of a file system in Python that can be used to interface many different resource types (with appropriate implementations).

Mentor: Trent Mick.

= Wax GUI for Python =

(Abhishek Reddy)


Wax requires work on four broad fronts. Firstly, support for several
basic controls need to be added, some of which are listed
above. Secondly, the design of the whole module has to be reviewed,
particularly focusing on the initialisation. Thirdly, there are
teething problems with passing data between Wax and wxPython that must
be looked at. Fourthly, documentation, presently lacking, needs to be
written.

Mentor: Hans Nowak

= PyTrails =

(Jennifer Dozar)

I'm working on an extensible opensource engine for implementing
trail-style games such as Oregon Trail or Amazon Trail. The primary
goal is to produce a quality edutainment title that can be used free
of cost. The secondary goal is to make it easy for other edutainment
trail games to be created. PyTrails will be Python based and uses
PyGame. The engine will allow following a branching map including
making stops to rest, hunt, or trade. Additional choices such as
shopping and fording rivers may be available at special points. Each
of these activities will be replacable in other trail games as to
allow for maximum flexibility.

Mentors: Cameron Laird, Andrew Kuchling

= mmpy -- A garbage collection tool kit in Python =

(Carl Friedrich Bolz)

The project aims at producing a framework for writing and
evaluating garbage collectors in Python. The interfaces to
the low level memory and to the object model will be general
enough to make it usable for a wide range of projects in
need for garbage collection as well as for teaching and
research purposes. It will be designed with flexibility and
modularity in mind to encourage component reuse. It aims a
being directly useful for the PyPy project and translatable
by its translation tools.


Mentors: Samuele Pedroni, Armin Rigo

= Efficiently Analysing Data Polymorphism and Deducing Generics in Shedskin =

(Mark Dufour)

As part of my Master's Thesis, I am working on a Python-to-C++ compilation system, called Shedskin. Currently, it performs static type inference based on two techniques. The Cartesian Product Algorithm is used to handle parametric polymorphism (calling functions with different combinations of argument types); single-level class duplication, or 1CFA, is employed to handle data polymorphism (mostly polymorphic containers, such as list; in 1CFA, each allocation site gets its own class type, so we can analyze these (somewhat) precisely.) Run-time checks such as 'isinstance' are considered during inference. Further, short tuples are analyzed internally, which of course is especially important in case of Python. 

Based on the statically determined type information, the compiler currently performs stack- and static pre-allocation (using a simple escape analysis, and the static call graph respectively) and unboxing. Further, it generates polymorphic inline caches or virtual calls when a singleton type set cannot be deduced.
 
Single-level class duplication is imprecise, because it only duplicates class types once for each allocation site, and allocation sites may be duplicated during analysis (as CPA possibly creates many templates for each function.) Extending it to N levels, or NCFA, would make the analysis terribly exponential and still not precise for deep polymorphism. For the summer of code, my main goal will be to efficiently and precisely handle data polymorphism up to arbitrary depths. I am currently looking into an iterative technique developed by John Plevyak. (Tiejun & Wang's technique is incomprehensible, and I don't see how the method used in Starkiller would work.) My other large goal will be to generate generics of appreciable complexity, based on the inferred types, i.e. to determine whether types may be uniformly parameterized, and to generate class and function templates. Finally, I will integrate an existing C++ garbage collector into the run-time system in order to clean up objects that could not be stack- or statically pre-allocated. 


Mentors: Jeremy Hylton, Brett Cannon

= Mailbox modification =

(Gregory K. Johnson)

I intend to rewrite the Python library's mailbox module to support
mailbox modification. I will extend the module's API (e.g., mailboxes
will sport dictionary-like mapping) and enhance certain existing
functionality (e.g., message objects will maintain
mailbox-format-specific attributes). Full backward compatibility will
be maintained.

Mentor: Andrew Kuchling

= Memory Profiler =

(Nick Smallbone)

I would like to apply to work over the summer on a Python memory
profiler, as listed at CodingProjectIdeas.

To see how much work is involved in this, I've put together a
prototype, which tries to enumerate all objects from a root,
calculating the size of each object it finds.

Mentors: Michael Hudson, Jeremy Hylton
+The project website, where you can also find the original proposal sent to
Google, is at http://openexvis.sourceforge.net/ and the progress during
Summer of Code is tracked in Tero's blog at http://www.teroajk.net/blog/ .

== Outcome ==

David says: The project successfully animates several Python program through instrumentation of the low-level hooks inside standard Python interpreters.  The demonstration program works, and a screencast is avalable off of the project's home page for people who don't have access to a Linux machine with the required dependencies installed.  The project needs more work before it can be useful in an educational context, and there are probably features & bug fixes needed, but the foundational bits seem good!


= Wax =

Student: Jason Gedge

Mentors: Hans Nowak

== Proposal ==

This project consists of updating the Wax library for Python. Code
will be updated, or even added, to further develop the Wax
library. Also, a primary focus will be that of documentation, which
Wax currently lacks.

== Outcome ==

Many additions to Wax were released in these two months, including WaxRF (a system to load forms from XML, much like wxPython's XRC), a documentation viewing/generating tool, and a number of new controls. Some existing issues were also fixed (OverlaySizer, Wizard). More information at http://zephyrfalcon.org/weblog2/arch_e10_00810.html#e817.

= Data Serving/Collection Framework in Python/WSGI =

Student: Ho Chun Wei

Weblog: http://cwho.blogspot.com/

Mentors: Ian Bicking

== Proposal ==

A framework based on bulk data serving/collection via the
internet. Bulk data are in the form of files that could easily be
several hundred MB (not surveys or simple POST data).

The client has a file repository that it wishes to sync to the server
(a WSGI application). This server should be able to facilitate
transfer via a number of protocols, including HTTP file transfer, HTTP
form upload, FTP, Email.

This project is aimed not at yet another ad-hoc file transfer or p2p
file-sharing program but as a persistent production setup for
transferring data from data collection sites/areas to a server,
possibly via internet through different methods to get through strict
organizational firewalls and web admins.

== Outcome ==

I'd consider the code a 
solid beta.  There's probably packaging and documentation work to be 
done, but that's the kind of thing that only user experience feedback 
will really accomplish well, IMHO.  Also it's something where there's No 
One Right Way (yet!) to distribute web applications, so it'll evolve in 
time.

We changed projects at the start from his original proposal to something 
we both found more interesting/useful, a WebDAV server.  There were good 
generic WebDAV tests already written (acceptance only, but picky 
acceptance tests).
-Line 143:
+Line 134:
-(Elliot Cohen)
+Student: Elliot Cohen

Mentor: James Tauber

Web page: [http://pbnt.berlios.de/ PBNT – Python Bayesian Network Toolbox]

== Proposal ==
-Line 158:
+Line 155:
-Mentor: James Tauber
+For (almost) daily updates please see http://elliotpbnt.blogspot.com.

== Outcome ==

Elliot Cohen has done a great job with the Python Bayes Network  
Toolbox. He will undoubtedly continue working on pbnt after the SoC but I've  
encouraged him to get it ready for an initial release to get it out  
to a wider audience.

I have high hopes that this package will be useful to people wanting  
to do real work on Bayesian Network based inference and learning and  
I'm excited that the existence of pbnt means Python becomes a natural  
choice for this sort of work.

= Efficiently Analysing Data Polymorphism and Deducing Generics in Shedskin =

Student: Mark Dufour

Mentors: Jeremy Hylton, Brett Cannon

== Proposal ==

As part of my Master's Thesis, I am working on a Python-to-C++ compilation system, called Shedskin. Currently, it performs static type inference based on two techniques. The Cartesian Product Algorithm is used to handle parametric polymorphism (calling functions with different combinations of argument types); single-level class duplication, or 1CFA, is employed to handle data polymorphism (mostly polymorphic containers, such as list; in 1CFA, each allocation site gets its own class type, so we can analyze these (somewhat) precisely.) Run-time checks such as 'isinstance' are considered during inference. Further, short tuples are analyzed internally, which of course is especially important in case of Python. 

Based on the statically determined type information, the compiler currently performs stack- and static pre-allocation (using a simple escape analysis, and the static call graph respectively) and unboxing. Further, it generates polymorphic inline caches or virtual calls when a singleton type set cannot be deduced.
 
Single-level class duplication is imprecise, because it only duplicates class types once for each allocation site, and allocation sites may be duplicated during analysis (as CPA possibly creates many templates for each function.) Extending it to N levels, or NCFA, would make the analysis terribly exponential and still not precise for deep polymorphism. For the summer of code, my main goal will be to efficiently and precisely handle data polymorphism up to arbitrary depths. I am currently looking into an iterative technique developed by John Plevyak. (Tiejun & Wang's technique is incomprehensible, and I don't see how the method used in Starkiller would work.) My other large goal will be to generate generics of appreciable complexity, based on the inferred types, i.e. to determine whether types may be uniformly parameterized, and to generate class and function templates. Finally, I will integrate an existing C++ garbage collector into the run-time system in order to clean up objects that could not be stack- or statically pre-allocated. 

== Outcome ==

Plevyak's method has been implemented. Function and class templates of arbitrary complexity are generated. The Boehm garbage collector has been integrated. In total about 2000 lines have been added during the SoC. 123 (all) unit tests compile correctly, among which 6 larger non-trivial programs of between 100 and 200 lines. These are analyzed in a few seconds on a fast cpu, and become 10-50 times faster after compilation. Mission accomplished! :)

([http://mail.python.org/pipermail/summerofcode/2005-September/000179.html Announcement of first release])

= Mailbox modification =

Student: Gregory K. Johnson

Web page: http://gkj.freeshell.org/soc

Mentor: Andrew Kuchling

== Proposal ==

I intend to rewrite the Python library's mailbox module to support
mailbox modification. I will extend the module's API (e.g., mailboxes
will sport dictionary-like mapping) and enhance certain existing
functionality (e.g., message objects will maintain
mailbox-format-specific attributes). Full backward compatibility will
be maintained.

== Outcome ==

The modified version of the mailbox module is in the nondist/ tree 
of Python CVS (nondist/sandbox/mailbox).  I think it's acceptable for inclusion in the standard library, and is now waiting for a second opinion from some other Python developer, to see if there's any disagreement.  I expect this code will get into Python 2.5.

= Interactive Python Notebooks =

Students: 
  * Toni Alatalo, [http://an.org/programming], Documnet transformation functionality.
  * Tzanko Matev, graphical user interface.

Mentors: Fernando Perez and Robert Kern

== Proposal ==

Original proposal: [http://ipython.scipy.org/google_soc/ipnb_google_soc.pdf].

Further details:
[http://www.scipy.org/wikis/featurerequests/NoteBook]

== Outcome ==

The students have successfully implemented a prototype of a GUI and the underlying machinery for document transformation.  Currently the WXPython-based GUI accepts normal python code and the extended set of ipython commands, normal text (not #comments, but regular text processed as a separate entity) and embedded figures.  The document transformation infrastructure (XML based) can render these files into LaTeX, HTML or PDF, including mathematical notation.

The code is still considered alpha quality, but the basics are in place.  We are in the process (as of 9/8/05) of cleaning things up to allow early testers to download it and play with it.  Those willing to test out of the raw Subversion repository can do so by  checking out the nbshell component:

svn co http://ipython.scipy.org/svn/ipython/nbshell/trunk nbshell

which includes instructions on the other pieces needed.

Those interested in the following the development can do so either on the ipython-dev maling list, or by browsing the Trac pages for IPython at:

http://projects.scipy.org/ipython/ipython


= Porting _sre.c and arraymodule.c to Python =

Student: Niklaus Haldimann

Weblog: http://ubique.ch/soc

Mentors: Armin Rigo, Samuele Pedroni

== Proposal ==

I would like to create a port of the standard library modules "_sre" and "array" to pure Python. This will benefit alternative Python implementations like PyPy, Jython and IronPython. These projects all have to provide their own implementations of standard library modules written in C if they're not available in pure Python.

== Outcome ==

Niklaus Haldimann has done everything planned and more on writing a pure
Python implementation of the _sre (core of regular expressions) and
array modules (http://codespeak.net/svn/user/nik).  In addition to the
base work, he made regular releases of the _sre module (three so far).
This is a drop-in replacement for the equivalent C modules; for example,
_sre.py could be used in projects that need to run on top of Python 2.3
or earlier, while using regexps that require the recursion-less approach
introduced in 2.4 and supported by _sre.py.

He has also integrated the modules with PyPy (he was present at two PyPy
sprints); in the case of the _sre module, he ported it to
"interpreter-level", which means that in addition to integration, he
made sure that the code is static enough to be automatically
translatable to good C code, like the rest of PyPy.  This should give an
excellent write-once, translate-to-anything regular expression engine
that might also be used as the basis for a Jython or IronPython module
(or even back to a C extension module for CPython).


= Object-Oriented File System Virtualisation =

Student: Adam Kerz

Mentor: Trent Mick.

== Proposal ==

Create an object oriented model of a file system in Python that can be used to interface many different resource types (with appropriate implementations).

= Wax GUI for Python =

Student: Abhishek Reddy

Mentor: Hans Nowak

== Proposal ==

Wax requires work on four broad fronts. Firstly, support for several
basic controls need to be added, some of which are listed
above. Secondly, the design of the whole module has to be reviewed,
particularly focusing on the initialisation. Thirdly, there are
teething problems with passing data between Wax and wxPython that must
be looked at. Fourthly, documentation, presently lacking, needs to be
written.

== Outcome ==

In spite of the promising proposal, no work (code or documentation) was delivered. After the first initial contact, I have not heard from this student, for reasons still unknown. (Mentor list and Chris Di``Bona were notified.)

= PyTrails =

Student: Jennifer Dozar

Mentors: Cameron Laird, Andrew Kuchling

Web page: [http://seul.org/~jennifer/proj/pytrail/ PyTrails]

== Proposal ==

I'm working on an extensible opensource engine for implementing
trail-style games such as [http://www.gamespot.com/gamespot/features/all/greatestgames/p-34.html Oregon Trail] or Amazon Trail. The primary
goal is to produce a quality edutainment title that can be used free
of cost. The secondary goal is to make it easy for other edutainment
trail games to be created. PyTrails will be Python based and uses
PyGame. The engine will allow following a branching map including
making stops to rest, hunt, or trade. Additional choices such as
shopping and fording rivers may be available at special points. Each
of these activities will be replacable in other trail games as to
allow for maximum flexibility.

= mmpy -- A garbage collection tool kit in Python =

Student: Carl Friedrich Bolz

Mentors: Samuele Pedroni, Armin Rigo

== Proposal ==

The project aims at producing a framework for writing and
evaluating garbage collectors in Python. The interfaces to
the low level memory and to the object model will be general
enough to make it usable for a wide range of projects in
need for garbage collection as well as for teaching and
research purposes. It will be designed with flexibility and
modularity in mind to encourage component reuse. It aims a
being directly useful for the PyPy project and translatable
by its translation tools.


= Memory Profiler =

Student: Nick Smallbone

Weblog: http://starship.python.net/crew/mwh/blog/nb.cgi/portal/nickblog

Mentors: Michael Hudson, Jeremy Hylton

URL: [http://pysizer.8325.org]

== Proposal ==

I would like to apply to work over the summer on a Python memory
profiler, as listed at CodingProjectIdeas.

To see how much work is involved in this, I've put together a
prototype, which tries to enumerate all objects from a root,
calculating the size of each object it finds.

== Outcome ==

[http://starship.python.net/crew/mwh/blog/nb.cgi/view/nickblog/2005/09/01/0 The student's assessment]
-Line 162:
+Line 369:
-(Vladimir Sukhoy)
+Student: Vladimir Sukhoy

Mentor: Mark Hammond

== Proposal ==
-Line 168:
+Line 379:
-Mentor: Mark Hammond


= Interactive Python Notebook =

(Toni Alatalo)

See <http://ipython.scipy.org/google_soc/ipnb_google_soc.pdf>.

Mentor: Fernando Perez

= Porting _sre.c and arraymodule.c to Python =

(Niklaus Haldimann)

I would like to create a port of the standard library modules "_sre" and "array" to pure Python. This will benefit alternative Python implementations like PyPy, Jython and IronPython. These projects all have to provide their own implementations of standard library modules written in C if they're not available in pure Python.

Mentors: Armin Rigo, Samuele Pedroni

= Python Profile Replacement Project =

(Floris Bruynooghe)

Idea from ProfileReplacementProject page.

The current profiler is not free according to the DFSG (Debian Free
Software Guidelines) and has been taken out of the main Debian
distribution (bug #293932, http://bugs.debian.org/293932).  This
affects many users as the profiler is integrated into ipython for
example[1].  Patches for these programs to run without the profiler
have been incorporated, but this is only patchwork and ipython or one
lost previously standard functionality.

Mentors: Brett Cannon

= Wax =

(Jason Gedge)

This project consists of updating the Wax library for Python. Code
will be updated, or even added, to further develop the Wax
library. Also, a primary focus will be that of documentation, which
Wax currently lacks.

Mentors: Hans Nowak

= Data Serving/Collection Framework in Python/WSGI =

(Ho Chun Wei)

A framework based on bulk data serving/collection via the
internet. Bulk data are in the form of files that could easily be
several hundred MB (not surveys or simple POST data).

The client has a file repository that it wishes to sync to the server
(a WSGI application). This server should be able to facilitate
transfer via a number of protocols, including HTTP file transfer, HTTP
form upload, FTP, Email.

This project is aimed not at yet another ad-hoc file transfer or p2p
file-sharing program but as a persistent production setup for
transferring data from data collection sites/areas to a server,
possibly via internet through different methods to get through strict
organizational firewalls and web admins.

Mentors: Ian Bicking


= A Mathematica-like Notebook GUI for IPython =

(Tzanko Matev)

I propose to write a GUI for IPython resembling the
interfaces of the computer algebra applications Mathematica
and Maple.

Mentor: Fernando Perez
+Library releases available from: http://developer.berlios.de/project/showfiles.php?group_id=4124


= Profile Replacement =

Student: Floris Bruynooghe

Weblog: http://bruynooghe.blogspot.com/

Mentor: Brett Cannon

== Proposal ==

[Original idea from ProfileReplacementProject page.]

The current profiler is not free according to the Debian Free Software Guidelines (http://bugs.debian.org/293932) and has been taken out of the main Debian distribution.  This affects many users as the profiler is integrated into other programs such as ipython who lose functionality withouth the profiling available.

The aim is to write a wrapper for hotshot that will act as a drop in replacement for the profile module.  hotshot was chosen as base since it is much better tested then any newly written code would be.  Secondly an independed stats module will be written for hotshot so that loading of the data will be much faster.  This module will then also have a 100% pstats compatible wrapper.

When this all gets completed and time is left over one of the things to investigate is weather it is possible to make hotshot thread aware.

The project is registered as pyprof on savannah.nongnu.org: http://savannah.nongnu.org/projects/pyprof

== Outcome ==

Floris seems to have met all the goals.  While I (Brett) have not had a chance to do a thorough testing, Floris has a nice set of tests that seem to prove that he was successful.

= PythonModulePackaging =

Student: Vincenzo Di Massa

URL: http://udu.wiki.ubuntu.com/PythonModulePackaging

Mentor: Matthias Klose

== Proposal ==

'''(an ubuntu python SoC project)'''

Create a mechanism for fully automated packaging of python modules based on an upstream release. Support different Python implementations and different versions of CPython (needed, when not all software can run with the latest/default python version when an Ubuntu release is going to happen).


= Refactor Pirate (Python on Parrot) =

Student: Curtis Hall 
Mentor: Michal Wallace

URL: http://pirate.tangentcode.com/

== Proposal ==

This project didn't get mentioned here before. Probably because it was funded through the perl foundation. :)

This wasn't the actual proposal, but basically: 

Pirate is the python-to-parrot compiler. Parrot is the virtual machine for perl 5, but will also support a variety of other languages. The goal for this project was to refactor pirate. Instead of a few giant classes to compile all of python in one pass, we now have trees of many smaller compilers (for example, a while loop compiler and an assignment compiler and so on). The idea is that these mini-compilers can compile similar concepts from a variety of languages (for example, while loops in perl or lua are pretty similar to a while loop in python, other than that tricky else clause). 

== Outcome ==

This refactoring was actually a huge project. Curt worked like crazy and finished a large chunk of it by the deadline and decided to keep working. Currently (Sep 15 2005) there are still a small number of test cases that are not passing, and he plans to have them working and posted on the site within the week.

Page

User

Python Implementation of the Data Access Protocol

Proposal

Outcome

Bitten: A Python framework for collecting software metrics from automated builds

Proposal

Outcome

OpenExVis - A Program Visualization Tool

Proposal

Outcome

Wax

Proposal

Outcome

Data Serving/Collection Framework in Python/WSGI

Proposal

Outcome

Python Bayesian Network Toolbox

Proposal

Outcome

Efficiently Analysing Data Polymorphism and Deducing Generics in Shedskin

Proposal

Outcome

Mailbox modification

Proposal

Outcome

Interactive Python Notebooks

Proposal

Outcome

Porting _sre.c and arraymodule.c to Python

Proposal

Outcome

Object-Oriented File System Virtualisation

Proposal

Wax GUI for Python

Proposal

Outcome

PyTrails

Proposal

mmpy -- A garbage collection tool kit in Python

Proposal

Memory Profiler

Proposal

Outcome

asyncIO

Proposal

Profile Replacement

Proposal

Outcome

PythonModulePackaging

Proposal

Refactor Pirate (Python on Parrot)

Proposal

Outcome