Revision 2 as of 2006-10-06 11:37:13

Clear message

A Restructured Standard Library

Despite the continuous introduction of many new language features to Python, and compounded by the steady addition of new modules to the standard library over the years, the structure of the Python standard library has remained relatively static throughout most of Python's lifetime until the present day. However, new additions to the library have made the selection of appropriate library facilities relatively difficult, even for experienced developers. For example:

A persuasive argument once upon a time was the simplicity of the Python standard library's layout in comparison to the "aggressively hierarchical" layout of the standard Java APIs, for example. But with a large number of overlapping modules and packages within the Python standard library reducing its relative coherency to Java's API proliferation (see java.sun.com for details), it seems appropriate to perform a reorganisation of the library's layout in order to promote a more memorable and intuitive structure that can be more coherently documented.

A Note on Backward Compatibility

One argument against reorganising the standard library is that "if you ignore them, they won't bother you" - that is, the presence of many apparently haphazardly named modules is not a problem unless you need to import many of them. Fortunately, this observation can be used to work in favour of a reorganisation: the old module and package names can be retained in addition to a new layout, existing software will continue to work by importing modules via their old names, improved documentation can focus on the new layout, reference material describing the old layout could also be provided to assist those working with older software. One disadvantage might be the additional space requirement of two different library layouts, however.

Suggested Improvements

The following sections present observations about the current situation and possible recommendations for future editions of the standard library.

Activities, Grouping and Redundancy

The current standard library employs many modules as siblings at the top level of a relatively shallow namespace hierarchy. Many modules have been introduced to remedy, augment or partially replace existing modules, leading to problems of redundancy and incoherency.

Overlapping Module Groups

The following groups of modules exhibit overlapping functionality:

Modules in the above groups would be consolidated either within a single module or organised into a more intuitive package layout in a restructured standard library.

Functional Module Groups

The following groups of modules may intentionally provide similar functionality through different implementations, or may provide complementary functionality that belongs within a common "functional group":

Modules in the above groups would be placed in intuitively named packages, possibly with improved names.

Recommendations

Just as the current standard library documentation divides the modules into particular groups, albeit with only moderate success, the above functional groupings could be used to define package boundaries that are more useful in distinguishing between different activities. A cursory review of the above could suggest the following set of packages:

The names employed above may not be entirely suitable, and due to the ambiguity of certain category names, it might be appropriate to establish packages with certain names (eg. parsing) within other packages (eg. compiler), thus providing a level of context (eg. Python source code parsing, as opposed to HTML/SGML parsing).

Naming

The current standard library employs a number of naming conventions:

Recommendations

In order to simplify the recollection process, names should follow a consistent naming scheme, arguably favouring descriptive names which mention the nature of the activity supported. We might decide to permit only lower-case characters, together with numbers (only where absolutely necessary), although this can often appear confusing with acronyms and word combinations (eg. stringio, cstringio). However, since the use of acronyms may potentially be relegated to the level of class names, we may at that level employ mixed-case class names, along with upper-case acronyms as apparently tolerated by [http://www.python.org/dev/peps/pep-0008/ PEP 8 "Style Guide for Python Code"]. Thus, StringIO.StringIO would not become stringio.StringIO, but perhaps something like stringfile.StringIO or something even more descriptive.

Module Functionality

The diversity of module naming provides an "archaeological" guide to the accumulation processes operating within the standard library, yet more fundamental changes in style, recommended practices and techniques exist within the code of the modules themselves. Since the results of such differing implementation techniques manifest themselves as differently organised class hierarchies or interaction patterns, users of standard library modules must often master styles of usage which are often unnecessarily complicated for the task at hand or which diverge from previously accepted abstractions for similar tasks.

However, for certain kinds of tasks it is appropriate to employ differing approaches and thus expose differing representations to users. For example, the choice of XML parsing module may involve trade-offs with respect to resource usage, convenience and performance, and no single approach is likely to satisfy the needs of all users.

Styles of Organisation/Interaction in Modules

The following styles of class organisation or interaction patterns appear in the standard library:

The following styles of behaviour configuration are employed in the standard library:

Recommendations

Clearly, a diversity of patterns, mechanisms and styles are necessary to provide different approaches to particular tasks (as noted above). However, the revision of certain approaches and the subsequent "archaelogical" accumulation of modules suggests that contributors have not been able to settle, at least initially, on a style widely regarded as being satisfactory to many standard library users.

An interesting example of evolving styles, as well as a number of peculiarities in the APIs provided, can be found in the urllib and urllib2 modules. Here, a moderately simple initial API has evolved into a more complicated (and presumably more powerful) subsequent API, but despite the conveniences provided in "loading up" the configured objects with specific handler functionality in advance, an alternative might involve "flattening" the style of interactions by having users process responses explicitly using separate objects or functions.

Editorial Notes

This is currently a draft, featuring a number of points that should be discussed rather than being interpreted as a final opinion or a final set of recommendations. -- PaulBoddie

Unable to edit the page? See the FrontPage for instructions.