A Restructured Standard Library

Despite the continuous introduction of many new language features to Python, and compounded by the steady addition of new modules to the standard library over the years, the structure of the Python standard library has remained relatively static throughout most of Python's lifetime until the present day. However, new additions to the library have made the selection of appropriate library facilities relatively difficult, even for experienced developers. For example:

A persuasive argument once upon a time was the simplicity of the Python standard library's layout in comparison to the "aggressively hierarchical" layout of the standard Java APIs, for example. But with a large number of overlapping modules and packages within the Python standard library reducing its relative coherency to Java's API proliferation (see java.sun.com for details), it seems appropriate to perform a reorganisation of the library's layout in order to promote a more memorable and intuitive structure that can be more coherently documented.

A Note on Backward Compatibility

One argument against reorganising the standard library is that "if you ignore them, they won't bother you" - that is, the presence of many apparently haphazardly named modules is not a problem unless you need to import many of them. Fortunately, this observation can be used to work in favour of a reorganisation: the old module and package names can be retained in addition to a new layout, existing software will continue to work by importing modules via their old names, improved documentation can focus on the new layout, reference material describing the old layout could also be provided to assist those working with older software. One disadvantage might be the additional space requirement of two different library layouts, however. Another disadvantage, more serious than the first is the case where a top-level module is replaced by one with the same name but with different functionality; strict backward compatibility measures would be necessary to avoid migration issues in such cases.

Potential Areas of Improvement

The following sections present observations about the current situation and possible recommendations for future editions of the standard library.

Activities, Grouping and Redundancy

The current standard library employs many modules as siblings at the top level of a relatively shallow namespace hierarchy. Many modules have been introduced to remedy, augment or partially replace existing modules, leading to problems of redundancy and incoherency. However, a policy of preserving APIs which resemble "system", "native" or "platform" APIs has also been maintained, leading to the provision of numerous functions and abstractions in modules such as os, select, mmap, errno, getopt, and so on.

Overlapping Module Groups

The following groups of modules exhibit overlapping functionality:

Modules in the above groups would be consolidated either within a single module or organised into a more intuitive package layout in a restructured standard library.

Functional Module Groups

The following groups of modules may intentionally provide similar functionality through different implementations, or may provide complementary functionality that belongs within a common "functional group":

Modules in the above groups would be placed in intuitively named packages, possibly with improved names.

Recommendations

Just as the current standard library documentation divides the modules into particular groups, albeit with only moderate success, the above functional groupings could be used to define package boundaries that are more useful in distinguishing between different activities. A cursory review of the above could suggest the following set of packages:

The names employed above may not be entirely suitable, and due to the ambiguity of certain category names, it might be appropriate to establish packages with certain names (eg. parsing) within other packages (eg. compiler), thus providing a level of context (eg. Python source code parsing, as opposed to HTML/SGML parsing). One inspiration for top-level category names could be the MIME media type hierarchy which uses such names as application, audio, image, text, and so on, although enthusiasm for replicating that hierarchy would need to be restrained in areas beyond file type handling.

The issue remains of providing access to "system", "native" or "platform" APIs, especially since developers with a systems programming background may wish to make use of such APIs in preference to others, possibly to implement other abstractions or to maintain compatibility with (or resemblance to) other works. We may decide to retain a package for such APIs and not to remove them entirely from the standard library, despite the duplication of functionality that this might suggest.

Naming

The current standard library employs a number of naming conventions:

Recommendations

In order to simplify the recollection process, names should follow a consistent naming scheme, arguably favouring descriptive names which mention the nature of the activity supported. We might decide to permit only lower-case characters, together with numbers (only where absolutely necessary), although this can often appear confusing with acronyms and word combinations (eg. stringio, cstringio). However, since the use of acronyms may potentially be relegated to the level of class names, we may at that level employ mixed-case class names, along with upper-case acronyms as apparently tolerated by PEP 8 "Style Guide for Python Code". Thus, StringIO.StringIO would not become stringio.StringIO, but perhaps something like stringfile.StringIO or something even more descriptive.

In some situations it may be advisable to retain technical names instead of employing names which obscure the purpose of the module. For example, the base64 module refers to a specific kind of encoding, but any invented descriptive name for this module may prove be verbose and yet fail to accurately communicate the same information.

Top-level Organisation

Currently, the standard library "owns" a number of top-level module names; indeed, since the library is relatively flat, a large number of module names are effectively reserved. Here is some anecdotal evidence of the confusion caused when standard library names are inadvertently chosen for other things:

"Some care is required in picking a name for the application. I thought 'calendar' would be a good name for a test application - but it turns out that this conflicts with the Python calendar library." -- Comment on the Django tutorial

Recommendations

In a standard library organisation where no encapsulating top-level package exists (eg. std), care must be taken not to conflict with existing or likely independent package names. One potential conflict involves the config package, already registered in the Python Package Index. A solution may involve minimising the footprint of the library by creating as few packages as possible and by giving those packages distinct, meaningful, and possibly unlikely names.

Alternatively, an encapsulating top-level package could be chosen, with a name like one of the following suggestions:

Module Functionality

The diversity of module naming provides an "archaeological" guide to the accumulation processes operating within the standard library, yet more fundamental changes in style, recommended practices and techniques exist within the code of the modules themselves. Since the results of such differing implementation techniques manifest themselves as differently organised class hierarchies or interaction patterns, users of standard library modules must often master styles of usage which are often unnecessarily complicated for the task at hand or which diverge from previously accepted abstractions for similar tasks.

However, for certain kinds of tasks it is appropriate to employ differing approaches and thus expose differing representations to users. For example, the choice of XML parsing module may involve trade-offs with respect to resource usage, convenience and performance, and no single approach is likely to satisfy the needs of all users.

Styles of Organisation/Interaction in Modules

The following styles of class organisation or interaction patterns appear in the standard library:

The following styles of behaviour configuration are employed in the standard library:

Recommendations

Clearly, a diversity of patterns, mechanisms and styles are necessary to provide different approaches to particular tasks (as noted above). However, the revision of certain approaches and the subsequent "archaeological" accumulation of modules suggests that contributors have not been able to settle, at least initially, on a style widely regarded as being satisfactory to many standard library users.

An interesting example of evolving styles, as well as a number of peculiarities in the APIs provided, can be found in the urllib and urllib2 modules. Here, a moderately simple initial API has evolved into a more complicated (and presumably more powerful) subsequent API, but despite the conveniences provided in "loading up" the configured objects with specific handler functionality in advance, an alternative might involve "flattening" the style of interactions by having users process responses explicitly using separate objects or functions.

Proposals

The most natural starting point for the definition of a restructured standard library is the package hierarchy itself. Taking the grouping recommendations into consideration, in order to identify broad categories, and taking the naming recommendations into account, we might define a more complete hierarchy:

Additional Categorisation

Here, additional categorisation is introduced in order to distinguish between categories in different contexts. For example, http packages appear in both the client and server top-level packages. Instead of dividing the previously identified http category in this way, we might have decided to preserve a single http package and divide it into client and server subpackages. However, as suggested above, we regard the client and server categorisations as being more important than one of many technologies that may be relevant to both of these categorisations.

Difficult Categories and Packages

Some categories may be established at the top level despite their nature suggesting a placement in some other category. For example, the email package could in certain respects be placed in either the archive or text packages, but since this might appear counterintuitive to different users of the package, a separate placement hopefully eliminates confusion and gives the package a deservedly more prominent status in the library.

Some modules or sections of functionality can be awkward to categorise. For example, the processing of URLs as attempted by the urlparse module could be placed in various networking categories or in some other category, since URIs/URLs are also used in contexts unrelated to networking and the Internet (eg. in XML namespaces and RDF identifiers). A compromise may therefore be necessary, placing a proposed url module in the net top-level package, for example.

Some categories are difficult to justify from the selection of modules available. For example, an io package (input/output) may contain all modules involving things like streams and files, but it may be the case that such modules belong elsewhere, or perhaps the name of such a top-level package should be more suggestive, such as files or streams, even if such names are arguably too specific or start to cover other topics: a files package might include file metadata processing, too.

System Packages

As noted above, a special "system" ("native" or "platform") package could be established. Care should be taken, however, to avoid filling such a package with other packages that really ought to be disassembled, reorganised or recategorised.

Runtime Packages

The current standard library has a sys package along with other auxilliary packages which affect the behaviour of the runtime system. It is suggested that these packages be available under the runtime top-level package. Although sys might be a more compatible name with existing programmers' expectations, if that name were to be preserved, the system package would need to take one of the other proposed names instead.

Editorial Notes

This is currently a draft, featuring a number of points that should be discussed rather than being interpreted as a final opinion or a final set of recommendations. -- PaulBoddie

Open Issues

  1. Should there be a top-level package representing the entire standard distribution, e.g., "std"? from std.database import anydbm -- SkipMontanaro

    • I think the proposed hierarchy (which is obviously tentative) should be at the top level, although the risk of name collisions with independent packages and the issue of how "selfish" the standard library should be ought to be worked out. -- PaulBoddie

    • It's not obvious to me that putting all these package at the top level will improve matters significantly. On the one hand, people have been living with the current flat standard module namespace and using names that avoid collisions. By creating a bunch of top-level packages that avoid the current names you run the risk of stomping on names others have selected for their own use (text, config, audio, image, database all seem like names people might have chosen). By giving Python its own one package namespace you avoid most of that. -- SkipMontanaro

    • It's a trade-off to avoid Java-style deep hierarchies: we could move the xml package up out of the text package, but would we then want it under std? Perhaps a conservative set of names is best: put client and server under net, for example, but reserve things like audio and image. I suspect that generic standard library names are least likely to collide with published packages (and only config from the suggested names seems to do so), and there's an argument that the standard library provides definitive answers: the archive package is the archive package for Python. Would prefixing such solutions with std not add "conceptual boilerplate"? -- PaulBoddie

    • Would it make any sense to have a few top-level slots for people's packages? Things like "pypi" and "local" or "optlib", "ext" and "user" (or "spkg" only), so that non-std packages (which I assume will be less used than the standard lib) carry the burden of deep hierarchies. It would also allow for easy hooks for some "separated Python environments" functionality and would extend the organization down to site-packages (e.g. PIL and numpy are de facto "optlib", my own dirty scripts are "local" or "trash"). And if someone really needs some special module in the top-level, a simple mv hack should be enough. -- ajaksu

    • The danger of having user, local, trash and other subjective packages is that as soon as you promote your work to something non-dirty, you have to go and change all the imports. And people are likely to forget to do this when releasing their code, too. Of course, definitive naming in the Java world involves strict hierarchies which use Internet domains (apart from Sun's stuff, understandably, and Oracle's stuff, not so understandably), but this just produces long branches. I guess people would just have to do due diligence before choosing names, but this isn't a different situation to what we have now. Still, "non-functional" categories at the top-level is an interesting idea. -- PaulBoddie

  2. Should client and server be under net? Would this be too much of the obvious? -- PaulBoddie

  3. Should encodings be under text or merged with text? -- PaulBoddie

  4. Is the io package too weak? Things like XML parsers do I/O and they're probably best in the text package (or their own top-level package). -- PaulBoddie

    • I question whether the xml package belongs in text. Lots of people use XML for stuff other than HTML on steroids. -- SkipMontanaro

    • Yes, it's something which is potentially important enough to have its own category. -- PaulBoddie

  5. Are the io package and files package too close in purpose? Could things like tempfile (with a nicer API) be part of the files package? -- PaulBoddie

CodingProjectIdeas/StandardLibrary/RestructuredStandardLibrary (last edited 2008-11-15 14:00:36 by localhost)

Unable to edit the page? See the FrontPage for instructions.