Distutils Version comparison

Draft final proposal

From Trent Mick,

At the second distutils open space we agreed on the following format for a "rational version":

 N.N[.N]+[abc]N[.N]+[.(dev|post)N+]

Some examples probably make it clearer:

>>> from verlib import RationalVersion as V
>>> (V('1.0a1')
...  < V('1.0a2.dev456')
...  < V('1.0a2')
...  < V('1.0a2.1.dev456')  # e.g. need to do a quick post release on 1.0a2
...  < V('1.0a2.1')
...  < V('1.0b1.dev456')
...  < V('1.0b2')
...  < V('1.0c1.dev456')
...  < V('1.0c1')
...  < V('1.0.dev456')
...  < V('1.0')
...  < V('1.0.post456'))
True

The trailing ".dev123" is for pre-releases. The ".post123" is for post-releases -- which apparently is used by a number of projects out there (e.g. twisted). For example *after* a "1.2.0" release there might be a "1.2.0-r678" release. We used "post" instead of "r" because the "r" is ambiguous as to whether it indicates a pre- or post-release.

The attached verlib.py has the parsing and comparison (using lt, eq, et al on the RationalVersion class). There are a lot of doctests in the main docstring with examples.

There is also a suggest_rational_version(s) method that can be used to suggest a rational version string for a lot of version strings that are close. Using this suggest method I managed to get 81% of the versions currently on PyPI (from Martin's list) to match the RationalVersion scheme -- which I think is pretty good.

Implementation : verlib.py

Proposals

Larry Hastings

Sorry for the crazy moon proposal here, but I have example code that I think even works. (Though I'm not operating on enough sleep, so sorry 'cause I already know the code is crappy.) lch.version.py

The basic idea:

I doubt y'all will go for this. But I thought you should at least consider supporting a more flexible format. People like to express their version numbers a wide variety of ways, and most folks could find a way to map their personal weird approach to something this approach would make consistent.

(Setuptools implements almost exactly what you've described, except that it produces tuples that can be compared by the standard comparison operators; see pkg_resources.parse_version() --PJE)

Larry Hastings: Round 2--Fight!

You want strict? I can do strict. Peep dis, my homies. First we define a "version number".

Now we can define a "version string":

Though I think we should relent and allow dashes to be equivalent to periods. But I'm not going to fight about it.

For comparison purposes, a version string should be converted to a "version tuple". If we have a free hand to specify this, I suggest:

For example:

When you compare two version tuples, for each sub-tuple:

@Larry: See 'Trent Mick' section below for code that now does this (with the exception of allowing the '~' and 'r' aliases for 'dev'). --TrentMick

Erik LaBianca

Just throwing this out there since it's a little different from what was discussed earlier, but I think has some merit:

* Versions are a series of integers. Ie 0.0.1, 1.0.0, or 12.5.7.9. Versions define the "intended API level" of the software in question.

Pre-release versions are denoted by a version, followed by a string (I suggest "pre" but it doesn't really matter), followed by a series of integers, seperated by periods if needed. Pre-releases are important because while they may be an implementation of a future API version, they are likely to be buggy or incomplete.

* For instance 1.0.0pre2.1 > 1.0.0pre2 > 1.0.0pre1 < 1.0.0.

* Or for the case of daily builds, 1.0.0pre1 > 1.0.0pre0.20090327 < 1.0.0

This obviously trades away the flexibility of roll-your-own naming entirely, but makes up for it by defining a standard that leaves enough flexibility to represent most cases easily. It is easy to parse and explain, extensible, and able to cover most use cases aside from that of "backported bug fixes". I believe that eliminating words will be a net positive because it eliminates any complaints along the lines of "you included alpha, beta, and rc but where's pre-release or testing?!".

Trent Mick

Code for the version format discussed at the second distutils open space (Sat evening): verlib.py

Run the script to run all its doctests.

Some stats against the list of current PyPI versions that MvL provided:

-- matches againsts current PyPI versions
count: 4975
RationalVersion1 matches: 1986 (39.92%)
RationalVersion2 matches: 2386 (47.96%)
-- with some naive cleaning up of PyPI versions (e.g. '1.0-alpha1' -> '1.0a1')
cleaned RationalVersion1 matches: 2499 (50.23%)
cleaned RationalVersion2 matches: 3003 (60.36%)

A link from RubyGems that might be interesting:

* Version Policy: http://rubygems.org/read/chapter/7

Georg Brandl

I just spoke with Holger Krekel and we discussed an important point: when we reject a version number on upload, we should provide at least examples for valid version numbers, or even better, if the version can be automatically normalized like 0.5-alpha1 to 0.5a1, suggest that instead. Just saying "it's irrational" or pointing to the PEP (which will probably contain much more than version numbering specs) is not user-friendly.

Just nothing this here so that it doesn't get lost.

[I've added that to my list. I already have some code for this when I was calculating some stats on the current pypi versions list. --TrentMick]

Tom Crawley

The setuptools versioning scheme is described here. It is pretty similiar to the schemes which have been proposed so far. The setuptools versioning scheme should meet most of our needs and has found wide acceptance in the community. Neither of the two current distuils versioning schemes are widely used. The setuptools versioning scheme is geared to the needs of developers who are the primary audience of distuils and provides a useful scheme for controlling software releases.

Third party packagers each have their own version schemes with their own features and idiosyncracies . We cannot accomodate every third party packager within the Python versioning scheme. We should focus primarily on the needs of developers as without developer buyin the scheme will not be adopted. We can accomodate third parties by allowing inclusion of packager specific version numbers with the metadata that is distributed with the Python application distribution. This would look something like:

version = 1.0.pre1

[rpm]
version = 1.0

This would enable a version numbering scheme per packaging scheme and would also allow for extensibility as new packaging schemes are developed. The information in the tags can be processed downstream by packaging organisations.

The idea put forward by Larry Hastings is an excellent method for converting pre-release and post release tags to a numeric based schema. It could be included as an example of tag scheme conversion from the setuptools versioning scheme to a completely numeric packaging scheme.

Dan Callahan

Rough sketch of an idea:

A version is a series of "."-separated numeric fields. The introduction of a non-numeric character anywhere in the version string marks the version as a pre-release.

For sorting: Compare numeric fields as tuples of integers. For equivalent numeric versions, prefer ones that do not have any trailing non-numeric fields.

In essence: '3.1' == '3.1.0' > '3.1.rc' > '3.1.0.funtime32'

This simply and flexibly handles pre-release versioning. Post-releases are handled by incrementing the least significant numeric field.

This is, in essence, the inverse of Loose Version in distutils -- it handles pre-releases instead of post-releases, but still has room for "non-standard" annotations.

Matthias Klose

Pointing to the version numbering scheme used by Debian, which is in production use for many years. Pointing out some points:

Examples:

  1.0 < 1.0.0 < 1.0.1 < 1.1 < 1.10

  2.1 < 1:1.0

  3.1~~svn20090328 < 3.1~alpha1 < 3.1

Marc-Andre Lemburg

The approach I'd like to see is simple and avoids all complicated and error prone parsing of string versions:

Package management software should always use the tuple version for version comparison and display the string version to the user.

Package repositories must provide a way to:

Ideally, they should display the version string to the user and allow searches based on these as well.

@MAL: One of the issues is being able to specify dependencies (e.g. in the current "setup_requires" or whatever). For example: "simplejson > 3.0.1". That means either a deterministic translation btwn version tuple and version string is necessary OR dependencies need to specify version *tuples*. The latter might be painful. --TrentMick

@Trent: Right, dependency definitions will need to use the tuple version, probably using a version matching function or instance, e.g. requires=PythonPackage('simplejson', supported_versions=((3,0), (3,1))). Note that just specifying a minimum version is likely not going to provide a robust setup, e.g. "simplejson > 3.0.1" would also match simplejson 4.0, but that may have a completely incompatible interface. -- MarcAndreLemburg

Brian Sutherland

This is more a statement of what we want/don't want than a specific implementation:

Toshio Kuratomi

Something that could be important is specifying a meaning for the version numbers (especially the first three version numbers). Blessing something like Enthought's Versioning would have several advantages.

Kay Schluehr

I basically like the initial versioning scheme proposed by Trent Mick and would just stick to it. However I wouldn't interpret too much into it. Version numbers are an incredibly poor way to express compatibility issues and they serve as a simplistic heuristics at best that is open to interpretation.

The WP article about versioning also mentions political and psychological implications of using version numbers. Remember that Django was long considered unstable due to not having reached v1.0. Better stick to static metadata for more reliable information and to DAGs for modeling dependencies. OO did the latter about 30 years ago. It's time to grow up.

Distutils/VersionComparison (last edited 2009-05-11 08:52:26 by dslb-088-064-058-218)

Unable to edit the page? See the FrontPage for instructions.