Differences between revisions 4 and 5
Revision 4 as of 2004-02-29 19:19:37
Size: 9649
Editor: adsl-215-103-120
Comment:
Revision 5 as of 2004-02-29 19:47:59
Size: 9926
Editor: ayunami
Comment:
Deletions are marked like this. Additions are marked like this.
Line 78: Line 78:
* Check whether they are installed/up-to-date (probably via a function supplied to the constructor, or in the common case some metadata like the name of a module or module contents to import, or perhaps the module location of a version string to check, and the expected version.) * Check whether they are installed/up-to-date (probably via a function supplied to the constructor, or in the common case some metadata like the name of a module or module contents to import, or perhaps the module location of a version string to check, and the expected version.)  What about edge cases, where importing a module (particularly an extension) can have gnarly side-effects? Do we build an easy way to spawn a python interpreter and acquire the result of this check, or do we force module authors to fix things such that importing them is safer?
Line 127: Line 127:

Distutils-based Dependency Support

Vision

We want to create a means whereby a package author can create a distutils-based package distribution that can automatically download and install the other distutils-based packages that the author's package requires. This mechanism must not require that a depended-on package explicitly support it. In other words, a newly created package should be able to depend on any existing distutils-based source distribution. (Being able to use binary distributions to satisfy dependencies is also desirable, but support of all possible binary formats should not be a sprint goal.) The means created should work with Python 2.2, must not require a modified distutils, and must be distributable by the packager so that the end user sees only the standard setup.py-based installation process.

Whatever mechanism used to denote and process dependencies should be isolated as a library for reuse by other tool efforts, such as user-friendly package management tools. However, the package management tools themselves aren't the sprint goal, because these can be written independently. The "simplest thing that could possibly work" here is to give package authors a tool they can use to make their dependencies easy to install.

Intended Uses

  • break up monolithic systems (PEAK, Zope, Twisted) into smaller package sets with dependencies
  • allow dependencies on other systems (e.g. Twisted using PyProtocols, PEAK using zope.publisher, etc.)
  • support painless install for end users (single command to download and install "everything needed") even at the cost of a little pain for the packager(s).

Non-Goals

  • This is not an effort to develop an ultimate metadata format, repository, or best way to sign code
  • This is not a bitch session about how distutils and PyPI aren't CPAN
  • PEP 262 should be orthogonal to this effort, since we will be using standard distutils to do installations.
  • "uninstall" is out of scope

Implementation Notes

What would be the "simplest thing that could possibly work"? We could have a function to download and install a source distribution from a given URL, and setup.py could simply call that function once for each dependency.

But that wouldn't work very well if someone ran setup.py more than once, and would be wasteful of time and bandwidth if the dependencies were already installed. So, we could have the setup script check if each dependency was already installed, before installing it. That would prevent the redundant installation if the script was re-run, or if the dependencies were already present.

This would not work very well, however, in situations where the user was not installing to the standard location, or indeed using any non-standard options at all. Duplicating the determination of this information wouldn't be a good idea. And simply copying sys.argv to pass to a subprocess won't work either, because the parent script might be being run for say, an "sdist" command, which would produce very silly results when applied to a dependency.

So, dependency installation needs to become a part of the distutils command structure, via the custom commands mechanism. Specifically, we could create a get_dependencies command, that was added to a custom version of the build command. In this way, only during building would we download and install dependencies. We need this to be done before building anything else, because the dependency might contain C header files that the parent package needs in order to compile extensions.

But there's still a problem; the build command doesn't have all of the knowledge that the install command would have about the target installation location and so on. So, a get_dependencies during build could only download, extract, and build the dependencies. It would not be able to install them without manual intervention -- which means that it would not be able to build the parent package.

Of course, the scenario of depending on C header files probably isn't all that common. So, perhaps the simplest thing to do would be to:

  1. explicitly designate build-time dependencies, as distinct from run-time dependencies, and/or
  2. trick the parent package's build process to use header files from the build/ directory of the dependency

Dependencies that are "build-time" would then stop the build process of the parent if the dependency is not installed.

So, now it seems we would have at least two custom commands: build_dependencies and install_dependencies. We might also have fetch_dependencies and extract_dependencies commands, but it would probably be better to make fetching and extraction be responsibilities of objects representing the dependencies, thereby making it easier to have specialized installation mechanisms. (For example, someone could potentially write a dependency class to install a needed non-Python library or tool.)

Where would dependencies be downloaded? Built? Perhaps we could create a deps/ subdirectory, similar to the build/ subdirectory. Dependent packages would be downloaded there, and a subdirectory created for each one. Extraction would take place in that subdirectory. So, if we were building PyProtocols as a dependency, it would be downloaded as deps/PyProtocols-0.9.2.tar.gz, and it would be extracted in deps/PyProtocols-0.9.2. This would result in a setup.py being located at deps/PyProtocols-0.9.2/PyProtocols-0.9.2/setup.py. While the double level of directories would be redundant for distutils packages, the extra directory would make the directory structure safe for extracting non-distutils dependencies as well.

So how do we run the subcommands? There's a distutils.core.run_setup() function that could be used to run the child setup scripts, after first changing to the subdirectory. However, if a dependency expects to import anything special in its setup script (i.e. it depends on sys.path including the setup directory), it will not work. This strongly suggests that running dependencies' setup scripts in a subprocess is the best way to ensure complete compatibility with all existing distributed packages.

So, can we just pass our sys.argv to the child process? Tentatively, yes. There are only two ways this could foul up the child's configuration:

  1. files or directories were specified on the command line using relative paths
  2. the user edited the parent package's setup.cfg to specify options.

Both of these issues could be fixed by regenerating a custom command line from the actual finalized command options for the parent distribution. But it's not clear whether that's worth it. During the sprint, we may just want to use the parent sys.argv, and isolate it in a get_setup_argv() method for future enhancement.

Dependency and Distribution Objects

Dependency objects should be able to:

  • Check whether they are installed/up-to-date (probably via a function supplied to the constructor, or in the common case some metadata like the name of a module or module contents to import, or perhaps the module location of a version string to check, and the expected version.) What about edge cases, where importing a module (particularly an extension) can have gnarly side-effects? Do we build an easy way to spawn a python interpreter and acquire the result of this check, or do we force module authors to fix things such that importing them is safer?
  • Select an appropriate "distribution" of the dependency (probably by selecting the first distribution that can be used on this platform)

Distribution objects should be able to:

  • Know whether they can be used on the current platform

    (For example, a win32.exe distribution should know it can only be used on a Win32 machine, and a tar.gz distribution should know it needs a working zlib to be extracted. And it should be able to explain why it can't be used, so that the dependency can explain why it couldn't find a suitable distribution if no other distribution suffices.)

  • Download themselves to a designated location, optionally verifying size and checksum

  • Extract themselves to a designated location

  • Install themselves, given some parameters (???)

For convenience's sake, it should be possible to make a single call to create a dependency with a single distribution. For example, one might simply provide a list of URLs to the Depenency constructor that specify the distribution(s). The dependency would convert any non-distribution objects in the list by looking up file suffixes in a mapping to determine the dependency class to create.

Finally, by subclassing distutils.core.Distribution to add a dependencies attribute, we'll be able to supply a dependencies keyword to setup() in the parent distribution.

Sprinting Strategies/Notes

  • How can we create tests? What will we test? Unit testing of Dependency, Distribution, et al will probably be easy, but integration testing of the distutils-connected parts may be rather "interesting", to say the least.
  • We should probably plan ahead what scenarios we'd like to have working, e.g.:
    • Have PEAK separately download and install PyProtocols and ZConfig, instead of bundling them
    • Tinker with the mechanize.browser distribution so it can download and install ClientCookie, ClientForm, and so on.
    • Something with binary install for win32

Committed Participants

  • Anthony Baxter
  • Fred Drake
  • Bob Ippolito
  • John Landahl

Schedule

TBD; Fred suggests Monday/Tuesday

Distutils/Proposals/DependencySupport (last edited 2009-01-31 16:00:31 by 61)

Unable to edit the page? See the FrontPage for instructions.