Differences between revisions 1 and 2
Revision 1 as of 2008-10-11 21:48:49
Size: 4335
Editor: tarek
Comment:
Revision 2 as of 2008-10-12 14:45:02
Size: 4532
Editor: 158
Comment: Grammatical edits. Clarify terminology somewhat. Improve HTML of mirror page.
Deletions are marked like this. Additions are marked like this.
Line 16: Line 16:
PyPI is hosting over 4000 package and is used on a daily-basis
by people to build applications. Systems like easy_install
and zc.buildout makes a intensive usage of PyPI.
PyPI is hosting over 4000 projects and is used on a daily basis
by people to build applications. Especially systems like easy_install
and Buildout make intensive usage of PyPI.
Line 20: Line 20:
In a sense, PyPI became a single point of failure, and people
are starting to set up some mirrors, wether they are private
or public. Those mirrors are active mirrors, which means that
they are browsing PyPI to get synced.
For people making extensive use of PyPI, it can act as a single point
of failure. People have started to set up some mirrors, both private and
public. Those mirrors are active mirrors, which means that they are
browsing PyPI to get synced.
Line 33: Line 33:
 * mirror listing and registering
 * ping mechanism
 * Mirror listing and registering
 * Ping mechanism
Line 39: Line 39:
that can be browsed like the simple index. This page just
gives a list of the mirrors through a list of links.
that can be browsed like the simple index. This page gives a list of
the mirrors through a list of links.
Line 42: Line 42:
These links are the url of the simple index of each mirrors. These links are the URL of the simple index of each mirror.
Line 49: Line 49:
    <p>PyPI mirrors     <h1>PyPI mirrors</h1>
Line 51: Line 52:
    If you want to register a new mirror, send a mail
    to catalog-SIG@python.org with:
                 - the url of your mirror.
        - the name and email of the maintainer.
        - the url to
ping when PyPI is updated.
    <p>
If you want to register a new mirror, send an email
    to the catalog-SIG@python.org with:
    </p>
Line 58: Line 57:
    <ol>
        <li> The url of your mirror.</li>
        <li> The name and email of the maintainer.</li>
        <li> The url to ping when PyPI is updated.</li>
    </ol>

    <p>
Line 60: Line 66:
    API defined here:     API defined here:
   </p>
    
    <p>
Line 62: Line 71:
    </p>
    
    <ul id="mirror-links">
        <li><a href="http://example.com/pypi">Mirror #1</a></li>
        <li><a href="http://example2.com/pypi">Mirror #2</a></li>
    </ul>
Line 63: Line 78:
    </p>
    <a href="http://example.com/pypi">Mirror #1</a>
    <a href="http://example2.com/pypi">Mirror #2</a>
Line 74: Line 86:
by the tool out there that wants to get a list of registered mirrors. by any tool that wants to get a list of registered mirrors.
Line 78: Line 90:
Everytime a package is uploaded, removed or
modified at PyPI, the server will ping via XML-RPC all
the mirrors that are registered in order to tell them
that something has changed.
Everytime a package is uploaded, removed or modified at PyPI, the server
will ping via XML-RPC all the mirrors that are registered in order to tell
them that something has changed.
Line 83: Line 94:
The RPC request will use the following pseudo-code The XML-RPC request will use the following pseudo-code:
Line 101: Line 112:
 * removed_names and modified_names are the name of the packages
 (the
distutils ids).
 * The removed_names and modified_names are the distutils names of the
   module d
istributions.
Line 104: Line 115:
 * the MIRROR_TIMEOUT will prevent PyPI to slow down everytime  * The MIRROR_TIMEOUT will prevent PyPI to slow down everytime
Line 107: Line 118:
 call, which has to remain a ping for the sake of performances.  call, which has to remain a ping for the sake of performance.
Line 111: Line 122:
 * everytime a mirror fails, it is log in the SQL database
 where we keep a "reliability ratio". It start with 1.0
 * Everytime a mirror fails, it is logged in the SQL database
 where we keep a "reliability ratio". It starts with 1.0
Line 115: Line 126:
 calls goes down to 0.75, the mirror is declared
unreliable, and a mail is sent to the distutils mailing list.
The mirror will then be eventually removed if the problem persists.
 calls goes down to 0.75, the mirror is declared unreliable, and a
mail is sent to the distutils mailing list. The mirror will then be
eventually removed if the problem persists.
Line 134: Line 145:
can implement the RPC APIs in Python. can implement the XML-RPC APIs in Python.

Mirroring infrastructure in PyPI

  • PEP: 374
  • Title: Mirroring infrastructure in PyPI
  • Author: Tarek Ziadé
  • Discussions-To: Catalog SIG
  • Status: Draft
  • Python-Version: 2.6

Abstract

This PEP describes a mirroring infrastructure for PyPI.

Motivation

PyPI is hosting over 4000 projects and is used on a daily basis by people to build applications. Especially systems like easy_install and Buildout make intensive usage of PyPI.

For people making extensive use of PyPI, it can act as a single point of failure. People have started to set up some mirrors, both private and public. Those mirrors are active mirrors, which means that they are browsing PyPI to get synced.

The motivation of this PEP is to set up a registering mechanism in PyPI in order to list all the public PyPI mirrors and to provide an event based system where all mirrors get informed via RPC when a package has been uploaded, modified, or removed, so they can eventually sync themselves.

This PEP describes:

  • Mirror listing and registering
  • Ping mechanism

Mirror listing and registering

A new HTML page will be added at http://pypi.python.org/mirrors that can be browsed like the simple index. This page gives a list of the mirrors through a list of links.

These links are the URL of the simple index of each mirror. The page will look like this:

<html>
  <head><title>PyPI mirrors</title></head>
  <body>
  
    <h1>PyPI mirrors</h1>
    
    <p>
    If you want to register a new mirror, send an email
    to the catalog-SIG@python.org with:
    </p>

    <ol>
        <li> The url of your mirror.</li>
        <li> The name and email of the maintainer.</li>
        <li> The url to ping when PyPI is updated.</li>
    </ol>

    <p>
    The registering is done manually and to become a
    mirror, you need to strictly follow the package index
    API defined here:
    </p>
    
    <p>
    http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api
    </p>
    
    <ul id="mirror-links">
        <li><a href="http://example.com/pypi">Mirror #1</a></li>
        <li><a href="http://example2.com/pypi">Mirror #2</a></li>
    </ul>

  </body>
</html>

When a mirror is proposed on the mailing list, it is manually added in a dedicated SQL table in the PyPI application.

The mirror list page is a simple html page that can be browsed by any tool that wants to get a list of registered mirrors.

Ping mechanism

Everytime a package is uploaded, removed or modified at PyPI, the server will ping via XML-RPC all the mirrors that are registered in order to tell them that something has changed.

The XML-RPC request will use the following pseudo-code:

    >>> import xmlrpclib
    >>> from socket import setdefaulttimeout
    >>> MIRROR_TIMEOUT = 2
    >>> setdefaulttimeout(MIRROR_TIMEOUT)
    >>> for mirror_url in mirror_urls:
    ...     mirror = ServerProxy(mirror_url)
    ...     try:
    ...         mirror.removed_packages(removed_names)
    ...         mirror.modified_packages(modified_names)
    ...     except (timeout, xmlrpclib.ProtocolError):
    ...         log_failure(mirror_url) 
  • The removed_names and modified_names are the distutils names of the
    • module distributions.
  • The MIRROR_TIMEOUT will prevent PyPI to slow down everytime it calls the mirrors. The mirrors are responsible to do the synchronisation job and must do it asynchronously from that call, which has to remain a ping for the sake of performance. The mirror decides what to do with the information it gets from PyPI.
  • Everytime a mirror fails, it is logged in the SQL database where we keep a "reliability ratio". It starts with 1.0 and decreases after each failure. It is reset to 1.0 every 100 calls. If the number of failures during these 100 calls goes down to 0.75, the mirror is declared unreliable, and a mail is sent to the distutils mailing list. The mirror will then be eventually removed if the problem persists.

Costs

Someone has to manage the list of mirrors. This work should not take too much time. I am willing to be that maintainer if the people that maintain the server don't have the time, or don't trust me.

Implementation

XXX

Third-party applications

A sample module will be written and published to demonstrate how a mirror can implement the XML-RPC APIs in Python.

Mirroring infrastructure (last edited 2009-02-15 18:16:54 by PaulBoddie)

Unable to edit the page? See the FrontPage for instructions.