Differences between revisions 3 and 4
Revision 3 as of 2008-10-12 14:47:34
Size: 4528
Editor: 158
Comment: use ints instead of a float for the mirror reliability counter feature
Revision 4 as of 2008-10-12 16:01:58
Size: 3846
Editor: tarek
Comment:
Deletions are marked like this. Additions are marked like this.
Line 60: Line 60:
        <li> The url to ping when PyPI is updated.</li>
Line 74: Line 73:
        <li><a href="http://example.com/pypi">Mirror #1</a></li>
        <li><a href="http://example2.com/pypi">Mirror #2</a></li>
        <li>
          
<a class="mirrorLink" href="http://example.com/pypi">Mirror #1</a> |
          <a class="mirrorPage" href="http://pypi.python.org/mirrors/1">more infos</a>
        
</li>
Line 88: Line 89:
== Ping mechanism == Each mirror gets its own page url at PyPI with extra information.
The name of that page is the ID of the mirror in the PyPI system.
Line 90: Line 92:
Everytime a package is uploaded, removed or modified at PyPI, the server
will ping via XML-RPC all the mirrors that are registered in order to tell
them that something has changed.

The XML-RPC request will use the following pseudo-code:
This page contains extra information about the mirror, like its freshness
date:
Line 97: Line 96:
<html>
  <head><title>PyPI mirror</title></head>
  <body>
  
    <h1>PyPI mirror Mirror #1</h1>
    
    <div>Freshness data: <span id="freshness">Sun May 20 15:23:01 2007</span></div>
  </body>
</html>
Line 98: Line 106:
    >>> import xmlrpclib
    >>> from socket import setdefaulttimeout
    >>> MIRROR_TIMEOUT = 2
    >>> setdefaulttimeout(MIRROR_TIMEOUT)
    >>> for mirror_url in mirror_urls:
    ... mirror = ServerProxy(mirror_url)
    ... try:
    ... mirror.removed_packages(removed_names)
    ... mirror.modified_packages(modified_names)
    ... except (timeout, xmlrpclib.ProtocolError):
    ... log_failure(mirror_url)
Line 112: Line 109:
 * The removed_names and modified_names are the distutils names of the
 module distributions.
Line 115: Line 110:
 * The MIRROR_TIMEOUT will prevent PyPI to slow down everytime
 it calls the mirrors. The mirrors are responsible to do the
 synchronisation job and must do it asynchronously from that
 call, which has to remain a ping for the sake of performance.
 The mirror decides what to do with the information it gets from
 PyPI.
Other package indexes that are not mirrors of PyPI are not added in the
mirror list in PyPI.
Line 122: Line 113:
 * Everytime a mirror fails, it is logged in the SQL database
 where we keep a "reliability ratio". It starts with 100
 and decreases after each failure. It is reset to 100
 every 100 calls. If the number of failures during these 100
 calls goes below a 75, the mirror is declared unreliable, and a
 mail is sent to the distutils mailing list. The mirror will then be
 eventually removed if the problem persists.
Although they can provide theirselve the same mirroring list mechanism
for their own mirrors.

== Freshness date ==

CPAN uses a freshness date system where the mirror last synchronisation date is made available.

For PyPI, the 'last-modified' header can be added in the root page of the index.

Other mirrors can also maintain this header.

Mirroring infrastructure in PyPI

  • PEP: 374
  • Title: Mirroring infrastructure in PyPI
  • Author: Tarek Ziadé
  • Discussions-To: Catalog SIG
  • Status: Draft
  • Python-Version: 2.6

Abstract

This PEP describes a mirroring infrastructure for PyPI.

Motivation

PyPI is hosting over 4000 projects and is used on a daily basis by people to build applications. Especially systems like easy_install and Buildout make intensive usage of PyPI.

For people making extensive use of PyPI, it can act as a single point of failure. People have started to set up some mirrors, both private and public. Those mirrors are active mirrors, which means that they are browsing PyPI to get synced.

The motivation of this PEP is to set up a registering mechanism in PyPI in order to list all the public PyPI mirrors and to provide an event based system where all mirrors get informed via RPC when a package has been uploaded, modified, or removed, so they can eventually sync themselves.

This PEP describes:

  • Mirror listing and registering
  • Ping mechanism

Mirror listing and registering

A new HTML page will be added at http://pypi.python.org/mirrors that can be browsed like the simple index. This page gives a list of the mirrors through a list of links.

These links are the URL of the simple index of each mirror. The page will look like this:

<html>
  <head><title>PyPI mirrors</title></head>
  <body>
  
    <h1>PyPI mirrors</h1>
    
    <p>
    If you want to register a new mirror, send an email
    to the catalog-SIG@python.org with:
    </p>

    <ol>
        <li> The url of your mirror.</li>
        <li> The name and email of the maintainer.</li>
    </ol>

    <p>
    The registering is done manually and to become a
    mirror, you need to strictly follow the package index
    API defined here:
    </p>
    
    <p>
    http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api
    </p>
    
    <ul id="mirror-links">
        <li>
          <a class="mirrorLink" href="http://example.com/pypi">Mirror #1</a> |
          <a class="mirrorPage" href="http://pypi.python.org/mirrors/1">more infos</a>
        </li>
    </ul>

  </body>
</html>

When a mirror is proposed on the mailing list, it is manually added in a dedicated SQL table in the PyPI application.

The mirror list page is a simple html page that can be browsed by any tool that wants to get a list of registered mirrors.

Each mirror gets its own page url at PyPI with extra information. The name of that page is the ID of the mirror in the PyPI system.

This page contains extra information about the mirror, like its freshness date:

<html>
  <head><title>PyPI mirror</title></head>
  <body>
  
    <h1>PyPI mirror Mirror #1</h1>
    
    <div>Freshness data: <span id="freshness">Sun May 20 15:23:01 2007</span></div>
  </body>
</html>

Other package indexes that are not mirrors of PyPI are not added in the mirror list in PyPI.

Although they can provide theirselve the same mirroring list mechanism for their own mirrors.

Freshness date

CPAN uses a freshness date system where the mirror last synchronisation date is made available.

For PyPI, the 'last-modified' header can be added in the root page of the index.

Other mirrors can also maintain this header.

Costs

Someone has to manage the list of mirrors. This work should not take too much time. I am willing to be that maintainer if the people that maintain the server don't have the time, or don't trust me.

Implementation

XXX

Third-party applications

A sample module will be written and published to demonstrate how a mirror can implement the XML-RPC APIs in Python.

Mirroring infrastructure (last edited 2009-02-15 18:16:54 by PaulBoddie)

Unable to edit the page? See the FrontPage for instructions.