Differences between revisions 1 and 4 (spanning 3 versions)
Revision 1 as of 2014-10-03 09:38:03
Size: 1701
Editor: PaulMoore
Comment:
Revision 4 as of 2018-07-10 16:44:09
Size: 1793
Editor: EWDurbin
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
[[CheeseShopDev|PyPI]] APIs: '''Simple''', [[PyPIXmlRpc|XMLRPC]] [[PyPIJSON|JSON]]. [[CheeseShopDev|PyPI]] APIs: '''Simple''', [[PyPIJSON|JSON]], [[PyPIXmlRpc|XMLRPC]].
Line 3: Line 3:
For up to date documenation, see https://warehouse.readthedocs.io/api-reference/legacy/
Line 9: Line 10:
Line 17: Line 17:
Line 36: Line 35:

'''TODO''': Add details about links, `rel=`, `#md5=`, `#egg=`, links scraped from long_description and how/when to follow download links externally.
'''TODO''': Add further details about links, `rel=`, `#md5=`, `#egg=`, links scraped from long_description and how/when to follow download links externally.

PyPI APIs: Simple, JSON, XMLRPC.

For up to date documenation, see https://warehouse.readthedocs.io/api-reference/legacy/

You can get a list of all the distributions available on PyPI from the URL

    https://pypi.python.org/simple/

This returns a HTML page containing a list of links to the individual distribution pages.

If you wish to retrieve information about the download files available for specific distribution you may use

    https://pypi.python.org/simple/<distribution_name>/

This returns a HTML page containing a list of links to the actual downloadable files, and to other URLs registered by the project. The distribution name should be in canonical form (all lowercase, with dashes replaced by underscores) but there is a redirect from the name as specified by the project to the canonical name (and from the names without a trailing backslash to the version with a trailing backslash). To minimise network round trips, the canonical name should be used.

The following code can be used to extract the URLs from the simple API pages:

from xml.etree import ElementTree
from urllib.request import urlopen

def get_distributions(simple_index='https://pypi.python.org/simple/'):
    with urlopen(simple_index) as f:
        tree = ElementTree.parse(f)
    return [a.text for a in tree.iter('a')]

def scrape_links(dist, simple_index='https://pypi.python.org/simple/'):
    with urlopen(simple_index + dist + '/') as f:
        tree = ElementTree.parse(f)
    return [a.attrib['href'] for a in tree.iter('a')]

TODO: Add further details about links, rel=, #md5=, #egg=, links scraped from long_description and how/when to follow download links externally.

PyPISimple (last edited 2019-06-27 21:27:11 by SumanaHarihareswara)

Unable to edit the page? See the FrontPage for instructions.