Size: 436
Comment: add info about how to
|
← Revision 4 as of 2015-02-08 09:24:41 ⇥
Size: 659
Comment: add few more link s
|
Deletions are marked like this. | Additions are marked like this. |
Line 9: | Line 9: |
== Links == | == Links to check == |
Line 11: | Line 11: |
[[http://math.nist.gov/~RPozo/ngraph/webcrawler.html|PCrawler]] - NIST modular crawler, Public Domain, needs some love. | * [[http://math.nist.gov/~RPozo/ngraph/webcrawler.html|PCrawler]] - NIST modular crawler, Public Domain, needs some love. * [[https://www.mediawiki.org/wiki/Manual:Pywikibot/weblinkchecker.py|weblinkchecker.py]] - Wikipedia's Pywikibot link checker, MIT license. * https://pypi.python.org/pypi/LinkChecker - many features, GPL. |
Python is often used to crawl the internet. One of the useful application for that is finding dead links.
Theory
The basic method to check if link is dead is sending a HEAD request.
Advanced stuff requires crawling links that are alive, and skipping those that are already visited.
Links to check
PCrawler - NIST modular crawler, Public Domain, needs some love.
weblinkchecker.py - Wikipedia's Pywikibot link checker, MIT license.
https://pypi.python.org/pypi/LinkChecker - many features, GPL.