Differences between revisions 3 and 4
Revision 3 as of 2015-02-06 11:36:28
Size: 436
Editor: techtonik
Comment: add info about how to
Revision 4 as of 2015-02-08 09:24:41
Size: 659
Editor: techtonik
Comment: add few more link s
Deletions are marked like this. Additions are marked like this.
Line 9: Line 9:
== Links == == Links to check ==
Line 11: Line 11:
[[http://math.nist.gov/~RPozo/ngraph/webcrawler.html|PCrawler]] - NIST modular crawler, Public Domain, needs some love.  * [[http://math.nist.gov/~RPozo/ngraph/webcrawler.html|PCrawler]] - NIST modular crawler, Public Domain, needs some love.
 * [[https://www.mediawiki.org/wiki/Manual:Pywikibot/weblinkchecker.py|weblinkchecker.py]] - Wikipedia's Pywikibot link checker, MIT license.
 * https://pypi.python.org/pypi/LinkChecker - many features, GPL.

Python is often used to crawl the internet. One of the useful application for that is finding dead links.

Theory

The basic method to check if link is dead is sending a HEAD request.

Advanced stuff requires crawling links that are alive, and skipping those that are already visited.

DeadLinks (last edited 2015-02-08 09:24:41 by techtonik)

Unable to edit the page? See the FrontPage for instructions.