Differences between revisions 8 and 9
Revision 8 as of 2003-06-16 16:01:31
Size: 1935
Editor: 213
Comment:
Revision 9 as of 2003-06-27 18:13:25
Size: 1844
Editor: dialpool-210-214-114-105
Comment:
Deletions are marked like this. Additions are marked like this.
Line 41: Line 41:
    suited for python hackers and learners. We also urge the
    avergae user to download the executable and try it. It is free ! :-)
    suited for python hackers and learners.

Description

A www crawler(robot) program in python.

Information

Deployment Platforms

  • Tested on Windows 95-98/NT/2000, Mandrakesoft Linux 9.0.

How it spins its web

  • HarvestMan uses a new threading model using python threads to achieve a very fast, but highly customizable download of web-sites on the internet.It can be used to download files from intranet servers.

    HarvestMan is a truly multi-threaded webcrawler utility using the threading support in python language to the fullest. It is the first webcrawler in python, which is opensource.

Features

  • Fully Multithreaded
  • Number of threads configurable by user
  • Support for robots exclusion protocol
  • Filtering of urls using regular expressions
  • Filtering of server names using regular expressions
  • Control download by specifying depth of fetching
  • Configure by number of files downloadable
  • Specify timeout for individual threads
  • Control download speed by changing thread/depth options.

Who should use it

  • HarvestMan is written for the desktop user. It is ideally suited for python hackers and learners.

Taxonomy

  • HarvestMan is the name of a kind of small spider found in parts of N.A also called "Daddy long legs". Since this program functions as a "spider", and also thrives by "harvesting" links from the internet, the name "HarvestMan" looked very apt.

    Species: HarvestMan Genus: Web-spiderae

HarvestMan (last edited 2014-05-31 20:10:33 by MarcAndreLemburg)

Unable to edit the page? See the FrontPage for instructions.