Differences between revisions 27 and 28
Revision 27 as of 2014-05-31 18:27:15
Size: 307
Editor: 173
Comment:
Revision 28 as of 2014-05-31 20:10:33
Size: 1928
Comment: undo spam
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
My name is Charla Broadnax but everybody calls me Charla. I'm from Great Britain. I'm studying at the university (3rd year) and I play the Lap Steel Guitar for 3 years. Usually I choose music from the famous films :D. <<BR>>
I have two sister. I like Sculling or Rowing, watching movies and Table tennis.
#pragma section-numbers off
=== Description ===
A www crawler(robot) program in python.

=== Information ===

   [[http://freecode.com/projects/harvestman|"Freecode Project Page"]]

   [[http://harvestman.freezope.org/|"HarvestMan Home Page"]] link gone

   version:: 1.4 (''<<Date(2005-05-27T00:00:00)>>'')
   licence:: GNU GPL
   Python versions:: 2.2, 2.3, 2.4

   Platforms :: Any platform supported by python
   Binaries :: None

=== How it spins its web ===
   HarvestMan uses a threading model using python threads to
   achieve a very fast, but highly customizable download of web-sites
   on the internet. It can be used to download files from intranet
   servers.
  
   It is the first multithreaded, opensource webcrawler written
   in python.

=== Features ===

    * Fully Multithreaded
    * Number of threads configurable by user
    * Support for robots exclusion protocol
    * Filtering of urls using regular expressions
    * Filtering of server names using regular expressions
    * Control download by specifying depth of fetching
    * Configure by number of files downloadable
    * Specify timeout for individual threads
    * Control download speed by changing thread/depth options.
    * HTTP/FTP/HTTPS support & support for servers in LAN.
    * XML project files which can be re-read
    * Smart reconnection
    * Support for proxies/firewalls
    * File limits, server limits
    * Projects browser page
    * Command line/config file support
    * Use as a program or as a web-spider module
    * OO architecture

=== Who should use it ===

    HarvestMan is written for the desktop user. It can be used
    as an internet spidering module also. An API for external
    users is being written.

=== Taxonomy ===

    Species: HarvestMan
    Genus: (Internet) Spiders
     
=== Developers ===
     
    Anand B Pillai,

Description

A www crawler(robot) program in python.

Information

How it spins its web

  • HarvestMan uses a threading model using python threads to achieve a very fast, but highly customizable download of web-sites on the internet. It can be used to download files from intranet servers. It is the first multithreaded, opensource webcrawler written in python.

Features

  • Fully Multithreaded
  • Number of threads configurable by user
  • Support for robots exclusion protocol
  • Filtering of urls using regular expressions
  • Filtering of server names using regular expressions
  • Control download by specifying depth of fetching
  • Configure by number of files downloadable
  • Specify timeout for individual threads
  • Control download speed by changing thread/depth options.
  • HTTP/FTP/HTTPS support & support for servers in LAN.

  • XML project files which can be re-read
  • Smart reconnection
  • Support for proxies/firewalls
  • File limits, server limits
  • Projects browser page
  • Command line/config file support
  • Use as a program or as a web-spider module
  • OO architecture

Who should use it

  • HarvestMan is written for the desktop user. It can be used as an internet spidering module also. An API for external users is being written.

Taxonomy

  • Species: HarvestMan Genus: (Internet) Spiders

Developers

  • Anand B Pillai,

HarvestMan (last edited 2014-05-31 20:10:33 by MarcAndreLemburg)

Unable to edit the page? See the FrontPage for instructions.