Differences between revisions 16 and 27 (spanning 11 versions)
Revision 16 as of 2003-08-12 21:51:30
Size: 2460
Editor: 203-195-199-244
Comment:
Revision 27 as of 2014-05-31 18:27:15
Size: 307
Editor: 173
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#pragma section-numbers off
=== Description ===
A www crawler(robot) program in python.

=== Information ===

   [http://www.freshmeat.net/projects/harvestman/ "Freshmeat Project Page"]

   [http://sithara.freezope.org/harvestman/source/ "Source/Binaries Download Page"]

   version:: 1.1.2 (''[[Date(2003-08-13T00:00:00)]]'')
   licence:: OSL 1.1 (Open Software License Version 1.1)
   Python versions:: 2.2.2, 2.2.3
   Platforms :: Any platform supported by python
   Binaries :: Available for Win32

=== How it spins its web ===
   HarvestMan uses a threading model using python threads to
   achieve a very fast, but highly customizable download of web-sites
   on the internet.It can be used to download files from intranet
   servers.
  
   HarvestMan is a fully multi-threaded webcrawler utility
   using the threading support in python language to the fullest.
  
   It is the first multithreaded, opensource webcrawler written
   in python.

=== Features ===

    * Fully Multithreaded
    * Number of threads configurable by user
    * Support for robots exclusion protocol
    * Filtering of urls using regular expressions
    * Filtering of server names using regular expressions
    * Control download by specifying depth of fetching
    * Configure by number of files downloadable
    * Specify timeout for individual threads
    * Control download speed by changing thread/depth options.
    * HTTP/FTP/HTTPS support & support for servers in LAN.
    * XML project files which can be re-read
    * Smart reconnection
    * Support for proxies/firewalls
    * File limits, server limits
    * Projects browser page
    * Command line/config file support
    * Use as a program or as a web-spider module
    * OO architecture

=== Who should use it ===

    HarvestMan is written for the desktop user. It can be used
    as an internet spidering module also. An API for external
    users is being written.

=== Taxonomy ===
    
    HarvestMan is the name of a kind of small spider found in parts of N.A
    also called "Daddy long legs". Since this program functions as a "spider",
    and also makes a living by "harvesting" links from the internet, the name
    "HarvestMan" looked very apt.

    Species: HarvestMan
    Genus: (Internet) Spiders
     
=== Developers ===
     
    "The Harvesters",

    Anand B Pillai,
    Nirmal K Chidambaram
    
My name is Charla Broadnax but everybody calls me Charla. I'm from Great Britain. I'm studying at the university (3rd year) and I play the Lap Steel Guitar for 3 years. Usually I choose music from the famous films :D. <<BR>>
I have two sister. I like Sculling or Rowing, watching movies and Table tennis.

My name is Charla Broadnax but everybody calls me Charla. I'm from Great Britain. I'm studying at the university (3rd year) and I play the Lap Steel Guitar for 3 years. Usually I choose music from the famous films :D.
I have two sister. I like Sculling or Rowing, watching movies and Table tennis.

HarvestMan (last edited 2014-05-31 20:10:33 by MarcAndreLemburg)

Unable to edit the page? See the FrontPage for instructions.