Size: 1935
Comment:
|
Size: 2314
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 7: | Line 7: |
[http://members.lycos.co.uk/anandpillai "HarvestMan Homepage"] version:: 1.0.1 (''[[Date(2003-06-14T00:00:00)]]'') licence:: An Opensource license, free for personal use. |
[http://members.lycos.co.uk/anandpillai "HarvestMan Homepage"] [http://www.freshmeat.net/projects/harvestman "Freshmeat Project Page"] version:: 1.1.1 (''[[Date(2003-08-01T00:00:00)]]'') licence:: OSL 1.1 (Open Software License Version 1.1) |
Line 13: | Line 14: |
Tested on Windows 95-98/NT/2000, Mandrakesoft Linux 9.0. | Tested on Windows 95-98/NT/2000, Mandrakesoft Linux 9.0, 9.1. |
Line 21: | Line 22: |
HarvestMan is a truly multi-threaded webcrawler utility | HarvestMan is a fully multi-threaded webcrawler utility |
Line 37: | Line 38: |
* HTTP/FTP/HTTPS support & support for servers in LAN. * XML project files which can be re-read * Smart reconnection * Queue based multithreading * Support for proxies/firewalls * File limits, server limits * Projects browser page * Command line/config file support * Use as a program or as a web-spider module * OO architecture |
|
Line 40: | Line 51: |
HarvestMan is written for the desktop user. It is ideally suited for python hackers and learners. We also urge the avergae user to download the executable and try it. It is free ! :-) |
HarvestMan is written for the desktop user. It can be used by any average computer user. |
Line 48: | Line 58: |
and also thrives by "harvesting" links from the internet, the name | and also makes a living by "harvesting" links from the internet, the name |
Line 52: | Line 62: |
Genus: Web-spiderae | Genus: (Internet) Spiders |
Description
A www crawler(robot) program in python.
Information
[http://members.lycos.co.uk/anandpillai "HarvestMan Homepage"] [http://www.freshmeat.net/projects/harvestman "Freshmeat Project Page"]
- version
1.1.1 (Date(2003-08-01T00:00:00))
- licence
- OSL 1.1 (Open Software License Version 1.1)
- Python versions
- Tested on 2.2.2, 2.2.3
Deployment Platforms
- Tested on Windows 95-98/NT/2000, Mandrakesoft Linux 9.0, 9.1.
How it spins its web
HarvestMan uses a new threading model using python threads to achieve a very fast, but highly customizable download of web-sites on the internet.It can be used to download files from intranet servers.
HarvestMan is a fully multi-threaded webcrawler utility using the threading support in python language to the fullest. It is the first webcrawler in python, which is opensource.
Features
- Fully Multithreaded
- Number of threads configurable by user
- Support for robots exclusion protocol
- Filtering of urls using regular expressions
- Filtering of server names using regular expressions
- Control download by specifying depth of fetching
- Configure by number of files downloadable
- Specify timeout for individual threads
- Control download speed by changing thread/depth options.
HTTP/FTP/HTTPS support & support for servers in LAN.
- XML project files which can be re-read
- Smart reconnection
- Queue based multithreading
- Support for proxies/firewalls
- File limits, server limits
- Projects browser page
- Command line/config file support
- Use as a program or as a web-spider module
- OO architecture
Who should use it
HarvestMan is written for the desktop user. It can be used by any average computer user.
Taxonomy
HarvestMan is the name of a kind of small spider found in parts of N.A also called "Daddy long legs". Since this program functions as a "spider", and also makes a living by "harvesting" links from the internet, the name "HarvestMan" looked very apt.
Species: HarvestMan Genus: (Internet) Spiders