ethical-web-crawler

🕷️ Ethical Web Crawler v2.0

A GUI-based ethical web crawler that respects robots.txt rules and offers customizable crawling.

Screenshot
Example interface

Rules Compliance: Automatic robots.txt checking
Flexible Configuration:
- Adjustable request delay
- Customizable crawling depth
- Maximum sites limit
Smart Filtering:
- Domain and keyword blocking
- Automatic content type detection (HTML, JSON, PDF, etc.)
Intuitive Interface:
- Real-time results visualization
- Progress bar and statistics
Multiple Export Formats: JSON, CSV, TXT or URL copy
Optimizations:
- Asynchronous request handling
- Bandwidth throttling
- robots.txt caching

Installation:

git clone https://github.com/Clementabcd/ethical-web-crawler.git
cd ethical-web-crawler
pip install -r requirements.txt

Context Menu: Right-click on results to:
- Open in browser
- Copy URL
- Remove entry
Statistics: Crawled data visualization
Memory Management: Result limit controls

This crawler is designed to:

MIT License - see LICENSE

Built with Python and ❤️ - Contribute by opening issues or pull requests!

This site is open source. Improve this page.