Robots_txt takes the URL of a page and retrieves the robots.txt file of the same site. The robots.txt is parsed and the rules defined in it are looked up, in order to determine if crawling a page is allowed.
Robots_txt also stores the time when a page is crawled to check whether next time another page of the same site is being crawled it is honoring the intended crawl delay and request rate limits.
PHP 5.0 or higher
More popular Web Search
- Google Custom Search 1.0 (5 years, 9 months 20 days ago)
- Yahoo Boss Search 1.0.1 (5 years, 5 months 12 days ago)
- Search Keywords 1.0 (9 years, 9 months 21 days ago)
- Spider Class 1.0 (7 years, 6 months 17 days ago)
- Script - Run your own Search engine on your server in 5 minutes 2.03 (9 years, 6 months 8 days ago)