Robots_txt takes the URL of a page and retrieves the robots.txt file of the same site. The robots.txt is parsed and the rules defined in it are looked up, in order to determine if crawling a page is allowed.
Robots_txt also stores the time when a page is crawled to check whether next time another page of the same site is being crawled it is honoring the intended crawl delay and request rate limits.
PHP 5.0 or higher
More popular Web Search
- Google Custom Search 1.0 (4 years, 10 months 4 days ago)
- Yahoo Boss Search 1.0.1 (4 years, 5 months 25 days ago)
- Search Keywords 1.0 (8 years, 10 months 4 days ago)
- Spider Class 1.0 (6 years, 7 months ago)
- Script - Run your own Search engine on your server in 5 minutes 2.03 (8 years, 6 months 21 days ago)