Robots_txt takes the URL of a page and retrieves the robots.txt file of the same site. The robots.txt is parsed and the rules defined in it are looked up, in order to determine if crawling a page is allowed.
Robots_txt also stores the time when a page is crawled to check whether next time another page of the same site is being crawled it is honoring the intended crawl delay and request rate limits.
PHP 5.0 or higher
More popular Web Search
- Google Custom Search 1.0 (4 years, 7 months 13 days ago)
- Yahoo Boss Search 1.0.1 (4 years, 3 months 5 days ago)
- Search Keywords 1.0 (8 years, 7 months 14 days ago)
- Spider Class 1.0 (6 years, 4 months 10 days ago)
- Script - Run your own Search engine on your server in 5 minutes 2.03 (8 years, 4 months 1 day ago)