Robots_txt takes the URL of a page and retrieves the robots.txt file of the same site. The robots.txt is parsed and the rules defined in it are looked up, in order to determine if crawling a page is allowed.
Robots_txt also stores the time when a page is crawled to check whether next time another page of the same site is being crawled it is honoring the intended crawl delay and request rate limits.
PHP 5.0 or higher
More popular Web Search
- Google Custom Search 1.0 (4 years, 6 months ago)
- Yahoo Boss Search 1.0.1 (4 years, 1 month 21 days ago)
- Search Keywords 1.0 (8 years, 6 months ago)
- Spider Class 1.0 (6 years, 2 months 26 days ago)
- Script - Run your own Search engine on your server in 5 minutes 2.03 (8 years, 2 months 17 days ago)