PHP Developer's Network : Network Member Forums

Class: Robots_txt



  Search   All class groups   Latest entries   Top 10 charts   Newsletter   Blog   Forums   Help FAQ  
  Login   Register  
Recommend this page to a friend! Trackback URL: http://www.phpclasses.org/trackback/browse/package/4292.html
      Classes of Andy Pieters  > 
Robots_txt
 >  Download  >  Support forum Support forum  >  RSS 1.0 feed RSS 2.0 feed Latest changes  >  Stumble It! Stumble It!  >  Bookmark in del.icio.us Bookmark in del.icio.us  
  Supplied by   Group folder image Groups   Detailed description  
  Rate classes User ratings   Applications   Files Files  
  • Supplied by:

  • Picture of Andy Pieters
    Name: Andy Pieters <e-mail contact>
    Published packages: 1
    Country: United Kingdom United Kingdom - PHP jobs in United Kingdom
    Home page: ???
    Age: 30
    All time rank: 1592
    Week rank: 629

    Browse this author's classes

  • Innovation Award:

  • PHP Programming Innovation award nominee
    January 2008
    Number 8
    robots.txt is a file that sites need to have in their domain Web root to tell search engine crawlers and Web robots in general which pages should not be crawled.

    This class can parse a robots.txt file of a domain to determine whether a given page should be crawled or not.

    It is useful to implement a friendly crawler which respects the wishes of site owners that do not want to have certain pages crawled by Web robot programs.

    Manuel Lemos
  • Groups:

  • Group folder image
    Classes using PHP 5 specific features
    View top rated classes
    Group folder image
    Search engines, crawling and indexing
    View top rated classes
  • Detailed description:

  • This class can be used to check whether a page may be crawled by looking at the robots.txt file of its site.

    It takes the URL of a page and retrieves the robots.txt file of the same site.

    The class parses the robots.txt file and looks up for the rules defined in that file to see if the site allows crawling the intended page.

    The class also stores the time when a page is crawled to check whether next time another page of the same site is being crawled it is honoring the intended crawl delay and request rate limits.
  • User ratings:

  • There are not enough user ratings to display for this class.
  • Applications that use this class:

  • No application links were specified for this class.
    Add link image If you know an application of this package, send a message to the author to add a link here.
  • Files:

  • File Role Description
    Plain text file Robots.txt.class.php Class
    Core file
    Accessible without login Plain text file README.txt Doc.
    Usage Examples
    Download all files: robots_txt.tar.gz robots_txt.zip
    NOTICE: if you are using a download manager program like 'GetRight', please Login before trying to download this archive.

 
  Advertise on this site   Site map   Statistics   Site tips   Privacy policy   Contact  

For more information send a message to :
info at phpclasses dot org.
Copyright (c) Icontem 1999-2008 PHP Classes - PHP Class Scripts
  PHP Book Reviews - Reviews of books and other products