The files robots.txt, located on the base of the websites, allow the webmaster to defined the allowed/denied path to the crawlers of search engines.
This class allow to read a robots.txt file and allow or deny a provided URL as the search engine can do.
Namespace Domframework
/** This class analyze the provided robots.txt file content and allow to get the configured data for DomSearch. It allow to examine an URL against the robots.txt file and return if the URL is allowed to be used or not The definition of the format of robots.txt file is available here : https://www.rfc-editor.org/rfc/rfc9309.txt http://www.robotstxt.org/norobots-rfc.txt https://en.wikipedia.org/wiki/Robots_exclusion_standard
No property available
/** Return true if the provided URL can be used against the robots.txt definition or FALSE if it is not the case@param string $urlThe URL to check@returnboolean The result of the test
/** Get the robots.txt file content and do the analyze@param string $contentThe robots.txt file content to analyze@param string $crawlerNameThe crawler name to use in analyze@return$this
/**
Return the allowed urls
@return array $allow The array of allow rules
/**
Return the crawldelay
@return integer $crawldelay The crawlDelay defined in robots.txt
/**
Return the disallowed urls
@return array $disallow The array of disallow rules
/**
Return the lines where an error occured
The key of the array is the line number with the default
@return array The errors
/**
Return the host
@return string $host The Host string defined in robots.txt
/**
Return the matchRule
@return string $matchRule The matchRule matching the URLAllow test
/**
Return the sitemaps url
@return array $sitemap The array of sitemaps URL