This package provides all of the components to crawl a website and build and write sitemaps file.
Example of console application using the library: dmoraschi/sitemap-app
Run the following command and provide the latest stable version (e.g v1.0.0):
composer require dmoraschi/sitemap-common
or add the following to your composer.json
file :
"dmoraschi/sitemap-common": "1.0.*"
Basic usage
$generator = new SiteMapGenerator(
new FileWriter($outputFileName),
new XmlTemplate()
);
Add a URL:
$generator->addUrl($url, $frequency, $priority);
Add a single SiteMapUrl
object or array:
$siteMapUrl = new SiteMapUrl(
new Url($url), $frequency, $priority
);
$generator->addSiteMapUrl($siteMapUrl);
$generator->addSiteMapUrls([
$siteMapUrl, $siteMapUrl2
]);
Set the URLs of the sitemap via SiteMapUrlCollection
:
$siteMapUrl = new SiteMapUrl(
new Url($url), $frequency, $priority
);
$collection = new SiteMapUrlCollection([
$siteMapUrl, $siteMapUrl2
]);
$generator->setCollection($collection);
Generate the sitemap:
$generator->execute();
Basic usage
$crawler = new Crawler(
new Url($baseUrl),
new RegexBasedLinkParser(),
new HttpClient()
);
You can tell the Crawler
not to visit certain url's by adding policies. Below the default policies provided by the library:
$crawler->setPolicies([
'host' => new SameHostPolicy($baseUrl),
'url' => new UniqueUrlPolicy(),
'ext' => new ValidExtensionPolicy(),
]);
// or
$crawler->setPolicy('host', new SameHostPolicy($baseUrl));
SameHostPolicy
, UniqueUrlPolicy
, ValidExtensionPolicy
are provided with the library, you can define your own policies by implementing the interface Policy
.
Calling the function crawl
the object will start from the base url in the contructor and crawl all the web pages with the specified depth passed as a argument.
The function will return with the array of all unique visited Url
's:
$urls = $crawler->crawl($deep);
You can also instruct the Crawler
to collect custom data while visiting the web pages by adding Collector
's to the main object:
$crawler->setCollectors([
'images' => new ImageCollector()
]);
// or
$crawler->setCollector('images', new ImageCollector());
And then retrive the collected data:
$crawler->crawl($deep);
$imageCollector = $crawler->getCollector('images');
$data = $imageCollector->getCollectedData();
ImageCollector
is provided by the library, you can define your own collector by implementing the interface Collector
.