The simplest way to start crawling is to use closure callbacks:
use Spatie\Crawler\Crawler;
use Spatie\Crawler\CrawlResponse;
Crawler::create('https://example.com')
->onCrawled(function (string $url, CrawlResponse $response) {
echo "{$url}: {$response->status()}\n";
})
->start();
The following callbacks are available:
use GuzzleHttp\Exception\RequestException;
use Spatie\Crawler\CrawlProgress;
use Spatie\Crawler\Enums\FinishReason;
use Spatie\Crawler\Enums\ResourceType;
Crawler::create('https://example.com')
->onWillCrawl(function (string $url, ?string $linkText, ?ResourceType $resourceType) {
})
->onCrawled(function (string $url, CrawlResponse $response, CrawlProgress $progress) {
})
->onFailed(function (string $url, RequestException $e, CrawlProgress $progress, ?string $foundOnUrl, ?string $linkText, ?ResourceType $resourceType) {
})
->onFinished(function (FinishReason $reason, CrawlProgress $progress) {
})
->start();
Each callback (except onWillCrawl) receives a CrawlProgress object with live crawl statistics. See tracking progress for details.
The onFinished callback also receives a FinishReason enum that tells you why the crawl stopped.
You can register multiple callbacks of the same type. They will all be called in the order they were added.
use Spatie\Crawler\Crawler;
use Spatie\Crawler\CrawlResponse;
Crawler::create('https://example.com')
->onCrawled(function (string $url, CrawlResponse $response) {
})
->onCrawled(function (string $url, CrawlResponse $response) {
})
->start();