When a URL is successfully crawled, your callback or observer receives a CrawlResponse object. This provides a friendlier API than the raw PSR-7 response:
use Spatie\Crawler\CrawlResponse;
$response->status();
$response->body();
$response->header('Name');
$response->headers();
$response->dom();
$response->isSuccessful();
$response->isRedirect();
$response->foundOnUrl();
$response->linkText();
$response->depth();
$response->resourceType();
$response->transferStats();
$response->redirectHistory();
$response->wasRedirected();
If you need access to the underlying PSR-7 response:
$response->toPsrResponse();
##Transfer statistics
Each response includes a Spatie\Crawler\TransferStatistics object with typed accessors for timing data and other transfer details:
use Spatie\Crawler\Crawler;
use Spatie\Crawler\CrawlResponse;
Crawler::create('https://example.com')
->onCrawled(function (string $url, CrawlResponse $response) {
$stats = $response->transferStats();
$stats->transferTimeInMs();
$stats->effectiveUri();
})
->start();
The transferStats() method returns null for faked responses.
All timing methods return values in milliseconds. They return null when the stat is unavailable (for example, tlsHandshakeTimeInMs() will be null for plain HTTP requests).
$stats = $response->transferStats();
$stats->transferTimeInMs();
$stats->connectionTimeInMs();
$stats->dnsLookupTimeInMs();
$stats->tlsHandshakeTimeInMs();
$stats->timeToFirstByteInMs();
$stats->redirectTimeInMs();
$stats->effectiveUri();
$stats->primaryIp();
$stats->downloadSpeedInBytesPerSecond();
$stats->requestSizeInBytes();
##Redirect history
When the crawler follows redirects (which is the default), you can inspect the redirect chain for any response:
use Spatie\Crawler\Crawler;
use Spatie\Crawler\CrawlResponse;
Crawler::create('https://example.com')
->onCrawled(function (string $url, CrawlResponse $response) {
if ($response->wasRedirected()) {
echo "{$url} redirected through: " . implode(' → ', $response->redirectHistory()) . "\n";
}
})
->start();
The redirectHistory() method returns an array of URLs that were visited before reaching the final URL. The wasRedirected() method is a convenience that returns true when the redirect history is not empty.
The crawler follows up to 5 redirects per request by default (Guzzle's built-in limit), which protects against infinite redirect loops. To change this limit, pass a custom allow_redirects option:
use GuzzleHttp\RequestOptions;
Crawler::create('https://example.com', [
RequestOptions::ALLOW_REDIRECTS => [
'max' => 10,
'track_redirects' => true,
],
])->start();
Keep track_redirects set to true if you want redirectHistory() and wasRedirected() to work. To disable following redirects entirely, set allow_redirects to false.
##Using the DOM crawler
The dom() method returns a Symfony DomCrawler instance, which makes it easy to extract structured data from pages:
use Spatie\Crawler\Crawler;
use Spatie\Crawler\CrawlResponse;
Crawler::create('https://example.com')
->onCrawled(function (string $url, CrawlResponse $response) {
$title = $response->dom()->filter('title')->text('');
$h1 = $response->dom()->filter('h1')->text('');
echo "{$url}: {$title}\n";
})
->start();