Using observers | crawler | Spatie

 SPATIE

  Crawler
==========

spatie.be/open-source

  [Docs](https://spatie.be/docs)  [Crawler](https://spatie.be/docs/crawler/v9)  Basic-usage  Using observers

 Version   v9

 Other versions for crawler [v9](https://spatie.be/docs/crawler/v9)

- [ Introduction ](https://spatie.be/docs/crawler/v9/introduction)
- [ Installation &amp; setup ](https://spatie.be/docs/crawler/v9/installation-setup)
- [ Support us ](https://spatie.be/docs/crawler/v9/support-us)
- [ Questions and issues ](https://spatie.be/docs/crawler/v9/questions-issues)
- [ Changelog ](https://spatie.be/docs/crawler/v9/changelog)
- [ About us ](https://spatie.be/docs/crawler/v9/about-us)

Basic usage
-----------

- [ Your first crawl ](https://spatie.be/docs/crawler/v9/basic-usage/starting-your-first-crawl)
- [ Crawl responses ](https://spatie.be/docs/crawler/v9/basic-usage/handling-crawl-responses)
- [ Using observers ](https://spatie.be/docs/crawler/v9/basic-usage/using-observers)
- [ Collecting URLs ](https://spatie.be/docs/crawler/v9/basic-usage/collecting-urls)
- [ Filtering URLs ](https://spatie.be/docs/crawler/v9/basic-usage/filtering-urls)
- [ Testing ](https://spatie.be/docs/crawler/v9/basic-usage/testing)
- [ Tracking progress ](https://spatie.be/docs/crawler/v9/basic-usage/tracking-progress)

Configuring the crawler
-----------------------

- [ Concurrency &amp; throttling ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/crawl-behavior)
- [ Limits ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/setting-crawl-limits)
- [ Extracting resources ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/extracting-resources)
- [ Configuring requests ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/configuring-requests)
- [ Response filtering ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/handling-responses)
- [ Respecting robots.txt ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/respecting-robots-txt)

Advanced usage
--------------

- [ JavaScript rendering ](https://spatie.be/docs/crawler/v9/advanced-usage/rendering-javascript)
- [ Custom link extraction ](https://spatie.be/docs/crawler/v9/advanced-usage/extracting-custom-links)
- [ Custom request handlers ](https://spatie.be/docs/crawler/v9/advanced-usage/custom-request-handlers)
- [ Crawling across requests ](https://spatie.be/docs/crawler/v9/advanced-usage/crawling-across-requests)
- [ Custom crawl queue ](https://spatie.be/docs/crawler/v9/advanced-usage/using-a-custom-crawl-queue)
- [ Graceful shutdown ](https://spatie.be/docs/crawler/v9/advanced-usage/graceful-shutdown)

 Using observers
===============

###  On this page

1. [ Using multiple observers ](#content-using-multiple-observers)

For more structured crawl handling, you can create observer classes instead of using closures. An observer must extend `Spatie\Crawler\CrawlObservers\CrawlObserver`:

```
namespace App;

use GuzzleHttp\Exception\RequestException;
use Spatie\Crawler\CrawlObservers\CrawlObserver;
use Spatie\Crawler\CrawlProgress;
use Spatie\Crawler\CrawlResponse;
use Spatie\Crawler\Enums\FinishReason;
use Spatie\Crawler\Enums\ResourceType;
use Spatie\Crawler\TransferStatistics;

class MyCrawlObserver extends CrawlObserver
{
    public function willCrawl(string $url, ?string $linkText, ?ResourceType $resourceType = null): void
    {
        // called before a URL is crawled
    }

    public function crawled(
        string $url,
        CrawlResponse $response,
        CrawlProgress $progress,
    ): void {
        // called when a URL has been successfully crawled
    }

    public function crawlFailed(
        string $url,
        RequestException $requestException,
        CrawlProgress $progress,
        ?string $foundOnUrl = null,
        ?string $linkText = null,
        ?ResourceType $resourceType = null,
        ?TransferStatistics $transferStats = null,
    ): void {
        // called when a URL could not be crawled
    }

    public function finishedCrawling(FinishReason $reason, CrawlProgress $progress): void
    {
        // called when the entire crawl is complete
    }
}
```

Pass the observer to the crawler:

```
Crawler::create('https://example.com')
    ->addObserver(new MyCrawlObserver())
    ->start();
```

The `crawled()` method receives a `CrawlProgress` object with live crawl statistics and a `CrawlResponse` that provides access to `foundOnUrl()`, `linkText()`, `resourceType()`, and other response data. See [tracking progress](/docs/crawler/v9/basic-usage/tracking-progress) for more on `CrawlProgress`.

The `crawlFailed()` method also includes `$foundOnUrl`, `$linkText`, and `$resourceType` parameters since there is no `CrawlResponse` available for failed requests. The `$transferStats` parameter provides transfer timing data (connection time, total transfer time, etc.) for the failed request. This is useful for detecting timeouts: if `$transferStats->transferTimeInMs()` exceeds your threshold, the request likely timed out rather than returning an error.

Using multiple observers
--------------------------------------------------------------------------------------------------------------------------------

You can add multiple observers. They will all be notified of every crawl event. The `addObserver()` method accepts multiple observers at once:

```
Crawler::create('https://example.com')
    ->addObserver(new LoggingObserver(), new MetricsObserver())
    ->start();
```

You can also chain separate `addObserver()` calls:

```
Crawler::create('https://example.com')
    ->addObserver(new LoggingObserver())
    ->addObserver(new MetricsObserver())
    ->start();
```

You can also combine observers with closure callbacks. Both will be called:

```
Crawler::create('https://example.com')
    ->addObserver(new MyObserver())
    ->onCrawled(function (string $url, CrawlResponse $response) {
        // this will also be called alongside the observer
    })
    ->start();
```
