Tracking progress | crawler | Spatie

 SPATIE

  Crawler
==========

spatie.be/open-source

  [Docs](https://spatie.be/docs)  [Crawler](https://spatie.be/docs/crawler/v9)  Basic-usage  Tracking progress

 Version   v9

 Other versions for crawler [v9](https://spatie.be/docs/crawler/v9)

- [ Introduction ](https://spatie.be/docs/crawler/v9/introduction)
- [ Installation &amp; setup ](https://spatie.be/docs/crawler/v9/installation-setup)
- [ Support us ](https://spatie.be/docs/crawler/v9/support-us)
- [ Questions and issues ](https://spatie.be/docs/crawler/v9/questions-issues)
- [ Changelog ](https://spatie.be/docs/crawler/v9/changelog)
- [ About us ](https://spatie.be/docs/crawler/v9/about-us)

Basic usage
-----------

- [ Your first crawl ](https://spatie.be/docs/crawler/v9/basic-usage/starting-your-first-crawl)
- [ Crawl responses ](https://spatie.be/docs/crawler/v9/basic-usage/handling-crawl-responses)
- [ Using observers ](https://spatie.be/docs/crawler/v9/basic-usage/using-observers)
- [ Collecting URLs ](https://spatie.be/docs/crawler/v9/basic-usage/collecting-urls)
- [ Filtering URLs ](https://spatie.be/docs/crawler/v9/basic-usage/filtering-urls)
- [ Testing ](https://spatie.be/docs/crawler/v9/basic-usage/testing)
- [ Tracking progress ](https://spatie.be/docs/crawler/v9/basic-usage/tracking-progress)

Configuring the crawler
-----------------------

- [ Concurrency &amp; throttling ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/crawl-behavior)
- [ Limits ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/setting-crawl-limits)
- [ Extracting resources ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/extracting-resources)
- [ Configuring requests ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/configuring-requests)
- [ Response filtering ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/handling-responses)
- [ Respecting robots.txt ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/respecting-robots-txt)

Advanced usage
--------------

- [ JavaScript rendering ](https://spatie.be/docs/crawler/v9/advanced-usage/rendering-javascript)
- [ Custom link extraction ](https://spatie.be/docs/crawler/v9/advanced-usage/extracting-custom-links)
- [ Custom request handlers ](https://spatie.be/docs/crawler/v9/advanced-usage/custom-request-handlers)
- [ Crawling across requests ](https://spatie.be/docs/crawler/v9/advanced-usage/crawling-across-requests)
- [ Custom crawl queue ](https://spatie.be/docs/crawler/v9/advanced-usage/using-a-custom-crawl-queue)
- [ Graceful shutdown ](https://spatie.be/docs/crawler/v9/advanced-usage/graceful-shutdown)

 Tracking progress
=================

###  On this page

1. [ CrawlProgress ](#content-crawlprogress)
2. [ FinishReason ](#content-finishreason)
3. [ Using progress in observers ](#content-using-progress-in-observers)

The crawler provides real-time progress tracking through the `CrawlProgress` object and reports why a crawl stopped through the `FinishReason` enum.

CrawlProgress
-----------------------------------------------------------------------------------------------

Every `onCrawled`, `onFailed`, and `onFinished` callback receives a `CrawlProgress` object with the following properties:

```
use Spatie\Crawler\CrawlProgress;

// Available on every CrawlProgress instance:
$progress->urlsCrawled;   // int (number of URLs successfully crawled)
$progress->urlsFailed;    // int (number of URLs that failed)
$progress->urlsProcessed; // int (urlsCrawled + urlsFailed)
$progress->urlsFound;     // int (total URLs added to the queue)
$progress->urlsPending;   // int (URLs not yet processed)
```

Here's an example that logs progress during a crawl:

```
use Spatie\Crawler\Crawler;
use Spatie\Crawler\CrawlProgress;
use Spatie\Crawler\CrawlResponse;

Crawler::create('https://example.com')
    ->onCrawled(function (string $url, CrawlResponse $response, CrawlProgress $progress) {
        echo "[{$progress->urlsProcessed}/{$progress->urlsFound}] {$url}\n";
    })
    ->start();
```

FinishReason
--------------------------------------------------------------------------------------------

The `start()` method returns a `FinishReason` enum that tells you why the crawl stopped:

```
use Spatie\Crawler\Crawler;
use Spatie\Crawler\Enums\FinishReason;

$reason = Crawler::create('https://example.com')
    ->limit(100)
    ->start();

match ($reason) {
    FinishReason::Completed => 'All URLs have been crawled',
    FinishReason::CrawlLimitReached => 'Stopped because the crawl limit was reached',
    FinishReason::TimeLimitReached => 'Stopped because the time limit was reached',
    FinishReason::Interrupted => 'Stopped by a signal (SIGINT/SIGTERM)',
};
```

The `onFinished` callback also receives the `FinishReason`:

```
use Spatie\Crawler\CrawlProgress;
use Spatie\Crawler\Enums\FinishReason;

Crawler::create('https://example.com')
    ->limit(100)
    ->onFinished(function (FinishReason $reason, CrawlProgress $progress) {
        echo "Crawl finished: {$reason->value}\n";
        echo "Crawled {$progress->urlsCrawled} URLs, {$progress->urlsFailed} failed\n";
    })
    ->start();
```

Using progress in observers
-----------------------------------------------------------------------------------------------------------------------------------------

Observer classes receive `CrawlProgress` and `FinishReason` through their method signatures:

```
use Spatie\Crawler\CrawlObservers\CrawlObserver;
use Spatie\Crawler\CrawlProgress;
use Spatie\Crawler\CrawlResponse;
use Spatie\Crawler\Enums\FinishReason;

class ProgressLogger extends CrawlObserver
{
    public function crawled(
        string $url,
        CrawlResponse $response,
        CrawlProgress $progress,
    ): void {
        echo "[{$progress->urlsProcessed}/{$progress->urlsFound}] {$url}\n";
    }

    public function finishedCrawling(FinishReason $reason, CrawlProgress $progress): void
    {
        echo "Done ({$reason->value}): {$progress->urlsCrawled} crawled, {$progress->urlsFailed} failed\n";
    }
}
```
