Limits | crawler | Spatie

 SPATIE

  Crawler
==========

spatie.be/open-source

  [Docs](https://spatie.be/docs)  [Crawler](https://spatie.be/docs/crawler/v9)  Configuring-the-crawler  Limits

 Version   v9

 Other versions for crawler [v9](https://spatie.be/docs/crawler/v9)

  Limits
- [ Introduction ](https://spatie.be/docs/crawler/v9/introduction)
- [ Installation &amp; setup ](https://spatie.be/docs/crawler/v9/installation-setup)
- [ Support us ](https://spatie.be/docs/crawler/v9/support-us)
- [ Questions and issues ](https://spatie.be/docs/crawler/v9/questions-issues)
- [ Changelog ](https://spatie.be/docs/crawler/v9/changelog)
- [ About us ](https://spatie.be/docs/crawler/v9/about-us)

Basic usage
-----------

- [ Your first crawl ](https://spatie.be/docs/crawler/v9/basic-usage/starting-your-first-crawl)
- [ Crawl responses ](https://spatie.be/docs/crawler/v9/basic-usage/handling-crawl-responses)
- [ Using observers ](https://spatie.be/docs/crawler/v9/basic-usage/using-observers)
- [ Collecting URLs ](https://spatie.be/docs/crawler/v9/basic-usage/collecting-urls)
- [ Filtering URLs ](https://spatie.be/docs/crawler/v9/basic-usage/filtering-urls)
- [ Testing ](https://spatie.be/docs/crawler/v9/basic-usage/testing)
- [ Tracking progress ](https://spatie.be/docs/crawler/v9/basic-usage/tracking-progress)

Configuring the crawler
-----------------------

- [ Concurrency &amp; throttling ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/crawl-behavior)
- [ Limits ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/setting-crawl-limits)
- [ Extracting resources ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/extracting-resources)
- [ Configuring requests ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/configuring-requests)
- [ Response filtering ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/handling-responses)
- [ Respecting robots.txt ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/respecting-robots-txt)

Advanced usage
--------------

- [ JavaScript rendering ](https://spatie.be/docs/crawler/v9/advanced-usage/rendering-javascript)
- [ Custom link extraction ](https://spatie.be/docs/crawler/v9/advanced-usage/extracting-custom-links)
- [ Custom request handlers ](https://spatie.be/docs/crawler/v9/advanced-usage/custom-request-handlers)
- [ Crawling across requests ](https://spatie.be/docs/crawler/v9/advanced-usage/crawling-across-requests)
- [ Custom crawl queue ](https://spatie.be/docs/crawler/v9/advanced-usage/using-a-custom-crawl-queue)
- [ Graceful shutdown ](https://spatie.be/docs/crawler/v9/advanced-usage/graceful-shutdown)

 Limits
======

###  On this page

1. [ Crawl depth ](#content-crawl-depth)
2. [ Crawl and time limits ](#content-crawl-and-time-limits)
3. [ Using the total crawl limit ](#content-using-the-total-crawl-limit)
4. [ Using the current crawl limit ](#content-using-the-current-crawl-limit)
5. [ Using time limits ](#content-using-time-limits)
6. [ Combining limits ](#content-combining-limits)

By default, the crawler continues until it has crawled every page it can find. This behavior might cause issues if you are working in an environment with limitations such as a serverless environment.

Crawl depth
-----------------------------------------------------------------------------------------

You can limit how deep the crawler will go using the `depth` method.

```
use Spatie\Crawler\Crawler;

Crawler::create('https://example.com')
    ->depth(2)
    ->start();
```

A depth of 0 means only the start URL will be crawled. A depth of 1 means the start URL and any pages it links to, and so on.

Crawl and time limits
-----------------------------------------------------------------------------------------------------------------------

The crawl behavior can be controlled with these options:

- `limit()`: the maximum number of URLs to crawl across all executions
- `limitPerExecution()`: how many URLs to process during the current crawl
- `timeLimit()`: the maximum execution time in seconds across all executions
- `timeLimitPerExecution()`: the maximum execution time in seconds for the current crawl

When any of these limits are reached, the crawler stops and returns a `FinishReason` from `start()`. See [tracking progress](/docs/crawler/v9/basic-usage/tracking-progress) for details.

Using the total crawl limit
-----------------------------------------------------------------------------------------------------------------------------------------

The `limit()` method allows you to limit the total number of URLs to crawl, no matter how often you call the crawler.

```
use Spatie\Crawler\Crawler;
use Spatie\Crawler\Enums\FinishReason;

$queue = ;

// Crawls 5 URLs and ends.
$reason = Crawler::create('https://example.com')
    ->crawlQueue($queue)
    ->limit(5)
    ->start();

// $reason will be FinishReason::CrawlLimitReached

// Doesn't crawl further as the total limit is reached.
Crawler::create('https://example.com')
    ->crawlQueue($queue)
    ->limit(5)
    ->start();
```

Using the current crawl limit
-----------------------------------------------------------------------------------------------------------------------------------------------

The `limitPerExecution()` method limits how many URLs will be crawled in a single execution. This is especially useful when [crawling across multiple requests](/docs/crawler/v9/advanced-usage/crawling-across-requests). This code will process 5 pages with each execution, without a total limit of pages to crawl.

```
use Spatie\Crawler\Crawler;

$queue = ;

// Crawls 5 URLs and ends.
Crawler::create('https://example.com')
    ->crawlQueue($queue)
    ->limitPerExecution(5)
    ->start();

// Crawls the next 5 URLs and ends.
Crawler::create('https://example.com')
    ->crawlQueue($queue)
    ->limitPerExecution(5)
    ->start();
```

Using time limits
-----------------------------------------------------------------------------------------------------------

The `timeLimit()` method sets the maximum execution time across all executions. The `timeLimitPerExecution()` method sets the maximum execution time for a single crawl. Both accept a value in seconds.

```
use Spatie\Crawler\Crawler;

// Stop crawling after 60 seconds total
$reason = Crawler::create('https://example.com')
    ->timeLimit(60)
    ->start();

// $reason will be FinishReason::TimeLimitReached if time ran out

// Stop each execution after 30 seconds, but allow resuming
Crawler::create('https://example.com')
    ->crawlQueue($queue)
    ->timeLimitPerExecution(30)
    ->start();
```

Combining limits
--------------------------------------------------------------------------------------------------------

All limits can be combined to control the crawler:

```
use Spatie\Crawler\Crawler;

$queue = ;

// Crawls 5 URLs and ends.
Crawler::create('https://example.com')
    ->crawlQueue($queue)
    ->limit(10)
    ->limitPerExecution(5)
    ->start();

// Crawls the next 5 URLs and ends.
Crawler::create('https://example.com')
    ->crawlQueue($queue)
    ->limit(10)
    ->limitPerExecution(5)
    ->start();

// Doesn't crawl further as the total limit is reached.
Crawler::create('https://example.com')
    ->crawlQueue($queue)
    ->limit(10)
    ->limitPerExecution(5)
    ->start();
```

 A good
match?
-------------

### What we do best

- All things Laravel
- Custom frontend components
- Building APIs
- AI-powered features
- Simplifying things
- Clean solutions
- Integrating services

### Not our cup of tea

- WordPress themes
- Cutting corners
- Free mockups to win a job
- "Just execute the briefing"

 In short: we'd like to be a **substantial part** of your project.

 [ Get in touch via email ](mailto:info@spatie.be?subject=A%20good%20match%21&body=Tell%20us%20as%20much%20as%20you%20can%20about%0A-%20your%20online%20project%0A-%20your%20planning%0A-%20your%20budget%0A-%20%E2%80%A6%0A%0AAnything%20that%20helps%20us%20to%20start%20straightforward%21)