Concurrency &amp; throttling | crawler | Spatie

 SPATIE

  Crawler
==========

spatie.be/open-source

  [Docs](https://spatie.be/docs)  [Crawler](https://spatie.be/docs/crawler/v9)  Configuring-the-crawler  Concurrency &amp; throttling

 Version   v9

 Other versions for crawler [v9](https://spatie.be/docs/crawler/v9)

- [ Introduction ](https://spatie.be/docs/crawler/v9/introduction)
- [ Installation &amp; setup ](https://spatie.be/docs/crawler/v9/installation-setup)
- [ Support us ](https://spatie.be/docs/crawler/v9/support-us)
- [ Questions and issues ](https://spatie.be/docs/crawler/v9/questions-issues)
- [ Changelog ](https://spatie.be/docs/crawler/v9/changelog)
- [ About us ](https://spatie.be/docs/crawler/v9/about-us)

Basic usage
-----------

- [ Your first crawl ](https://spatie.be/docs/crawler/v9/basic-usage/starting-your-first-crawl)
- [ Crawl responses ](https://spatie.be/docs/crawler/v9/basic-usage/handling-crawl-responses)
- [ Using observers ](https://spatie.be/docs/crawler/v9/basic-usage/using-observers)
- [ Collecting URLs ](https://spatie.be/docs/crawler/v9/basic-usage/collecting-urls)
- [ Filtering URLs ](https://spatie.be/docs/crawler/v9/basic-usage/filtering-urls)
- [ Testing ](https://spatie.be/docs/crawler/v9/basic-usage/testing)
- [ Tracking progress ](https://spatie.be/docs/crawler/v9/basic-usage/tracking-progress)

Configuring the crawler
-----------------------

- [ Concurrency &amp; throttling ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/crawl-behavior)
- [ Limits ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/setting-crawl-limits)
- [ Extracting resources ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/extracting-resources)
- [ Configuring requests ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/configuring-requests)
- [ Response filtering ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/handling-responses)
- [ Respecting robots.txt ](https://spatie.be/docs/crawler/v9/configuring-the-crawler/respecting-robots-txt)

Advanced usage
--------------

- [ JavaScript rendering ](https://spatie.be/docs/crawler/v9/advanced-usage/rendering-javascript)
- [ Custom link extraction ](https://spatie.be/docs/crawler/v9/advanced-usage/extracting-custom-links)
- [ Custom request handlers ](https://spatie.be/docs/crawler/v9/advanced-usage/custom-request-handlers)
- [ Crawling across requests ](https://spatie.be/docs/crawler/v9/advanced-usage/crawling-across-requests)
- [ Custom crawl queue ](https://spatie.be/docs/crawler/v9/advanced-usage/using-a-custom-crawl-queue)
- [ Graceful shutdown ](https://spatie.be/docs/crawler/v9/advanced-usage/graceful-shutdown)

 Concurrency &amp; throttling
============================

###  On this page

1. [ Concurrency ](#content-concurrency)
2. [ Request delay ](#content-request-delay)
3. [ Throttling ](#content-throttling)
4. [ Default scheme ](#content-default-scheme)

Concurrency
-----------------------------------------------------------------------------------------

To improve the speed of the crawl, the package concurrently crawls 10 URLs by default. You can change this number using the `concurrency` method.

```
use Spatie\Crawler\Crawler;

Crawler::create('https://example.com')
    ->concurrency(1) // crawl URLs one by one
    ->start();
```

Request delay
-----------------------------------------------------------------------------------------------

By default, there is no delay between requests. In some cases you might get rate limited when crawling too aggressively. You can add a pause between every request using the `delay` method. The value is expressed in milliseconds.

```
use Spatie\Crawler\Crawler;

Crawler::create('https://example.com')
    ->delay(150) // wait 150ms after every page
    ->start();
```

Throttling
--------------------------------------------------------------------------------------

For more control over request pacing, you can use a throttle. A throttle is a class that implements `Spatie\Crawler\Throttlers\Throttle`. When a throttle is set, it takes precedence over the `delay` method.

### Fixed delay

The `FixedDelayThrottle` works like `delay()`, but as a class you can pass around and configure independently.

```
use Spatie\Crawler\Crawler;
use Spatie\Crawler\Throttlers\FixedDelayThrottle;

Crawler::create('https://example.com')
    ->throttle(new FixedDelayThrottle(delayMs: 150))
    ->start();
```

### Adaptive throttle

The `AdaptiveThrottle` adjusts the delay based on how fast the server responds. When the server is slow, the crawler backs off. When it speeds up, the delay decreases. You can configure minimum and maximum bounds.

```
use Spatie\Crawler\Crawler;
use Spatie\Crawler\Throttlers\AdaptiveThrottle;

Crawler::create('https://example.com')
    ->throttle(new AdaptiveThrottle(
        minDelayMs: 50,
        maxDelayMs: 5000,
    ))
    ->start();
```

The delay is calculated as an exponential moving average: `(currentDelay + latency) / 2`, clamped to the configured bounds.

### Custom throttle

You can create your own throttle by implementing the `Throttle` interface:

```
use Spatie\Crawler\Throttlers\Throttle;

class MyThrottle implements Throttle
{
    public function sleep(): void
    {
        // Called after each response. Pause here.
    }

    public function recordResponseTime(float $seconds): void
    {
        // Called with the transfer time of each response.
    }
}
```

Default scheme
--------------------------------------------------------------------------------------------------

By default, URLs without a scheme are prefixed with `https`. You can change this using the `defaultScheme` method.

```
use Spatie\Crawler\Crawler;

Crawler::create('example.com')
    ->defaultScheme('http')
    ->start();
```
