Preventing content from being indexed | laravel-site-search | Spatie

 SPATIE

  Laravel Site Search
======================

spatie.be/open-source

  [Docs](https://spatie.be/docs)  [Laravel-site-search](https://spatie.be/docs/laravel-site-search/v3)  Basic-usage  Preventing content from being indexed

 Version   v3   v1

 Other versions for crawler [v3](https://spatie.be/docs/laravel-site-search/v3) [v1](https://spatie.be/docs/laravel-site-search/v1)

- [ Introduction ](https://spatie.be/docs/laravel-site-search/v3/introduction)
- [ Support us ](https://spatie.be/docs/laravel-site-search/v3/support-us)
- [ Requirements ](https://spatie.be/docs/laravel-site-search/v3/requirements)
- [ Installation and setup ](https://spatie.be/docs/laravel-site-search/v3/installation-setup)
- [ About us ](https://spatie.be/docs/laravel-site-search/v3/about-us)
- [ Questions and issues ](https://spatie.be/docs/laravel-site-search/v3/questions-issues)
- [ Changelog ](https://spatie.be/docs/laravel-site-search/v3/changelog)

Basic usage
-----------

- [ High level overview ](https://spatie.be/docs/laravel-site-search/v3/basic-usage/high-level-overview)
- [ Indexing your first site ](https://spatie.be/docs/laravel-site-search/v3/basic-usage/indexing-your-first-site)
- [ Retrieving results ](https://spatie.be/docs/laravel-site-search/v3/basic-usage/retrieving-results)
- [ Preventing content from being indexed ](https://spatie.be/docs/laravel-site-search/v3/basic-usage/preventing-content-from-being-indexed)
- [ Using a search profile ](https://spatie.be/docs/laravel-site-search/v3/basic-usage/using-a-search-profile)
- [ Listing indexes ](https://spatie.be/docs/laravel-site-search/v3/basic-usage/listing-indexes)
- [ Troubleshooting ](https://spatie.be/docs/laravel-site-search/v3/basic-usage/troubleshooting)

Advanced usage
--------------

- [ Creating multiple search indexes ](https://spatie.be/docs/laravel-site-search/v3/advanced-usage/creating-multiple-search-indexes)
- [ Using a custom indexer ](https://spatie.be/docs/laravel-site-search/v3/advanced-usage/using-a-custom-indexer)
- [ Indexing extra properties ](https://spatie.be/docs/laravel-site-search/v3/advanced-usage/indexing-extra-properties)
- [ Available events ](https://spatie.be/docs/laravel-site-search/v3/advanced-usage/available-events)
- [ Using the database driver ](https://spatie.be/docs/laravel-site-search/v3/advanced-usage/using-the-database-driver)
- [ Using the Meilisearch driver ](https://spatie.be/docs/laravel-site-search/v3/advanced-usage/using-the-meilisearch-driver)
- [ Building a Filament integration ](https://spatie.be/docs/laravel-site-search/v3/advanced-usage/filament-integration)

 Preventing content from being indexed
=====================================

###  On this page

1. [ Using CSS selectors ](#content-using-css-selectors)
2. [ Using the config file ](#content-using-the-config-file)
3. [ Using headers ](#content-using-headers)
4. [ Query string handling ](#content-query-string-handling)
5. [ Using a search profile ](#content-using-a-search-profile)

Your site probably displays a lot of information that should not be indexed, such as your menu structure or footer. Or maybe entire pages do not need to be indexed.

Using CSS selectors
-----------------------------------------------------------------------------------------------------------------

In the `ignore_content_by_css_selector` key of the `site-search` config file, you can specify CSS selectors of elements that should not be indexed. By default, the content of a `nav` element will not be put in the index (but all URLs inside it will still be crawled).

Additionally, any elements with a `data-no-index` will not get indexed. In the following example, the sentence "This will not be indexed", will not be indexed.

```

        This is my page

            This is the content of the nav. It should not be indexed.

        This is the H1

        This is the content

            This will not be indexed.

```

Using the config file
-----------------------------------------------------------------------------------------------------------------------

In the `ignore_content_on_urls` key of the `site-search` config file, you may specify URLs whose content should not be indexed. All links on these URLs will still be followed and crawled.

Using headers
-----------------------------------------------------------------------------------------------

If your site's response contains a header whose name is in the `do_not_index_content_headers` of the `site-search` config file, then that page will not be indexed.

Query string handling
-----------------------------------------------------------------------------------------------------------------------

By default, query strings are stripped from URLs during indexing. This means that `example.com/post` and `example.com/post?utm_source=newsletter` will be treated as the same page, preventing duplicate entries in your search index.

This normalization happens both when deciding whether to index a URL and when storing the URL in the index. URLs with query strings will still be crawled (links are still followed), but the indexed URL will always be the clean version without any query parameters.

If you need to preserve query strings for certain URLs, you can create a custom search profile that overrides the `normalizeUrl` method.

Using a search profile
--------------------------------------------------------------------------------------------------------------------------

A search profile is a class that determines what pages get crawled and what content gets indexed. Learn more about search profiles [here](/docs/laravel-site-search/v1/basic-usage/using-a-search-profile).
