A search profile determines which pages get crawled and what content gets indexed. In the site-search
config file, you'll in the default_profile
key that the Spatie\SiteSearch\Profiles\DefaultSearchProfile::class
is being use by default.
This default profile will instruct the indexing process:
- to crawl each page of your site
- to only index any page that had
200
as the status code of its response
- to not index a page if the response had a header
site-search-do-not-index
By default, the crawling process will respect the robots.txt
of your site.
A search profile is also responsible for determining which indexer will be used for a certain page. An indexer is responsible for determining the title, content, description, ... of a page. By default, Spatie\SiteSearch\Indexers\DefaultIndexer
will get used. To know more about indexers and how to customize them, head over to the section on indexers.
##Creating your own search profile
If you want to customize the crawling and indexing behavior, you could opt to extend Spatie\SiteSearch\Profiles\DefaultSearchProfile
or create your own class that implements the Spatie\SiteSearch\Profiles\SearchProfile
interface. This is how that interface looks like.
namespace Spatie\SiteSearch\Profiles;
use Psr\Http\Message\ResponseInterface;
use Psr\Http\Message\UriInterface;
use Spatie\SiteSearch\Indexers\Indexer;
interface SearchProfile
{
public function shouldCrawl(UriInterface $url, ResponseInterface $response): bool;
public function shouldIndex(UriInterface $url, ResponseInterface $response): bool;
public function useIndexer(UriInterface $url, ResponseInterface $response): ?Indexer;
public function configureCrawler(Crawler $crawler): void;
}