When a site gets crawled, each of the pages is fed to a search profile. When that search profile determines that page should be indexed, the URL and response for that page is being given to an indexer. The job of the indexer is to extract the title of the page, the h1, description, content, ... that should be put in the site index.
By default, the Spatie\SiteSearch\Indexers\DefaultIndexer is used. This indexer makes the best effort in determining the page title, description, and content of your page.
The implementation of entries() of the DefaultIndexer will extract text content from your page. The database driver consolidates all entries for a URL into a single row, so each page results in one search record.
If the results yielded by DefaultIndexer are not good enough for your content, you can create a custom indexer. An indexer is any class that implements Spatie\SiteSearch\Indexers\Indexer. Here's how that interface looks like.
namespace Spatie\SiteSearch\Indexers;
use Carbon\CarbonInterface;
interface Indexer
{
public function pageTitle(): ?string;
public function h1(): ?string;
public function entries(): array;
public function dateModified(): ?CarbonInterface;
public function description(): ?string;
public function extra(): array;
public function url(): string;
}
In most cases, it's probably the easiest to extend the DefaultIndexer
class YourIndexer extends Spatie\SiteSearch\Indexers\DefaultIndexer
{
}
To use your custom indexer, specify its class name in the default_indexer key of the site-search config file.
Here's an example of a custom indexer used at freek.dev that will remove the suffix of site.
namespace App\Services\Search;
use Spatie\SiteSearch\Indexers\DefaultIndexer;
class Indexer extends DefaultIndexer
{
public function pageTitle(): ?string
{
return str_replace(
" - Freek Van der Herten's blog on PHP, Laravel and JavaScript",
'',
parent::pageTitle()
);
}
}
Here's an example of a custom indexer to strip away the query parameters from the url.
namespace App\Services\Search;
use Spatie\SiteSearch\Indexers\DefaultIndexer;
class Indexer extends DefaultIndexer
{
public function url(): string
{
$parsed = parse_url($this->url);
return ($parsed['scheme'] ?? 'https') . '://' . ($parsed['host'] ?? '') . ($parsed['path'] ?? '/');
}
}
Here's an example of a custom indexer to use the canonical url (if applicable) as the url.
namespace App\Services\Search;
use Spatie\SiteSearch\Indexers\DefaultIndexer;
class Indexer extends DefaultIndexer
{
public function url(): string
{
$canonical = attempt(fn () => $this->domCrawler->filter('link[rel="canonical"]')->first()->attr('href'));
if (! $canonical) {
return parent::url();
}
return $canonical;
}
}