When a site gets crawled, each of the pages is fed to a search profile. When that search profile determines that page should be indexed, the URL and response for that page is being given to an indexer. The job of the indexer is to extract the title of the page, the h1, description, content, ... that should be put in the site index.
By default, the Spatie\SiteSearch\Indexers\DefaultIndexer
is used. This indexer makes the best effort in determining the page title, description, and content of your page.
The implementation of entries()
of the DefaultIndexer
will chop up your content in pieces of a few sentences long. We do this to keep the record size within the limits of Meilisearch.
If the results yielded by DefaultIndexer
are not good enough for your content, you can create a custom indexer. An indexer is any class that implements Spatie\SiteSearch\Indexers\Indexer
. Here's how that interface looks like.
namespace Spatie\SiteSearch\Indexers;
use Carbon\CarbonInterface;
use Psr\Http\Message\UriInterface;
interface Indexer
{
public function pageTitle(): ?string;
public function h1(): ?string;
public function entries(): array;
public function dateModified(): ?CarbonInterface;
public function extra(): array;
public function url(): UriInterface;
}
In most cases, it's probably the easiest to extend the DefaultIndexer
class YourIndexer extends Spatie\SiteSearch\Indexers\DefaultIndexer
{
}
To use your custom indexer, specify its class name in the default_indexer
key of the site-search
config file.
Here's an example of a custom indexer used at freek.dev that will remove the suffix of site.
namespace App\Services\Search;
use Spatie\SiteSearch\Indexers\DefaultIndexer;
class Indexer extends DefaultIndexer
{
public function pageTitle(): ?string
{
return str_replace(
" - Freek Van der Herten's blog on PHP, Laravel and JavaScript",
'',
parent::pageTitle()
);
}
}
Here's an example of a custom indexer to strip away the query parameters from the url.
namespace App\Services\Search;
use Psr\Http\Message\UriInterface;
use Spatie\SiteSearch\Indexers\DefaultIndexer;
class Indexer extends DefaultIndexer
{
public function url(): UriInterface
{
return $this->url->withQuery('');
}
}
Here's an example of a custom indexer to use the canonical url (if applicable) as the url.
namespace App\Services\Search;
use GuzzleHttp\Psr7\Uri;
use Psr\Http\Message\UriInterface;
use Spatie\SiteSearch\Indexers\DefaultIndexer;
class Indexer extends DefaultIndexer
{
public function url(): UriInterface
{
$canonical = attempt(fn () => $this->domCrawler->filter('link[rel="canonical"]')->first()->attr('href'));
if (! $canonical) {
return parent::url();
}
return new Uri($canonical);
}
}