-
Notifications
You must be signed in to change notification settings - Fork 12
Indexing in SearchPress
When you first activate SearchPress, you need to index your content. That is to say, you need to send all your content to Elasticsearch. There are two ways to do this: via the WordPress Admin or via WP-CLI.
SearchPress provides a tool to sync content under Tools → SearchPress. This tool will sync content using the WordPress cron system in batches of 500 posts, roughly once per minute. If your site has 20,000 posts, that should take 40-45 minutes to complete. For a site that size, you should consider using WP-CLI, which will be considerably faster.
If you need to increase or decrease the bulk indexing size, you can do so in your theme (or a plugin) using the following example:
add_action(
'searchpress_loaded',
function() {
// Decrease cron bulk indexing size to 250.
SP_Sync_Meta()->bulk = 250;
}
);
If you have WP-CLI installed, SearchPress automatically registers a command to perform various tasks like syncing your content. This is the preferred method for syncing content if you have more than a thousand posts in your site.
To remove any extant data, add the mapping from scratch, and run a full sync on your content, the command would be:
wp searchpress index --flush --put-mapping
This will delete the current index if there is one (--flush
), add the mapping (--put-mapping
), and index all content. This will sync your content in batches of 2,000 by default, but you can control the batch size by setting the bulk flag, e.g. --bulk=1000
. We found 2,000 to be a good default, but you may find that increasing or decreasing this will produce better results for your specific content. Too many at a time and you might have size issues (either in memory in PHP or the size of the payload going to ES); too few at a time and the HTTP overhead might prevent indexing from being as fast as it could be. The command will report on its progress and try to estimate how much longer is left based on the overall average velocity.
Domain
SearchPress uses the current domain (specifically, get_site_url()
) when interacting with Elasticsearch. When using WP-CLI, and depending on your host/wp-config/etc., if you don't explicitly set the --url
flag, it's possible that the domain will not be the same in WP-CLI as it is when you visit the live site.
When a post or attachment is created or updated, it is automatically synced to Elasticsearch. When it is trashed/deleted, it is automatically removed from the Elasticsearch index.
SearchPress prepares a post for indexing by populating an object of the class SP_Post
. The SP_Post
object is very flexible and allows you to alter any of the data prior to indexing.
In early versions of SearchPress, SearchPress would index almost all post meta with the post. Starting in the 0.4 release, SearchPress only indexes the post meta that it is explicitly told to index. Further, it only indexes post meta in the data types that a site's developer plans to use. The principal reason behind this change is performance, and to prevent "mappings explosion".
Data type casting will only be attempted for a key if the opt-in callback specifies that type for the key in question (see example below for the full list of possible types). However, the data type will still only be indexed if the type casting is successful. For example, attempting to index the meta value "WordPress"
as a long
would fail, since it is not a numeric value. This failure is silent, for better or worse, but type casting is overall quite forgiving.
If a meta key is allowed to be indexed, the meta value will always be indexed as an unanalyzed string (post_meta.*.raw
) and that type need not be specified. This is primarily for compatibility with ES_WP_Query, which depends on that key in EXISTS
queries, among others.
add_filter(
'sp_post_allowed_meta',
function( $allowed_meta ) {
// Tell SearchPress to index 'some_meta_key' post meta when encountered.
$allowed_meta['some_meta_key'] = [
'value', // Index as an analyzed string.
'boolean', // Index as a boolean value.
'long', // Index as a "long" (integer).
'double', // Index as a "double" (floating point number).
'date', // Index as a GMT date-only value in the format Y-m-d.
'datetime', // Index as a GMT datetime value in the format Y-m-d H:i:s.
'time', // Index as a GMT time-only value in the format H:i:s.
];
return $allowed_meta;
}
);
Here are some of the actions and filters available for customization in SP_Post
:
Filters:
-
sp_post_allowed_meta
Filter which post meta should be indexed and the data types it should be cast to.-
array
$meta
The meta to be indexed, as'meta_key' => (array) $data_types
. See above docs on how to index post meta -
integer
$post_id
The ID of the current post.
-
array
-
sp_post_pre_index
Filter the data that will be sent to Elasticsearch immediately before converting it to JSON.-
array
$data
The data to be indexed. See Mapping in SearchPress for a rundown of the data.
-
array
-
sp_post_should_be_indexed
Determines if this post should be indexed or not.-
boolean
$should_be_indexed
Defaults totrue
. If you don't want this post to be indexed, returnfalse
. -
SP_Post
$sp_post
The currentSP_Post
object.
-
boolean
-
sp_post_index_filtered_data
Should the title and content be filtered prior to indexing?-
boolean
$should_filter_data
Defaults tofalse
. Iftrue
, the title and post_content will be filtered as if it were being displayed on the front end of the site.
-
boolean