Skip to content

Auto Indexing With Indexers

dburry edited this page Nov 5, 2012 · 2 revisions

Auto Indexing With Indexers

Getting Started

Suppose you have a model named FooThing that is all set up with index definitions, and you can build its index fine using the rake task.

First, make sure this model includes the IndexedSearch::Index module:

 # app/models/foo_thing.rb
 class FooThing < ActiveRecord::Base

   include IndexedSearch::Index

   # ...

 end

Then, generate an indexer in app/indexers/foo_thing.rb using this rake task:

 $ rails generate indexed_search:indexer foo_thing

That’s it! That model will now automatically reindex itself whenever it is changed, without you having to run any further rake tasks. You do not have to do anything further most of the time (but you can… keep reading).

What Are “Indexers”?

Basically, at their heart indexers are regular Rails observers. Except that the standard ActiveRecord::Observer class has been extended with short predefined after_create, after_update, and <tt>after_delete<tt> callbacks to provide reindexing capabilities by default.

The default generators create small files in the app/indexers directory that glue them to your models. Those small files also act as stub files for you to add more specific application-specific optimizations and extensions to.

Making Auto Indexing More Efficient

The getting started example creates a simple indexer that reindexes each model row as it’s changed (or added, or deleted). But it does it when any attribute is changed, even attributes that are not used by the index at all. This is inherently inefficient, especially if you have a lot of attributes that have nothing to do with the index.

So, in order to refine the indexing, we can edit the indexer created in the previous example:

 # app/indexers/foo_thing_indexer.rb
 class FooThingIndexer < ApplicationIndexer
   observe FooThing

   # override what happens when a foo record is changed
   def after_update(foo)
     # modify to only update for things used in search_index_info (not unrelated changes):
     # note: foo.<attribute>_changed? doesn't seem to work right with null values... (rails bug or feature?)
     if foo.name_was != foo.name || foo.description_was != foo.description || foo.abstract_was != foo.abstract
       foo.update_search_index
     # if an attribute is only used by search_priority, then this is much more efficient than a full reindex:
     elsif foo.public != foo.public_was
       foo.update_search_priority
     end
   end

 end

The default thing is usually fine for adding and deleting, so we can leave those alone in this example.

Auto Indexing When Associated Attributes Change

Now suppose the index for a given model not only uses its own attributes, but also uses attributes from a related model. For example, suppose your models are set up like such:

 # app/models/foo_thing.rb
 class FooThing < ActiveRecord::Base
   has_many :bar_things

   def search_index_info
     [
       # ...
       [bar_things.collect(&:name),   10]
     ]
   end

   # ...

 end
 # app/models/bar_thing.rb
 class BarThing < ActiveRecord::Base
   belongs_to :foo_thing
 end

Now the problem is that when BarThing objects change, their associated FooThing objects should be reindexed. This does not happen by default.

To fix it, first generate an indexer for BarThing too:

 $ rails generate indexed_search:indexer bar_thing

Then modify its default behavior to index the other associated model, instead of itself:

 # app/indexers/bar_thing_indexer.rb
 class BarThingIndexer < ApplicationIndexer
  observe BarThing

   def after_update(bar)
     # if the relationship itself changed, update old and/or new one
     if bar.foo_thing_id_was != bar.foo_thing_id
       bar.foo_thing.update_search_index unless bar.foo_thing_id.nil?
       if ! bar.foo_thing_id_was.nil? && ! (old_foo = FooThing.find(bar.foo_thing_id_was)).nil?
         old_foo.update_search_index
       end
     # otherwise if just the name changed, update current one if there is one
     elsif bar.name_was != bar.name && ! bar.foo_thing_id.nil?
       bar.foo_thing.update_search_index
     end
   end

   def after_create(bar)
     bar.foo_thing.update_search_index unless bar.foo_thing_id.nil?
   end

   def after_destroy(bar)
     bar.foo_thing.update_search_index unless bar.foo_thing_id.nil?
   end

 end

DRYing Out Your Code

If you find yourself doing a lot of similar things over and over, feel free to add common indexer code to your ApplicationIndexer class at app/indexers/application_indexer.rb, that’s what it’s there for! :)

It is quite normal to use this file as your indexers grow.

Temporarily Turning Off Auto Indexing For Large Updates

Sometimes you’re updating so many rows, that it would be more efficient to not do any reindexing for a while, and when you’re done reindex them all afterward. This might be the case, for example, in a mass import rake task.

This is easy. Just wrap your long running tasks inside a without_indexing block, and then call update_search_index, like so:

 # lib/tasks/import_foo_things.rb
 task :import_foo_things => :environment do
   FooThing.without_indexing do
     # ... do your long import code here
   end
   FooThing.update_search_index
 end

Note that if you do this, you should make sure any custom changes you’ve done to your indexers wrap themselves in an unless no_indexing? block like the default ones do:

 # app/indexers/foo_thing_indexer.rb
 class FooThingIndexer < ApplicationIndexer
   observe FooThing

   def after_update(foo)
     unless foo.no_indexing?
       if foo.name_was != foo.name || foo.description_was != foo.description
         foo.update_search_index
       end
     end
   end

 end

In some cases you might also find it faster to run delete_search_index and create_search_index instead of update_search_index, because that way it simply nukes and writes a new index, instead of doing a lot of reading and comparing.

But if you do this, your site should probably be offline (i.e. in maintenance mode), so that users aren’t confused by search mysteriously not returning expected results sometimes. It’s also very important to make sure any indexers don’t get triggered on a given row before create_search_index processes it or it will blindly create bogus duplicate indexes, that’s why this is not the usual way. Usually it’s better to take a certain percentage longer and keep the site running.