Skip to content

Indexing and Ranking Tips

dburry edited this page Jul 10, 2012 · 2 revisions

With great power comes great responsibility. And potentially much confusion, since there are so many knobs and wheels you can adjust in this gem. Let’s discuss some philosophies of what to index and how to rank that we’ve learned over the years.

Index as much related data as possible

The more data is in the index, the more likely something a user searches for will match, and the more results they will have. Lots of results is a good thing, it gives the user lots of options, even if their search is somewhat vague.

The important thing is the ranking. Use the ranking settings to make sure the most important matches are at the top of the list on the first page. Proper ranking is what makes “lots of results” a good thing.

Of course, any data that’s truly “private” you probably don’t want to index… This gem doesn’t implement any sort of role-based segregation of the index data (though you can exempt certain records entirely from being indexed, while allowing others).

Range of ranking numbers to use

While the limit is that of an integer, keep in mind that internally this gem uses addition and multiplication several different ways to do a final ranking score for each query. There’s no need to use really big numbers to start with. Here’s what I’ve found useful for individual model attributes:

  • Titles and names: 50-80

  • Keywords, group and category names: 10-30

  • Short abstracts, related info: 3-10

  • Longer full text descriptions: 1-3

Note that you can often do a first paragraph or sentence thing to fake a short abstract, since well written things often summarize a bit first.

Rank models with very few rows higher than models with many

If all models were ranked equally, models with very few rows would get “swamped” by models with many rows. Therefore I’ve found it best to bring the smaller ones out a bit more, even if their data is not actually more important.

Using a hard-coded model/row-based rank

The search_priority is useful when you just want to say “hey, this row is definitely more important than that one.” Reasons for doing this can be varied, such as bringing out newer, more active, or “better” content. This only influences a search result rank, and does not guarantee an order.

It’s also useful when setting a priority in a Sitemaps file (www.sitemaps.org/protocol.html). In fact the range of 0.0 - 1.0 was chosen just to make that easier.