-
Notifications
You must be signed in to change notification settings - Fork 2
Proximity Searches
Proximity searches are searches that take into account the positioning of words in relation to each other.
This gem does NOT currently support proximity searches, nor does it consider directly how close words are to each other in its ranking.
The index currently only stores whether a word exists in a given record or not (i.e. a boolean index), not what field it’s in, not how close it is within the same field to any other word, etc. To do so would require a much bigger more complicated index.
Note: someday different indexing algorithms may be supported by this gem (including ones that record word positions), I’m not against it.
But never fear… ranking to the rescue! Along with the boolean of whether a word exists or not, is an integer score that tells the system how important the word is in the record. This score is based on: what field it’s in, how frequently it was found, etc. This means that most of the time, you do not need proximity searches, this scoring optimization will be “good enough.”
-
When you need phrase searches. Examples include things that you might memorize verbatim, such as religious texts or poetry, that you can’t otherwise search through as easily by subject.
-
When you need to rank higher based on how close the resulting words are to each other. For example, if you have truly long texts that have lots of words (like whole chapters or books all in one field), then you might need this kind of ranking to distinguish them. However, such long texts should be broken up into smaller pieces before being put on the web, and this search engine, being built upon the Rails web application framework, is primarily designed for the web.
-
When you have a rich set of fields for each record to assign different ranks. Then the ranking system really shines, and makes proximity searches a lot less important, often to the point where you will not miss them.
-
When important words can be parsed out of large blocks of text somehow, such as different level headings. This is essentially a way of obtaining a bunch of important keywords for the document.