Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when using auto_query() with stemming #146

Open
ecstaticpeon opened this issue Mar 17, 2015 · 4 comments
Open

Issue when using auto_query() with stemming #146

ecstaticpeon opened this issue Mar 17, 2015 · 4 comments

Comments

@ecstaticpeon
Copy link

I have an index with items containing the word "voyage", and others "voyager". When doing a search for "voyage" using auto_query(), the backend returns the items containing "voyager" first, although one would expect the items with "voyage" to be first. However, when using filter(), the ordering appears correct (e.g. first "voyage", then "voyager").

After doing some investigation, it looks like the query returned by XapianSearchQuery.build_query() is different depending on whether auto_query() or filter() is used:

from haystack.query import SearchQuerySet

search_query_set = SearchQuerySet()

search_query_set.auto_query(u'voyage')
# build_query() returns `Xapian::Query(Zvoyag:(pos=1))`
# Results are "voyager" first, then "voyage".

search_query_set.filter(content=u'voyage')
# build_query() returns `Xapian::Query((Zvoyag OR voyage))`
# Results are "voyage" first, then "voyager".

Looking at XapianSearchQuery._filter_contains(), which will be called when using filter(), the docstring specify the search will be done on both the stemmed and un-stemmed term: "Splits the sentence in terms and join them with OR, using stemmed and un-stemmed."

Shouldn't using auto_query() end up using both stemmed and un-stemmed terms as well?

Versions used:

Xapian: 1.3.2
xapian-haystack: 3e86112 (from 12 January 2015).

@ecstaticpeon
Copy link
Author

Ignore the ordering issue, this is actually related to our index. The question remain though: shouldn't using auto_query() end up using both stemmed and un-stemmed terms as well?

@jorgecarleitao
Copy link
Collaborator

Thanks for using Xapian-Haystack and for reporting this here.

In principle I agree with the consistency you mentioned. However, I'm not sure this is what we want since the auto_query receives a query, not a term. E.g. what would be the stemmed version of Hello OR bye OR che*rs?

See here what keywords it accepts.

@ecstaticpeon
Copy link
Author

Thanks for making Xapian-Haystack :)

As far as I understand, the query will be split by terms? And therefore stemming wil be applied to each of the terms when applicable?

@jorgecarleitao
Copy link
Collaborator

I'm not sure the query is split in terms by Xapian-Haystack. In this line, the "term" is prepared by haystack and sent to the backend to be interpreted (self.backend.parse_query(query)). We just add the field_name:%s to the term in case it is made on a specific field.

Can you point out where in the code it is split by terms?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants