Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial boolean support ARXIVNG-1971 #234

Merged
merged 18 commits into from
Jun 11, 2019

Conversation

JaimieMurdock
Copy link
Contributor

This PR depends on the earlier PR adding query support: #233

Launch the service using the same instructions as for the previous PR.

This adds support for boolean operators. For example:

Known Limitations

The fields must be different - you cannot filter twice on the same field as the same boolean operator will apply to all terms with the same field. For example, "all:none AND all:universe OR all:physics" will result in the following FieldedSearchTerm objects:

  • FieldedSearchTerm('OR', 'all', 'none')
  • FieldedSearchTerm('OR', 'all', 'universe')
  • FieldedSearchTerm('OR', 'all', 'physics')

Even without parentheticals, that's probably not the intent of the query. At the very least we'd expect:

  • FieldedSearchTerm('AND', 'all', 'none')
  • FieldedSearchTerm('AND', 'all', 'universe')
  • FieldedSearchTerm('OR', 'all', 'physics')

Proper design is complicated by the fact that booleans are applied as operands on each term, rather than as a binary operator between two terms. This design, however, might better reflect the elastic search practices, which would generate a query using + and - operands for the terms, which essentially defaults to an OR, whereas we default to an AND. This has the semantics of our query as +none +universe physics, whereas the elasticsearch defaults would be none +universe physics. https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_boolean_operators

We need to settle on a consistent convention across the API. I personally vote for preserving the AND default, because I think that's what people expect when searching our database. They want all the terms included unless they specify an OR. This is the default in the current search UI as well.

@JaimieMurdock JaimieMurdock self-assigned this Mar 7, 2019
@@ -0,0 +1,30 @@
# Notes for Boolean search implementation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for capturing this information here!

Copy link
Contributor

@erickpeirson erickpeirson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per discussion it seems like cramming this into the Fielded paradigm that we used for the UI is a square-peg-round-hole situation. We arrived at something like

Term = Tuple[str,str]
Expression = Union[Term, Tuple[Operator, Term]]
Phrase = Tuple[Union[Expression, Phrase], Operator, Union[Expression, Phrase]]

for a stripped-down domain representation, possible attached to a class like

class ClassicAPIQuery:
    phrase: Phrase
    order: Optional[str]
    size: int
    page_start: int

that supports pagination in whatever way the classic API requires

@JaimieMurdock JaimieMurdock mentioned this pull request Apr 3, 2019
@JaimieMurdock JaimieMurdock merged commit f74a4db into task/ARXIVNG-1898 Jun 11, 2019
@cbf66 cbf66 deleted the task/ARXIVNG-1971 branch August 21, 2024 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants