Feature/#258 search query and id list #269

alefnula · 2020-02-14T18:58:40Z

Implemented a new parser for classic api query using lark-parser (it's faster and more flexible and has nicer syntax than ply).

Rewrote all the tests and added tests for ElasticSearch query generation.

I still have to test this a little bit, but as far as I did until now everything works as it should be.

mhl10 · 2020-02-17T01:36:46Z

search/domain/base.py


   # Disjunct query with an unary not.
-   phrase = (('au', 'del_maestro'), 'OR', ('ANDNOT', ('ti', 'checkerboard')))
+   phrase = (


I think this is much easier to read than before 👍

search/domain/classic_api/query_parser.py

mhl10 · 2020-02-17T01:43:01Z

search/domain/classic_api/query_parser.py


-from search.domain.base import Phrase, Operator, Field, Term
+class QueryTransformer(Transformer):
+    def string(self, tokens):


A class docstring would be helpful here

mhl10 · 2020-02-17T02:20:14Z

search/domain/classic_api/query_parser.py


-    # Cast field to Field enum.
+QUERY_PARSER = Lark(


super nice!

mhl10 · 2020-02-17T02:52:20Z

search/services/index/api_classic/classic_search.py

+    # Filter id_list if necessary.
+    if query.id_list:
+        # Separate versioned and unversioned papers.
+        paper_ids = [id for id in query.id_list if "v" not in id]


This unfortunately won't work for a subset of the old-style arXiv identifiers (e.g. solv-int/9704004). Something like this should be ok:

import re r1 = re.compile(".*v[1-9]\d*$") paper_ids = [] paper_id_vs = [] for id in query.id_list: paper_id_vs.append(id) if r1.search(id) else paper_ids.append(id)

mhl10

Kudos, and thanks! Although I didn't dive into the lark-parser documentation in great depth just yet, I think this is much clearer than the original code. Good call on using that over ply. I think this will be easier to maintain.

All the queries I tested worked as expected.

I made some very minor comments and pushed a tiny commit to remove the generator element.

alefnula · 2020-02-17T12:35:53Z

I've spent some time comparing the EBNF parsers since I got a little bit stuck with PLY (simple things were simple but then I got stuck with some of more complex stuff). And then I found LARK which is more actively developed, has more github stars, is pretty much faster and uses a lot less memory. I really don't know why I haven't heard of it earlier, but doing it in LARK was a breeze. So I think it's a win win situation.

There is only one thing that I want to fix. But I need a suggestion. What should an empty string parse to? all:""?

mhl10 · 2020-02-17T14:38:27Z

There is only one thing that I want to fix. But I need a suggestion. What should an empty string parse to? all:""?

If search_query is empty or any of the field terms are empty I would just short-circuit the request to the search backend and return something like the legacy version does, with no entries and 200 status. It's a bit odd, but would be compatible I suppose.

alefnula · 2020-02-17T14:39:10Z

OK will do that.

alefnula · 2020-02-17T16:15:27Z

I just have to add the edge cases where no query string is provided and then this pull request is ready.

Co-Authored-By: Martin Lessmeister <[email protected]>

alefnula requested a review from mhl10 February 14, 2020 18:58

mhl10 reviewed Feb 17, 2020

View reviewed changes

search/domain/classic_api/query_parser.py Outdated Show resolved Hide resolved

mhl10 reviewed Feb 17, 2020

View reviewed changes

search/domain/classic_api/query_parser.py

# Cast field to Field enum.

QUERY_PARSER = Lark(

Copy link

Contributor

mhl10 Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nice!

mhl10 reviewed Feb 17, 2020

View reviewed changes

mhl10 approved these changes Feb 17, 2020

View reviewed changes

alefnula and others added 7 commits February 20, 2020 10:30

Refactor query parser test cases for easier readability.

1f1ef30

Implemented new parsing of classic query string

c1606c2

disable generator entry

41059a2

Update search/domain/classic_api/query_parser.py

fc7911f

Co-Authored-By: Martin Lessmeister <[email protected]>

Fix paper_id version matching.

7b44512

Add documentation.

c39a18e

Empty query handling.

fc700ef

alefnula force-pushed the feature/#258-search-query-and-id-list branch from 511c906 to fc700ef Compare February 20, 2020 10:30

alefnula merged commit 342ff00 into develop Feb 20, 2020

bmaltzan deleted the feature/#258-search-query-and-id-list branch April 1, 2021 17:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/#258 search query and id list #269

Feature/#258 search query and id list #269

alefnula commented Feb 14, 2020

mhl10 Feb 17, 2020

mhl10 Feb 17, 2020

alefnula Feb 17, 2020

mhl10 Feb 17, 2020

mhl10 Feb 17, 2020

alefnula Feb 17, 2020

mhl10 left a comment

alefnula commented Feb 17, 2020

mhl10 commented Feb 17, 2020

alefnula commented Feb 17, 2020

alefnula commented Feb 17, 2020

Feature/#258 search query and id list #269

Feature/#258 search query and id list #269

Conversation

alefnula commented Feb 14, 2020

mhl10 Feb 17, 2020

Choose a reason for hiding this comment

mhl10 Feb 17, 2020

Choose a reason for hiding this comment

alefnula Feb 17, 2020

Choose a reason for hiding this comment

mhl10 Feb 17, 2020

Choose a reason for hiding this comment

mhl10 Feb 17, 2020

Choose a reason for hiding this comment

alefnula Feb 17, 2020

Choose a reason for hiding this comment

mhl10 left a comment

Choose a reason for hiding this comment

alefnula commented Feb 17, 2020

mhl10 commented Feb 17, 2020

alefnula commented Feb 17, 2020

alefnula commented Feb 17, 2020