-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/#258 search query and id list #269
Conversation
|
||
# Disjunct query with an unary not. | ||
phrase = (('au', 'del_maestro'), 'OR', ('ANDNOT', ('ti', 'checkerboard'))) | ||
phrase = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is much easier to read than before 👍
|
||
from search.domain.base import Phrase, Operator, Field, Term | ||
class QueryTransformer(Transformer): | ||
def string(self, tokens): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A class docstring would be helpful here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
|
||
# Cast field to Field enum. | ||
QUERY_PARSER = Lark( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nice!
# Filter id_list if necessary. | ||
if query.id_list: | ||
# Separate versioned and unversioned papers. | ||
paper_ids = [id for id in query.id_list if "v" not in id] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unfortunately won't work for a subset of the old-style arXiv identifiers (e.g. solv-int/9704004
). Something like this should be ok:
import re
r1 = re.compile(".*v[1-9]\d*$")
paper_ids = []
paper_id_vs = []
for id in query.id_list:
paper_id_vs.append(id) if r1.search(id) else paper_ids.append(id)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kudos, and thanks! Although I didn't dive into the lark-parser
documentation in great depth just yet, I think this is much clearer than the original code. Good call on using that over ply
. I think this will be easier to maintain.
All the queries I tested worked as expected.
I made some very minor comments and pushed a tiny commit to remove the generator
element.
I've spent some time comparing the EBNF parsers since I got a little bit stuck with PLY (simple things were simple but then I got stuck with some of more complex stuff). And then I found LARK which is more actively developed, has more github stars, is pretty much faster and uses a lot less memory. I really don't know why I haven't heard of it earlier, but doing it in LARK was a breeze. So I think it's a win win situation. There is only one thing that I want to fix. But I need a suggestion. What should an empty string parse to? |
If |
OK will do that. |
I just have to add the edge cases where no query string is provided and then this pull request is ready. |
Co-Authored-By: Martin Lessmeister <[email protected]>
511c906
to
fc700ef
Compare
Implemented a new parser for classic api query using
lark-parser
(it's faster and more flexible and has nicer syntax than ply).Rewrote all the tests and added tests for ElasticSearch query generation.
I still have to test this a little bit, but as far as I did until now everything works as it should be.