Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance elastic search mapping for documents #139

Closed
sebdeleze opened this issue Feb 10, 2020 · 3 comments · Fixed by #286
Closed

Enhance elastic search mapping for documents #139

sebdeleze opened this issue Feb 10, 2020 · 3 comments · Fixed by #286

Comments

@sebdeleze
Copy link
Contributor

sebdeleze commented Feb 10, 2020

* AND instead of OR (simple query of ES) everywhere
* Search has to manage diacritics, as in RERO ILS
* User search: it should be possible to do a keyword search (for a user search, we cannot search by using only the last name) > check field analyser
* Ranking by relevance (might not be necessary with the new AND instead of OR)
* Full text search disabled.
@sebdeleze sebdeleze added 3 and removed 2 labels Jul 23, 2020
@sebdeleze sebdeleze added 5 and removed 3 labels Aug 24, 2020
@sebdeleze
Copy link
Contributor Author

@pronguen I think the OR operator is already implemented by default for all searches. For instance, in this document https://sonardev.test.rero.ch/api/documents/168, you can do a search with terms appearing in different fields and the document is returned https://sonardev.test.rero.ch/global/search/documents?q=Testing%202019&page=5&size=10 (last record of the page).
Maybe I missunderstood something...

@pronguen
Copy link
Contributor

Yes, by now we have OR but we want AND as default operator.

Example: https://sonardev.test.rero.ch/global/search/documents?q=Testing%20AND%202019&page=1&size=10

  • it corresponds better to the request "testing 2019" with only 2 results
  • what is more, the document with title "Testing" should appear at first rank. Could you check the default sort option by relevance?
  • I don't understand why the other document appears in the result as it does not contain "2019"

@mmo
Copy link
Collaborator

mmo commented Sep 7, 2020

  • OK: AND instead of OR
  • OK: diacritics replacement, eg. universitaeten / universitaten -> universitäten
  • OK: User search
  • Ranking by relevance: more data needed for testing
  • OK: Full text search

sebdeleze pushed a commit that referenced this issue Sep 7, 2020
For a better user experience, a serie of enhancements have been applied in this PR.

* Creates a custom docker image for elasticsearch with the ICU filter plugin.
* Improves mappings for all records types.
* Changes the way to sort records to put the best scores first.
* Changes the way to sort records to put the most recents first, when no query is specified.
* Sets a default analyzer in elasticsearch template.
* Adds a default search factory applied to all records types.
* Adds property to explain how elasticsearch set the score for a hit, if `debug` parameter is set.
* Creates a custom query parser for documents, to avoid to search in full-text by default.
* Closes #139.

Co-Authored-by: Sébastien Délèze <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants