Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching with decomposed unicode characters #1184

Closed
Vainonen opened this issue Aug 13, 2021 · 4 comments · Fixed by #1239
Closed

Searching with decomposed unicode characters #1184

Vainonen opened this issue Aug 13, 2021 · 4 comments · Fixed by #1239
Assignees
Labels
Milestone

Comments

@Vainonen
Copy link
Contributor

At which URL did you encounter the problem?

http://finto.fi/finaf/fi/page/000185275

What steps will reproduce the problem?

  1. Copy a keyword used in search from a text with decomposed unicode characters

What is the expected output? What do you see instead?

Search will not find similar prefLabel if the decomposed unicode character in keyword is precomposed in the vocabulary.

@osma osma added the bug label Aug 16, 2021
@osma osma added this to the Next Tasks milestone Aug 16, 2021
@kouralex
Copy link
Contributor

Could not reproduce. Please give a more throughout example.

@joelit
Copy link
Contributor

joelit commented Sep 7, 2021

Same here; I couldn't reporduce the reported bug. But I did get close to it - while searching with Etholén, Hans or Etholen, Hans did produce the desired search result. I could at least copy and paste the authorized search term for the entity, including the trailing white space, and try to search with that. At least that way there was no search results.

@kouralex
Copy link
Contributor

That is a good theory and catch @joelit ! Could there have been a trailing white space @Vainonen ?

@joelit joelit self-assigned this Nov 9, 2021
@joelit
Copy link
Contributor

joelit commented Nov 9, 2021

This could be reproduced by creating a decomposed unicode search string, for example:
uconv -x any-nfd <<<'Etholén, Hans'

Using the precomposed search term, we get a single result:
https://finto.fi/finaf/fi/search?clang=fi&q=Ethol%C3%A9n%2C+Hans

Using the decomposed search term, we got no results:
https://finto.fi/finaf/fi/search?clang=fi&q=Ethole%CC%81n%2C+Hans

This could be solved in the text index with an appropriate filter, but maybe it's better to normalize the search terms in the ConceptSearchParameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants