-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alphabetical order issues in sorting #171
Comments
Original comment by John Shepherdson (GitHub: john-shepherdson). See also #154 |
Original comment by John Shepherdson (GitHub: john-shepherdson). I’ll fix the label issues (via the linked issue), but cannot fix the behavioural ones, which will have to wait for the next maintenance phase. |
Original comment by Taina Jääskeläinen. I have made an issue to the service providers, if they have titles beginning with brackets or with single or double quotation marks. |
Original comment by John Shepherdson (GitHub: john-shepherdson). The ElasticSearch config file for each language points to the default stopword list for that language (where available): czech, danish, german, greek, english, finnish, french, hungarian, italian, dutch, norwegian, portuguese, swedish. Elasticsearch provides the following predefined list of stopword languages:
So, no stopword lists are available for estonian, slovakian and slovenian |
Original comment by John Shepherdson (GitHub: john-shepherdson). 1 - fixed via #154 2 - TODO (see also https://github.com/cessda/cessda.metadata.office/issues/55 and https://github.com/cessda/cessda.metadata.office/issues/56) 3 - fixed via #204 |
Original comment by Taina Jääskeläinen. Adding a sub-issue number 4: Looking at Z-A sorting, it seems that if the title starts with a small letter and not a capital letter, the sorting goes haywire. Teach system to treat small and capital letters alike? Sometimes there is a need to have the title to start with a small letter, for instance elderLUCID: London UCL Older adults' clear speech in interaction database. Here elderLUCID is the database name. |
Original comment by John Shepherdson (GitHub: john-shepherdson). @matthew-morris-cessda Are you able to fix this? if so, please self-assign. |
Original comment by Taina Jääskeläinen. https://github.com/cessda/cessda.metadata.office/issues/56 is fixed and closed. |
Original comment by Matthew Morris (GitHub: matthew-morris-cessda). I’ve discovered the root cause: Letters are represented by numbers by computers, for example the letter G is represented by the number 71. This issue is caused by lowercase letters are represented with larger numbers (i.e. g is represented by the number 103). Elasticsearch sorts by these numbers by default. This has been fixed as of cessda/cessda.cdc.osmh-indexer.cmm@8940f35 but a reindex is required in order for the fix to take effect. |
Original comment by Matthew Morris (GitHub: matthew-morris-cessda). [link to pull request removed](link to pull request removed) |
Original comment by Matthew Morris (GitHub: matthew-morris-cessda). |
Original comment by John Shepherdson (GitHub: john-shepherdson). Checked using Swedish alphabet |
Original report on BitBucket by Taina Jääskeläinen.
Alphabetical ordering by titles: some issues.
The text was updated successfully, but these errors were encountered: