-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Highlighting of phrase search results seems to be applied for individual words of the phrase #101
Comments
Hello, @Conal-Tuohy The list consists of search words on the web page, the current result on the web page, and how we want to fix them. Search field: Search field: Search field: Search field: Search field: Search result: Search field: (a phrase including symbols) |
As as see it there are a few different issues here:
Latin stemmingIt would be possible to patch Solr to be Latin-aware, and automatically extract the stems of latin words, which would allow for fuzzier searches. This would require writing a Latin "Stemmer" component in Java, and configuring Solr to use it. I know Java and I could write such a thing, but I would definitely need assistance with Latin grammar because I don't know Latin. It' would be many hours' work though, for sure. Wild-card and phrasal searchThe app uses Solr's facet search API. The app accepts the field values posted from the website's HTML form, and it uses an XSLT stylesheet to convert those values into a JSON object which it sends to Solr as an HTTP POST, and then uses another stylesheet to convert Solr's response back to HTML. The stylesheet which formats the request as JSON is https://github.com/IUBLibTech/newton_chymistry/blob/master/xslt/search-parameters-to-solr-request.xsl and the search parameters are passed to it in XML like this: <c:param-set>
<c:param name="text" value="aqua fortis"/>
<c:param name="symbol" value="♂ IRON (MARS)"/>
</c:param-set> The stylesheet produces a JSON document which specifies a search query for anything at all (i.e. HighlightingWhat's happening with the hit highlighting is that Solr's response includes a set of what it calls "snippets"; these are phrases or sentences in which the searched words appear. Each snippet represents a "hit", and within each snippet, the words that the user searched for are also marked up. The snippets are displayed in full on the search results page, with the keywords in each snippet highlighted. If you click through to the HTML page, the same snippets are retrieved from Solr and used to highlight the HTML. Each snippet is highlighted, and the keywords are also highlighted distinctly. The stylesheet which does the hit-highlighting also inserts I think it'd be a good idea to defer any work on the hit-highlighting until the phrase searching is replaced with word searching and fuzzy searching enabled, because that will change the kind of results you can get. It should be a separate issue. |
@Conal-Tuohy As for Latin stemming, since I used to be a Java web developer as well as I know basic Latin, I also might be able to work on it. Even if I do so, I would ask you to provide some information resources about this technology. As for wildcard and phrasal search, I am going to try what you suggest and see the difference. As for highlighting, okay, we can work on it after the two issues above are done. If I have further questions, please let me inquire again. |
@tubesoft since I am looking into the phrase search issue myself, for @jawalsh, maybe you ought to wait a bit? Regarding lemmatizing Latin in Solr, I don't actually know much about it myself, to be honest. The best I can do is refer you to https://lucene.apache.org/solr/guide/8_1/language-analysis.html |
@Conal-Tuohy Okay, as for the Latin issues, I will try to do some research for how to implement it. |
progress report: I've fixed the "phrasal search" part of this issue in the Swinburne website, but I haven't merged the fix with chymistry, yet. I've been almost entirely off work for a couple of weeks because of a virus, and I'm just getting back to it. |
Thank you, Con! |
Here is update! I added Latin stemmer plugins found in this GitHub repository on my local Solr, and I modified the filters property in Then, after re-indexing, the search seems to recognize latin word. For example, when I search "omnis", the result shows various declensions. |
As we discussed in the previous meeting (Feb 16), I firstly thought that, OR search would be executed when we enter a phrase like Aqua fortis in the search form. However, I found that the webapp actually execute an phrasal search! For the evidence, when I search the word, "Aqua fortis," (without double-quotes in the form) I get 17 results.
I also executed the query,
http://localhost:8983/solr/chymistry/select?q=text%3A%22Aqua%20fortis%22
(ortext:"Aqua fortis"
) directly on Solr. The phrase was actually double-quoted, which triggers phrasal search in Solr.I also got 17 results!
On the other hand, when I executed
text:Aqua fortis
(without double-quotes), I got 83 results.Judging from these facts, when we enter a phrase in the search form, a phrasal search seems to be executed even without double-quotation.
Then, I assume the issue to think about is how to highlight the result. The search result highlights the phrases that are hit as well as individual words of the phrase. I might have to talk at the meeting about whether only the exact phrase should be highlighted or not, but I personally highlighting should be limited to the exact phrase since it is the result of a phrasal search.
If we decided to modify the highlighting results, we might have to ask @Conal-Tuohy 's help!
The text was updated successfully, but these errors were encountered: