-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace default responses from XML by JSON #348
Comments
@gforcada I'd be open to refactoring c.solr to use JSON instead of XML in general. I went a bit in that direction and I'd like to share my findings. Because Solr queries were too slow (seconds instead of milliseconds) in c.solr I started at some point to write an endpoint that does a raw Solr query and that returns the raw Solr results: This came out of a longer discussion with the 4tw folks, who went in a similar direction with ftw.solr. When I compared the performance of this raw Solr approach I figured out that as soon as I start converting the results (in Python), things became slow. I did not investigate this further and I did not do any performance measurements. I was after a raw Solr query anyways because I was tired of not being able to use Solr directly and relying on abstraction layers. Therefore my gut feeling would be that moving from XML to JSON won't give us a significant boost. Though, I could be mistaken. At kitconcept we are still evaluating different approaches, this is why we created kitconcept.solr. collective.solr does a lot more and a raw Solr query does not really fit to the expectations people might have about collective.solr. In any case. If you gain some profiling data on this topic I'd love to see it. |
@tisto There are a number of different libraries available for deserializing JSON in Python, and it is probably worth a bit of investigation to profile the different options. The naive approach in Python of creating an empty list and then adding items to it as you parse the JSON is bound to be slow, because Python has to keep reallocating the memory available for the list. In the case of the view you linked, it looks like it is pretty close to not needing to deserialize the raw results in Python at all. If it could be avoided (and just embed the serialized string from solr in the response) then that would save quite a bit of effort. |
What I remember, though it is a few years ago, is that analyzing where the time was spent processing the XML responses from Solr was on converting a string to a One option would be to make everything lazy (though that probably complicates things 😅 ) so parsing a response would be converting data to JSON and giving a mock interface that only when a brain is accessed, then we convert the JSON data to a proper brain... |
@davisagli yeah. That was the whole idea of this "raw Solr" approach. I was shocked at how much slower things became when I tried to mess around with the response. The Solr response format is very well documented and can be transformed into whatever is necessary on the front end. Personally, I think this is the way to go in the future. @gforcada thanks for sharing this finding. This makes a lot of sense. I guess in the end it all depends on if you are using Python or JavaScript to render your results. In a "Classic" environment it makes sense to optimize in Python, in a JS-frontend environment I think it makes lots of sense to let go of the transformation in the backend and just relies on the frontend (where you can lazy load or render things as well). Of course, there is nothing that prevents you from using the REST-API-based approach in Classic as well. :) |
For quite a few major releases, Solr allows you to specify in which format you want to receive the results.
collective.solr
always asks forXML
and it has a quite involved parser.At work we noticed that we get notifications of slow Solr requests, and actually looking into it, is not that Solr (the server) is slow on sending the response, but rather that
collective.solr
takes quite some time to process the receivedXML
response, and the notification that we get is fired after the response is processed.Getting
JSON
responses might be much more straightforward to process or even thepython
format, that returns a dictionary-like. Most probably theJSON
version is faster to parse, we should get numbers... 🤷🏾Would it be an option to either change it completely, or allow to specify/configure in which format one wants to get the responses?
The text was updated successfully, but these errors were encountered: