-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement buffer recycling for CharacterReader #1800
Conversation
Good stuff, thanks! I may move some things around (not sure if the recycler should be a public contract API, etc). Enjoy your vacation :) |
The next big thing is improving the efficiency of the InputStream based APIs
Yields
Which memory wise is terrible compared to the strign version... I'd be better of creating the String prior parsing rather than supplying the imputStream.... |
|
(Force pushed to solve the conflict) |
Hello @jhy , any status on this work? Also would it make sense to add a JVM system property to turn off the pooling behavior if not desired by the end user? |
Hello @jhy What can I do to help moving on on this topic? |
My apologies for the radio silience @chibenwa, I was pulled away from this. I have a refactoring of this to make the recyclers more generic, and will push an update for your review shortly. |
Apologies for the force-push, but I wanted to bring the branch current to HEAD. Here's a WIP of the refactoring to use a generic pool approach. I haven't profiled it much yet. I'd be glad for your review and profiling. A couple points:
|
(Am getting some build failures which are from the FuzzFixes tests -- some are timing out. Those have been sensitive to the CPU allocation on the workers so am not sure if it's just GitHub actions running a bit slower, or if these changes caused any slowness. I don't believe it's the latter, as my local perf tests show improvements, but we need to dig in.) |
Perf tests show this performs better when parsing larger docs; small strings still get the speed boost as the buffer is recycled.
In CharacterReader. This removes the redundant buffer allocation, and simplifies the read. For small file reads (same as small string test), updated version is ~ 20% faster than original.
(The Windows builds are failing - looks like for HTTP fetches the connection length is not getting fully read in some case. Need to investigate.) |
Refactored so that it eats until a combinator is seen after non-combinator content, and returns it all. And corrected unit tests that were incorrectly relying on that behavior. Note that a leading combinator will combine against the root element, which is either the Document, or the context element. Fixes jhy#1707
OK, I've landed #2186 now. Thanks again for initiating this! |
Before
After
Conclusion
This changeset significantly improves the memory efficiency of JSOUP which turns into massive performance gains.