Implement buffer recycling for CharacterReader #1800

chibenwa · 2022-07-01T02:28:47Z

Before

Benchmark                                                  Mode  Cnt      Score      Error   Units
JMHBenchmark.benchmarkSmall                                avgt    5      7.382 ±    0.940   us/op
JMHBenchmark.benchmarkSmall:·gc.alloc.rate                 avgt    5   8572.806 ± 1082.658  MB/sec
JMHBenchmark.benchmarkSmall:·gc.alloc.rate.norm            avgt    5  72952.001 ±    0.001    B/op
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Eden_Space        avgt    5   8776.474 ± 1059.021  MB/sec
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Eden_Space.norm   avgt    5  74688.242 ±  767.418    B/op
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Old_Gen           avgt    5      0.195 ±    0.016  MB/sec
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Old_Gen.norm      avgt    5      1.664 ±    0.174    B/op
JMHBenchmark.benchmarkSmall:·gc.count                      avgt    5    523.000             counts
JMHBenchmark.benchmarkSmall:·gc.time                       avgt    5    761.000                 ms
JMHBenchmark.benchmarkMedium                               avgt    5     29.383 ±    1.479   us/op
JMHBenchmark.benchmarkMedium:·gc.alloc.rate                avgt    5   2531.961 ±  127.589  MB/sec
JMHBenchmark.benchmarkMedium:·gc.alloc.rate.norm           avgt    5  85824.002 ±    0.001    B/op
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Eden_Space       avgt    5   2522.477 ±  106.173  MB/sec
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Eden_Space.norm  avgt    5  85506.478 ± 2628.227    B/op
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Old_Gen          avgt    5      0.207 ±    0.036  MB/sec
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Old_Gen.norm     avgt    5      7.031 ±    1.401    B/op
JMHBenchmark.benchmarkMedium:·gc.count                     avgt    5    372.000             counts
JMHBenchmark.benchmarkMedium:·gc.time                      avgt    5    371.000                 ms

After

Benchmark                                                  Mode  Cnt      Score     Error   Units
JMHBenchmark.benchmarkSmall                                avgt    5      2.038 ±   0.102   us/op
JMHBenchmark.benchmarkSmall:·gc.alloc.rate                 avgt    5   3205.214 ± 160.609  MB/sec
JMHBenchmark.benchmarkSmall:·gc.alloc.rate.norm            avgt    5   7536.000 ±   0.001    B/op
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Eden_Space        avgt    5   3175.262 ± 132.074  MB/sec
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Eden_Space.norm   avgt    5   7465.807 ± 139.856    B/op
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Old_Gen           avgt    5      0.194 ±   0.016  MB/sec
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Old_Gen.norm      avgt    5      0.455 ±   0.038    B/op
JMHBenchmark.benchmarkSmall:·gc.count                      avgt    5    439.000            counts
JMHBenchmark.benchmarkSmall:·gc.time                       avgt    5    397.000                ms
JMHBenchmark.benchmarkMedium                               avgt    5     21.085 ±   1.112   us/op
JMHBenchmark.benchmarkMedium:·gc.alloc.rate                avgt    5    833.438 ±  44.777  MB/sec
JMHBenchmark.benchmarkMedium:·gc.alloc.rate.norm           avgt    5  20272.002 ±   0.001    B/op
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Eden_Space       avgt    5    827.537 ±  70.651  MB/sec
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Eden_Space.norm  avgt    5  20126.922 ± 783.745    B/op
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Old_Gen          avgt    5      0.206 ±   0.016  MB/sec
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Old_Gen.norm     avgt    5      5.021 ±   0.162    B/op
JMHBenchmark.benchmarkMedium:·gc.count                     avgt    5    268.000            counts
JMHBenchmark.benchmarkMedium:·gc.time                      avgt    5    193.000                ms

Conclusion

This changeset significantly improves the memory efficiency of JSOUP which turns into massive performance gains.

jhy · 2022-07-01T02:40:31Z

Good stuff, thanks! I may move some things around (not sure if the recycler should be a public contract API, etc).

Enjoy your vacation :)

chibenwa · 2022-07-01T03:01:13Z

The next big thing is improving the efficiency of the InputStream based APIs

    @Benchmark
    public void benchmarkMediumInputStream(Blackhole bh) throws Exception{
        bh.consume(Jsoup.parse(new ByteArrayInputStream(CONTENT_MEDIUM_AS_BYTES), StandardCharsets.UTF_8.name(), ""));
    }

Yields

Benchmark                                                             Mode  Cnt       Score      Error   Units
JMHBenchmark.benchmarkMediumInputStream                               avgt    5      35.285 ±    1.339   us/op
JMHBenchmark.benchmarkMediumInputStream:·gc.alloc.rate                avgt    5    3504.566 ±  133.319  MB/sec
JMHBenchmark.benchmarkMediumInputStream:·gc.alloc.rate.norm           avgt    5  142656.003 ±    0.001    B/op
JMHBenchmark.benchmarkMediumInputStream:·gc.churn.G1_Eden_Space       avgt    5    3497.228 ±  134.669  MB/sec
JMHBenchmark.benchmarkMediumInputStream:·gc.churn.G1_Eden_Space.norm  avgt    5  142359.431 ± 3459.922    B/op
JMHBenchmark.benchmarkMediumInputStream:·gc.churn.G1_Old_Gen          avgt    5       0.379 ±    0.356  MB/sec
JMHBenchmark.benchmarkMediumInputStream:·gc.churn.G1_Old_Gen.norm     avgt    5      15.420 ±   13.903    B/op
JMHBenchmark.benchmarkMediumInputStream:·gc.count                     avgt    5     416.000             counts
JMHBenchmark.benchmarkMediumInputStream:·gc.time                      avgt    5     423.000                 ms

Which memory wise is terrible compared to the strign version... I'd be better of creating the String prior parsing rather than supplying the imputStream....

chibenwa · 2022-07-01T04:20:02Z

Benchmark                                                             Mode  Cnt       Score      Error   Units
JMHBenchmark.benchmarkMedium                                          avgt    5      20.314 ±    1.621   us/op
JMHBenchmark.benchmarkMedium:·gc.alloc.rate                           avgt    5     654.901 ±   51.181  MB/sec
JMHBenchmark.benchmarkMedium:·gc.alloc.rate.norm                      avgt    5   15344.002 ±    0.001    B/op
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Eden_Space                  avgt    5     650.845 ±   57.968  MB/sec
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Eden_Space.norm             avgt    5   15248.663 ±  495.411    B/op
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Old_Gen                     avgt    5       0.193 ±    0.031  MB/sec
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Old_Gen.norm                avgt    5       4.533 ±    0.927    B/op
JMHBenchmark.benchmarkMedium:·gc.count                                avgt    5     244.000             counts
JMHBenchmark.benchmarkMedium:·gc.time                                 avgt    5     174.000                 ms
JMHBenchmark.benchmarkMediumInputStream                               avgt    5      32.855 ±    1.372   us/op
JMHBenchmark.benchmarkMediumInputStream:·gc.alloc.rate                avgt    5    3633.766 ±  150.956  MB/sec
JMHBenchmark.benchmarkMediumInputStream:·gc.alloc.rate.norm           avgt    5  137728.003 ±    0.001    B/op
JMHBenchmark.benchmarkMediumInputStream:·gc.churn.G1_Eden_Space       avgt    5    3632.474 ±  189.650  MB/sec
JMHBenchmark.benchmarkMediumInputStream:·gc.churn.G1_Eden_Space.norm  avgt    5  137676.632 ± 2596.732    B/op
JMHBenchmark.benchmarkMediumInputStream:·gc.churn.G1_Old_Gen          avgt    5       0.339 ±    0.368  MB/sec
JMHBenchmark.benchmarkMediumInputStream:·gc.churn.G1_Old_Gen.norm     avgt    5      12.879 ±   14.070    B/op
JMHBenchmark.benchmarkMediumInputStream:·gc.count                     avgt    5     420.000             counts
JMHBenchmark.benchmarkMediumInputStream:·gc.time                      avgt    5     427.000                 ms
JMHBenchmark.benchmarkSmall                                           avgt    5       1.656 ±    0.035   us/op
JMHBenchmark.benchmarkSmall:·gc.alloc.rate                            avgt    5    1699.708 ±   36.050  MB/sec
JMHBenchmark.benchmarkSmall:·gc.alloc.rate.norm                       avgt    5    3248.000 ±    0.001    B/op
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Eden_Space                   avgt    5    1681.110 ±   63.848  MB/sec
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Eden_Space.norm              avgt    5    3212.422 ±   74.478    B/op
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Old_Gen                      avgt    5       0.186 ±    0.015  MB/sec
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Old_Gen.norm                 avgt    5       0.355 ±    0.034    B/op
JMHBenchmark.benchmarkSmall:·gc.count                                 avgt    5     337.000             counts
JMHBenchmark.benchmarkSmall:·gc.time                                  avgt    5     270.000                 ms

chibenwa · 2022-07-01T04:21:53Z

(Force pushed to solve the conflict)

chibenwa · 2022-09-13T09:46:42Z

Hello @jhy , any status on this work?

Also would it make sense to add a JVM system property to turn off the pooling behavior if not desired by the end user?

chibenwa · 2022-11-14T05:20:26Z

Hello @jhy

What can I do to help moving on on this topic?

jhy · 2023-01-06T07:14:36Z

My apologies for the radio silience @chibenwa, I was pulled away from this.

I have a refactoring of this to make the recyclers more generic, and will push an update for your review shortly.

jhy · 2023-01-07T03:03:45Z

Apologies for the force-push, but I wanted to bring the branch current to HEAD.

Here's a WIP of the refactoring to use a generic pool approach. I haven't profiled it much yet. I'd be glad for your review and profiling.

A couple points:

I temporarily disabled the benchmark just on committing these, but please re-enable for your testing. When we land this, I think I will remove the benchmark from the commit, and move out to a distinct repo (so that versions of jsoup can be compared more readily)
I made the char buffer size constant vs allocating smaller ephemeral buffers if the input was smaller than the buffer size. I figure that because we are using a constant pool, it will be better to recycle somewhat larger objects than thrashing on smaller objects
I want to make the DataUtil buffers use this too - I think I can just remove the BufferedReader (which internally allocates a new buffer) because the CharacterReader implements its own buffer. Two is redundant.
It would be good if the small HTML test had more varied inputs. Otherwise the String Cache in CharacterReader is always going to have an artificially high hit rate, which might bias the performance tests.
I don't know that we need to make these pools optional via a tuning configuration. I would prefer to just get it to work well for all circumstances. Just one less thing for the developer to worry about.
In contrast to the last point however, I wonder if we should make the StringCache optional, or automatically switch it on/off depending on the platform (server or Android). I have found during perf testing that the overhead of maintaining it when not on Android is worse than the allocation cost.

jhy · 2023-01-07T03:13:54Z

(Am getting some build failures which are from the FuzzFixes tests -- some are timing out. Those have been sensitive to the CPU allocation on the workers so am not sure if it's just GitHub actions running a bit slower, or if these changes caused any slowness. I don't believe it's the latter, as my local perf tests show improvements, but we need to dig in.)

Perf tests show this performs better when parsing larger docs; small strings still get the speed boost as the buffer is recycled.

In CharacterReader. This removes the redundant buffer allocation, and simplifies the read. For small file reads (same as small string test), updated version is ~ 20% faster than original.

src/main/java/org/jsoup/parser/CharacterReader.java

+     release.
+     */
+    @Deprecated
+    public CharacterReader(Reader input, int sz) {


jhy · 2023-01-12T06:12:52Z

(The Windows builds are failing - looks like for HTTP fetches the connection length is not getting fully read in some case. Need to investigate.)

Refactored so that it eats until a combinator is seen after non-combinator content, and returns it all. And corrected unit tests that were incorrectly relying on that behavior. Note that a leading combinator will combine against the root element, which is either the Document, or the context element. Fixes jhy#1707

jhy · 2024-08-10T00:28:45Z

OK, I've landed #2186 now. Thanks again for initiating this!

chibenwa mentioned this pull request Jul 1, 2022

CharacterReader always allocate a 32 KB buffer that can even exceed document size #1773

Closed

jhy linked an issue Jul 1, 2022 that may be closed by this pull request

CharacterReader always allocate a 32 KB buffer that can even exceed document size #1773

Closed

jhy changed the title ~~ISSUE-1773 Implement buffer recycling for CharacterReader~~ Implement buffer recycling for CharacterReader Jul 1, 2022

chibenwa force-pushed the ISSUE-1773 branch from 82950d8 to d09a106 Compare July 1, 2022 04:21

Reuse buffers

dfb065a

General implementation of a threadlocal pool

a950665

jhy force-pushed the ISSUE-1773 branch from d09a106 to a950665 Compare January 7, 2023 02:37

Don't use withInitial, not in default Android level

f108026

jhy added 2 commits January 9, 2023 16:02

Moving char buffer size back to 32K

b699998

Perf tests show this performs better when parsing larger docs; small strings still get the speed boost as the buffer is recycled.

Removed need of BufferedReader

30e73c8

In CharacterReader. This removes the redundant buffer allocation, and simplifies the read. For small file reads (same as small string test), updated version is ~ 20% faster than original.

github-advanced-security bot found potential problems Jan 12, 2023

View reviewed changes

src/main/java/org/jsoup/parser/CharacterReader.java

release.

*/

@Deprecated

public CharacterReader(Reader input, int sz) {

Check notice

Code scanning / CodeQL

Useless parameter

The parameter 'sz' is never used.

jhy added 5 commits October 28, 2023 18:45

Merge branch 'master' into ISSUE-1773

57475c8

Tidied up some tests with an assertion helper

1c57f30

Added the :is pseudo selector

ef8106e

General implementation of a threadlocal pool

1113cad

jhy mentioned this pull request Aug 2, 2024

Improve buffer management throughout the load/fetch and parse lifecycle #2186

Merged

jhy closed this Aug 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement buffer recycling for CharacterReader #1800

Implement buffer recycling for CharacterReader #1800

chibenwa commented Jul 1, 2022 •

edited

Loading

jhy commented Jul 1, 2022

chibenwa commented Jul 1, 2022 •

edited

Loading

chibenwa commented Jul 1, 2022

chibenwa commented Jul 1, 2022

chibenwa commented Sep 13, 2022

chibenwa commented Nov 14, 2022

jhy commented Jan 6, 2023

jhy commented Jan 7, 2023

jhy commented Jan 7, 2023

jhy commented Jan 12, 2023

jhy commented Aug 10, 2024

Implement buffer recycling for CharacterReader #1800

Implement buffer recycling for CharacterReader #1800

Conversation

chibenwa commented Jul 1, 2022 • edited Loading

Before

After

Conclusion

jhy commented Jul 1, 2022

chibenwa commented Jul 1, 2022 • edited Loading

chibenwa commented Jul 1, 2022

chibenwa commented Jul 1, 2022

chibenwa commented Sep 13, 2022

chibenwa commented Nov 14, 2022

jhy commented Jan 6, 2023

jhy commented Jan 7, 2023

jhy commented Jan 7, 2023

jhy commented Jan 12, 2023

jhy commented Aug 10, 2024

chibenwa commented Jul 1, 2022 •

edited

Loading

chibenwa commented Jul 1, 2022 •

edited

Loading