Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use sync.Pool for runner pooling #88

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Gusted
Copy link

@Gusted Gusted commented Dec 24, 2024

Currently as noted in the comment of putRunner, there's no attempt
being made to limit the size of the runner pooling - this can result in
the pool containing a lot of runners that were once created in a spur
but will likely not be used anymore. Instead of trying to do gc within
this code, move the pooling to sync.Pool which will deallocated objects
in idle and therefore keep the size of the pool as small as possible.

The pool is on a per-regexp scope, this means certain properties can be
re-used for optimal performance.

The motivation for this change is that I'm seeing a lot of memory (~300MiB) being
hold by these runners until the Go program is restarted which feels like
an unoptimal usage of memory, with this change after a spur of these
runners have been created in a small amount of time they are gracefully
deallocated over time and no longer hold memory indefinitely.

@dlclark
Copy link
Owner

dlclark commented Dec 31, 2024

Thanks for this @Gusted -- what's the impact on CPU benchmarks around this change?

@Gusted
Copy link
Author

Gusted commented Jan 2, 2025

It is quite noticeable in specific benchmarks, mostly due to be Match not being pooled anymore, which cannot really be pooled now as its being returned to the application using regexp2 and the application would otherwise need take care of putting it back in the pool and that feels like a breaking change to require that for a memory efficient usage of regexp2 (In the current code it does feel like a easy way to create a data race if the application is using the match after it started a new run with the regexp, but this seems intentional?). This does feel like something that can be pooled if I moved the pool back to a per-regexp scope.

benchstat
                                           │    before    │                after-3                │
                                           │    sec/op    │    sec/op      vs base                │
Literal-12                                   351.1n ±  1%    589.7n ± 17%   +67.97% (p=0.002 n=6)
NotLiteral-12                                3.615µ ±  1%    3.913µ ±  1%    +8.26% (p=0.002 n=6)
MatchClass-12                                828.9n ±  1%   1039.5n ±  1%   +25.41% (p=0.002 n=6)
MatchClass_InRange-12                        811.4n ±  1%   1029.5n ±  1%   +26.88% (p=0.002 n=6)
AnchoredLiteralShortNonMatch-12              101.5n ±  1%    116.1n ±  1%   +14.33% (p=0.002 n=6)
AnchoredLiteralLongNonMatch-12               32.62n ±  2%    31.16n ±  1%    -4.48% (p=0.002 n=6)
AnchoredShortMatch-12                        214.5n ±  1%    498.9n ± 13%  +132.62% (p=0.002 n=6)
AnchoredLongMatch-12                         137.6n ±  1%    357.2n ±  3%  +159.69% (p=0.002 n=6)
OnePassShortA-12                             879.7n ±  1%   1545.5n ±  4%   +75.68% (p=0.002 n=6)
NotOnePassShortA-12                          890.3n ±  1%   1549.0n ±  5%   +73.99% (p=0.002 n=6)
OnePassShortB-12                             522.2n ±  1%    858.9n ±  7%   +64.48% (p=0.002 n=6)
NotOnePassShortB-12                          534.2n ±  1%    857.2n ±  4%   +60.45% (p=0.002 n=6)
OnePassLongPrefix-12                         227.0n ±  1%    464.4n ±  7%  +104.58% (p=0.002 n=6)
OnePassLongNotPrefix-12                      215.0n ±  1%    457.8n ±  3%  +112.93% (p=0.002 n=6)
MatchEasy0_32-12                             31.88n ±  1%    41.22n ±  0%   +29.31% (p=0.002 n=6)
MatchEasy0_1K-12                             145.7n ±  1%    165.8n ±  1%   +13.80% (p=0.002 n=6)
MatchEasy0_32K-12                            4.386µ ±  1%    4.411µ ±  1%    +0.57% (p=0.015 n=6)
MatchEasy0_1M-12                             138.0µ ±  1%    138.1µ ±  1%         ~ (p=0.589 n=6)
MatchEasy0_32M-12                            8.155m ±  0%    8.296m ±  1%    +1.73% (p=0.002 n=6)
MatchEasy1_32-12                             120.4n ±  1%    137.7n ±  1%   +14.28% (p=0.002 n=6)
MatchEasy1_1K-12                             4.130µ ±  0%    4.405µ ±  1%    +6.66% (p=0.002 n=6)
MatchEasy1_32K-12                            116.4µ ±  1%    117.9µ ±  1%    +1.33% (p=0.041 n=6)
MatchEasy1_1M-12                             3.786m ±  0%    3.774m ±  1%         ~ (p=0.065 n=6)
MatchEasy1_32M-12                            121.3m ±  0%    121.5m ±  1%         ~ (p=1.000 n=6)
MatchMedium_32-12                            238.5n ±  1%    239.3n ±  1%         ~ (p=0.461 n=6)
MatchMedium_1K-12                            11.95µ ±  1%    12.11µ ±  3%         ~ (p=0.093 n=6)
MatchMedium_32K-12                           403.6µ ±  1%    397.0µ ±  1%    -1.64% (p=0.004 n=6)
MatchMedium_1M-12                            12.96m ±  1%    12.87m ±  1%         ~ (p=0.132 n=6)
MatchMedium_32M-12                           415.6m ±  1%    408.3m ±  1%    -1.76% (p=0.004 n=6)
MatchHard_32-12                              15.40µ ±  1%    15.40µ ±  1%         ~ (p=0.732 n=6)
MatchHard_1K-12                              740.8µ ±  1%    734.3µ ±  1%    -0.89% (p=0.002 n=6)
MatchHard_32K-12                             25.76m ±  0%    25.23m ±  1%    -2.03% (p=0.002 n=6)
MatchHard_1M-12                              861.4m ±  0%    853.3m ±  1%    -0.93% (p=0.004 n=6)
MatchHard_32M-12                              27.70 ±  0%     27.25 ±  0%    -1.63% (p=0.002 n=6)
MatchHard1_32-12                             1.484µ ±  1%    1.700µ ±  2%   +14.56% (p=0.002 n=6)
MatchHard1_1K-12                             56.39µ ±  1%    55.99µ ±  1%    -0.71% (p=0.004 n=6)
MatchHard1_32K-12                            1.894m ±  1%    1.878m ±  1%         ~ (p=0.065 n=6)
MatchHard1_1M-12                             60.41m ±  2%    60.35m ±  1%         ~ (p=0.589 n=6)
MatchHard1_32M-12                             1.935 ±  1%     1.935 ±  1%         ~ (p=0.699 n=6)
Leading-12                                   12.88µ ±  0%    12.60µ ±  3%         ~ (p=0.065 n=6)
ShortSearch/serial-no-timeout-12             32.53n ±  0%    43.28n ±  1%   +33.04% (p=0.002 n=6)
ShortSearch/serial-fixed-timeout-12          33.58n ±  0%    44.32n ±  2%   +31.98% (p=0.002 n=6)
ShortSearch/serial-increasing-timeout-12     34.13n ± 29%    44.72n ± 14%   +31.03% (p=0.002 n=6)
ShortSearch/parallel-no-timeout-12           4.519n ±  2%    5.500n ± 11%   +21.70% (p=0.002 n=6)
ShortSearch/parallel-fixed-timeout-12        4.752n ±  1%    6.469n ± 15%   +36.12% (p=0.002 n=6)
ShortSearch/parallel-increasing-timeout-12   4.752n ±  2%    6.260n ± 29%   +31.75% (p=0.002 n=6)
ParserPrefixLongLen-12                       12.20m ±  1%    12.02m ±  2%    -1.40% (p=0.041 n=6)
geomean                                      13.69µ          16.49µ         +20.45%

                                           │    before     │               after-3                │
                                           │      B/s      │      B/s       vs base               │
MatchEasy0_32-12                             957.2Mi ±  1%   740.3Mi ±  0%  -22.65% (p=0.002 n=6)
MatchEasy0_1K-12                             6.545Gi ±  1%   5.751Gi ±  1%  -12.14% (p=0.002 n=6)
MatchEasy0_32K-12                            6.959Gi ±  1%   6.919Gi ±  1%   -0.57% (p=0.015 n=6)
MatchEasy0_1M-12                             7.077Gi ±  1%   7.072Gi ±  1%        ~ (p=0.589 n=6)
MatchEasy0_32M-12                            3.832Gi ±  0%   3.767Gi ±  1%   -1.70% (p=0.002 n=6)
MatchEasy1_32-12                             253.3Mi ±  1%   221.7Mi ±  1%  -12.46% (p=0.002 n=6)
MatchEasy1_1K-12                             236.4Mi ±  0%   221.7Mi ±  1%   -6.24% (p=0.002 n=6)
MatchEasy1_32K-12                            268.6Mi ±  1%   265.1Mi ±  1%   -1.30% (p=0.041 n=6)
MatchEasy1_1M-12                             264.1Mi ±  0%   265.0Mi ±  1%        ~ (p=0.065 n=6)
MatchEasy1_32M-12                            263.7Mi ±  0%   263.4Mi ±  1%        ~ (p=1.000 n=6)
MatchMedium_32-12                            128.0Mi ±  1%   127.5Mi ±  1%        ~ (p=0.461 n=6)
MatchMedium_1K-12                            81.73Mi ±  1%   80.66Mi ±  3%        ~ (p=0.093 n=6)
MatchMedium_32K-12                           77.42Mi ±  1%   78.71Mi ±  1%   +1.66% (p=0.004 n=6)
MatchMedium_1M-12                            77.17Mi ±  1%   77.70Mi ±  1%        ~ (p=0.132 n=6)
MatchMedium_32M-12                           76.99Mi ±  1%   78.38Mi ±  1%   +1.80% (p=0.004 n=6)
MatchHard_32-12                              1.984Mi ±  1%   1.984Mi ±  1%        ~ (p=0.924 n=6)
MatchHard_1K-12                              1.316Mi ±  1%   1.330Mi ±  1%   +1.09% (p=0.013 n=6)
MatchHard_32K-12                             1.211Mi ±  1%   1.240Mi ±  1%   +2.36% (p=0.002 n=6)
MatchHard_1M-12                              1.163Mi ±  1%   1.173Mi ±  1%   +0.82% (p=0.045 n=6)
MatchHard_32M-12                             1.154Mi ±  0%   1.173Mi ±  1%   +1.65% (p=0.002 n=6)
MatchHard1_32-12                             20.57Mi ±  1%   17.96Mi ±  2%  -12.70% (p=0.002 n=6)
MatchHard1_1K-12                             17.32Mi ±  1%   17.44Mi ±  1%   +0.72% (p=0.006 n=6)
MatchHard1_32K-12                            16.50Mi ±  1%   16.64Mi ±  1%        ~ (p=0.071 n=6)
MatchHard1_1M-12                             16.56Mi ±  2%   16.57Mi ±  1%        ~ (p=0.563 n=6)
MatchHard1_32M-12                            16.54Mi ±  1%   16.54Mi ±  1%        ~ (p=0.619 n=6)
ShortSearch/serial-no-timeout-12             2.863Gi ±  0%   2.152Gi ±  1%  -24.84% (p=0.002 n=6)
ShortSearch/serial-fixed-timeout-12          2.773Gi ±  0%   2.101Gi ±  1%  -24.23% (p=0.002 n=6)
ShortSearch/serial-increasing-timeout-12     2.729Gi ± 22%   2.083Gi ± 13%  -23.70% (p=0.002 n=6)
ShortSearch/parallel-no-timeout-12           20.61Gi ±  2%   16.94Gi ± 10%  -17.80% (p=0.002 n=6)
ShortSearch/parallel-fixed-timeout-12        19.60Gi ±  1%   14.40Gi ± 13%  -26.52% (p=0.002 n=6)
ShortSearch/parallel-increasing-timeout-12   19.60Gi ±  2%   14.89Gi ± 23%  -24.02% (p=0.002 n=6)
geomean                                      180.8Mi         168.1Mi         -7.05%

Currently as noted in the comment of `putRunner`, there's no attempt
being made to limit the size of the runner pooling - this can result in
the pool containing a lot of runners that were once created in a spur
but will likely not be used anymore. Instead of trying to do gc within
this code, move the pooling to `sync.Pool` which will deallocated objects
in idle and therefore keep the size of the pool as small as possible.

The pool is on a per-regexp scope, this means certain properties can be
re-used for optimal performance.

The motivation for this change is that I'm seeing a lot of memory (~300MiB) being
hold by these runners until the Go program is restarted which feels like
an unoptimal usage of memory, with this change after a spur of these
runners have been created in a small amount of time they are gracefully
deallocated over time and no longer hold memory indefinitely.
@Gusted
Copy link
Author

Gusted commented Jan 9, 2025

The change is now much smaller and the performance is much better, overhead of atomic operations is a few ns which results in average 3.6% performance drop.

goos: linux
goarch: amd64
pkg: github.com/dlclark/regexp2
cpu: AMD Ryzen 5 3600X 6-Core Processor             
                                           │    before    │               after-4               │
                                           │    sec/op    │    sec/op     vs base               │
Literal-12                                   351.1n ±  1%   370.8n ±  1%   +5.63% (p=0.002 n=6)
NotLiteral-12                                3.615µ ±  1%   3.911µ ±  1%   +8.20% (p=0.002 n=6)
MatchClass-12                                828.9n ±  1%   848.6n ±  3%   +2.38% (p=0.002 n=6)
MatchClass_InRange-12                        811.4n ±  1%   828.7n ±  6%   +2.13% (p=0.009 n=6)
AnchoredLiteralShortNonMatch-12              101.5n ±  1%   117.0n ±  1%  +15.21% (p=0.002 n=6)
AnchoredLiteralLongNonMatch-12               32.62n ±  2%   32.84n ±  1%        ~ (p=0.485 n=6)
AnchoredShortMatch-12                        214.5n ±  1%   220.7n ±  1%   +2.89% (p=0.002 n=6)
AnchoredLongMatch-12                         137.6n ±  1%   137.6n ±  2%        ~ (p=0.976 n=6)
OnePassShortA-12                             879.7n ±  1%   901.8n ±  1%   +2.52% (p=0.002 n=6)
NotOnePassShortA-12                          890.3n ±  1%   909.4n ±  1%   +2.15% (p=0.002 n=6)
OnePassShortB-12                             522.2n ±  1%   544.1n ±  2%   +4.19% (p=0.002 n=6)
NotOnePassShortB-12                          534.2n ±  1%   552.1n ±  1%   +3.35% (p=0.002 n=6)
OnePassLongPrefix-12                         227.0n ±  1%   239.1n ±  1%   +5.33% (p=0.002 n=6)
OnePassLongNotPrefix-12                      215.0n ±  1%   226.8n ±  5%   +5.49% (p=0.002 n=6)
MatchEasy0_32-12                             31.88n ±  1%   32.75n ±  1%   +2.71% (p=0.002 n=6)
MatchEasy0_1K-12                             145.7n ±  1%   153.8n ±  1%   +5.53% (p=0.002 n=6)
MatchEasy0_32K-12                            4.386µ ±  1%   4.441µ ±  1%   +1.28% (p=0.002 n=6)
MatchEasy0_1M-12                             138.0µ ±  1%   137.7µ ±  1%        ~ (p=1.000 n=6)
MatchEasy0_32M-12                            8.155m ±  0%   8.447m ±  6%   +3.58% (p=0.002 n=6)
MatchEasy1_32-12                             120.4n ±  1%   129.2n ±  2%   +7.22% (p=0.002 n=6)
MatchEasy1_1K-12                             4.130µ ±  0%   4.198µ ±  2%   +1.63% (p=0.002 n=6)
MatchEasy1_32K-12                            116.4µ ±  1%   117.2µ ±  0%        ~ (p=0.065 n=6)
MatchEasy1_1M-12                             3.786m ±  0%   3.801m ±  0%   +0.40% (p=0.015 n=6)
MatchEasy1_32M-12                            121.3m ±  0%   120.7m ±  1%   -0.55% (p=0.002 n=6)
MatchMedium_32-12                            238.5n ±  1%   243.2n ±  1%   +1.97% (p=0.004 n=6)
MatchMedium_1K-12                            11.95µ ±  1%   11.99µ ±  4%        ~ (p=1.000 n=6)
MatchMedium_32K-12                           403.6µ ±  1%   406.5µ ± 16%        ~ (p=0.699 n=6)
MatchMedium_1M-12                            12.96m ±  1%   12.74m ±  5%        ~ (p=0.132 n=6)
MatchMedium_32M-12                           415.6m ±  1%   409.4m ±  2%   -1.49% (p=0.041 n=6)
MatchHard_32-12                              15.40µ ±  1%   15.66µ ±  1%   +1.71% (p=0.009 n=6)
MatchHard_1K-12                              740.8µ ±  1%   743.6µ ±  1%        ~ (p=0.180 n=6)
MatchHard_32K-12                             25.76m ±  0%   25.47m ±  2%        ~ (p=0.065 n=6)
MatchHard_1M-12                              861.4m ±  0%   846.8m ±  1%   -1.69% (p=0.002 n=6)
MatchHard_32M-12                              27.70 ±  0%    27.35 ±  1%   -1.26% (p=0.002 n=6)
MatchHard1_32-12                             1.484µ ±  1%   1.502µ ±  1%   +1.25% (p=0.009 n=6)
MatchHard1_1K-12                             56.39µ ±  1%   55.92µ ±  1%   -0.82% (p=0.026 n=6)
MatchHard1_32K-12                            1.894m ±  1%   1.884m ±  5%        ~ (p=0.394 n=6)
MatchHard1_1M-12                             60.41m ±  2%   60.52m ±  2%        ~ (p=0.699 n=6)
MatchHard1_32M-12                             1.935 ±  1%    1.925 ±  1%        ~ (p=0.310 n=6)
Leading-12                                   12.88µ ±  0%   12.46µ ±  1%   -3.22% (p=0.002 n=6)
ShortSearch/serial-no-timeout-12             32.53n ±  0%   36.20n ±  1%  +11.26% (p=0.002 n=6)
ShortSearch/serial-fixed-timeout-12          33.58n ±  0%   39.28n ±  1%  +16.97% (p=0.002 n=6)
ShortSearch/serial-increasing-timeout-12     34.13n ± 29%   39.69n ± 17%  +16.31% (p=0.041 n=6)
ShortSearch/parallel-no-timeout-12           4.519n ±  2%   5.380n ±  9%  +19.06% (p=0.002 n=6)
ShortSearch/parallel-fixed-timeout-12        4.752n ±  1%   5.692n ±  8%  +19.78% (p=0.002 n=6)
ShortSearch/parallel-increasing-timeout-12   4.752n ±  2%   5.646n ± 14%  +18.83% (p=0.002 n=6)
ParserPrefixLongLen-12                       12.20m ±  1%   11.81m ±  1%   -3.13% (p=0.002 n=6)
geomean                                      13.69µ         14.18µ         +3.57%

                                           │    before     │               after-4                │
                                           │      B/s      │      B/s       vs base               │
MatchEasy0_32-12                             957.2Mi ±  1%   931.9Mi ±  1%   -2.64% (p=0.002 n=6)
MatchEasy0_1K-12                             6.545Gi ±  1%   6.203Gi ±  1%   -5.23% (p=0.002 n=6)
MatchEasy0_32K-12                            6.959Gi ±  1%   6.871Gi ±  1%   -1.26% (p=0.002 n=6)
MatchEasy0_1M-12                             7.077Gi ±  1%   7.092Gi ±  1%        ~ (p=1.000 n=6)
MatchEasy0_32M-12                            3.832Gi ±  0%   3.700Gi ±  5%   -3.45% (p=0.002 n=6)
MatchEasy1_32-12                             253.3Mi ±  1%   236.3Mi ±  2%   -6.72% (p=0.002 n=6)
MatchEasy1_1K-12                             236.4Mi ±  0%   232.7Mi ±  2%   -1.60% (p=0.002 n=6)
MatchEasy1_32K-12                            268.6Mi ±  1%   266.6Mi ±  0%        ~ (p=0.065 n=6)
MatchEasy1_1M-12                             264.1Mi ±  0%   263.1Mi ±  0%   -0.39% (p=0.015 n=6)
MatchEasy1_32M-12                            263.7Mi ±  0%   265.2Mi ±  1%   +0.56% (p=0.002 n=6)
MatchMedium_32-12                            128.0Mi ±  1%   125.5Mi ±  1%   -1.95% (p=0.004 n=6)
MatchMedium_1K-12                            81.73Mi ±  1%   81.46Mi ±  4%        ~ (p=1.000 n=6)
MatchMedium_32K-12                           77.42Mi ±  1%   76.89Mi ± 14%        ~ (p=0.699 n=6)
MatchMedium_1M-12                            77.17Mi ±  1%   78.52Mi ±  5%        ~ (p=0.132 n=6)
MatchMedium_32M-12                           76.99Mi ±  1%   78.16Mi ±  2%   +1.52% (p=0.041 n=6)
MatchHard_32-12                              1.984Mi ±  1%   1.950Mi ±  1%   -1.68% (p=0.011 n=6)
MatchHard_1K-12                              1.316Mi ±  1%   1.316Mi ±  1%        ~ (p=0.758 n=6)
MatchHard_32K-12                             1.211Mi ±  1%   1.230Mi ±  2%   +1.57% (p=0.045 n=6)
MatchHard_1M-12                              1.163Mi ±  1%   1.178Mi ±  1%   +1.23% (p=0.002 n=6)
MatchHard_32M-12                             1.154Mi ±  0%   1.173Mi ±  1%   +1.65% (p=0.002 n=6)
MatchHard1_32-12                             20.57Mi ±  1%   20.32Mi ±  1%   -1.23% (p=0.009 n=6)
MatchHard1_1K-12                             17.32Mi ±  1%   17.46Mi ±  1%   +0.83% (p=0.022 n=6)
MatchHard1_32K-12                            16.50Mi ±  1%   16.59Mi ±  5%        ~ (p=0.331 n=6)
MatchHard1_1M-12                             16.56Mi ±  2%   16.52Mi ±  2%        ~ (p=0.667 n=6)
MatchHard1_32M-12                            16.54Mi ±  1%   16.62Mi ±  1%        ~ (p=0.242 n=6)
ShortSearch/serial-no-timeout-12             2.863Gi ±  0%   2.573Gi ±  1%  -10.13% (p=0.002 n=6)
ShortSearch/serial-fixed-timeout-12          2.773Gi ±  0%   2.371Gi ±  2%  -14.51% (p=0.002 n=6)
ShortSearch/serial-increasing-timeout-12     2.729Gi ± 22%   2.346Gi ± 15%  -14.04% (p=0.041 n=6)
ShortSearch/parallel-no-timeout-12           20.61Gi ±  2%   17.31Gi ±  9%  -16.02% (p=0.002 n=6)
ShortSearch/parallel-fixed-timeout-12        19.60Gi ±  1%   16.36Gi ±  8%  -16.50% (p=0.002 n=6)
ShortSearch/parallel-increasing-timeout-12   19.60Gi ±  2%   16.50Gi ± 12%  -15.83% (p=0.002 n=6)
geomean                                      180.8Mi         174.4Mi         -3.57%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants