Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics from VertxHttpServerMetrics grows until OOM happens #21790

Closed
janinko opened this issue Nov 29, 2021 · 17 comments
Closed

metrics from VertxHttpServerMetrics grows until OOM happens #21790

janinko opened this issue Nov 29, 2021 · 17 comments
Labels

Comments

@janinko
Copy link

janinko commented Nov 29, 2021

Describe the bug

We have an application that reads a lot of data over the network. The application crashed several times because of Out Of Memory. Investigating the memory showed that two instances of io.quarkus.micrometer.runtime.registry.json.JsonDistributionSummary took 96 % of memory and over 5 days grew from 162.5 MiB to 355.7 MiB, while the rest of the app's memory usage grew by 0.4 MiB.

The two JsonDistributionSummary were for http.server.bytes.read and http.server.bytes.written

Expected behavior

The memory usage is constrained.

Actual behavior

The memory usage grows until OOM error happens.

How to Reproduce?

No response

Output of uname -a or ver

No response

Output of java -version

No response

GraalVM version (if different from Java)

No response

Quarkus version or git rev

No response

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

@janinko janinko added the kind/bug Something isn't working label Nov 29, 2021
@quarkus-bot
Copy link

quarkus-bot bot commented Nov 29, 2021

/cc @ebullient, @jmartisk

@geoand
Copy link
Contributor

geoand commented Nov 29, 2021

Thanks for reporting @janinko.

Do you by any chance have information in your heap dump that shows which fields of JsonDistributionSummary where talking up all the memory?

@ebullient
Copy link
Member

Which version of Quarkus are you using? There was a fix added to the Json registry awhile ago to correct the depth of the ring buffer.

Note that the Json registry is optional, and stores/performs calculations that other registries don't. If you don't need it, I would turn it off.

@janinko
Copy link
Author

janinko commented Nov 29, 2021

Quarkus version 2.4.2.Final

image

@geoand
Copy link
Contributor

geoand commented Nov 29, 2021

Any chance you can test with 2.5 and see if the problem persists?

@janinko
Copy link
Author

janinko commented Nov 29, 2021

Note that the Json registry is optional, and stores/performs calculations that other registries don't. If you don't need it, I would turn it off.

We are trying to turn off this config quarkus.micrometer.binder.http-server.enabled = false.

Any chance you can test with 2.5 and see if the problem persists?

We can, but it will take some time for us to release and thest the app with quarkus 2.5. There is not so big data flow on our dev/stage instances, so we can only see the problem in production.

@ebullient
Copy link
Member

ebullient commented Nov 29, 2021

You misunderstood me. This isn't turning off http metrics, I am suggesting that you turn off the json registry if you don't need it: quarkus.micrometer.export.json.enabled=false. This registry is not enabled by default, and it definitely brings a big memory requirement along with it.

If you need the json registry, please make sure your ring buffer is set to a small number:
quarkus.micrometer.export.json.buffer-length=3. The larger that number is, the more memory it will require.

@geoand
Copy link
Contributor

geoand commented Nov 30, 2021

so we can only see the problem in production

Understood, but won't profiling your staging environment show an ever increasing ring buffer size if the problem persists?

@ebullient
Copy link
Member

I don't think anything has changed w.r.t. the Json registry recently, this is a knock-on effect of how the JsonRegistry works (it performs a lot more calculations, and requires more storage). The buffer-length setting is a concern, the setting was added / memory usage fixed in 2.1.0.Final, any release before that could see this runaway memory usage because the ring buffer length was way too big.

@geoand
Copy link
Contributor

geoand commented Nov 30, 2021

quarkus.micrometer.export.json.buffer-length=3 is already the default, isn't it?
And it seems it has been for a few months

@ebullient
Copy link
Member

Yes. When we added the flag in June, we set the default. So if the version of Quarkus in use is pre-2.1.0.Final, the explosion of data with the Json exporter is a known issue that would be resolved by turning it off.

@geoand
Copy link
Contributor

geoand commented Nov 30, 2021

This issue seems to be about Quarkus 2.4.2.Final

@ebullient
Copy link
Member

Right. The amount of calculation the Json exporter does is not really fixable (aside from ensuring the ring buffer value is small, which it should be).

There are a few ways to deal with this:

  • Turn the Json exporter off. This is what I would do, as I don't recommend the Json exporter for production use because it holds so much in memory.
  • If you actually require it in production, use a Deny MeterFilter with a MeterFilterConstraint to limit the metrics being sent to the Json exporter to constrain memory usage.

@ebullient
Copy link
Member

Closing this. Recommendations for how to work around memory usage are given above.

@gsmet
Copy link
Member

gsmet commented Dec 30, 2021

@ebullient hmmm, if this thing is not usable in production, we should either document it or tweak it or implement a filter by default. But it's definitely not the behavior I would expect and I'm even surprised this can be so much of an issue.

@gsmet
Copy link
Member

gsmet commented Dec 31, 2021

What's a bit weird is that the size of the ring buffer is 1024 which is the old default and it has been 3 since 2.0.0.Final (#17852).

@gsmet
Copy link
Member

gsmet commented Dec 31, 2021

I tried to debug the creation of the ring buffer and I get 3 for the size, which is what we expect in recent Quarkus versions.

@janinko I would be interested in having more information about your setup and how you end up with a ring buffer sized to 1024. I added a breakpoint in AbstractTimeWindowHistogram for the ringBuffer initialization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants