Replies: 4 comments
-
3.13.x is out of community support. The memory usage profile of your node is entirely dependent on the workload. Modern quorum queues have a certain footprint which is stable and streams have a very minimal footprint. Classic queue v2 act as lazy queues, meaning they keep data in memory for only a short period. Relevant doc guides:
And sorry to respond this way but quorum queues have been around since 2018. |
Beta Was this translation helpful? Give feedback.
-
@m0n5t3r start with collecting metrics and switching to CQv2 on RabbitMQ 4.0.2. |
Beta Was this translation helpful? Give feedback.
-
…and Memory Footprint with Classic Queues (v2 specifically). |
Beta Was this translation helpful? Give feedback.
-
I'd add that you might be interested in setting |
Beta Was this translation helpful? Give feedback.
-
Community Support Policy
RabbitMQ version used
3.13.7
How is RabbitMQ deployed?
Debian package
Steps to reproduce the behavior in question
Hello all
After a recent update to 3.13.7 (from 3.10.something, going through 3.11 because I guess feature flags are a thing now...): 3 node cluster, all queues mirrored, AWS c6.xlarge instances, one queue has about 70k deferred messages from celery (which means the consumer holds them in memory without acknowledging them until the specified time) - after upgrading, one of the nodes started getting killed by OOM on every deploy (which necessarily restarts celery consumers and the application producers - in total about 300 connections). The nodes are accessed in DNS round robin.
After some further flailing which involved bumping up the instance size and adding IOPS to the storage, now it looks like two of the instances explode on every deploy... incidentally, the ones that
don't hostaren't primaries of that one queue with the deferred messages seem to blow up now, basically going from 500-ish MB to $machine_ram in seconds and getting killed by OOM.We're going to switch to quorum queues eventually, but 1) we just found out they exist, and 2) the version of celery we use doesn't support them; apart from going back to 3.10 (which is non-trivial on a production cluster), what would be the options we have?
I did find a discussion detailing a similar behavior, but it was rabbitmq 3.6 and he was told to upgrade...
Beta Was this translation helpful? Give feedback.
All reactions