Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout on registering a metric #185

Open
jayashe opened this issue Jan 12, 2021 · 3 comments
Open

Timeout on registering a metric #185

jayashe opened this issue Jan 12, 2021 · 3 comments

Comments

@jayashe
Copy link

jayashe commented Jan 12, 2021

Seeing a genserver timeout via ensure_registered. What's weird is that genserver message ({:subscribe,...}) should only be called when the metric hasn't been registered before, but the metric should be registered because we have plenty of data points for the metric in question within the same session. This starts to happen when the system is under heavy load (and after the system has been running for several hours).

Trace:

GenServer Elixometer.Updater terminating
** (stop) exited in: GenServer.call(Elixometer, {:subscribe, ["my_app_prefix", "timers", "namespace", "of", "my", "module", "function"]}, 5000)
** (EXIT) time out
(elixir) lib/gen_server.ex:924: GenServer.call/3
(elixometer) lib/elixometer.ex:372: Elixometer.ensure_registered/2
(elixometer) lib/updater.ex:108: Elixometer.Updater.do_update/2
(elixir) lib/enum.ex:765: Enum.-each/2-lists^foreach/1-0-/2
(elixir) lib/enum.ex:765: Enum.each/2
(elixometer) lib/updater.ex:45: Elixometer.Updater.handle_info/2
(stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
(stdlib) gen_server.erl:711: :gen_server.handle_msg/6

Any thoughts on what could cause this?

@scohen
Copy link
Collaborator

scohen commented Jan 12, 2021

Are you seeing any other crashes in your logs? Without more information, it's very hard to debug. Do you have any SASL or OTP reports?

How heavy is the load you're seeing? Elixometer depends on a single GenServer, for subscriptions which can be a bottleneck.

@jayashe
Copy link
Author

jayashe commented Jan 12, 2021

@scohen thanks for quick response. Should have a full crash dump tomorrow that should provide more details (assuming we see the crash tomorrow at the peak load time). Will report back

@scohen
Copy link
Collaborator

scohen commented Jan 12, 2021

The fact that you're seeing resubscriptions happen seems to indicate that elixometer's genserver crashed and is getting backed up, which is why the handle_call timeout is being reached

Just my guess though; I've never seen it crash in production; it should be able to handle tens to a hundred thousand messages a second.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants