-
Notifications
You must be signed in to change notification settings - Fork 512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Meet java.util.concurrent.RejectedExecutionException after running some days #913
Comments
Hello,
Your ""riemann executor agg completed rate"" seems to be always 0:
102514 INFO [2018-02-26 14:53:56,934] Thread-9 - riemann.config -
#riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann
executor agg completed rate, :state ok, :description nil, :metric 0N,
:tags [riemann], :time 379907009 233/250, :ttl 120}
Are the events correctly forwarded to the other Riemann instance ?
2018-02-27 4:08 GMT+01:00 keyboardfann <[email protected]>:
… When I run the riemann after some days, it will start to throw the error: java.util.concurrent.RejectedExecutionException , I see another resolved issue #655 change the queue-size to 10000, I have to change the queue size to 10000 and max-pool-size to 1024 but after some days the alert will show again.
riemann version: 0.2.14
riemann.conf
; -*- mode: clojure; -*-
; vim: filetype=clojure
; non-streams related configuration is located here
(include "/etc/riemann/include")
(instrumentation {:interval 60 :enabled? true})
(let [index (index)]
; Inbound events will be passed to these streams:
(streams
(default :ttl 60
; Index all events immediately.
index
; Log expired events.
(expired
(fn [event] (info "expired" event)))))
(streams
;Output riemann internal metrics to log
(where (service #"^riemann ")
#(info %)
)
(by [:host :service]
(where (not (expired? event))
(smap (fn [events]
(dissoc
(assoc events :site (get (re-find (re-pattern "([A-Za-z0-9]*)[.](ops)[.]([a-z]*)") hostname) 1) :ttl 360 )
:okthreshold :criticalthreshold :departmentkey :alertformat :alerter :alertstate
:alertdescription :failure_message :step_message :sqlinstance_owner :is_job_running :original_host :original_service
:description :job_category :job_id :job_owner :failure_time :next_runtime :frequency :step_id :sqlinstanceowner
))
send_to_central_riemann))))
)
include/centralriemann.clj
; -*- mode: clojure; -*-
; vim: filetype=clojure
; Load the Riemann TCP client. Were going to use this client to send events downstream.
(require 'riemann.client)
; Forward events to the central riemann server
; 1. We define the downstream symbol ('send_to_central_riemann'). When that symbol is referenced events are passed into it.
; 2. Events first go into the batch stream. Every 100 events or 1/10th of a second events are batched and sent on.
; 3. The batched events are passed to the async-queue! stream.
; 4. The async-queue! stream passes the events to the forward stream which sends them to the central riemann server over 5555
(def send_to_central_riemann (batch 1000 1/10
(async-queue! :agg { :queue-size 10000
:core-pool-size 32
:max-pool-size 1024}
(forward
(riemann.client/tcp-client :host "10.x.x.x")))) ; Need to set it to TNG loadbalancer IP
riemann.log error log
102497 INFO [2018-02-26 14:53:56,933] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server udp 0.0.0.0:5555 in latency 0.95, :state ok, :description nil, :metric nil, :tags [riemann], :t ime 379907009233/250, :ttl 120}
102498 INFO [2018-02-26 14:53:56,933] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server udp 0.0.0.0:5555 in latency 0.99, :state ok, :description nil, :metric nil, :tags [riemann], :t ime 379907009233/250, :ttl 120}
102499 INFO [2018-02-26 14:53:56,933] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server udp 0.0.0.0:5555 in latency 0.999, :state ok, :description nil, :metric nil, :tags [riemann], : time 379907009233/250, :ttl 120}
102500 INFO [2018-02-26 14:53:56,933] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 conns, :state ok, :description nil, :metric 0, :tags [riemann], :time 379907009 233/250, :ttl 120}
102501 INFO [2018-02-26 14:53:56,933] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 out rate, :state ok, :description nil, :metric 0.0, :tags [riemann], :time 3799 07009233/250, :ttl 120}
102502 INFO [2018-02-26 14:53:56,933] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 in rate, :state ok, :description nil, :metric 0.0, :tags [riemann], :time 37990 7009233/250, :ttl 120}
102503 INFO [2018-02-26 14:53:56,933] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 out latency 0.0, :state ok, :description nil, :metric nil, :tags [riemann], :ti me 379907009233/250, :ttl 120}
102504 INFO [2018-02-26 14:53:56,933] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 out latency 0.5, :state ok, :description nil, :metric nil, :tags [riemann], :ti me 379907009233/250, :ttl 120}
102505 INFO [2018-02-26 14:53:56,933] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 out latency 0.95, :state ok, :description nil, :metric nil, :tags [riemann], :t ime 379907009233/250, :ttl 120}
102506 INFO [2018-02-26 14:53:56,933] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 out latency 0.99, :state ok, :description nil, :metric nil, :tags [riemann], :t ime 379907009233/250, :ttl 120}
102507 INFO [2018-02-26 14:53:56,933] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 out latency 0.999, :state ok, :description nil, :metric nil, :tags [riemann], : time 379907009233/250, :ttl 120}
102508 INFO [2018-02-26 14:53:56,933] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 in latency 0.0, :state ok, :description nil, :metric nil, :tags [riemann], :tim e 379907009233/250, :ttl 120}
102509 INFO [2018-02-26 14:53:56,933] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 in latency 0.5, :state ok, :description nil, :metric nil, :tags [riemann], :tim e 379907009233/250, :ttl 120}
102510 INFO [2018-02-26 14:53:56,934] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 in latency 0.95, :state ok, :description nil, :metric nil, :tags [riemann], :ti me 379907009233/250, :ttl 120}
102511 INFO [2018-02-26 14:53:56,934] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 in latency 0.99, :state ok, :description nil, :metric nil, :tags [riemann], :ti me 379907009233/250, :ttl 120}
102512 INFO [2018-02-26 14:53:56,934] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 in latency 0.999, :state ok, :description nil, :metric nil, :tags [riemann], :t ime 379907009233/250, :ttl 120}
102513 INFO [2018-02-26 14:53:56,934] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann executor agg accepted rate, :state ok, :description nil, :metric 223500/30001, :tags [riemann], :time 379907009233/250, :ttl 120}
102514 INFO [2018-02-26 14:53:56,934] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann executor agg completed rate, :state ok, :description nil, :metric 0N, :tags [riemann], :time 379907009 233/250, :ttl 120}
102515 INFO [2018-02-26 14:53:56,934] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann executor agg rejected rate, :state ok, :description nil, :metric 0N, :tags [riemann], :time 3799070092 33/250, :ttl 120}
102516 INFO [2018-02-26 14:53:56,934] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann executor agg queue capacity, :state ok, :description nil, :metric 10000, :tags [riemann], :time 379907 009233/250, :ttl 120}
102517 INFO [2018-02-26 14:53:56,934] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann executor agg queue size, :state critical, :description nil, :metric 9991, :tags [riemann], :time 37990 7009233/250, :ttl 120}
102518 INFO [2018-02-26 14:53:56,934] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann executor agg queue used, :state critical, :description nil, :metric 9991/10000, :tags [riemann], :time 379907009233/250, :ttl 120}
102519 INFO [2018-02-26 14:53:56,934] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann executor agg threads active, :state ok, :description nil, :metric 246, :tags [riemann], :time 37990700 9233/250, :ttl 120}
102520 INFO [2018-02-26 14:53:56,934] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann executor agg threads alive, :state ok, :description nil, :metric 246, :tags [riemann], :time 379907009 233/250, :ttl 120}
...
...
102557 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server udp 0.0.0.0:5555 in latency 0.99, :state ok, :description nil, :metric nil, :tags [riemann], :t ime 759814048467/500, :ttl 120}
102558 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server udp 0.0.0.0:5555 in latency 0.999, :state ok, :description nil, :metric nil, :tags [riemann], : time 759814048467/500, :ttl 120}
102559 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 conns, :state ok, :description nil, :metric 0, :tags [riemann], :time 759814048 467/500, :ttl 120}
102560 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 out rate, :state ok, :description nil, :metric 0.0, :tags [riemann], :time 7598 14048467/500, :ttl 120}
102561 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 in rate, :state ok, :description nil, :metric 0.0, :tags [riemann], :time 75981 4048467/500, :ttl 120}
102562 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 out latency 0.0, :state ok, :description nil, :metric nil, :tags [riemann], :ti me 759814048467/500, :ttl 120}
102563 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 out latency 0.5, :state ok, :description nil, :metric nil, :tags [riemann], :ti me 759814048467/500, :ttl 120}
102564 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 out latency 0.95, :state ok, :description nil, :metric nil, :tags [riemann], :t ime 759814048467/500, :ttl 120}
102565 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 out latency 0.99, :state ok, :description nil, :metric nil, :tags [riemann], :t ime 759814048467/500, :ttl 120}
102566 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 out latency 0.999, :state ok, :description nil, :metric nil, :tags [riemann], : time 759814048467/500, :ttl 120}
102567 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 in latency 0.0, :state ok, :description nil, :metric nil, :tags [riemann], :tim e 759814048467/500, :ttl 120}
102568 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 in latency 0.5, :state ok, :description nil, :metric nil, :tags [riemann], :tim e 759814048467/500, :ttl 120}
102569 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 in latency 0.95, :state ok, :description nil, :metric nil, :tags [riemann], :ti me 759814048467/500, :ttl 120}
102570 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 in latency 0.99, :state ok, :description nil, :metric nil, :tags [riemann], :ti me 759814048467/500, :ttl 120}
102571 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann server ws 0.0.0.0:5556 in latency 0.999, :state ok, :description nil, :metric nil, :tags [riemann], :t ime 759814048467/500, :ttl 120}
102572 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann executor agg accepted rate, :state ok, :description nil, :metric 152000/20001, :tags [riemann], :time 303925619387/200, :ttl 120}
102573 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann executor agg completed rate, :state ok, :description nil, :metric 0N, :tags [riemann], :time 303925619 387/200, :ttl 120}
102574 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann executor agg rejected rate, :state ok, :description nil, :metric 0N, :tags [riemann], :time 3039256193 <(303)%20925-6193> 87/200, :ttl 120}
102575 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann executor agg queue capacity, :state ok, :description nil, :metric 10000, :tags [riemann], :time 303925 619387/200, :ttl 120}
102576 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann executor agg queue size, :state critical, :description nil, :metric 10000, :tags [riemann], :time 3039 25619387/200, :ttl 120}
102577 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann executor agg queue used, :state critical, :description nil, :metric 1, :tags [riemann], :time 30392561 9387/200, :ttl 120}
102578 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann executor agg threads active, :state ok, :description nil, :metric 693, :tags [riemann], :time 30392561 9387/200, :ttl 120}
102579 INFO [2018-02-26 14:54:56,936] Thread-9 - riemann.config - #riemann.codec.Event{:host tcb0124.tcb.mgsops.com, :service riemann executor agg threads alive, :state ok, :description nil, :metric 693, :tags [riemann], :time 303925619 387/200, :ttl 120}
...
...
102599 java.util.concurrent.RejectedExecutionException: Task ***@***.*** rejected from ***@***.***[Running, pool size = 1024, active threads = 1024, queued tasks = 10 000, completed tasks = 8356055]
102600 at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
102601 at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
102602 at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
102603 at riemann.service.ExecutorServiceService.execute(service.clj:173)
102604 at riemann.streams$execute_on$stream__7093.invoke(streams.clj:266)
102605 at riemann.streams$batch$flush__7957$fn__7963.invoke(streams.clj:1154)
102606 at riemann.streams$batch$flush__7957.invoke(streams.clj:1154)
102607 at riemann.streams$part_time_simple$tick__7301.invoke(streams.clj:610)
102608 at riemann.time.Once.run(time.clj:42)
102609 at riemann.time$run_tasks_BANG_$fn__5370$fn__5371.invoke(time.clj:154)
102610 at riemann.time$run_tasks_BANG_$fn__5370.invoke(time.clj:153)
102611 at riemann.time$run_tasks_BANG_.invokeStatic(time.clj:147)
102612 at riemann.time$run_tasks_BANG_.invoke(time.clj:142)
102613 at riemann.time$start_BANG_$fn__5386$fn__5387.invoke(time.clj:189)
102614 at clojure.lang.AFn.applyToHelper(AFn.java:152)
102615 at clojure.lang.AFn.applyTo(AFn.java:144)
102616 at clojure.core$apply.invokeStatic(core.clj:646)
102617 at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1881)
102618 at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1881)
102619 at clojure.lang.RestFn.invoke(RestFn.java:425)
102620 at clojure.lang.AFn.applyToHelper(AFn.java:156)
102621 at clojure.lang.RestFn.applyTo(RestFn.java:132)
102622 at clojure.core$apply.invokeStatic(core.clj:650)
102623 at clojure.core$bound_fn_STAR_$fn__4671.doInvoke(core.clj:1911)
102624 at clojure.lang.RestFn.invoke(RestFn.java:397)
102625 at clojure.lang.AFn.run(AFn.java:22)
102626 at java.lang.Thread.run(Thread.java:748)
102627 WARN [2018-02-26 14:55:39,096] riemann task 0 - riemann.streams - ***@***.*** threw
102628 java.util.concurrent.RejectedExecutionException: Task ***@***.*** rejected from ***@***.***[Running, pool size = 1024, active threads = 1024, queued tasks = 1 0000, completed tasks = 8356055]
102629 at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
102630 at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
102631 at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
102632 at riemann.service.ExecutorServiceService.execute(service.clj:173)
102633 at riemann.streams$execute_on$stream__7093.invoke(streams.clj:266)
102634 at riemann.streams$batch$flush__7957$fn__7963.invoke(streams.clj:1154)
102635 at riemann.streams$batch$flush__7957.invoke(streams.clj:1154)
102636 at riemann.streams$part_time_simple$tick__7301.invoke(streams.clj:610)
102637 at riemann.time.Once.run(time.clj:42)
102638 at riemann.time$run_tasks_BANG_$fn__5370$fn__5371.invoke(time.clj:154)
102639 at riemann.time$run_tasks_BANG_$fn__5370.invoke(time.clj:153)
102640 at riemann.time$run_tasks_BANG_.invokeStatic(time.clj:147)
102641 at riemann.time$run_tasks_BANG_.invoke(time.clj:142)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#913>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEPrhL_Fu2K0tpgVVY9Sa-r8o2UuUz9Mks5tY3HKgaJpZM4SUSTw>
.
|
Hi @mcorbin, |
I dont know what happened. You had any other exceptions in the logs ? Btw, in your example code: (def send_to_central_riemann (batch 1000 1/10
(async-queue! :agg { :queue-size 10000
:core-pool-size 32
:max-pool-size 1024}
(forward
(riemann.client/tcp-client :host "10.x.x.x")))) ; Need to set it to loadbalancer IP a parenthesis is missing at the end. |
HI @mcorbin |
So when it happens, the Riemann on 10.x.x.x stop receiving events completely ? |
Hi @mcorbin, |
Excellent. Please re-open if your issue reappears. Thanks. |
Hi @jamtur01, it happen again, could you help to reopen it? Hi @mcorbin , And the riemann forwarder stats and other data seems can't send to center riemann and influxdb, so there is no metrics on the graph after the error happen. I think the forwarder stats on the graph is the Riemann forwarding the events, not receiver stats. |
I think you need to use a tool like visualvm and perform thread and heap dump and analyze the threads states on the frozen Riemann process. You can do it by adding on
Be careful, this example configure jmx without authentication/security (but you can also configure jmx with password auth/tls/...). You can then analyze your Riemann process using for example https://visualvm.github.io/download.html, to see the thread states and perform thread dumps. You can also do heap dump using jmap, with this command for example:
and then open it in visualvm. |
Let me try and see if I can find any issue. |
seems don't happen again, I close it. |
When I run the riemann after some days, it will start to throw the error: java.util.concurrent.RejectedExecutionException , I see another resolved issue #655 change the queue-size to 10000, I have to change the queue size to 10000 and max-pool-size to 1024 but after some days the alert will show again and the alert will run out of our disk space.
riemann version: 0.2.14
riemann.conf
include/centralriemann.clj
riemann.log error log
The text was updated successfully, but these errors were encountered: