-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
interval between health checks is incorrect #9327
Comments
@monkeyDluffy6017 PTAL, thanks |
Does your application interface contain other logic? |
How many nodes are in the upstream? |
|
Are all nodes healthy actually? And how about the CPU load of APISIX? Note that, the health check upon the nodes is executed serially: |
all nodes healthy ,apisix is new and working well,wait,"the health check upon the nodes is executed serially" refers to the interval for all nodes,not just one node? For example, if I configure three upstream nodes, The interval of active health check is 10 seconds, which means that the request is sent to detect a node every 10 seconds, right? It does not detect every node every 10 seconds, right? |
If you set 10 secs, then the healthy/unhealthy check flow is like this: sleep 10 secs --> check node1 --> check node2 --> check node3 --> sleep 10 secs --> ... |
However, unhealthy and healthy checkers are running in parallel. |
Yes, it's strange. Maybe I will try to reproduce it later. |
OKey,Wait for your reply,tks |
hello,Is there any progress? |
@Sn0rt Please take this issue. |
/assign |
Thank your reply~ There is no error log output in this health interval check, and neither is my access log. Excuse me, which configuration do I need to modify? I can output a detailed error log or access log during the health check,thanks |
can you try this config ? diff --git a/conf/config-default.yaml b/conf/config-default.yaml
index 4f97adc4..f2796416 100755
--- a/conf/config-default.yaml
+++ b/conf/config-default.yaml
@@ -136,7 +136,7 @@ nginx_config: # config for render the template to generate n
# the "user" directive makes sense only if the master process runs with super-user privileges.
# if you're not root user,the default is current user.
error_log: logs/error.log
- error_log_level: warn # warn,error
+ error_log_level: info # warn,error
worker_processes: auto # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES"
enable_cpu_affinity: false # disable CPU affinity by default, if APISIX is deployed on a physical machine, it can be enabled and work well.
worker_rlimit_nofile: 20480 # the number of files a worker process can open, should be larger than worker_connections
diff --git a/conf/debug.yaml b/conf/debug.yaml
index 23c8d51a..268d744c 100644
--- a/conf/debug.yaml
+++ b/conf/debug.yaml
@@ -15,7 +15,7 @@
# limitations under the License.
#
basic:
- enable: false
+ enable: true
http_filter:
enable: false # enable or disable this feature
enable_header_name: X-APISIX-Dynamic-Debug # the header name of dynamic enable and to reproduce this issues . |
Ok, I'll try it right away |
|
@bin-53 try again ? modify the log level.
|
this is my error log file ,thanks
***@***.***
From: Sn0rt
Date: 2023-04-21 11:40
To: apache/apisix
CC: bin-53; Mention
Subject: Re: [apache/apisix] interval between health checks is incorrect (Issue #9327)
@bin-53 try again ?
modify the log level.
diff --git a/conf/config-default.yaml b/conf/config-default.yaml
index 4f97adc4..07242889 100755
--- a/conf/config-default.yaml
+++ b/conf/config-default.yaml
@@ -136,7 +136,7 @@ nginx_config: # config for render the template to generate n
# the "user" directive makes sense only if the master process runs with super-user privileges.
# if you're not root user,the default is current user.
error_log: logs/error.log
- error_log_level: warn # warn,error
+ error_log_level: debug # warn,error
worker_processes: auto # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES"
enable_cpu_affinity: false # disable CPU affinity by default, if APISIX is deployed on a physical machine, it can be enabled and work well.
worker_rlimit_nofile: 20480 # the number of files a worker process can open, should be larger than worker_connections
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
sorry. I can't get the log file. I will provider a you can download the file from the link as below. https://raw.githubusercontent.com/Sn0rt/lua-resty-healthcheck/sn0rt/try-fix-apisix-issues-9327/lib/resty/healthcheck.lua and the target path is $(APISIX_PATH)/deps//share/lua/5.1/resty/healthcheck.lua more detailt about this file you can found from the diff block as below. diff lib/resty/healthcheck.lua ../apisix/deps//share/lua/5.1/resty/healthcheck.lua
136,178d135
<
< -- cache timers in "init", "init_worker" phases so we use only a single timer
< -- and do not run the risk of exhausting them for large sets
< -- see https://github.com/Kong/lua-resty-healthcheck/issues/40
< -- Below we'll temporarily use a patched version of ngx.timer.at, until we're
< -- past the init and init_worker phases, after which we'll return to the regular
< -- ngx.timer.at implementation
< local ngx_timer_at do
< local callback_list = {}
<
< local function handler(premature)
< if premature then
< return
< end
<
< local list = callback_list
< callback_list = {}
<
< for _, args in ipairs(list) do
< local ok, err = pcall(args[1], ngx_worker_exiting(), unpack(args, 2, args.n))
< if not ok then
< ngx.log(ngx.ERR, "timer failure: ", err)
< end
< end
< end
<
< ngx_timer_at = function(...)
< local phase = ngx.get_phase()
< if phase ~= "init" and phase ~= "init_worker" then
< -- we're past init/init_worker, so replace this temp function with the
< -- real-deal again, so from here on we run regular timers.
< ngx_timer_at = ngx.timer.at
< return ngx.timer.at(...)
< end
<
< local n = #callback_list
< callback_list[n+1] = { n = select("#", ...), ... }
< if n == 0 then
< -- first one, so schedule the actual timer
< return ngx.timer.at(0, handler)
< end
< return true
< end
180,182d136
< end
<
<
321c275
< local _, terr = ngx_timer_at(0, run_fn_locked_target_list, self, fn)
---
> local _, terr = ngx.timer.at(0, run_fn_locked_target_list, self, fn)
576c530
< local _, terr = ngx_timer_at(0, run_mutexed_fn, self, ip, port, hostname, fn)
---
> local _, terr = ngx.timer.at(0, run_mutexed_fn, self, ip, port, hostname, fn) |
Ok, I'll give it a try. Thank you for your quick reply |
Excuse me, which version of the official website does not have this problem? I am currently apisix-2.15-alpine |
It is recommended to use the latest release, but what you need to know is that it is not sure whether this problem exists in the latest release version, because your problem has not yet been reproduced, and it is not sure whether the method is wrong or the environment is wrong. |
I think it doesn't matter. I will inspect the code to seek the root cause and then think about this situation again. |
it can't reproduced with only one worker process. diff --git a/conf/config-default.yaml b/conf/config-default.yaml
index 4f97adc4..dc04037b 100755
--- a/conf/config-default.yaml
+++ b/conf/config-default.yaml
@@ -136,8 +136,8 @@ nginx_config: # config for render the template to generate n
# the "user" directive makes sense only if the master process runs with super-user privileges.
# if you're not root user,the default is current user.
error_log: logs/error.log
- error_log_level: warn # warn,error
- worker_processes: auto # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES"
+ error_log_level: debug # warn,error
+ worker_processes: 1 # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES"
enable_cpu_affinity: false # disable CPU affinity by default, if APISIX is deployed on a physical machine, it can be enabled and work well.
worker_rlimit_nofile: 20480 # the number of files a worker process can open, should be larger than worker_connections
worker_shutdown_timeout: 240s # timeout for a graceful shutdown of worker processes and the root cause here is https://github.com/api7/lua-resty-healthcheck/blob/master/lib/resty/healthcheck.lua#L999 it's still in process how to raise a PR for fix. https://github.com/Kong/lua-resty-healthcheck/pull/59/files |
the bug has been fix with APISIX 3.4.0 release |
fixed: #9590 I will close it. |
Description
Below is the configuration I set up for the health check:
My interface returns code is 200, normally. First, it should access the interface five times, at intervals of 10 seconds, and then start accessing an unhealthy configuration once, at intervals of 15 seconds. However, my log prints are obviously inconsistent. What's wrong with that
Environment
apisix :2.15-alpine
):uname -a
):openresty -V
ornginx -V
):curl http://127.0.0.1:9090/v1/server_info
):luarocks --version
):The text was updated successfully, but these errors were encountered: