interval between health checks is incorrect #9327

bin-53 · 2023-04-18T09:14:48Z

Description

Below is the configuration I set up for the health check：

My interface returns code is 200, normally. First, it should access the interface five times, at intervals of 10 seconds, and then start accessing an unhealthy configuration once, at intervals of 15 seconds. However, my log prints are obviously inconsistent. What's wrong with that

Environment

APISIX version (run apisix :2.15-alpine):
Operating system (run uname -a):
OpenResty / Nginx version (run openresty -V or nginx -V):
etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info):
APISIX Dashboard version, if relevant:
Plugin runner version, for issues related to plugin runners:
LuaRocks version, for installation issues (run luarocks --version):

The text was updated successfully, but these errors were encountered:

bin-53 · 2023-04-18T09:16:51Z

tao12345666333 · 2023-04-19T02:02:45Z

@monkeyDluffy6017 PTAL, thanks

tao12345666333 · 2023-04-19T02:03:25Z

Does your application interface contain other logic?

bin-53 · 2023-04-19T02:06:02Z

No, the interface is pretty simple

kingluo · 2023-04-19T02:34:10Z

How many nodes are in the upstream?

bin-53 · 2023-04-19T02:40:36Z

How many nodes are in the upstream?
two

bin-53 · 2023-04-19T02:44:21Z

this is my route config :
{
  "uri": "/test/*",
  "name": "test-health",
  "priority": 1,
  "methods": [
    "GET",
    "POST",
    "PUT",
    "DELETE",
    "PATCH",
    "HEAD",
    "OPTIONS",
    "CONNECT",
    "TRACE"
  ],
  "upstream": {
    "nodes": [
      {
        "host": "10.4.16.12",
        "port": 9001,
        "weight": 1
      },
      {
        "host": "192.168.88.66",
        "port": 9001,
        "weight": 1
      }
    ],
    "retries": 1,
    "timeout": {
      "connect": 1,
      "send": 1,
      "read": 9
    },
    "type": "least_conn",
    "checks": {
      "active": {
        "concurrency": 10,
        "healthy": {
          "http_statuses": [
            404
          ],
          "interval": 10,
          "successes": 5
        },
        "http_path": "/test",
        "port": 9001,
        "timeout": 1,
        "type": "http",
        "unhealthy": {
          "http_failures": 10,
          "http_statuses": [
            506
          ],
          "interval": 15,
          "tcp_failures": 2,
          "timeouts": 3
        }
      }
    },
    "scheme": "http",
    "pass_host": "pass",
    "keepalive_pool": {
      "idle_timeout": 60,
      "requests": 1000,
      "size": 320
    },
    "retry_timeout": 1
  },
  "labels": {
    "1": "1"
  },
  "status": 1
}

kingluo · 2023-04-19T04:40:02Z

Are all nodes healthy actually? And how about the CPU load of APISIX?

Note that, the health check upon the nodes is executed serially:

https://github.com/api7/lua-resty-healthcheck/blob/0aa2cbdfae47c750552909762bf20d999e536e41/lib/resty/healthcheck.lua#L952-L961

bin-53 · 2023-04-19T05:11:08Z

all nodes healthy ，apisix is new and working well，wait,"the health check upon the nodes is executed serially" refers to the interval for all nodes,not just one node? For example, if I configure three upstream nodes, The interval of active health check is 10 seconds, which means that the request is sent to detect a node every 10 seconds, right? It does not detect every node every 10 seconds, right?

bin-53 · 2023-04-19T05:25:25Z

Are all nodes healthy actually? And how about the CPU load of APISIX?

Note that, the health check upon the nodes is executed serially:

https://github.com/api7/lua-resty-healthcheck/blob/0aa2cbdfae47c750552909762bf20d999e536e41/lib/resty/healthcheck.lua#L952-L961

The strangest thing is that The Times and intervals are so different from the configuration

kingluo · 2023-04-19T06:33:58Z

all nodes healthy ，apisix is new and working well，wait,"the health check upon the nodes is executed serially" refers to the interval for all nodes,not just one node? For example, if I configure three upstream nodes, The interval of active health check is 10 seconds, which means that the request is sent to detect a node every 10 seconds, right? It does not detect every node every 10 seconds, right?

If you set 10 secs, then the healthy/unhealthy check flow is like this:

sleep 10 secs --> check node1 --> check node2 --> check node3 --> sleep 10 secs --> ...

kingluo · 2023-04-19T06:38:17Z

However, unhealthy and healthy checkers are running in parallel.

kingluo · 2023-04-19T06:40:32Z

The strangest thing is that The Times and intervals are so different from the configuration

Yes, it's strange. Maybe I will try to reproduce it later.

bin-53 · 2023-04-19T06:47:41Z

reprod

OKey，Wait for your reply,tks

bin-53 · 2023-04-20T01:32:57Z

The strangest thing is that The Times and intervals are so different from the configuration

Yes, it's strange. Maybe I will try to reproduce it later.

hello，Is there any progress？

kingluo · 2023-04-20T04:06:21Z

@Sn0rt Please take this issue.

Sn0rt · 2023-04-20T05:43:29Z

/assign

Sn0rt · 2023-04-21T01:24:51Z

@bin-53 can you attache the error.log ?

which file can found from apisix logs directory.

The more certain thing is that this implementation does have the problem of inaccurate interval.

bin-53 · 2023-04-21T02:01:26Z

@bin-53 can you attache the error.log ?

which file can found from apisix logs directory.

The more certain thing is that this implementation does have the problem of inaccurate interval.

Thank your reply~ There is no error log output in this health interval check, and neither is my access log. Excuse me, which configuration do I need to modify? I can output a detailed error log or access log during the health check,thanks

Sn0rt · 2023-04-21T02:05:43Z

@bin-53 can you attache the error.log ?
which file can found from apisix logs directory.
The more certain thing is that this implementation does have the problem of inaccurate interval.

Thank your reply~ There is no error log output in this health interval check, and neither is my access log. Excuse me, which configuration do I need to modify? I can output a detailed error log or access log during the health check,thanks

can you try this config ?

diff --git a/conf/config-default.yaml b/conf/config-default.yaml
index 4f97adc4..f2796416 100755
--- a/conf/config-default.yaml
+++ b/conf/config-default.yaml
@@ -136,7 +136,7 @@ nginx_config:                     # config for render the template to generate n
                                   # the "user" directive makes sense only if the master process runs with super-user privileges.
                                   # if you're not root user,the default is current user.
   error_log: logs/error.log
-  error_log_level:  warn          # warn,error
+  error_log_level:  info          # warn,error
   worker_processes: auto          # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES"
   enable_cpu_affinity: false      # disable CPU affinity by default, if APISIX is deployed on a physical machine, it can be enabled and work well.
   worker_rlimit_nofile: 20480     # the number of files a worker process can open, should be larger than worker_connections
diff --git a/conf/debug.yaml b/conf/debug.yaml
index 23c8d51a..268d744c 100644
--- a/conf/debug.yaml
+++ b/conf/debug.yaml
@@ -15,7 +15,7 @@
 # limitations under the License.
 #
 basic:
-  enable: false
+  enable: true
 http_filter:
   enable: false         # enable or disable this feature
   enable_header_name: X-APISIX-Dynamic-Debug # the header name of dynamic enable

and to reproduce this issues .

bin-53 · 2023-04-21T02:09:10Z

Ok, I'll try it right away

bin-53 · 2023-04-21T02:34:58Z

@bin-53 can you attache the error.log ?
which file can found from apisix logs directory.
The more certain thing is that this implementation does have the problem of inaccurate interval.

Thank your reply~ There is no error log output in this health interval check, and neither is my access log. Excuse me, which configuration do I need to modify? I can output a detailed error log or access log during the health check,thanks

can you try this config ?

diff --git a/conf/config-default.yaml b/conf/config-default.yaml
index 4f97adc4..f2796416 100755
--- a/conf/config-default.yaml
+++ b/conf/config-default.yaml
@@ -136,7 +136,7 @@ nginx_config:                     # config for render the template to generate n
                                   # the "user" directive makes sense only if the master process runs with super-user privileges.
                                   # if you're not root user,the default is current user.
   error_log: logs/error.log
-  error_log_level:  warn          # warn,error
+  error_log_level:  info          # warn,error
   worker_processes: auto          # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES"
   enable_cpu_affinity: false      # disable CPU affinity by default, if APISIX is deployed on a physical machine, it can be enabled and work well.
   worker_rlimit_nofile: 20480     # the number of files a worker process can open, should be larger than worker_connections
diff --git a/conf/debug.yaml b/conf/debug.yaml
index 23c8d51a..268d744c 100644
--- a/conf/debug.yaml
+++ b/conf/debug.yaml
@@ -15,7 +15,7 @@
 # limitations under the License.
 #
 basic:
-  enable: false
+  enable: true
 http_filter:
   enable: false         # enable or disable this feature
   enable_header_name: X-APISIX-Dynamic-Debug # the header name of dynamic enable

and to reproduce this issues .
Hello, I have modified the configuration file, but there is no log output available during the health check

apisix:
  node_listen: 9080              # APISIX listening port
  enable_ipv6: false
  ssl:
    enable: true
    enable_http2: true
    ssl_trusted_certificate: /usr/local/apisix/conf/cert/apisix.ca-bundle

  allow_admin:                  # http://nginx.org/en/docs/http/ngx_http_access_module.html#allow
    - 0.0.0.0/0              # We need to restrict ip access rules for security. 0.0.0.0/0 is for test.

  admin_key:
    - name: "admin"
      key: edd1c9f034335f136f87ad84b625c8f1
      role: admin                 # admin: manage all configuration data
                                  # viewer: only can view configuration data
    - name: "viewer"
      key: 4054f7cf07e344346cd3f287985e76a2
      role: viewer
  
  enable_control: true
  control:
    ip: "0.0.0.0"
    port: 9092

etcd:
  host:                           # it's possible to define multiple etcd hosts addresses of the same etcd cluster.
    - "http://etcd:2379"     # multiple etcd address
  prefix: "/apisix"               # apisix configurations prefix
  timeout: 30                     # 30 seconds

      
nginx_config:                     # config for render the template to generate nginx.conf
  error_log: logs/error.log
  error_log_level:  info   
  worker_processes: 1          # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES"
  stream:
    lua_shared_dict:
      internal-status: 10m
      plugin-limit-req: 10m
      plugin-limit-count: 10m
      prometheus-metrics: 10m
      plugin-limit-conn: 10m
      upstream-healthcheck: 10m
      worker-events: 10m
      lrucache-lock: 100m
      balancer-ewma: 10m
      balancer-ewma-locks: 10m
      balancer-ewma-last-touched-at: 10m
      plugin-limit-count-redis-cluster-slot-lock: 100m
      tracing_buffer: 10m
      plugin-api-breaker: 10m
      etcd-cluster-health-check: 10m
      discovery: 1m
      jwks: 1m
      introspection: 10m
      access-tokens: 1m
      ext-plugin: 1m
      tars: 1m
      cas-auth: 10m
#  http:
#    enable_access_log: true         # enable access log or not, default true
#    access_log: logs/access.log
#    access_log_format:     '$remote_addr - [$remote_addr] - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_length $request_time  $upstream_addr $upstream_response_length $upstream_response_time $upstream_status  $host [$http_client_v] [$http_device_id] [$http_utm_source] [$http_platform_brand] [$scheme] '
    
plugin_attr:
  prometheus:
    export_addr:
      ip: "0.0.0.0"
      port: 9091

Sn0rt · 2023-04-21T03:39:55Z

@bin-53 try again ?

modify the log level.

diff --git a/conf/config-default.yaml b/conf/config-default.yaml
index 4f97adc4..07242889 100755
--- a/conf/config-default.yaml
+++ b/conf/config-default.yaml
@@ -136,7 +136,7 @@ nginx_config:                     # config for render the template to generate n
                                   # the "user" directive makes sense only if the master process runs with super-user privileges.
                                   # if you're not root user,the default is current user.
   error_log: logs/error.log
-  error_log_level:  warn          # warn,error
+  error_log_level:  debug          # warn,error
   worker_processes: auto          # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES"
   enable_cpu_affinity: false      # disable CPU affinity by default, if APISIX is deployed on a physical machine, it can be enabled and work well.
   worker_rlimit_nofile: 20480     # the number of files a worker process can open, should be larger than worker_connections

bin-53 · 2023-04-21T05:45:39Z

this is my error log file ,thanks ***@***.*** From: Sn0rt Date: 2023-04-21 11:40 To: apache/apisix CC: bin-53; Mention Subject: Re: [apache/apisix] interval between health checks is incorrect (Issue #9327) @bin-53 try again ? modify the log level. diff --git a/conf/config-default.yaml b/conf/config-default.yaml index 4f97adc4..07242889 100755 --- a/conf/config-default.yaml +++ b/conf/config-default.yaml @@ -136,7 +136,7 @@ nginx_config: # config for render the template to generate n # the "user" directive makes sense only if the master process runs with super-user privileges. # if you're not root user,the default is current user. error_log: logs/error.log - error_log_level: warn # warn,error + error_log_level: debug # warn,error worker_processes: auto # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES" enable_cpu_affinity: false # disable CPU affinity by default, if APISIX is deployed on a physical machine, it can be enabled and work well. worker_rlimit_nofile: 20480 # the number of files a worker process can open, should be larger than worker_connections — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

Sn0rt · 2023-04-21T05:57:15Z

this is my error log file ,thanks @.*** From: Sn0rt Date: 2023-04-21 11:40 To: apache/apisix CC: bin-53; Mention Subject: Re: [apache/apisix] interval between health checks is incorrect (Issue #9327) @bin-53 try again ? modify the log level. diff --git a/conf/config-default.yaml b/conf/config-default.yaml index 4f97adc4..07242889 100755 --- a/conf/config-default.yaml +++ b/conf/config-default.yaml @@ -136,7 +136,7 @@ nginx_config: # config for render the template to generate n # the "user" directive makes sense only if the master process runs with super-user privileges. # if you're not root user,the default is current user. error_log: logs/error.log - error_log_level: warn # warn,error + error_log_level: debug # warn,error worker_processes: auto # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES" enable_cpu_affinity: false # disable CPU affinity by default, if APISIX is deployed on a physical machine, it can be enabled and work well. worker_rlimit_nofile: 20480 # the number of files a worker process can open, should be larger than worker_connections — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

sorry. I can't get the log file.

I will provider a healthcheck.lua file to override the healthcheck file.

you can download the file from the link as below.

 https://raw.githubusercontent.com/Sn0rt/lua-resty-healthcheck/sn0rt/try-fix-apisix-issues-9327/lib/resty/healthcheck.lua

and the target path is

$(APISIX_PATH)/deps//share/lua/5.1/resty/healthcheck.lua

more detailt about this file you can found from the diff block as below.

diff lib/resty/healthcheck.lua ../apisix/deps//share/lua/5.1/resty/healthcheck.lua
136,178d135
<
< -- cache timers in "init", "init_worker" phases so we use only a single timer
< -- and do not run the risk of exhausting them for large sets
< -- see https://github.com/Kong/lua-resty-healthcheck/issues/40
< -- Below we'll temporarily use a patched version of ngx.timer.at, until we're
< -- past the init and init_worker phases, after which we'll return to the regular
< -- ngx.timer.at implementation
< local ngx_timer_at do
<   local callback_list = {}
<
<   local function handler(premature)
<     if premature then
<       return
<     end
<
<     local list = callback_list
<     callback_list = {}
<
<     for _, args in ipairs(list) do
<       local ok, err = pcall(args[1], ngx_worker_exiting(), unpack(args, 2, args.n))
<       if not ok then
<         ngx.log(ngx.ERR, "timer failure: ", err)
<       end
<     end
<   end
<
<   ngx_timer_at = function(...)
<     local phase = ngx.get_phase()
<     if phase ~= "init" and phase ~= "init_worker" then
<       -- we're past init/init_worker, so replace this temp function with the
<       -- real-deal again, so from here on we run regular timers.
<       ngx_timer_at = ngx.timer.at
<       return ngx.timer.at(...)
<     end
<
<     local n = #callback_list
<     callback_list[n+1] = { n = select("#", ...), ... }
<     if n == 0 then
<       -- first one, so schedule the actual timer
<       return ngx.timer.at(0, handler)
<     end
<     return true
<   end
180,182d136
< end
<
<
321c275
<     local _, terr = ngx_timer_at(0, run_fn_locked_target_list, self, fn)
---
>     local _, terr = ngx.timer.at(0, run_fn_locked_target_list, self, fn)
576c530
<     local _, terr = ngx_timer_at(0, run_mutexed_fn, self, ip, port, hostname, fn)
---
>     local _, terr = ngx.timer.at(0, run_mutexed_fn, self, ip, port, hostname, fn)

bin-53 · 2023-04-21T06:10:08Z

this is my error log file ,thanks @.*** From: Sn0rt Date: 2023-04-21 11:40 To: apache/apisix CC: bin-53; Mention Subject: Re: [apache/apisix] interval between health checks is incorrect (Issue #9327) @bin-53 try again ? modify the log level. diff --git a/conf/config-default.yaml b/conf/config-default.yaml index 4f97adc4..07242889 100755 --- a/conf/config-default.yaml +++ b/conf/config-default.yaml @@ -136,7 +136,7 @@ nginx_config: # config for render the template to generate n # the "user" directive makes sense only if the master process runs with super-user privileges. # if you're not root user,the default is current user. error_log: logs/error.log - error_log_level: warn # warn,error + error_log_level: debug # warn,error worker_processes: auto # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES" enable_cpu_affinity: false # disable CPU affinity by default, if APISIX is deployed on a physical machine, it can be enabled and work well. worker_rlimit_nofile: 20480 # the number of files a worker process can open, should be larger than worker_connections — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

sorry. I can't get the log file.

I will provider a healthcheck.lua file to override the healthcheck file.

you can download the file from the link as below.
 https://raw.githubusercontent.com/Sn0rt/lua-resty-healthcheck/sn0rt/try-fix-apisix-issues-9327/lib/resty/healthcheck.lua
and the target path is
$(APISIX_PATH)/deps//share/lua/5.1/resty/healthcheck.lua
more detailt about this file you can found from the diff block as below.
diff lib/resty/healthcheck.lua ../apisix/deps//share/lua/5.1/resty/healthcheck.lua
136,178d135
<
< -- cache timers in "init", "init_worker" phases so we use only a single timer
< -- and do not run the risk of exhausting them for large sets
< -- see https://github.com/Kong/lua-resty-healthcheck/issues/40
< -- Below we'll temporarily use a patched version of ngx.timer.at, until we're
< -- past the init and init_worker phases, after which we'll return to the regular
< -- ngx.timer.at implementation
< local ngx_timer_at do
<   local callback_list = {}
<
<   local function handler(premature)
<     if premature then
<       return
<     end
<
<     local list = callback_list
<     callback_list = {}
<
<     for _, args in ipairs(list) do
<       local ok, err = pcall(args[1], ngx_worker_exiting(), unpack(args, 2, args.n))
<       if not ok then
<         ngx.log(ngx.ERR, "timer failure: ", err)
<       end
<     end
<   end
<
<   ngx_timer_at = function(...)
<     local phase = ngx.get_phase()
<     if phase ~= "init" and phase ~= "init_worker" then
<       -- we're past init/init_worker, so replace this temp function with the
<       -- real-deal again, so from here on we run regular timers.
<       ngx_timer_at = ngx.timer.at
<       return ngx.timer.at(...)
<     end
<
<     local n = #callback_list
<     callback_list[n+1] = { n = select("#", ...), ... }
<     if n == 0 then
<       -- first one, so schedule the actual timer
<       return ngx.timer.at(0, handler)
<     end
<     return true
<   end
180,182d136
< end
<
<
321c275
<     local _, terr = ngx_timer_at(0, run_fn_locked_target_list, self, fn)
---
>     local _, terr = ngx.timer.at(0, run_fn_locked_target_list, self, fn)
576c530
<     local _, terr = ngx_timer_at(0, run_mutexed_fn, self, ip, port, hostname, fn)
---
>     local _, terr = ngx.timer.at(0, run_mutexed_fn, self, ip, port, hostname, fn)

Ok, I'll give it a try. Thank you for your quick reply

bin-53 · 2023-04-23T01:39:38Z

override

Excuse me, which version of the official website does not have this problem? I am currently apisix-2.15-alpine

Sn0rt · 2023-04-23T02:04:31Z

It is recommended to use the latest release, but what you need to know is that it is not sure whether this problem exists in the latest release version, because your problem has not yet been reproduced, and it is not sure whether the method is wrong or the environment is wrong.

Sn0rt · 2023-05-16T08:39:05Z

@bin-53

already reproduced.

to run the process with two instances listen to different ports

from flask import Flask
from datetime import datetime
import logging


app = Flask(__name__)

import logging
log = logging.getLogger('werkzeug')
log.setLevel(logging.ERROR)

@app.route('/test')
def test_endpoint():
    current_time = datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3] 
    print("Current time:", current_time) 
    return '', 200

@app.route('/test/a')
def echo_endpoint():
    return 'a', 200



if __name__ == '__main__':
    app.run(host='0.0.0.0')

the config file of apisix

{
  "uri": "/test/*",
  "name": "test-health",
  "methods": [
    "GET"
  ],
  "upstream": {
    "nodes": [
      {
        "host": "192.168.31.224",
        "port": 5000,
        "weight": 1
      },
      {
        "host": "192.168.31.224",
        "port": 5001,
        "weight": 1
      }
    ],
    "retries": 1,
    "timeout": {
      "connect": 1,
      "send": 1,
      "read": 9
    },
    "type": "least_conn",
    "checks": {
      "active": {
        "concurrency": 10,
        "healthy": {
          "http_statuses": [
            404
          ],
          "interval": 10,
          "successes": 5
        },
        "http_path": "/test",
        "port": 5000,
        "timeout": 1,
        "type": "http",
        "unhealthy": {
          "http_failures": 10,
          "http_statuses": [
            506
          ],
          "interval": 15,
          "tcp_failures": 2,
          "timeouts": 3
        }
      }
    },
    "scheme": "http",
    "pass_host": "pass"
  }
}

bin-53 · 2023-05-16T08:58:19Z

@bin-53

already reproduced.

to run the process with two instances listen to different ports

from flask import Flask
from datetime import datetime
import logging


app = Flask(__name__)

import logging
log = logging.getLogger('werkzeug')
log.setLevel(logging.ERROR)

@app.route('/test')
def test_endpoint():
    current_time = datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3] 
    print("Current time:", current_time) 
    return '', 200

@app.route('/test/a')
def echo_endpoint():
    return 'a', 200



if __name__ == '__main__':
    app.run(host='0.0.0.0')

the config file of apisix

{
  "uri": "/test/*",
  "name": "test-health",
  "methods": [
    "GET"
  ],
  "upstream": {
    "nodes": [
      {
        "host": "192.168.31.224",
        "port": 5000,
        "weight": 1
      },
      {
        "host": "192.168.31.224",
        "port": 5001,
        "weight": 1
      }
    ],
    "retries": 1,
    "timeout": {
      "connect": 1,
      "send": 1,
      "read": 9
    },
    "type": "least_conn",
    "checks": {
      "active": {
        "concurrency": 10,
        "healthy": {
          "http_statuses": [
            404
          ],
          "interval": 10,
          "successes": 5
        },
        "http_path": "/test",
        "port": 5000,
        "timeout": 1,
        "type": "http",
        "unhealthy": {
          "http_failures": 10,
          "http_statuses": [
            506
          ],
          "interval": 15,
          "tcp_failures": 2,
          "timeouts": 3
        }
      }
    },
    "scheme": "http",
    "pass_host": "pass"
  }
}

my is the same port, but the upstream ip is different, Is it the same thing?
In a normal production environment, multiple upstream points must be deployed on different servers。

Sn0rt · 2023-05-16T09:00:57Z

@bin-53
already reproduced.

to run the process with two instances listen to different ports

from flask import Flask
from datetime import datetime
import logging


app = Flask(__name__)

import logging
log = logging.getLogger('werkzeug')
log.setLevel(logging.ERROR)

@app.route('/test')
def test_endpoint():
    current_time = datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3] 
    print("Current time:", current_time) 
    return '', 200

@app.route('/test/a')
def echo_endpoint():
    return 'a', 200



if __name__ == '__main__':
    app.run(host='0.0.0.0')

the config file of apisix

{
  "uri": "/test/*",
  "name": "test-health",
  "methods": [
    "GET"
  ],
  "upstream": {
    "nodes": [
      {
        "host": "192.168.31.224",
        "port": 5000,
        "weight": 1
      },
      {
        "host": "192.168.31.224",
        "port": 5001,
        "weight": 1
      }
    ],
    "retries": 1,
    "timeout": {
      "connect": 1,
      "send": 1,
      "read": 9
    },
    "type": "least_conn",
    "checks": {
      "active": {
        "concurrency": 10,
        "healthy": {
          "http_statuses": [
            404
          ],
          "interval": 10,
          "successes": 5
        },
        "http_path": "/test",
        "port": 5000,
        "timeout": 1,
        "type": "http",
        "unhealthy": {
          "http_failures": 10,
          "http_statuses": [
            506
          ],
          "interval": 15,
          "tcp_failures": 2,
          "timeouts": 3
        }
      }
    },
    "scheme": "http",
    "pass_host": "pass"
  }
}

my is the same port, but the upstream ip is different, Is it the same thing? In a normal production environment, multiple upstream points must be deployed on different servers。

I think it doesn't matter. I will inspect the code to seek the root cause and then think about this situation again.

Sn0rt · 2023-05-17T09:54:54Z

it can't reproduced with only one worker process.

diff --git a/conf/config-default.yaml b/conf/config-default.yaml
index 4f97adc4..dc04037b 100755
--- a/conf/config-default.yaml
+++ b/conf/config-default.yaml
@@ -136,8 +136,8 @@ nginx_config:                     # config for render the template to generate n
                                   # the "user" directive makes sense only if the master process runs with super-user privileges.
                                   # if you're not root user,the default is current user.
   error_log: logs/error.log
-  error_log_level:  warn          # warn,error
-  worker_processes: auto          # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES"
+  error_log_level:  debug          # warn,error
+  worker_processes: 1 # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES"
   enable_cpu_affinity: false      # disable CPU affinity by default, if APISIX is deployed on a physical machine, it can be enabled and work well.
   worker_rlimit_nofile: 20480     # the number of files a worker process can open, should be larger than worker_connections
   worker_shutdown_timeout: 240s   # timeout for a graceful shutdown of worker processes

and the root cause here is https://github.com/api7/lua-resty-healthcheck/blob/master/lib/resty/healthcheck.lua#L999

it's still in process how to raise a PR for fix.

https://github.com/Kong/lua-resty-healthcheck/pull/59/files

Sn0rt · 2023-07-25T08:00:48Z

the bug has been fix with APISIX 3.4.0 release

AlinsRan · 2023-07-25T08:04:18Z

fixed: #9590

I will close it.

soulbird mentioned this issue Apr 20, 2023

The interval time is not accurate #9310

Closed

Sn0rt mentioned this issue May 10, 2023

fix(timers): prevent exhausting timers in init/worker_init (#57) api7/lua-resty-healthcheck#20

Closed

Sn0rt mentioned this issue May 18, 2023

fix(timer) ensure intervals are not missed api7/lua-resty-healthcheck#21

Closed

leslie-tsang added this to Apache APISIX backlog Jul 25, 2023

leslie-tsang moved this to 📋 Backlog in Apache APISIX backlog Jul 25, 2023

AlinsRan closed this as completed Jul 25, 2023

github-project-automation bot moved this from 📋 Backlog to ✅ Done in Apache APISIX backlog Jul 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

interval between health checks is incorrect #9327

interval between health checks is incorrect #9327

bin-53 commented Apr 18, 2023

bin-53 commented Apr 18, 2023

tao12345666333 commented Apr 19, 2023

tao12345666333 commented Apr 19, 2023

bin-53 commented Apr 19, 2023 •

edited

Loading

kingluo commented Apr 19, 2023

bin-53 commented Apr 19, 2023

bin-53 commented Apr 19, 2023 •

edited

Loading

kingluo commented Apr 19, 2023

bin-53 commented Apr 19, 2023

bin-53 commented Apr 19, 2023

kingluo commented Apr 19, 2023 •

edited

Loading

kingluo commented Apr 19, 2023

kingluo commented Apr 19, 2023

bin-53 commented Apr 19, 2023

bin-53 commented Apr 20, 2023

kingluo commented Apr 20, 2023

Sn0rt commented Apr 20, 2023

Sn0rt commented Apr 21, 2023 •

edited

Loading

bin-53 commented Apr 21, 2023

Sn0rt commented Apr 21, 2023 •

edited

Loading

bin-53 commented Apr 21, 2023

bin-53 commented Apr 21, 2023

Sn0rt commented Apr 21, 2023

bin-53 commented Apr 21, 2023 via email

Sn0rt commented Apr 21, 2023 •

edited

Loading

bin-53 commented Apr 21, 2023

bin-53 commented Apr 23, 2023

Sn0rt commented Apr 23, 2023

Sn0rt commented May 16, 2023 •

edited

Loading

bin-53 commented May 16, 2023

Sn0rt commented May 16, 2023

Sn0rt commented May 17, 2023 •

edited

Loading

Sn0rt commented Jul 25, 2023

AlinsRan commented Jul 25, 2023 •

edited

Loading

interval between health checks is incorrect #9327

interval between health checks is incorrect #9327

Comments

bin-53 commented Apr 18, 2023

Description

Environment

bin-53 commented Apr 18, 2023

tao12345666333 commented Apr 19, 2023

tao12345666333 commented Apr 19, 2023

bin-53 commented Apr 19, 2023 • edited Loading

kingluo commented Apr 19, 2023

bin-53 commented Apr 19, 2023

bin-53 commented Apr 19, 2023 • edited Loading

kingluo commented Apr 19, 2023

bin-53 commented Apr 19, 2023

bin-53 commented Apr 19, 2023

kingluo commented Apr 19, 2023 • edited Loading

kingluo commented Apr 19, 2023

kingluo commented Apr 19, 2023

bin-53 commented Apr 19, 2023

bin-53 commented Apr 20, 2023

kingluo commented Apr 20, 2023

Sn0rt commented Apr 20, 2023

Sn0rt commented Apr 21, 2023 • edited Loading

bin-53 commented Apr 21, 2023

Sn0rt commented Apr 21, 2023 • edited Loading

bin-53 commented Apr 21, 2023

bin-53 commented Apr 21, 2023

Sn0rt commented Apr 21, 2023

bin-53 commented Apr 21, 2023 via email

Sn0rt commented Apr 21, 2023 • edited Loading

bin-53 commented Apr 21, 2023

bin-53 commented Apr 23, 2023

Sn0rt commented Apr 23, 2023

Sn0rt commented May 16, 2023 • edited Loading

bin-53 commented May 16, 2023

Sn0rt commented May 16, 2023

Sn0rt commented May 17, 2023 • edited Loading

Sn0rt commented Jul 25, 2023

AlinsRan commented Jul 25, 2023 • edited Loading

bin-53 commented Apr 19, 2023 •

edited

Loading

bin-53 commented Apr 19, 2023 •

edited

Loading

kingluo commented Apr 19, 2023 •

edited

Loading

Sn0rt commented Apr 21, 2023 •

edited

Loading

Sn0rt commented Apr 21, 2023 •

edited

Loading

Sn0rt commented Apr 21, 2023 •

edited

Loading

Sn0rt commented May 16, 2023 •

edited

Loading

Sn0rt commented May 17, 2023 •

edited

Loading

AlinsRan commented Jul 25, 2023 •

edited

Loading