Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interval between health checks is incorrect #9327

Closed
bin-53 opened this issue Apr 18, 2023 · 34 comments
Closed

interval between health checks is incorrect #9327

bin-53 opened this issue Apr 18, 2023 · 34 comments

Comments

@bin-53
Copy link

bin-53 commented Apr 18, 2023

Description

Below is the configuration I set up for the health check:
image

My interface returns code is 200, normally. First, it should access the interface five times, at intervals of 10 seconds, and then start accessing an unhealthy configuration once, at intervals of 15 seconds. However, my log prints are obviously inconsistent. What's wrong with that
image

Environment

  • APISIX version (run apisix :2.15-alpine):
  • Operating system (run uname -a):
  • OpenResty / Nginx version (run openresty -V or nginx -V):
  • etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info):
  • APISIX Dashboard version, if relevant:
  • Plugin runner version, for issues related to plugin runners:
  • LuaRocks version, for installation issues (run luarocks --version):
@bin-53
Copy link
Author

bin-53 commented Apr 18, 2023

image

@tao12345666333
Copy link
Member

@monkeyDluffy6017 PTAL, thanks

@tao12345666333
Copy link
Member

Does your application interface contain other logic?

@bin-53
Copy link
Author

bin-53 commented Apr 19, 2023

image

No, the interface is pretty simple

@kingluo
Copy link
Contributor

kingluo commented Apr 19, 2023

How many nodes are in the upstream?

@bin-53
Copy link
Author

bin-53 commented Apr 19, 2023

How many nodes are in the upstream?
two
image

@bin-53
Copy link
Author

bin-53 commented Apr 19, 2023

this is my route config :
{
  "uri": "/test/*",
  "name": "test-health",
  "priority": 1,
  "methods": [
    "GET",
    "POST",
    "PUT",
    "DELETE",
    "PATCH",
    "HEAD",
    "OPTIONS",
    "CONNECT",
    "TRACE"
  ],
  "upstream": {
    "nodes": [
      {
        "host": "10.4.16.12",
        "port": 9001,
        "weight": 1
      },
      {
        "host": "192.168.88.66",
        "port": 9001,
        "weight": 1
      }
    ],
    "retries": 1,
    "timeout": {
      "connect": 1,
      "send": 1,
      "read": 9
    },
    "type": "least_conn",
    "checks": {
      "active": {
        "concurrency": 10,
        "healthy": {
          "http_statuses": [
            404
          ],
          "interval": 10,
          "successes": 5
        },
        "http_path": "/test",
        "port": 9001,
        "timeout": 1,
        "type": "http",
        "unhealthy": {
          "http_failures": 10,
          "http_statuses": [
            506
          ],
          "interval": 15,
          "tcp_failures": 2,
          "timeouts": 3
        }
      }
    },
    "scheme": "http",
    "pass_host": "pass",
    "keepalive_pool": {
      "idle_timeout": 60,
      "requests": 1000,
      "size": 320
    },
    "retry_timeout": 1
  },
  "labels": {
    "1": "1"
  },
  "status": 1
}

@kingluo
Copy link
Contributor

kingluo commented Apr 19, 2023

Are all nodes healthy actually? And how about the CPU load of APISIX?

Note that, the health check upon the nodes is executed serially:

https://github.com/api7/lua-resty-healthcheck/blob/0aa2cbdfae47c750552909762bf20d999e536e41/lib/resty/healthcheck.lua#L952-L961

@bin-53
Copy link
Author

bin-53 commented Apr 19, 2023

all nodes healthy ,apisix is new and working well,wait,"the health check upon the nodes is executed serially" refers to the interval for all nodes,not just one node? For example, if I configure three upstream nodes, The interval of active health check is 10 seconds, which means that the request is sent to detect a node every 10 seconds, right? It does not detect every node every 10 seconds, right?

@bin-53
Copy link
Author

bin-53 commented Apr 19, 2023

Are all nodes healthy actually? And how about the CPU load of APISIX?

Note that, the health check upon the nodes is executed serially:

https://github.com/api7/lua-resty-healthcheck/blob/0aa2cbdfae47c750552909762bf20d999e536e41/lib/resty/healthcheck.lua#L952-L961

The strangest thing is that The Times and intervals are so different from the configuration
image
image

image

@kingluo
Copy link
Contributor

kingluo commented Apr 19, 2023

all nodes healthy ,apisix is new and working well,wait,"the health check upon the nodes is executed serially" refers to the interval for all nodes,not just one node? For example, if I configure three upstream nodes, The interval of active health check is 10 seconds, which means that the request is sent to detect a node every 10 seconds, right? It does not detect every node every 10 seconds, right?

If you set 10 secs, then the healthy/unhealthy check flow is like this:

sleep 10 secs --> check node1 --> check node2 --> check node3 --> sleep 10 secs --> ...

@kingluo
Copy link
Contributor

kingluo commented Apr 19, 2023

However, unhealthy and healthy checkers are running in parallel.

@kingluo
Copy link
Contributor

kingluo commented Apr 19, 2023

The strangest thing is that The Times and intervals are so different from the configuration

Yes, it's strange. Maybe I will try to reproduce it later.

@bin-53
Copy link
Author

bin-53 commented Apr 19, 2023

reprod

OKey,Wait for your reply,tks

@bin-53
Copy link
Author

bin-53 commented Apr 20, 2023

The strangest thing is that The Times and intervals are so different from the configuration

Yes, it's strange. Maybe I will try to reproduce it later.

hello,Is there any progress?

@kingluo
Copy link
Contributor

kingluo commented Apr 20, 2023

@Sn0rt Please take this issue.

@Sn0rt
Copy link
Contributor

Sn0rt commented Apr 20, 2023

/assign

@Sn0rt
Copy link
Contributor

Sn0rt commented Apr 21, 2023

@bin-53 can you attache the error.log ?

which file can found from apisix logs directory.

The more certain thing is that this implementation does have the problem of inaccurate interval.

@bin-53
Copy link
Author

bin-53 commented Apr 21, 2023

@bin-53 can you attache the error.log ?

which file can found from apisix logs directory.

The more certain thing is that this implementation does have the problem of inaccurate interval.

Thank your reply~ There is no error log output in this health interval check, and neither is my access log. Excuse me, which configuration do I need to modify? I can output a detailed error log or access log during the health check,thanks

@Sn0rt
Copy link
Contributor

Sn0rt commented Apr 21, 2023

@bin-53 can you attache the error.log ?
which file can found from apisix logs directory.
The more certain thing is that this implementation does have the problem of inaccurate interval.

Thank your reply~ There is no error log output in this health interval check, and neither is my access log. Excuse me, which configuration do I need to modify? I can output a detailed error log or access log during the health check,thanks

can you try this config ?

diff --git a/conf/config-default.yaml b/conf/config-default.yaml
index 4f97adc4..f2796416 100755
--- a/conf/config-default.yaml
+++ b/conf/config-default.yaml
@@ -136,7 +136,7 @@ nginx_config:                     # config for render the template to generate n
                                   # the "user" directive makes sense only if the master process runs with super-user privileges.
                                   # if you're not root user,the default is current user.
   error_log: logs/error.log
-  error_log_level:  warn          # warn,error
+  error_log_level:  info          # warn,error
   worker_processes: auto          # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES"
   enable_cpu_affinity: false      # disable CPU affinity by default, if APISIX is deployed on a physical machine, it can be enabled and work well.
   worker_rlimit_nofile: 20480     # the number of files a worker process can open, should be larger than worker_connections
diff --git a/conf/debug.yaml b/conf/debug.yaml
index 23c8d51a..268d744c 100644
--- a/conf/debug.yaml
+++ b/conf/debug.yaml
@@ -15,7 +15,7 @@
 # limitations under the License.
 #
 basic:
-  enable: false
+  enable: true
 http_filter:
   enable: false         # enable or disable this feature
   enable_header_name: X-APISIX-Dynamic-Debug # the header name of dynamic enable

and to reproduce this issues .

@bin-53
Copy link
Author

bin-53 commented Apr 21, 2023

Ok, I'll try it right away

@bin-53
Copy link
Author

bin-53 commented Apr 21, 2023

@bin-53 can you attache the error.log ?
which file can found from apisix logs directory.
The more certain thing is that this implementation does have the problem of inaccurate interval.

Thank your reply~ There is no error log output in this health interval check, and neither is my access log. Excuse me, which configuration do I need to modify? I can output a detailed error log or access log during the health check,thanks

can you try this config ?

diff --git a/conf/config-default.yaml b/conf/config-default.yaml
index 4f97adc4..f2796416 100755
--- a/conf/config-default.yaml
+++ b/conf/config-default.yaml
@@ -136,7 +136,7 @@ nginx_config:                     # config for render the template to generate n
                                   # the "user" directive makes sense only if the master process runs with super-user privileges.
                                   # if you're not root user,the default is current user.
   error_log: logs/error.log
-  error_log_level:  warn          # warn,error
+  error_log_level:  info          # warn,error
   worker_processes: auto          # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES"
   enable_cpu_affinity: false      # disable CPU affinity by default, if APISIX is deployed on a physical machine, it can be enabled and work well.
   worker_rlimit_nofile: 20480     # the number of files a worker process can open, should be larger than worker_connections
diff --git a/conf/debug.yaml b/conf/debug.yaml
index 23c8d51a..268d744c 100644
--- a/conf/debug.yaml
+++ b/conf/debug.yaml
@@ -15,7 +15,7 @@
 # limitations under the License.
 #
 basic:
-  enable: false
+  enable: true
 http_filter:
   enable: false         # enable or disable this feature
   enable_header_name: X-APISIX-Dynamic-Debug # the header name of dynamic enable

and to reproduce this issues .
Hello, I have modified the configuration file, but there is no log output available during the health check
image
image

apisix:
  node_listen: 9080              # APISIX listening port
  enable_ipv6: false
  ssl:
    enable: true
    enable_http2: true
    ssl_trusted_certificate: /usr/local/apisix/conf/cert/apisix.ca-bundle

  allow_admin:                  # http://nginx.org/en/docs/http/ngx_http_access_module.html#allow
    - 0.0.0.0/0              # We need to restrict ip access rules for security. 0.0.0.0/0 is for test.

  admin_key:
    - name: "admin"
      key: edd1c9f034335f136f87ad84b625c8f1
      role: admin                 # admin: manage all configuration data
                                  # viewer: only can view configuration data
    - name: "viewer"
      key: 4054f7cf07e344346cd3f287985e76a2
      role: viewer
  
  enable_control: true
  control:
    ip: "0.0.0.0"
    port: 9092

etcd:
  host:                           # it's possible to define multiple etcd hosts addresses of the same etcd cluster.
    - "http://etcd:2379"     # multiple etcd address
  prefix: "/apisix"               # apisix configurations prefix
  timeout: 30                     # 30 seconds

      
nginx_config:                     # config for render the template to generate nginx.conf
  error_log: logs/error.log
  error_log_level:  info   
  worker_processes: 1          # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES"
  stream:
    lua_shared_dict:
      internal-status: 10m
      plugin-limit-req: 10m
      plugin-limit-count: 10m
      prometheus-metrics: 10m
      plugin-limit-conn: 10m
      upstream-healthcheck: 10m
      worker-events: 10m
      lrucache-lock: 100m
      balancer-ewma: 10m
      balancer-ewma-locks: 10m
      balancer-ewma-last-touched-at: 10m
      plugin-limit-count-redis-cluster-slot-lock: 100m
      tracing_buffer: 10m
      plugin-api-breaker: 10m
      etcd-cluster-health-check: 10m
      discovery: 1m
      jwks: 1m
      introspection: 10m
      access-tokens: 1m
      ext-plugin: 1m
      tars: 1m
      cas-auth: 10m
#  http:
#    enable_access_log: true         # enable access log or not, default true
#    access_log: logs/access.log
#    access_log_format:     '$remote_addr - [$remote_addr] - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_length $request_time  $upstream_addr $upstream_response_length $upstream_response_time $upstream_status  $host [$http_client_v] [$http_device_id] [$http_utm_source] [$http_platform_brand] [$scheme] '
    
plugin_attr:
  prometheus:
    export_addr:
      ip: "0.0.0.0"
      port: 9091

@Sn0rt
Copy link
Contributor

Sn0rt commented Apr 21, 2023

@bin-53 try again ?

modify the log level.

diff --git a/conf/config-default.yaml b/conf/config-default.yaml
index 4f97adc4..07242889 100755
--- a/conf/config-default.yaml
+++ b/conf/config-default.yaml
@@ -136,7 +136,7 @@ nginx_config:                     # config for render the template to generate n
                                   # the "user" directive makes sense only if the master process runs with super-user privileges.
                                   # if you're not root user,the default is current user.
   error_log: logs/error.log
-  error_log_level:  warn          # warn,error
+  error_log_level:  debug          # warn,error
   worker_processes: auto          # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES"
   enable_cpu_affinity: false      # disable CPU affinity by default, if APISIX is deployed on a physical machine, it can be enabled and work well.
   worker_rlimit_nofile: 20480     # the number of files a worker process can open, should be larger than worker_connections

@bin-53
Copy link
Author

bin-53 commented Apr 21, 2023 via email

@Sn0rt
Copy link
Contributor

Sn0rt commented Apr 21, 2023

this is my error log file ,thanks @.*** From: Sn0rt Date: 2023-04-21 11:40 To: apache/apisix CC: bin-53; Mention Subject: Re: [apache/apisix] interval between health checks is incorrect (Issue #9327) @bin-53 try again ? modify the log level. diff --git a/conf/config-default.yaml b/conf/config-default.yaml index 4f97adc4..07242889 100755 --- a/conf/config-default.yaml +++ b/conf/config-default.yaml @@ -136,7 +136,7 @@ nginx_config: # config for render the template to generate n # the "user" directive makes sense only if the master process runs with super-user privileges. # if you're not root user,the default is current user. error_log: logs/error.log - error_log_level: warn # warn,error + error_log_level: debug # warn,error worker_processes: auto # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES" enable_cpu_affinity: false # disable CPU affinity by default, if APISIX is deployed on a physical machine, it can be enabled and work well. worker_rlimit_nofile: 20480 # the number of files a worker process can open, should be larger than worker_connections — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

sorry. I can't get the log file.

I will provider a healthcheck.lua file to override the healthcheck file.

you can download the file from the link as below.

 https://raw.githubusercontent.com/Sn0rt/lua-resty-healthcheck/sn0rt/try-fix-apisix-issues-9327/lib/resty/healthcheck.lua

and the target path is

$(APISIX_PATH)/deps//share/lua/5.1/resty/healthcheck.lua

more detailt about this file you can found from the diff block as below.

diff lib/resty/healthcheck.lua ../apisix/deps//share/lua/5.1/resty/healthcheck.lua
136,178d135
<
< -- cache timers in "init", "init_worker" phases so we use only a single timer
< -- and do not run the risk of exhausting them for large sets
< -- see https://github.com/Kong/lua-resty-healthcheck/issues/40
< -- Below we'll temporarily use a patched version of ngx.timer.at, until we're
< -- past the init and init_worker phases, after which we'll return to the regular
< -- ngx.timer.at implementation
< local ngx_timer_at do
<   local callback_list = {}
<
<   local function handler(premature)
<     if premature then
<       return
<     end
<
<     local list = callback_list
<     callback_list = {}
<
<     for _, args in ipairs(list) do
<       local ok, err = pcall(args[1], ngx_worker_exiting(), unpack(args, 2, args.n))
<       if not ok then
<         ngx.log(ngx.ERR, "timer failure: ", err)
<       end
<     end
<   end
<
<   ngx_timer_at = function(...)
<     local phase = ngx.get_phase()
<     if phase ~= "init" and phase ~= "init_worker" then
<       -- we're past init/init_worker, so replace this temp function with the
<       -- real-deal again, so from here on we run regular timers.
<       ngx_timer_at = ngx.timer.at
<       return ngx.timer.at(...)
<     end
<
<     local n = #callback_list
<     callback_list[n+1] = { n = select("#", ...), ... }
<     if n == 0 then
<       -- first one, so schedule the actual timer
<       return ngx.timer.at(0, handler)
<     end
<     return true
<   end
180,182d136
< end
<
<
321c275
<     local _, terr = ngx_timer_at(0, run_fn_locked_target_list, self, fn)
---
>     local _, terr = ngx.timer.at(0, run_fn_locked_target_list, self, fn)
576c530
<     local _, terr = ngx_timer_at(0, run_mutexed_fn, self, ip, port, hostname, fn)
---
>     local _, terr = ngx.timer.at(0, run_mutexed_fn, self, ip, port, hostname, fn)

@bin-53
Copy link
Author

bin-53 commented Apr 21, 2023

this is my error log file ,thanks @.*** From: Sn0rt Date: 2023-04-21 11:40 To: apache/apisix CC: bin-53; Mention Subject: Re: [apache/apisix] interval between health checks is incorrect (Issue #9327) @bin-53 try again ? modify the log level. diff --git a/conf/config-default.yaml b/conf/config-default.yaml index 4f97adc4..07242889 100755 --- a/conf/config-default.yaml +++ b/conf/config-default.yaml @@ -136,7 +136,7 @@ nginx_config: # config for render the template to generate n # the "user" directive makes sense only if the master process runs with super-user privileges. # if you're not root user,the default is current user. error_log: logs/error.log - error_log_level: warn # warn,error + error_log_level: debug # warn,error worker_processes: auto # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES" enable_cpu_affinity: false # disable CPU affinity by default, if APISIX is deployed on a physical machine, it can be enabled and work well. worker_rlimit_nofile: 20480 # the number of files a worker process can open, should be larger than worker_connections — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

sorry. I can't get the log file.

I will provider a healthcheck.lua file to override the healthcheck file.

you can download the file from the link as below.

 https://raw.githubusercontent.com/Sn0rt/lua-resty-healthcheck/sn0rt/try-fix-apisix-issues-9327/lib/resty/healthcheck.lua

and the target path is

$(APISIX_PATH)/deps//share/lua/5.1/resty/healthcheck.lua

more detailt about this file you can found from the diff block as below.

diff lib/resty/healthcheck.lua ../apisix/deps//share/lua/5.1/resty/healthcheck.lua
136,178d135
<
< -- cache timers in "init", "init_worker" phases so we use only a single timer
< -- and do not run the risk of exhausting them for large sets
< -- see https://github.com/Kong/lua-resty-healthcheck/issues/40
< -- Below we'll temporarily use a patched version of ngx.timer.at, until we're
< -- past the init and init_worker phases, after which we'll return to the regular
< -- ngx.timer.at implementation
< local ngx_timer_at do
<   local callback_list = {}
<
<   local function handler(premature)
<     if premature then
<       return
<     end
<
<     local list = callback_list
<     callback_list = {}
<
<     for _, args in ipairs(list) do
<       local ok, err = pcall(args[1], ngx_worker_exiting(), unpack(args, 2, args.n))
<       if not ok then
<         ngx.log(ngx.ERR, "timer failure: ", err)
<       end
<     end
<   end
<
<   ngx_timer_at = function(...)
<     local phase = ngx.get_phase()
<     if phase ~= "init" and phase ~= "init_worker" then
<       -- we're past init/init_worker, so replace this temp function with the
<       -- real-deal again, so from here on we run regular timers.
<       ngx_timer_at = ngx.timer.at
<       return ngx.timer.at(...)
<     end
<
<     local n = #callback_list
<     callback_list[n+1] = { n = select("#", ...), ... }
<     if n == 0 then
<       -- first one, so schedule the actual timer
<       return ngx.timer.at(0, handler)
<     end
<     return true
<   end
180,182d136
< end
<
<
321c275
<     local _, terr = ngx_timer_at(0, run_fn_locked_target_list, self, fn)
---
>     local _, terr = ngx.timer.at(0, run_fn_locked_target_list, self, fn)
576c530
<     local _, terr = ngx_timer_at(0, run_mutexed_fn, self, ip, port, hostname, fn)
---
>     local _, terr = ngx.timer.at(0, run_mutexed_fn, self, ip, port, hostname, fn)

Ok, I'll give it a try. Thank you for your quick reply

@bin-53
Copy link
Author

bin-53 commented Apr 23, 2023

override

Excuse me, which version of the official website does not have this problem? I am currently apisix-2.15-alpine

@Sn0rt
Copy link
Contributor

Sn0rt commented Apr 23, 2023

It is recommended to use the latest release, but what you need to know is that it is not sure whether this problem exists in the latest release version, because your problem has not yet been reproduced, and it is not sure whether the method is wrong or the environment is wrong.

@Sn0rt
Copy link
Contributor

Sn0rt commented May 16, 2023

@bin-53

already reproduced.

image

to run the process with two instances listen to different ports

from flask import Flask
from datetime import datetime
import logging


app = Flask(__name__)

import logging
log = logging.getLogger('werkzeug')
log.setLevel(logging.ERROR)

@app.route('/test')
def test_endpoint():
    current_time = datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3] 
    print("Current time:", current_time) 
    return '', 200

@app.route('/test/a')
def echo_endpoint():
    return 'a', 200



if __name__ == '__main__':
    app.run(host='0.0.0.0')

the config file of apisix

{
  "uri": "/test/*",
  "name": "test-health",
  "methods": [
    "GET"
  ],
  "upstream": {
    "nodes": [
      {
        "host": "192.168.31.224",
        "port": 5000,
        "weight": 1
      },
      {
        "host": "192.168.31.224",
        "port": 5001,
        "weight": 1
      }
    ],
    "retries": 1,
    "timeout": {
      "connect": 1,
      "send": 1,
      "read": 9
    },
    "type": "least_conn",
    "checks": {
      "active": {
        "concurrency": 10,
        "healthy": {
          "http_statuses": [
            404
          ],
          "interval": 10,
          "successes": 5
        },
        "http_path": "/test",
        "port": 5000,
        "timeout": 1,
        "type": "http",
        "unhealthy": {
          "http_failures": 10,
          "http_statuses": [
            506
          ],
          "interval": 15,
          "tcp_failures": 2,
          "timeouts": 3
        }
      }
    },
    "scheme": "http",
    "pass_host": "pass"
  }
}

@bin-53
Copy link
Author

bin-53 commented May 16, 2023

@bin-53

already reproduced.

image to run the process with two instances listen to different ports
from flask import Flask
from datetime import datetime
import logging


app = Flask(__name__)

import logging
log = logging.getLogger('werkzeug')
log.setLevel(logging.ERROR)

@app.route('/test')
def test_endpoint():
    current_time = datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3] 
    print("Current time:", current_time) 
    return '', 200

@app.route('/test/a')
def echo_endpoint():
    return 'a', 200



if __name__ == '__main__':
    app.run(host='0.0.0.0')

the config file of apisix

{
  "uri": "/test/*",
  "name": "test-health",
  "methods": [
    "GET"
  ],
  "upstream": {
    "nodes": [
      {
        "host": "192.168.31.224",
        "port": 5000,
        "weight": 1
      },
      {
        "host": "192.168.31.224",
        "port": 5001,
        "weight": 1
      }
    ],
    "retries": 1,
    "timeout": {
      "connect": 1,
      "send": 1,
      "read": 9
    },
    "type": "least_conn",
    "checks": {
      "active": {
        "concurrency": 10,
        "healthy": {
          "http_statuses": [
            404
          ],
          "interval": 10,
          "successes": 5
        },
        "http_path": "/test",
        "port": 5000,
        "timeout": 1,
        "type": "http",
        "unhealthy": {
          "http_failures": 10,
          "http_statuses": [
            506
          ],
          "interval": 15,
          "tcp_failures": 2,
          "timeouts": 3
        }
      }
    },
    "scheme": "http",
    "pass_host": "pass"
  }
}

my is the same port, but the upstream ip is different, Is it the same thing?
In a normal production environment, multiple upstream points must be deployed on different servers。

@Sn0rt
Copy link
Contributor

Sn0rt commented May 16, 2023

@bin-53
already reproduced.
image
to run the process with two instances listen to different ports

from flask import Flask
from datetime import datetime
import logging


app = Flask(__name__)

import logging
log = logging.getLogger('werkzeug')
log.setLevel(logging.ERROR)

@app.route('/test')
def test_endpoint():
    current_time = datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3] 
    print("Current time:", current_time) 
    return '', 200

@app.route('/test/a')
def echo_endpoint():
    return 'a', 200



if __name__ == '__main__':
    app.run(host='0.0.0.0')

the config file of apisix

{
  "uri": "/test/*",
  "name": "test-health",
  "methods": [
    "GET"
  ],
  "upstream": {
    "nodes": [
      {
        "host": "192.168.31.224",
        "port": 5000,
        "weight": 1
      },
      {
        "host": "192.168.31.224",
        "port": 5001,
        "weight": 1
      }
    ],
    "retries": 1,
    "timeout": {
      "connect": 1,
      "send": 1,
      "read": 9
    },
    "type": "least_conn",
    "checks": {
      "active": {
        "concurrency": 10,
        "healthy": {
          "http_statuses": [
            404
          ],
          "interval": 10,
          "successes": 5
        },
        "http_path": "/test",
        "port": 5000,
        "timeout": 1,
        "type": "http",
        "unhealthy": {
          "http_failures": 10,
          "http_statuses": [
            506
          ],
          "interval": 15,
          "tcp_failures": 2,
          "timeouts": 3
        }
      }
    },
    "scheme": "http",
    "pass_host": "pass"
  }
}

my is the same port, but the upstream ip is different, Is it the same thing? In a normal production environment, multiple upstream points must be deployed on different servers。

I think it doesn't matter. I will inspect the code to seek the root cause and then think about this situation again.

@Sn0rt
Copy link
Contributor

Sn0rt commented May 17, 2023

it can't reproduced with only one worker process.

diff --git a/conf/config-default.yaml b/conf/config-default.yaml
index 4f97adc4..dc04037b 100755
--- a/conf/config-default.yaml
+++ b/conf/config-default.yaml
@@ -136,8 +136,8 @@ nginx_config:                     # config for render the template to generate n
                                   # the "user" directive makes sense only if the master process runs with super-user privileges.
                                   # if you're not root user,the default is current user.
   error_log: logs/error.log
-  error_log_level:  warn          # warn,error
-  worker_processes: auto          # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES"
+  error_log_level:  debug          # warn,error
+  worker_processes: 1 # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES"
   enable_cpu_affinity: false      # disable CPU affinity by default, if APISIX is deployed on a physical machine, it can be enabled and work well.
   worker_rlimit_nofile: 20480     # the number of files a worker process can open, should be larger than worker_connections
   worker_shutdown_timeout: 240s   # timeout for a graceful shutdown of worker processes

and the root cause here is https://github.com/api7/lua-resty-healthcheck/blob/master/lib/resty/healthcheck.lua#L999

it's still in process how to raise a PR for fix.

https://github.com/Kong/lua-resty-healthcheck/pull/59/files

@Sn0rt
Copy link
Contributor

Sn0rt commented Jul 25, 2023

the bug has been fix with APISIX 3.4.0 release

@AlinsRan
Copy link
Contributor

AlinsRan commented Jul 25, 2023

fixed: #9590

I will close it.

@github-project-automation github-project-automation bot moved this from 📋 Backlog to ✅ Done in Apache APISIX backlog Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants