Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"failed to release lock" error during active healthchecks #9221

Closed
1 task done
codemug opened this issue Aug 9, 2022 · 6 comments
Closed
1 task done

"failed to release lock" error during active healthchecks #9221

codemug opened this issue Aug 9, 2022 · 6 comments

Comments

@codemug
Copy link

codemug commented Aug 9, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Kong version ($ kong version)

2.8.1

Current Behavior

I'm running Kong in declarative mode to expose a bunch of services. I've configured the following active healthcheck on all of them:

active:
  type: http
  httpPath: "/health"
  healthy:
    httpStatuses:
      - 200
      - 302
    interval: 10
    successes: 2
  timeout: 15
  unhealthy:
    httpStatuses:
      - 429
      - 404
      - 500
      - 501
      - 502
      - 503
      - 504
      - 505
    httpFailures: 8
    interval: 10
    tcpFailures: 8
    timeouts: 8

Every once in a while, an upstream target randomly becomes unhealthy and even after the actual upstream service becomes available, the status of the upstream target never returns to healthy.

In one of these instances, I've noticed the following logs in Kong proxy:

2022/08/09 12:04:18 [error] 1118#0: *136780853 [lua] healthcheck.lua:1235: log(): [healthcheck] (0dc6f45b-8f8d-40d2-a504-473544ee190b:my-service) failed to release lock 'lua-resty-healthcheck:0dc6f45b-8f8d-40d2-a504-473544ee190b:my-service:target_list_lock': unlocked, context: ngx.timer
2022/08/09 12:04:18 [error] 1118#0: *136780854 [lua] healthcheck.lua:1235: log(): [healthcheck] (0dc6f45b-8f8d-40d2-a504-473544ee190b:my-service) failed to release lock 'lua-resty-healthcheck:0dc6f45b-8f8d-40d2-a504-473544ee190b:my-service:target_list_lock': unlocked, context: ngx.timer
2022/08/09 12:04:18 [warn] 1115#0: *136781978 [lua] healthcheckers.lua:98: callback(): [healthchecks] balancer 0dc6f45b-8f8d-40d2-a504-473544ee190b:my-service reported health status changed to UNHEALTHY, context: ngx.timer

What kind of lock is this?

I've even checked the resource usage of Kong. It's allowed to use 2 CPUs and 4Gs of memory. The CPU usage remains at around 1% and the memory never crosses 1G.

Expected Behavior

The upstream target should be set to healthy by the healthchecker once the upstream service is available.

Steps To Reproduce

This occurs very randomly, I have tried reproducing it several times but to no avail.

Anything else?

No response

@surenraju-careem
Copy link

surenraju-careem commented Aug 10, 2022

Kong 2.8.1 uses lua-resty-healthcheck = 1.5.1. There is new minor version lua-resty-healthcheck = 1.5.2 which has lot of bug fixes related to handling locking.

Kong/lua-resty-healthcheck@1.5.1...1.5.2

Kong/lua-resty-healthcheck#113
Kong/lua-resty-healthcheck#114
Kong/lua-resty-healthcheck#116

lua-resty-healthcheck version 1.5.2 does not even have log statement failed to release reported on this issue.

Is there any plan to release(ex: 2.8.2) kong with lua-resty-healthcheck = 1.5.2?

@codemug
Copy link
Author

codemug commented Aug 15, 2022

I should add that my upstream has a single target registered with it. I'm suspecting that this has something to do with the default value of healthchecks.active.concurrency which is 10 by default. Maybe having 10 parallel timers on a single target is causing lock timeouts due to long response delays from the upstream.

@codemug
Copy link
Author

codemug commented Aug 17, 2022

Adding more info here, I think this is related to how I'm managing the routing configurations on my setup. I'm running Kong in declarative mode and I have a sidecar daemon that's doing a POST on the config endpoint every 5 seconds. I based this design on the ingress controller. From debug logs I'm seeing that due to this, the configuration cache is being flushed and refreshed every time a POST call is made which may also be resetting the healthchecks.
I'm investigating it more.

@codemug
Copy link
Author

codemug commented Aug 29, 2022

My above assumption was correct. Each new POST call will reset the healthchecks. Closing this ticket.

@codemug codemug closed this as completed Aug 29, 2022
@mcdullbloom
Copy link

My above assumption was correct. Each new POST call will reset the healthchecks. Closing this ticket.

I also have a ingress controller to change my usptream node, then health check object will be created again.And I got the "failed to release lock" log.
I still don't understand why healthchecks reset will lead the lock released failed.Can you explain it more detail?Thanks

@mcdullbloom
Copy link

I found the answer by debuging lua-resty-lock. There is a bug in pcall which has been fixed.

fix(lock) handle all resty.lock failure modes (Kong/lua-resty-healthcheck#112)

Kong 2.8.1 uses lua-resty-healthcheck = 1.5.1. There is new minor version lua-resty-healthcheck = 1.5.2 which has lot of bug fixes related to handling locking.

Kong/[email protected]

Kong/lua-resty-healthcheck#113 Kong/lua-resty-healthcheck#114 Kong/lua-resty-healthcheck#116

lua-resty-healthcheck version 1.5.2 does not even have log statement failed to release reported on this issue.

Is there any plan to release(ex: 2.8.2) kong with lua-resty-healthcheck = 1.5.2?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants