Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suggestion: periodic lock time change to configurable (0.001 change to configurable) #32

Closed
huntkalio opened this issue Sep 27, 2019 · 6 comments · Fixed by #59
Closed

Comments

@huntkalio
Copy link

local ok, err = self.shm:add(key, true, interval - 0.001)

I use lua-resty-healthcheck to test, and found that sometimes there will only one checks between 2 interval(There is only one worker start health checker).This is because the checker process to fast ,the time of consuming is less than 0.001。So,I suggest that the shm key expire time can be controled。

@Tieske
Copy link
Member

Tieske commented Sep 28, 2019

We already do that:

local ok, err = self.shm:add(key, true, interval - 0.001)

If it is really 2 intervals, then did you experience a worker process failing, or an nginx reload?

Can you reproduce the behaviour?

@huntkalio
Copy link
Author

huntkalio commented Sep 29, 2019

healthcheck.txt

I add some log info in healthcheck.lua for debugging.In the test,I create a checker with 20 seconds interval (NOTICE:the nginx and the upstream are on the same machine).I log the interval of calling of function get_periodic_lock(),which is marked as time_between.And then I found that sometimes the interval is less than 19.999 ( 20 - 0.001),so function get_periodic_lock() return false due to shm key exists.The log is in the last.

To solve this problem in my case,I simplyly change 0.001 to 0.01 and it works for me .So,I suggested let 0.001 change to configurable .But I don't know what causes this problem,may be the timer start earlier?

2019/09/29 11:08:54 [debug] 10071#0: *7 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726534.621000051498 Got initial target list (0 targets)
2019/09/29 11:08:54 [debug] 10071#0: *7 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726534.621000051498 timers started
2019/09/29 11:08:54 [debug] 10071#0: *7 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726534.621000051498 Healthchecker started!
2019/09/29 11:08:54 [debug] 10071#0: *7 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726534.621000051498 event: local cache cleared
2019/09/29 11:08:54 [debug] 10071#0: *7 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726534.621000051498 event: target added '10.35.168.33:15700(10.35.168.33:15700)'
2019/09/29 11:08:54 [debug] 10071#0: *7 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726534.621000051498 event: target status '10.35.168.33:15700(10.35.168.33:15700)' from 'false' to 'true'
2019/09/29 11:08:54 [debug] 10071#0: *11 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726534.621000051498 add key 'lua-resty-healthcheck:feign-demo:period_lock:healthy', time_between: 1569726534.621
2019/09/29 11:08:54 [debug] 10071#0: *11 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726534.621000051498 checking healthy targets: #1
2019/09/29 11:08:54 [debug] 10071#0: *11 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726534.621000051498 Checking 10.35.168.33:15700 10.35.168.33:15700 (currently healthy)
2019/09/29 11:08:54 [debug] 10071#0: *11 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726534.628999948502 Reporting '10.35.168.33:15700 (10.35.168.33:15700)' (got HTTP 404)
2019/09/29 11:08:54 [debug] 10071#0: *11 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726534.628999948502 checking healthy gctimer interval is :19.992000102997
2019/09/29 11:09:14 [debug] 10071#0: *58 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726554.621000051498 add key 'lua-resty-healthcheck:feign-demo:period_lock:healthy', time_between: 20
2019/09/29 11:09:14 [debug] 10071#0: *58 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726554.621000051498 checking healthy targets: #1
2019/09/29 11:09:14 [debug] 10071#0: *58 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726554.621000051498 Checking 10.35.168.33:15700 10.35.168.33:15700 (currently healthy)
2019/09/29 11:09:14 [debug] 10071#0: *58 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726554.625999927521 Reporting '10.35.168.33:15700 (10.35.168.33:15700)' (got HTTP 404)
2019/09/29 11:09:14 [debug] 10071#0: *58 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726554.625999927521 checking healthy gctimer interval is :19.995000123978
2019/09/29 11:09:34 [error] 10071#0: *105 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726574.619999885559 key 'lua-resty-healthcheck:feign-demo:period_lock:healthy': exists, time_between: 19.998999834061, context: ngx.timer
2019/09/29 11:09:34 [debug] 10071#0: *105 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726574.619999885559 checking healthy gctimer interval is :20
2019/09/29 11:09:54 [debug] 10071#0: *151 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726594.619999885559 add key 'lua-resty-healthcheck:feign-demo:period_lock:healthy', time_between: 39.998999834061
2019/09/29 11:09:54 [debug] 10071#0: *151 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726594.619999885559 checking healthy targets: #1
2019/09/29 11:09:54 [debug] 10071#0: *151 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726594.619999885559 Checking 10.35.168.33:15700 10.35.168.33:15700 (currently healthy)
2019/09/29 11:09:54 [debug] 10071#0: *151 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726594.622999906540 Reporting '10.35.168.33:15700 (10.35.168.33:15700)' (got HTTP 404)
2019/09/29 11:09:54 [debug] 10071#0: *151 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726594.622999906540 checking healthy gctimer interval is :19.996999979019
2019/09/29 11:10:14 [error] 10071#0: *195 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726614.618000030518 key 'lua-resty-healthcheck:feign-demo:period_lock:healthy': exists, time_between: 19.998000144958, context: ngx.timer
2019/09/29 11:10:14 [debug] 10071#0: *195 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726614.618000030518 checking healthy gctimer interval is :20
2019/09/29 11:10:34 [debug] 10071#0: *239 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726634.619999885559 add key 'lua-resty-healthcheck:feign-demo:period_lock:healthy', time_between: 40
2019/09/29 11:10:34 [debug] 10071#0: *239 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726634.619999885559 checking healthy targets: #1
2019/09/29 11:10:34 [debug] 10071#0: *239 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726634.619999885559 Checking 10.35.168.33:15700 10.35.168.33:15700 (currently healthy)
2019/09/29 11:10:34 [debug] 10071#0: *239 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726634.621000051498 Reporting '10.35.168.33:15700 (10.35.168.33:15700)' (got HTTP 404)
2019/09/29 11:10:34 [debug] 10071#0: *239 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726634.621000051498 checking healthy gctimer interval is :19.998999834061
2019/09/29 11:10:54 [debug] 10071#0: *286 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726654.621000051498 add key 'lua-resty-healthcheck:feign-demo:period_lock:healthy', time_between: 20.001000165939
2019/09/29 11:10:54 [debug] 10071#0: *286 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726654.621000051498 checking healthy targets: #1
2019/09/29 11:10:54 [debug] 10071#0: *286 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726654.621000051498 Checking 10.35.168.33:15700 10.35.168.33:15700 (currently healthy)
2019/09/29 11:10:54 [debug] 10071#0: *286 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726654.624000072479 Reporting '10.35.168.33:15700 (10.35.168.33:15700)' (got HTTP 404)
2019/09/29 11:10:54 [debug] 10071#0: *286 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726654.624000072479 checking healthy gctimer interval is :19.996999979019
2019/09/29 11:11:14 [error] 10071#0: *331 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726674.619999885559 key 'lua-resty-healthcheck:feign-demo:period_lock:healthy': exists, time_between: 19.998999834061, context: ngx.timer
2019/09/29 11:11:14 [debug] 10071#0: *331 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726674.619999885559 checking healthy gctimer interval is :20
2019/09/29 11:11:34 [debug] 10071#0: *377 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726694.621999979019 add key 'lua-resty-healthcheck:feign-demo:period_lock:healthy', time_between: 40.000999927521
2019/09/29 11:11:34 [debug] 10071#0: *377 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726694.621999979019 checking healthy targets: #1
2019/09/29 11:11:34 [debug] 10071#0: *377 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726694.621999979019 Checking 10.35.168.33:15700 10.35.168.33:15700 (currently healthy)
2019/09/29 11:11:34 [debug] 10071#0: *377 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726694.625999927521 Reporting '10.35.168.33:15700 (10.35.168.33:15700)' (got HTTP 404)
2019/09/29 11:11:34 [debug] 10071#0: *377 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726694.625999927521 checking healthy gctimer interval is :19.996000051498
2019/09/29 11:11:54 [debug] 10071#0: *427 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726714.621999979019 add key 'lua-resty-healthcheck:feign-demo:period_lock:healthy', time_between: 20
2019/09/29 11:11:54 [debug] 10071#0: *427 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726714.621999979019 checking healthy targets: #1
2019/09/29 11:11:54 [debug] 10071#0: *427 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726714.621999979019 Checking 10.35.168.33:15700 10.35.168.33:15700 (currently healthy)
2019/09/29 11:11:54 [debug] 10071#0: *427 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726714.622999906540 Reporting '10.35.168.33:15700 (10.35.168.33:15700)' (got HTTP 404)
2019/09/29 11:11:54 [debug] 10071#0: *427 [lua] healthcheck.lua:1023: log(): [healthcheck] (feign-demo) 1569726714.622999906540 checking healthy gctimer interval is :19.999000072479

edit: reformatted log

@Tieske
Copy link
Member

Tieske commented Sep 30, 2019

the log you added there is only a single worker. Can you add the complete log? because it might be that another worker ran the missing healthchecks instead of this one.

@huntkalio
Copy link
Author

My nginx have two worker,but I only create checker in one worker,another worker no checker.

@Tieske
Copy link
Member

Tieske commented Sep 30, 2019

how did you do that? can you show the code?

@huntkalio
Copy link
Author

I only add a flag in my lua code
local IS_CREATE_CHECKER = ngx.worker.id() == 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants