Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(healthcheck) retry locking protected fn #37

Merged
merged 1 commit into from
Dec 13, 2019

Conversation

locao
Copy link
Contributor

@locao locao commented Nov 22, 2019

When ngx.sleep API is not available (e.g. in the log phase) it's not possible to lock using lua-resty-lock and functions that must run protected were failing.

This change adds a retry method that starts a new light thread that has access to ngx.sleep and will succeed to lock.

Fix Kong/kong#5137



-- Run the given function holding a lock on the target.
-- WARNING: the callback will run unprotected, so it should never
Copy link
Member

@kikito kikito Nov 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this comment is needed. What's the advantage of invoking the callback in an unprotected way? I'd rather use pcall and remove this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! Fixed, thanks.

@hishamhm
Copy link
Contributor

hishamhm commented Dec 5, 2019

@locao I believe locking_target_list needs the same thing.

When ngx.sleep API is not available (e.g. in the log phase) it's not possible to
lock using lua-resty-lock and functions that must run protected were failing.

This change adds retry methods that start new light threads that have access
to ngx.sleep and will succeed to lock.

Fix Kong/kong#5137
@locao locao force-pushed the fix/lock_in_log_phase branch from 0ce06a2 to 8c6a96a Compare December 13, 2019 15:10
Copy link
Contributor

@hishamhm hishamhm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the test case there actually tests for the condition (we'd need multiple workers hitting that report_http_status call at the same time) so even if we could do it, a failure report would result in a flaky test at best, unless we could mock something that would force a delay within the locked region and then attempt a second request.

For what is worth, I ran the Kong balancer tests on my machine with this branch, but I don't think they exercise concurrency that much either.

@hishamhm hishamhm merged commit 14b894a into master Dec 13, 2019
@hishamhm hishamhm deleted the fix/lock_in_log_phase branch December 13, 2019 21:40
@dabue
Copy link

dabue commented Apr 23, 2020

The timer delays adding targets, so it will get an error if use the get_target_status function to get the status of a node before the targets assigned.
It's the best to use the get_target_status function by pcall, and use the default is_healthy status as result.

@dabue
Copy link

dabue commented Apr 23, 2020

the locking_target_list delays the initialization and update of the targets.

@Tieske
Copy link
Member

Tieske commented Apr 24, 2020

@dabue please do not respond to closed issues/PR's please open a new one if needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

error log
5 participants