Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watch getting triggered in consul reload #7446

Closed
akshatgit opened this issue Mar 13, 2020 · 16 comments · Fixed by #7526
Closed

Watch getting triggered in consul reload #7446

akshatgit opened this issue Mar 13, 2020 · 16 comments · Fixed by #7526
Assignees
Labels
theme/config Relating to Consul Agent configuration, including reloading

Comments

@akshatgit
Copy link

akshatgit commented Mar 13, 2020

When filing a bug, please include the following headings if possible. Any example text in this template can be deleted.

Overview of the Issue

I have a service watch configured with 'passingonly' parameter. Whenever I do a consul reload on the box with service running, the watch gets triggered twice. Consul reload should not trigger watch. Please help me if I am configuring anything wrong.

Reproduction Steps

Steps to reproduce this issue, eg:

  1. Create a cluster with 15 client nodes and 3 server nodes
  2. Create a service. My service:
{
    "service": {
        "checks": [
            {
                "failures_before_critical": 3,
                "http": "http://localhost:80",
                "interval": "5s",
                "name": "HTTP Check",
                "success_before_passing": 1,
                "timeout": "1s"
            }
        ],
        "enable_tag_override": false,
        "id": "xxx",
        "name": "abc",
        "port": 80,
        "tags": [
            "xxx"
        ]
    }
}
  1. Create watch. My watch:
{
   "watches": [
        {
            "type": "service",
            "service": "abc",
            "passingonly": true,
            "handler_type": "script",
            "args": ["/opt/blah.sh"]
        }
    ]
}
  1. Reload consul on any box with service running. Watch will get triggered twice.

Consul info for both Client and Server

Client info
agent:
	check_monitors = 0
	check_ttls = 0
	checks = 1
	services = 1
build:
	prerelease =
	revision = 2cf0a3c8
	version = 1.7.1
consul:
	acl = disabled
	known_servers = 3
	server = false
runtime:
	arch = amd64
	cpu_count = 8
	goroutines = 47
	max_procs = 8
	os = linux
	version = go1.13.7
serf_lan:
	coordinate_resets = 0
	encrypted = true
	event_queue = 0
	event_time = 5
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 210
	members = 21
	query_queue = 0
	query_time = 1
Server info
agent:
	check_monitors = 0
	check_ttls = 0
	checks = 0
	services = 0
build:
	prerelease =
	revision = 2cf0a3c8
	version = 1.7.1
consul:
	acl = disabled
	bootstrap = false
	known_datacenters = 3
	leader = false
	leader_addr = a.b.c.d:8300
	server = true
raft:
	applied_index = 174601
	commit_index = 174601
	fsm_pending = 0
	last_contact = 26.908676ms
	last_log_index = 174601
	last_log_term = 5
	last_snapshot_index = 163894
	last_snapshot_term = 3
	latest_configuration = [{Suffrage:Voter ID:383373b5-75d1-78a7-fad2-d36f2df0ea9e Address:172.19.61.162:8300} {Suffrage:Voter ID:4e89dd31-c200-5d1a-c891-138bbb188143 Address:172.19.31.212:8300} {Suffrage:Voter ID:a96e5d29-8fa6-5b15-9596-8e773226150a Address:172.19.32.63:8300}]
	latest_configuration_index = 0
	num_peers = 2
	protocol_version = 3
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Follower
	term = 5
runtime:
	arch = amd64
	cpu_count = 8
	goroutines = 123
	max_procs = 8
	os = linux
	version = go1.13.7
serf_lan:
	coordinate_resets = 0
	encrypted = true
	event_queue = 0
	event_time = 5
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 210
	members = 21
	query_queue = 0
	query_time = 1
serf_wan:
	coordinate_resets = 0
	encrypted = true
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 158
	members = 9
	query_queue = 0
	query_time = 1

Operating system and Environment details

Debian 10.1

@pierresouchay
Copy link
Contributor

Might be fixed by #7345 => do you have the same behavior with Consul 1.5.2 or less ?

@akshatgit
Copy link
Author

Currently, I'm using consul 1.7.1, which was upgraded from 1.6.2. I have no information about 1.5.2

@pierresouchay
Copy link
Contributor

@akshatgit I added a unit test in #7449 ... from what I see, it does not happen on Consul upstream anymore (while this test fails often with 1.7.0 for instance), so, probably the same bug as #7318 (aka, a bug present since 1.5.3)

@akshatgit
Copy link
Author

akshatgit commented Mar 13, 2020

Great Work @pierresouchay!
Waiting for the new release to include this fix. If this never happens in upstream, we should close it. Thanks a lot for your help!

@akshatgit
Copy link
Author

@pierresouchay,
Reload issue is fixed, but there is one more issue. When we do a stop-start of consul on a client running our service, we most of the time get 3 watch triggers. Once when the consul is stopped, and two times when it is started back. This behavior is not consistent, sometimes the stop-start works perfectly and watch is triggered only 2 times.
I believe it needs to be further looked into, hence re-opening the issue.

@akshatgit akshatgit reopened this Mar 17, 2020
@pierresouchay
Copy link
Contributor

@akshatgit what is the exact behavior you have for the 2nd (when you have 3 notifications) notification ? empty ?

@akshatgit
Copy link
Author

I have a script handler, which looks for a change in member list for the particular service when watch is triggered. This reports '0' change in the member list.

@pierresouchay
Copy link
Contributor

@akshatgit For me, doing at restart on an agent having a service might be doing 3 changes:

  1. 1 for shutdown (consul agent down)
  2. 1 for startup (consul agent up, but service red, health check not done yet)
  3. 1 for service being healthy and in sync

So, nothing really abnormal.
It is however possible that (1/2 or 2/3) are sent at the same time, so you might have just one notification, or even have no notifications when 1, 2, 3 are very closed and initial state is the same as final state

@akshatgit
Copy link
Author

If you see my watch config, I have mentioned "passingonly": true. Therefore your point 2, will not happen. The watch should only get 2 notifications, once when consul is stopped and second when it is started back and service is healthy.
Let me know if there is an error in my understanding.

@pierresouchay
Copy link
Contributor

@akshatgit I am curious because I spent quite some time to have a precise unit test in #7449 => and I am able to run 200+ times this test without a single failure on Consul upstream.

Are you sure you are doing a consul reload and not a start/stop?

@akshatgit
Copy link
Author

I think there is a misunderstanding, for consul reload this issue is fixed. I reported it again for stop-start.
Do you want to take the discussion offline?

Thanks,
Akshat

@pierresouchay
Copy link
Contributor

@akshatgit Will be probably fixed by #7526

@hanshasselberg hanshasselberg self-assigned this Apr 1, 2020
@akshatgit
Copy link
Author

@pierresouchay, I am not sure if #7526 will fix the issue. The problem was watch getting triggered thrice when another agent having service was stop/start. This will make sure the watch is shut down when the agent with watch is stopped. Please correct me if I am wrong.

pierresouchay added a commit to pierresouchay/consul that referenced this issue Apr 3, 2020
This ensures no regression about hashicorp#7318

And ensure that hashicorp#7446 cannot happen anymore
@jsosulska jsosulska added the theme/config Relating to Consul Agent configuration, including reloading label Apr 7, 2020
@Sudhar287
Copy link

Does this issue have any relation to #571?

@jessicahartono-okta
Copy link

The reload issue seem to still be in version 1.7.2. When you reload the watch on the instance that is doing the watch, not the service being watched, the script is triggered.

hanshasselberg pushed a commit that referenced this issue May 20, 2020
…ul reload (#7449)

This ensures no regression about #7318
And ensure that #7446 cannot happen anymore
@Sudhar287
Copy link

Is this going to solve #571 too? My code in #7616 is a low-level fix of not triggering the watches instead of #7526 stopping all the watches during the reload. I've also proposed a flag to enable users to set/unset this behavior...

rboyer pushed a commit that referenced this issue Jun 1, 2020
…ul reload (#7449)

This ensures no regression about #7318
And ensure that #7446 cannot happen anymore
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/config Relating to Consul Agent configuration, including reloading
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants