consul-replicate sync empty data during raft leader election in the master DC #82

vaLski · 2018-03-15T08:25:47Z

Hello.

We ran into very strange issue with consul-replicate - three times.

consul-replicate v0.3.1 (172f34b) - configured with **max_stale = "5m"**
consul v0.7.5

During outage and raft leader election in the main data center, consul-replicate in the followers data centers synchronized blank or incomplete data from the main dc.

As a result, huge amount of data is erased from the prefixes we were replicating in the followers data centers. We experienced this with a prefix holding ~150000 subkeys/records.

UPDATE: My latest tests are showing that this issue can NOT be recreated if I configure consul-replicate without max_stale.

I suspect there might be something like race condition, where consul-replicate prefix watch is attached to a main dc follower that did not catch up completely with the master yet and whose kv db was empty resulting in triggering a kv prefix watch event with blank data. In this case I am not really sure how it get past the following check:

// Get the prefix data
view, ok := r.get(prefix)
if !ok {
    log.Printf("[INFO] (runner) no data for %q", prefix.Dependency)
    doneCh <- struct{}{}
    return
}

Here is a quick timeline what happened + logs

This was the last successful replication prior main dc outage

20180313T084044.363696: 2018/03/13 08:40:44.363681 [INFO] (runner) replicated 0 updates, 23 deletes

Main data center goes down and consul-replicate in follower data center start to return errors. Some of the logs skipped as they are identical.

20180313T084122.659085: 2018/03/13 08:41:22.659073 [WARN] (view) kv.list(pub@main-dc): Unexpected response code: 500 (retry attempt 1 after "10s")

Retry has been successful and some data was replicated:

20180313T084251.126752: 2018/03/13 08:42:51.126734 [INFO] (runner) quiescence timers starting
20180313T084251.276830: 2018/03/13 08:42:51.276821 [INFO] (runner) quiescence minTimer fired
20180313T084251.276842: 2018/03/13 08:42:51.276827 [INFO] (runner) running
20180313T084252.059906: 2018/03/13 08:42:52.059891 [INFO] (runner) replicated 49 updates, 53 deletes
20180313T084259.630092: 2018/03/13 08:42:59.630080 [INFO] (runner) quiescence timers starting
20180313T084259.780182: 2018/03/13 08:42:59.780168 [INFO] (runner) quiescence minTimer fired
20180313T084259.780193: 2018/03/13 08:42:59.780179 [INFO] (runner) running

Main data center goes down again:

20180313T084409.232366: 2018/03/13 08:44:09.232265 [WARN] (view) kv.list(pub@main-dc): Unexpected response code: 500 (retry attempt 4 after "1m20s")
20180313T085144.446896: 2018/03/13 08:51:44.446804 [WARN] (view) kv.list(pub@main-us): Unexpected response code: 500 (retry attempt 6 after "5m20s")

The very next log line is showing how consul-replicate synchronized empty data from the main data center resulting all keys from the pub prefix to be deleted

20180313T090040.647493: 2018/03/13 09:00:40.647411 [INFO] (runner) replicated 0 updates, 127719 deletes

Also after the outage in the main data center was resolved, it refused to synchronize proper data from the main dc KV store again due to the following code:

// Ignore if the modify index is old
    if pair.ModifyIndex <= status.LastReplicated {
        log.Printf("[DEBUG] (runner) skipping because %q is already "+
            "replicated", key)
        continue
    }

To force re-synchronization I had to erase recursively, all entries under the following prefix:

consul kv delete -recurse service/consul-replicate/statuses

Not sure if adjusting the following options could have prevented (sync blank data) this from happening. I am still unsure how to read those timers exactly. Did they mean "force replication each N seconds" or I am getting this wrong. How they are related to the cluster consistent state?

# This is the quiescence timers; it defines the minimum and maximum amount of
# time to wait for the cluster to reach a consistent state before rendering a
# replicating. This is useful to enable in systems that have a lot of flapping,
# because it will reduce the the number of times a replication occurs.
wait {
  min = "5s"
  max = "10s"
}

I am aware that the consul version we are using is a bit old, but I suspect that the issue is more to be related consul-replicate.

The text was updated successfully, but these errors were encountered:

vaLski · 2018-03-19T13:15:18Z

I managed to reproduce this on a test cluster as follows:

3 machines in dc1
3 machines in dc2

consul-replicate running on a consul server in dc2 is replicating 5 different prefixes from dc1 to dc2

source = "pub@dc1"
source = "shared@dc1"
source = "priv@dc1"
source = "consul-template@dc1"
source = "event/input@dc1"

The pub prefix got ~120k records in it.

On a server in dc1 every 10 seconds I "touch" 9 keys and erase 1 random key

while true; do
  cnt=0
  consul kv get -recurse pub | shuf | head  -n 10  | sed  's/:/ /'  | while read key value;  do
    if [[ ${cnt} == 9 ]];  then
      consul kv delete "${key}"
    else
      consul kv put "${key}" "${value}"
    fi
    cnt=$((${cnt}+1))
  done
  sleep 10
done

In a separate console i am sending SIGKILL to all consul servers in dc1 which are immediately started again by the "perp" supervisor

while true; do
  for i in sof1 sof2 sof3; do
    ssh ${i} 'perpctl k consul' &
  done
  sleep 30
done

After several passes I see how all keys under the pub/ prefix are deleted:

2018-03-19 12:12:55.432404 2018/03/19 12:12:55.432397 [INFO] (runner) replicated 0 updates, 127739 deletes

The interesting thing is that the data from the other prefixes is received and shown as already replicated thus not marked for deletion:

2018-03-19 12:08:33.180186 2018/03/19 12:08:33.180182 [DEBUG] (runner) skipping because "event/input/cc77a17c-5e48-4614-995f-470dd5f4ed0b" is already replicated

However it appears that consul-replicate is receiving blank data from the "pub" prefix watch and it is thinking that all records were erased from the master dc. As a result they are removed from the following dc as well.

Not sure if this is problem with consul-replicate not waiting enough for the master dc cluster to reach consistent state and triggering/receiving incomplete keyprefix watch data or it is problem with the consul itself which under certain unknown conditions is sending blank data in response to keyprefix watches.

Still not sure how blank data for our pub prefix got past this check as it appears to be per-prefix:

// Get the last status
status, err := r.tStatus(prefix)
if err != nil {
    errCh <- fmt.Errorf("failed to read replication status: %s", err)
    return
}

Any assistance will be highly appreciated.

vaLski · 2018-03-20T09:05:50Z

Still not sure if this is a consul issue or consul-replicate issue.

I checked consul changelog and find out the following:

hashicorp/consul#2644

Do you think it might be related?

Currently I am in process of upgrading my test cluster to the latest consul version so I can see if upgrading it will make this issue go away. Still not sure if consul or consul-replicate issue.

Any assistance will be highly appreciated.

vaLski · 2018-03-20T14:18:10Z

Quick update on the issue.

This case has been proven to be present even with the latest consul version available 1.0.6 and raft version 1

Steps to reproduce

create 3 servers in dc1
create 3 servers in dc2
start consul-replicate of selected prefix from dc1 to dc2
push ~130000-~150000 kv records in the prefix
start randomly reading 7 and erasing 3 records from the prefix each 10 seconds
during the same time issue restart forcefully consul on all 3 nodes in the master dc1 each 30 seconds
test started 1521549485
data loss in DC2 at 1521553967
2018-03-20 13:52:44.040452 2018/03/20 13:52:44.040433 [INFO] (runner) replicated 0 updates, 127118 deletes

safels and safetree behave exactly like the native ls and tree with one exception. They will *refuse* to render template, if KV prefix query return blank/empty data. This is especially usefull for rendering mission critical files that do not tolerate ls/tree KV queries to return blank data. safels and safetree work in stale mode just as their ancestors but we get extra safety on top. safels and safetree commands were born as an attempt to mitigate issues described here: hashicorp#1131 hashicorp/consul#3975 hashicorp/consul-replicate#82

Might avoid doing hashicorp/consul-template#1132 And might fix the following bugs: * hashicorp/consul-replicate#82 * hashicorp#3975 * hashicorp/consul-template#1131

…#4554) Ensure that DB is properly initialized when performing stale queries Addresses: - hashicorp/consul-replicate#82 - #3975 - hashicorp/consul-template#1131

pierresouchay · 2018-08-23T20:08:57Z

FIxed by hashicorp/consul#4554

nitsh · 2019-01-03T06:28:33Z

@pierresouchay We are facing similar issue with consul 1.4.0 . After the upgrade, the consul-replicate started deleting the keys once in a while. We run consul-replicate along with one of the consul server. We're trying to replicate the issue as of now.

arecker · 2019-01-03T21:16:06Z

@vaLski, just wanted to say 'thanks' for this thoroughly documented issue. It really helped us out.

vaLski · 2019-01-04T08:01:58Z

@arecker Have you experienced the same issue again? It is supposed to be fixed in hashicorp/consul#4554. Since we upgraded our consul servers to 1.2.3 release which carry the mentioned patch, the problem went away. Thanks to @pierresouchay for the fix and the assistance provided while tracking this down.

@nitsh Did you find the reason for the problem and solution for it? Kindly share further details if so. If you are still facing this issue kindly run consul-replicate in trace mode and attach log snippets where we can see tracing information of errors, debugging info while deleting data etc. In my personal case I was seeing some 5xx class errors in the consul-replicate logs during master-dc leader election and shortly afterwards deleting XXXXXXX keys as consul-replicate was configured with the stale flag set and it was syncing data from a raft-follower whose raft log? (data) was empty. But that is supposed to be fixed already. However since then, consul-replicate in our setup is configured to NEVER use the stale flag but rather to always query the raft leader in the parent DC.

arecker · 2019-01-04T13:23:44Z

@arecker Have you experienced the same issue again? It is supposed to be fixed in hashicorp/consul#4554. Since we upgraded our consul servers to 1.2.3 release which carry the mentioned patch, the problem went away.

We did, but we are still running an older version of consul. We plan to upgrade to 1.2.3 soon so we don't run into it again.

…

-- Alex Recker [email protected]

safels and safetree behave exactly like the native ls and tree with one exception. They will *refuse* to render template, if KV prefix query return blank/empty data. This is especially usefull for rendering mission critical files that do not tolerate ls/tree KV queries to return blank data. safels and safetree work in stale mode just as their ancestors but we get extra safety on top. safels and safetree commands were born as an attempt to mitigate issues described here: #1131 hashicorp/consul#3975 hashicorp/consul-replicate#82

vaLski mentioned this issue Mar 21, 2018

Flapping consul servers emit empty data for keyprefix watches leading to KV data loss when used with consul-replicate hashicorp/consul#3975

Closed

vaLski mentioned this issue Aug 6, 2018

consul-template render empty data during consul server raft election/change hashicorp/consul-template#1131

Closed

vaLski mentioned this issue Aug 15, 2018

Introducing safels and safetree commands to the consul-template. hashicorp/consul-template#1132

Closed

pierresouchay mentioned this issue Aug 21, 2018

[BUGFIX] Avoid returning empty data on startup of a non-leader server hashicorp/consul#4554

Merged

vaLski closed this as completed Feb 26, 2020

alkalinecoffee mentioned this issue Jul 22, 2020

Consul Upgrade with Replicate Results in Missing KVs hashicorp/consul#8351

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consul-replicate sync empty data during raft leader election in the master DC #82

consul-replicate sync empty data during raft leader election in the master DC #82

vaLski commented Mar 15, 2018 •

edited

Loading

vaLski commented Mar 19, 2018 •

edited

Loading

vaLski commented Mar 20, 2018

vaLski commented Mar 20, 2018 •

edited

Loading

pierresouchay commented Aug 23, 2018

nitsh commented Jan 3, 2019 •

edited

Loading

arecker commented Jan 3, 2019

vaLski commented Jan 4, 2019

arecker commented Jan 4, 2019 via email

consul-replicate sync empty data during raft leader election in the master DC #82

consul-replicate sync empty data during raft leader election in the master DC #82

Comments

vaLski commented Mar 15, 2018 • edited Loading

vaLski commented Mar 19, 2018 • edited Loading

vaLski commented Mar 20, 2018

vaLski commented Mar 20, 2018 • edited Loading

pierresouchay commented Aug 23, 2018

nitsh commented Jan 3, 2019 • edited Loading

arecker commented Jan 3, 2019

vaLski commented Jan 4, 2019

arecker commented Jan 4, 2019 via email

vaLski commented Mar 15, 2018 •

edited

Loading

vaLski commented Mar 19, 2018 •

edited

Loading

vaLski commented Mar 20, 2018 •

edited

Loading

nitsh commented Jan 3, 2019 •

edited

Loading