Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange panics on 1.6.2 #6892

Closed
imcitius opened this issue Dec 6, 2019 · 4 comments · Fixed by #6945
Closed

Strange panics on 1.6.2 #6892

imcitius opened this issue Dec 6, 2019 · 4 comments · Fixed by #6945
Labels
type/bug Feature does not function as expected

Comments

@imcitius
Copy link

imcitius commented Dec 6, 2019

Hello

I recently setup a new application cluster with consul 1.6.1.
The cluster is very same as few dozens of same clusters I manage, but I getting random consul cluster malfunctions.

Seeking into consul logs on servers side, I found these panics:

Dec 06 09:22:34 consul-0 consul[24512]: 2019/12/06 09:22:34 [INFO]  raft: aborting pipeline replication to peer {Voter a55e737c-0abb-1d5e-0273-c7b534b0b86f 192.168.124.186:8300}
Dec 06 09:22:34 consul-0 consul[24512]: panic: runtime error: index out of range
Dec 06 09:22:34 consul-0 consul[24512]: goroutine 169 [running]:
Dec 06 09:22:34 consul-0 consul[24512]: github.com/hashicorp/consul/logger.(*LogFile).pruneFiles(0xc00047d3b0, 0x0, 0x0)
Dec 06 09:22:34 consul-0 consul[24512]: /home/circleci/project/consul/logger/logfile.go:110 +0x231
Dec 06 09:22:34 consul-0 consul[24512]: github.com/hashicorp/consul/logger.(*LogFile).rotate(0xc00047d3b0, 0x2f42dd8, 0xc00047d418)
Dec 06 09:22:34 consul-0 consul[24512]: /home/circleci/project/consul/logger/logfile.go:88 +0x71
Dec 06 09:22:34 consul-0 consul[24512]: github.com/hashicorp/consul/logger.(*LogFile).Write(0xc00047d3b0, 0xc000700dc0, 0x8a, 0x282, 0x0, 0x0, 0x0)
Dec 06 09:22:34 consul-0 consul[24512]: /home/circleci/project/consul/logger/logfile.go:133 +0xc6
Dec 06 09:22:34 consul-0 consul[24512]: io.(*multiWriter).Write(0xc0006e01c0, 0xc000700dc0, 0x8a, 0x282, 0x0, 0xc0004cae00, 0x22bdf603)
Dec 06 09:22:34 consul-0 consul[24512]: /usr/local/go/src/io/multi.go:60 +0x87
Dec 06 09:22:34 consul-0 consul[24512]: github.com/hashicorp/go-hclog.(*writer).Flush(0xc00013f980, 0xbf72a55600000003, 0x4e94d6bc8c0a, 0x51db8e0)
Dec 06 09:22:34 consul-0 consul[24512]: /go/pkg/mod/github.com/hashicorp/[email protected]/writer.go:21 +0x13c
Dec 06 09:22:34 consul-0 consul[24512]: github.com/hashicorp/go-hclog.(*intLogger).Log(0xc0007574a0, 0xc000000003, 0xc00016e380, 0x67, 0x0, 0x0, 0x0)
Dec 06 09:22:34 consul-0 consul[24512]: /go/pkg/mod/github.com/hashicorp/[email protected]/intlogger.go:115 +0x13b
Dec 06 09:22:34 consul-0 consul[24512]: github.com/hashicorp/go-hclog.(*intLogger).Info(0xc0007574a0, 0xc00016e380, 0x67, 0x0, 0x0, 0x0)
Dec 06 09:22:34 consul-0 consul[24512]: /go/pkg/mod/github.com/hashicorp/[email protected]/intlogger.go:400 +0x65
Dec 06 09:22:34 consul-0 consul[24512]: github.com/hashicorp/raft.(*Raft).pipelineReplicate(0xc000250000, 0xc000bc8780, 0x0, 0x0)
Dec 06 09:22:34 consul-0 consul[24512]: /go/pkg/mod/github.com/hashicorp/[email protected]/replication.go:456 +0x894
Dec 06 09:22:34 consul-0 consul[24512]: github.com/hashicorp/raft.(*Raft).replicate(0xc000250000, 0xc000bc8780)
Dec 06 09:22:34 consul-0 consul[24512]: /go/pkg/mod/github.com/hashicorp/[email protected]/replication.go:183 +0x29d
Dec 06 09:22:34 consul-0 consul[24512]: github.com/hashicorp/raft.(*Raft).startStopReplication.func1()
Dec 06 09:22:34 consul-0 consul[24512]: /go/pkg/mod/github.com/hashicorp/[email protected]/raft.go:488 +0x33
Dec 06 09:22:34 consul-0 consul[24512]: github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc000250000, 0xc0001c6520)
Dec 06 09:22:34 consul-0 consul[24512]: /go/pkg/mod/github.com/hashicorp/[email protected]/state.go:146 +0x53
Dec 06 09:22:34 consul-0 consul[24512]: created by github.com/hashicorp/raft.(*raftState).goFunc
Dec 06 09:22:34 consul-0 systemd[1]: consul.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Dec 06 09:22:34 consul-0 systemd[1]: Unit consul.service entered failed state.
Dec 06 09:22:34 consul-0 systemd[1]: consul.service failed.

I tried to upgrade cluster to 1.6.2, but its not helped.
I do not know how to reproduce it, because it is very random, 1-2 occurrences per day.
Could this be a bug?

Operating system and Environment details

Centos 7, amd64

@hanshasselberg hanshasselberg added the type/bug Feature does not function as expected label Dec 6, 2019
@hanshasselberg
Copy link
Member

Thank you for reporting, this looks serious, we will be looking into this!

@imcitius
Copy link
Author

imcitius commented Dec 8, 2019

Downgraded the cluster to 1.5.3 and got this instead (agent's log, but observed the same on servers):

Dec 08 20:51:47 iron-prod2-pgsql-05 consul[131103]: panic: runtime error: index out of range
Dec 08 20:51:47 iron-prod2-pgsql-05 consul[131103]: goroutine 182 [running]:
Dec 08 20:51:47 iron-prod2-pgsql-05 consul[131103]: github.com/hashicorp/consul/logger.(*LogFile).pruneFiles(0xc000225180, 0x0, 0x0)
Dec 08 20:51:47 iron-prod2-pgsql-05 consul[131103]: /home/circleci/project/consul/logger/logfile.go:110 +0x231

At the same moment, on another agent:

Dec 08 20:51:56 iron-prod2-pgsql-04 consul[54787]: panic: runtime error: index out of range
Dec 08 20:51:56 iron-prod2-pgsql-04 consul[54787]: goroutine 78 [running]:
Dec 08 20:51:56 iron-prod2-pgsql-04 consul[54787]: github.com/hashicorp/consul/logger.(*LogFile).pruneFiles(0xc0002a4f50, 0x0, 0x0)
Dec 08 20:51:56 iron-prod2-pgsql-04 consul[54787]: /home/circleci/project/consul/logger/logfile.go:110 +0x231
Dec 08 20:51:56 iron-prod2-pgsql-04 consul[54787]: github.com/hashicorp/consul/logger.(*LogFile).rotate(0xc0002a4f50, 0x2ecc2b8, 0xc0002a4fb8)
Dec 08 20:51:56 iron-prod2-pgsql-04 consul[54787]: /home/circleci/project/consul/logger/logfile.go:88 +0x71
Dec 08 20:51:56 iron-prod2-pgsql-04 consul[54787]: github.com/hashicorp/consul/logger.(*LogFile).Write(0xc0002a4f50, 0xc00028ed00, 0x58, 0x100, 0x0, 0x0, 0x0)
Dec 08 20:51:56 iron-prod2-pgsql-04 consul[54787]: /home/circleci/project/consul/logger/logfile.go:133 +0xc6
Dec 08 20:51:56 iron-prod2-pgsql-04 consul[54787]: io.(*multiWriter).Write(0xc00085e4e0, 0xc00028ed00, 0x58, 0x100, 0x511b6c0, 0x0, 0x0)
Dec 08 20:51:56 iron-prod2-pgsql-04 consul[54787]: /usr/local/go/src/io/multi.go:60 +0x87
Dec 08 20:51:56 iron-prod2-pgsql-04 consul[54787]: log.(*Logger).Output(0xc00022cf00, 0x2, 0xc0004560a0, 0x43, 0x0, 0x0)
Dec 08 20:51:56 iron-prod2-pgsql-04 consul[54787]: /usr/local/go/src/log/log.go:172 +0x1fe
Dec 08 20:51:56 iron-prod2-pgsql-04 consul[54787]: log.(*Logger).Printf(0xc00022cf00, 0x2dbfe7f, 0x16, 0xc00044dd88, 0x3, 0x3)
Dec 08 20:51:56 iron-prod2-pgsql-04 consul[54787]: /usr/local/go/src/log/log.go:179 +0x7e
Dec 08 20:51:56 iron-prod2-pgsql-04 consul[54787]: github.com/hashicorp/serf/serf.(*Serf).handleNodeLeave(0xc0005b8f00, 0xc000978400)

@imcitius
Copy link
Author

imcitius commented Dec 8, 2019

I suppose, log rotation settings could be related to the issue, (due to logger.(*LogFile).pruneFiles), so here it is:

    "log_file": "/var/log/consul/consul.log",
    "log_level": "INFO",
    "log_rotate_bytes": 1073741824,
    "log_rotate_duration": "24h",
    "log_rotate_max_files": -1,

Now I'm trying change to

    "enable_syslog": true,
    "log_level": "INFO",

And see what happens then.

@hanshasselberg hanshasselberg self-assigned this Dec 13, 2019
hanshasselberg added a commit that referenced this issue Dec 13, 2019
Fixes #6892.

The [docs](https://www.consul.io/docs/agent/options.html#_log_rotate_max_files) are stating:

> -log-rotate-max-files - to specify the maximum number of older log
> file archives to keep. Defaults to 0 (no files are ever deleted). Set to
> -1 to disable rotation and discard all log files.

But the `-1` case was not implemented and led to a panic when being
used.
@hanshasselberg hanshasselberg removed their assignment Dec 18, 2019
hanshasselberg added a commit that referenced this issue Dec 18, 2019
* Handle discard all logfiles properly

Fixes #6892.

The [docs](https://www.consul.io/docs/agent/options.html#_log_rotate_max_files) are stating:

> -log-rotate-max-files - to specify the maximum number of older log
> file archives to keep. Defaults to 0 (no files are ever deleted). Set to
> -1 to disable rotation and discard all log files.

But the `-1` case was not implemented and led to a panic when being
used.

Co-Authored-By: Freddy <[email protected]>
@ghost
Copy link

ghost commented Jan 25, 2020

Hey there,

This issue has been automatically locked because it is closed and there hasn't been any activity for at least 30 days.

If you are still experiencing problems, or still have questions, feel free to open a new one 👍.

@ghost ghost locked and limited conversation to collaborators Jan 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type/bug Feature does not function as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants