Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: invalid page type #7471

Open
fcddk opened this issue Mar 20, 2020 · 7 comments
Open

panic: invalid page type #7471

fcddk opened this issue Mar 20, 2020 · 7 comments
Labels
type/bug Feature does not function as expected type/crash The issue description contains a golang panic and stack trace

Comments

@fcddk
Copy link

fcddk commented Mar 20, 2020

bootstrap_expect > 0: expecting 3 servers

==> Starting Consul agent...
           Version: 'v1.6.0-rc1 (71f98661d)'
           Node ID: 'cbdab978-4df9-b604-0553-5cd6a00cf812'
         Node name: 'consul-0'
        Datacenter: 'dc1' (Segment: '<all>')
            Server: true (Bootstrap: false)
       Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
      Cluster Addr: 100.101.219.34 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

panic: invalid page type: 3404: 10

goroutine 1 [running]:
github.com/boltdb/bolt.(*Cursor).search(0xc0007d0108, 0x4f860f0, 0x4, 0x4, 0xd4c)
	/go/pkg/mod/github.com/boltdb/[email protected]/cursor.go:256 +0x354
github.com/boltdb/bolt.(*Cursor).seek(0xc0007d0108, 0x4f860f0, 0x4, 0x4, 0x0, 0x0, 0x0, 0x0, 0xc0007d01b8, 0x146aeda, ...)
	/go/pkg/mod/github.com/boltdb/[email protected]/cursor.go:159 +0x7e
github.com/boltdb/bolt.(*Bucket).CreateBucket(0xc00004c1d8, 0x4f860f0, 0x4, 0x4, 0xc0007d0228, 0x42c48f, 0xc0000fc300)
	/go/pkg/mod/github.com/boltdb/[email protected]/bucket.go:172 +0xf0
github.com/boltdb/bolt.(*Bucket).CreateBucketIfNotExists(0xc00004c1d8, 0x4f860f0, 0x4, 0x4, 0x0, 0x20, 0x29bb4c0)
	/go/pkg/mod/github.com/boltdb/[email protected]/bucket.go:206 +0x4d
github.com/boltdb/bolt.(*Tx).CreateBucketIfNotExists(...)
	/go/pkg/mod/github.com/boltdb/[email protected]/tx.go:115
github.com/hashicorp/raft-boltdb.(*BoltStore).initialize(0xc000105ec0, 0x0, 0x0)
	/go/pkg/mod/github.com/hashicorp/[email protected]/bolt_store.go:98 +0xba
github.com/hashicorp/raft-boltdb.New(0xc000376780, 0x19, 0x0, 0xc000376700, 0x19, 0x0, 0x0)
	/go/pkg/mod/github.com/hashicorp/[email protected]/bolt_store.go:81 +0xfe
github.com/hashicorp/raft-boltdb.NewBoltStore(...)
	/go/pkg/mod/github.com/hashicorp/[email protected]/bolt_store.go:60
github.com/hashicorp/consul/agent/consul.(*Server).setupRaft(0xc00054e380, 0x0, 0x0)
	/home/circleci/project/consul/agent/consul/server.go:630 +0xa36
github.com/hashicorp/consul/agent/consul.NewServerLogger(0xc0001ae000, 0xc0002bf4a0, 0xc00045c800, 0xc000452050, 0x0, 0x0, 0x0)
	/home/circleci/project/consul/agent/consul/server.go:431 +0xafe
github.com/hashicorp/consul/agent.(*Agent).Start(0xc0004146c0, 0x0, 0x0)
	/home/circleci/project/consul/agent/agent.go:380 +0x56e
github.com/hashicorp/consul/command/agent.(*cmd).run(0xc000125200, 0xc0000f8020, 0xc, 0xc, 0x0)
	/home/circleci/project/consul/command/agent/agent.go:280 +0xf5b
github.com/hashicorp/consul/command/agent.(*cmd).Run(0xc000125200, 0xc0000f8020, 0xc, 0xc, 0xc000105b00)
	/home/circleci/project/consul/command/agent/agent.go:75 +0x4d
github.com/mitchellh/cli.(*CLI).Run(0xc00014af00, 0xc00014af00, 0x80, 0xc000105b80)
	/go/pkg/mod/github.com/mitchellh/[email protected]/cli.go:255 +0x1f1
main.realMain(0xc0000ce058)
	/home/circleci/project/consul/main.go:53 +0x393
main.main()
	/home/circleci/project/consul/main.go:20 +0x22
@ghost ghost added type/crash The issue description contains a golang panic and stack trace labels Mar 20, 2020
@dnephin
Copy link
Contributor

dnephin commented Mar 20, 2020

Hello fcddk, thank you for reporting this problem! From the stack trace it looks like this is a panic from boltdb. I took a look around and found the following:

  • there are some similar stack traces in the comments in panic: cannot free page 0 or 1: 0 #3771, unfortunately that issue does not have a resolution either
  • a google search for this panic message found similar reports in a number of other projects which also use boltdb.

I was not able to find a clear solution to the problem, but I found some information that might help. First it seems the issue is related to the filesystem. One report suggested that the mount options that were being used for the filesystem may have been causing the problem. Another few reports suggested the problem was corrupt db files, and that removing the files fixed the problem.

If you run consul agent with a different value for -data-dir do you encounter the same problem? If the filesystem which contains data dir has any special mount options you may want to try on a different filesystem without those mount options.

I hope that helps. Please do let us know if it worked, and if you have any more questions!

@like-inspur
Copy link

@dnephin I run consul cluster on kubernetes with version 1.7.1 also meet this probelem like below:
{@8E2X6% ~PCLZG$NO 5RSD

@like-inspur
Copy link

from https://github.com/boltdb/bolt/releases also found that boltdb doesn't upgrade any more. So if the problem is born of boltdb, who can solve the storage probelm of boltdb

@jsosulska jsosulska added type/umbrella-☂️ Makes issue the "source of truth" for multiple requests relating to the same topic type/bug Feature does not function as expected labels Jun 15, 2020
@sarahhodne
Copy link

I'm having a similar issue on my home kubectl cluster:

2020-07-03T21:41:41.722824738Z bootstrap_expect > 0: expecting 3 servers
2020-07-03T21:41:41.722871382Z ==> Starting Consul agent...
2020-07-03T21:41:41.722969222Z            Version: 'v1.8.0'
2020-07-03T21:41:41.722976474Z            Node ID: '8fa6500c-0eac-9d7e-2540-4986953195fd'
2020-07-03T21:41:41.722980032Z          Node name: 'consul-consul-server-0'
2020-07-03T21:41:41.722982969Z         Datacenter: 'home1' (Segment: '<all>')
2020-07-03T21:41:41.722986267Z             Server: true (Bootstrap: false)
2020-07-03T21:41:41.722989297Z        Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
2020-07-03T21:41:41.722993135Z       Cluster Addr: 10.38.0.9 (LAN: 8301, WAN: 8302)
2020-07-03T21:41:41.722996225Z            Encrypt: Gossip: true, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
2020-07-03T21:41:41.722999253Z 
2020-07-03T21:41:41.723002035Z ==> Log data will now stream in as it occurs:
2020-07-03T21:41:41.723008404Z 
2020-07-03T21:41:41.725907687Z panic: page 3949 already freed
2020-07-03T21:41:41.725917433Z 
2020-07-03T21:41:41.725920835Z goroutine 1 [running]:
2020-07-03T21:41:41.725923425Z github.com/boltdb/bolt.(*freelist).free(0xc000559560, 0x10d66a, 0x7f7975d3c000)
2020-07-03T21:41:41.725925512Z 	/go/pkg/mod/github.com/boltdb/[email protected]/freelist.go:121 +0x2a0
2020-07-03T21:41:41.725927679Z github.com/boltdb/bolt.(*Tx).Commit(0xc0003fec40, 0x51e0150, 0x4)
2020-07-03T21:41:41.725929682Z 	/go/pkg/mod/github.com/boltdb/[email protected]/tx.go:176 +0x1b5
2020-07-03T21:41:41.725931660Z github.com/hashicorp/raft-boltdb.(*BoltStore).initialize(0xc000553fa0, 0x0, 0x0)
2020-07-03T21:41:41.725934572Z 	/go/pkg/mod/github.com/hashicorp/[email protected]/bolt_store.go:105 +0x143
2020-07-03T21:41:41.725936675Z github.com/hashicorp/raft-boltdb.New(0xc000054f80, 0x19, 0x0, 0xc000054f00, 0x19, 0x0, 0xc0002adee8)
2020-07-03T21:41:41.725944660Z 	/go/pkg/mod/github.com/hashicorp/[email protected]/bolt_store.go:81 +0xf7
2020-07-03T21:41:41.725948074Z github.com/hashicorp/raft-boltdb.NewBoltStore(...)
2020-07-03T21:41:41.725950107Z 	/go/pkg/mod/github.com/hashicorp/[email protected]/bolt_store.go:60
2020-07-03T21:41:41.725952708Z github.com/hashicorp/consul/agent/consul.(*Server).setupRaft(0xc0003b8300, 0x0, 0x0)
2020-07-03T21:41:41.725954726Z 	/home/circleci/project/consul/agent/consul/server.go:702 +0xa4a
2020-07-03T21:41:41.725956784Z github.com/hashicorp/consul/agent/consul.NewServerLogger(0xc0000e0e00, 0x38ada20, 0xc00083d590, 0xc000582000, 0xc000499a40, 0x0, 0x0, 0x0)
2020-07-03T21:41:41.725959460Z 	/home/circleci/project/consul/agent/consul/server.go:499 +0x10a9
2020-07-03T21:41:41.725974130Z github.com/hashicorp/consul/agent.(*Agent).Start(0xc00039a000, 0x0, 0x0)
2020-07-03T21:41:41.725978917Z 	/home/circleci/project/consul/agent/agent.go:449 +0x7d7
2020-07-03T21:41:41.725996571Z github.com/hashicorp/consul/command/agent.(*cmd).run(0xc00029c000, 0xc00004c140, 0xf, 0x10, 0x0)
2020-07-03T21:41:41.726000697Z 	/home/circleci/project/consul/command/agent/agent.go:287 +0xf08
2020-07-03T21:41:41.726003909Z github.com/hashicorp/consul/command/agent.(*cmd).Run(0xc00029c000, 0xc00004c140, 0xf, 0x10, 0xc000205d80)
2020-07-03T21:41:41.726007952Z 	/home/circleci/project/consul/command/agent/agent.go:76 +0x4d
2020-07-03T21:41:41.726011529Z github.com/mitchellh/cli.(*CLI).Run(0xc0002183c0, 0xc000218300, 0x80, 0xc0005a86a0)
2020-07-03T21:41:41.726014926Z 	/go/pkg/mod/github.com/mitchellh/[email protected]/cli.go:260 +0x1da
2020-07-03T21:41:41.726018112Z main.realMain(0xc0000a0058)
2020-07-03T21:41:41.726021454Z 	/home/circleci/project/consul/main.go:50 +0x397
2020-07-03T21:41:41.726024683Z main.main()
2020-07-03T21:41:41.726028355Z 	/home/circleci/project/consul/main.go:22 +0x22

I think this happened after a power outage (at least that's when I noticed it, but I haven't looked into my Consul setup in a little while, so it may have happened before then), I wonder if the bolt database got corrupted somehow?

@like-inspur
Copy link

@sarahhodne yes, you have my doubts

1 similar comment
@like-inspur
Copy link

@sarahhodne yes, you have my doubts

@jsosulska jsosulska added this to the Upcoming milestone Jul 28, 2020
@jsosulska jsosulska removed the type/umbrella-☂️ Makes issue the "source of truth" for multiple requests relating to the same topic label Aug 5, 2020
@jsosulska jsosulska removed this from the Upcoming milestone Aug 5, 2020
@jsosulska
Copy link
Contributor

Hello all!

To update this thread - I have created a top level issue to track upgrading BoltDB to bbolt here. Please follow that work as a precursor to the issues mentioned here.

Thank you all for your patience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Feature does not function as expected type/crash The issue description contains a golang panic and stack trace
Projects
None yet
Development

No branches or pull requests

5 participants