Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: invalid page type: 26: 10 #537

Closed
messnerdev opened this issue Jul 14, 2023 · 8 comments · Fixed by #539
Closed

panic: invalid page type: 26: 10 #537

messnerdev opened this issue Jul 14, 2023 · 8 comments · Fixed by #539

Comments

@messnerdev
Copy link

I had a power outage on a server running InfluxDB 2.7.1 which depends on bbolt 1.3.6.

After the server came back online, InfluxDB would panic after about 1 second with with the logs/StackTrace below. Please help, I have years worth of data that is now inaccessible. I believe the bolt file is corrupt, but I have no idea how to go about repairing it.

2023-07-13T19:55:54.455287517Z  info    found existing boltdb file, skipping setup wrapper      {"system": "docker", "bolt_path": "/var/lib/influxdb2/influxd.bolt"}
2023-07-13T19:55:54.511065249Z  info    found existing boltdb file, skipping setup wrapper      {"system": "docker", "bolt_path": "/var/lib/influxdb2/influxd.bolt"}
ts=2023-07-13T19:55:54.584465Z lvl=info msg="Welcome to InfluxDB" log_id=0j0D1gM0000 version=v2.7.1 commit=407fa622e9 build_date=2023-04-28T13:24:27Z log_level=info
ts=2023-07-13T19:55:54.584499Z lvl=warn msg="nats-port argument is deprecated and unused" log_id=0j0D1gM0000
panic: invalid page type: 26: 10

goroutine 1 [running]:
go.etcd.io/bbolt.(*Cursor).search(0xc00179c650, {0x7f9d6f354cb8, 0x5, 0x5}, 0xc00179c5a8?)
        /go/pkg/mod/go.etcd.io/[email protected]/cursor.go:250 +0x299
go.etcd.io/bbolt.(*Cursor).seek(0xc00179c650, {0x7f9d6f354cb8?, 0x0?, 0x0?})
        /go/pkg/mod/go.etcd.io/[email protected]/cursor.go:159 +0x48
go.etcd.io/bbolt.(*Bucket).CreateBucket(0xc001e2e018, {0x7f9d6f354cb8, 0x5, 0x5})
        /go/pkg/mod/go.etcd.io/[email protected]/bucket.go:167 +0xe6
go.etcd.io/bbolt.(*Bucket).CreateBucketIfNotExists(0x7f9d3fe06408?, {0x7f9d6f354cb8, 0x5, 0x5})
        /go/pkg/mod/go.etcd.io/[email protected]/bucket.go:201 +0x31
go.etcd.io/bbolt.(*Tx).CreateBucketIfNotExists(...)
        /go/pkg/mod/go.etcd.io/[email protected]/tx.go:115
github.com/influxdata/influxdb/v2/bolt.(*Client).initializeID(0x0?, 0xc001e2e000)
        /root/project/bolt/id.go:22 +0x48
github.com/influxdata/influxdb/v2/bolt.(*Client).initialize.func1(0xc001e2e000)
        /root/project/bolt/bbolt.go:92 +0x45
go.etcd.io/bbolt.(*DB).Update(0x0?, 0xc00179c998)
        /go/pkg/mod/go.etcd.io/[email protected]/db.go:741 +0x82
github.com/influxdata/influxdb/v2/bolt.(*Client).initialize(0xc001141c80?, {0x1f?, 0x1?})
        /root/project/bolt/bbolt.go:89 +0x39
github.com/influxdata/influxdb/v2/bolt.(*Client).Open(0xc001e0e180, {0x7f9d6df90bd0, 0xc0011fa050})
        /root/project/bolt/bbolt.go:77 +0x1f7
github.com/influxdata/influxdb/v2/cmd/influxd/launcher.(*Launcher).openMetaStores(0xc000d1e380, {0x7f9d6df90bd0, 0xc0011fa050}, 0xc0001b3880)
        /root/project/cmd/influxd/launcher/launcher.go:992 +0x1d4
github.com/influxdata/influxdb/v2/cmd/influxd/launcher.(*Launcher).run(0xc000d1e380, {0x7f9d6df90bd0?, 0xc0011fa000?}, 0xc0001b3880)
        /root/project/cmd/influxd/launcher/launcher.go:255 +0xe71
github.com/influxdata/influxdb/v2/cmd/influxd/launcher.cmdRunE.func1()
        /root/project/cmd/influxd/launcher/cmd.go:124 +0x2f1
github.com/influxdata/influxdb/v2/kit/cli.NewCommand.func1(0xc000182840?, {0x7f9d70741798?, 0x0?, 0x0?})
        /root/project/kit/cli/viper.go:54 +0x1e
github.com/spf13/cobra.(*Command).execute(0xc000182840, {0xc000052050, 0x0, 0x0})
        /go/pkg/mod/github.com/spf13/[email protected]/command.go:842 +0x67c
github.com/spf13/cobra.(*Command).ExecuteC(0xc000182840)
        /go/pkg/mod/github.com/spf13/[email protected]/command.go:950 +0x39d
github.com/spf13/cobra.(*Command).Execute(...)
        /go/pkg/mod/github.com/spf13/[email protected]/command.go:887
main.main()
        /root/project/cmd/influxd/main.go:61 +0x4b8
@ahrtr
Copy link
Member

ahrtr commented Jul 14, 2023

Looks like the db file is somehow corrupted. It's the same issue as #128

We have surgery command, which can be used to fix the corrupted db. But there is no guarantee. Also unfortunately, there is no a comprehensive usage guide (I will try to provide a guide sometime later), and it's also supposed to be used by advanced users. There are a couple of examples for your reference,

What's the db size, and what's the underlying filesystem? I can try to fix the db for you if you can provide the file.

@messnerdev
Copy link
Author

The bolt file itself is 512KB, all the InfluxDB files amount to 4.31GB for 3918 files across 1542 folders. I attached my bolt file (zipped). Not that familiar with how bolt is used within InfluxDB or if you need the whole directory.

Thanks for the surgery references @ahrtr, and I will gladly buy you a coffee or otherwise compensate you for assistance in repairing since I don't know much about bolt.
influxd.zip

@ahrtr
Copy link
Member

ahrtr commented Jul 14, 2023

if you need the whole directory.

NO. I only need the bbolt file.

  • What's the OS on which the bbolt was running?
  • What' the filesystem on which the bbolt db was saved?

@messnerdev
Copy link
Author

InfluxDB is running inside a docker container based on debian 11.7 slim, the host is running ubuntu server 22.04.2 kernel 5.16.10-051610. All the InfluxDB files including the .bolt file are on a Synology NAS with a Btrfs file system.

The host mounts the Synology NAS as an nfs mount, and docker mounts a volume on that nfs host mount.

@ahrtr
Copy link
Member

ahrtr commented Jul 18, 2023

Please read #539.

@messnerdev
Copy link
Author

Thank you, I tried out the 4 bolt files along with the supporting InfluxDB files at the time of corruption.

  • root:4 | Allowed InfluxDB to start up, but all my data and config was still inaccessible
  • root23 | Crashed with Error on startup "Failed creating new authorization store" log_id=0j7eMcD0000 error="unexpected error retrieving auth index; Err: bucket "authorizationindexv1": bucket not found"
  • root27 | Crash with Fatal on startup "could not load existing scheduled runs" log_id=0j7eQfEW000 error="bucket "_tasks" not found"
  • root39| Crash with Error on startup, error messages about missing metadata.

Root39 seems the most promising as the startup log messages have the names of my InfluxDB buckets, but it seems whatever corrupt data that got excised out or dropped during my power failure was really important.

I've attached the InfluxDB startup logs for running with all 4 bolt files for reference:
repairedBoltFileInfluxStartupLogs.zip

I'd like to tip you for your assistance so far if you have a way to do that, thanks you very much.

@ahrtr
Copy link
Member

ahrtr commented Jul 19, 2023

FYI.

$ ./bbolt page ~/tmp/etcd/bbolt/root_39.db 39
Page ID:    39
Page Type:  leaf
Total Size: 4096 bytes
Overflow pages: 0
Item Count: 7

"authorizationindexv1": <pgid=0,seq=0>
"authorizationsv1": <pgid=18,seq=0>
"bucketindexv1": <pgid=0,seq=0>
"bucketsv1": <pgid=3,seq=0>
"checkindexv1": <pgid=0,seq=0>
"checksv1": <pgid=22,seq=0>
"dashboardcellviewsv1": <pgid=16,seq=0>

$ ./bbolt page ~/tmp/etcd/bbolt/root_39.db 3
Page ID:    3
Page Type:  leaf
Total Size: 4096 bytes
Overflow pages: 0
Item Count: 4

"087ff6b6a9f6ecab": {"id":"087ff6b6a9f6ecab","orgID":"40435f03a52a8b61","type":0,"name":"telegraf","description":"","retentionPeriod":0,"shardGroupDuration":604800000000000,"createdAt":"2022-06-14T19:01:53.123193033Z","updatedAt":"2022-06-14T19:01:53.123193377Z"}
"c97f4a758da096c7": {"id":"c97f4a758da096c7","orgID":"40435f03a52a8b61","type":0,"name":"home_assistant","description":"","retentionPeriod":0,"shardGroupDuration":604800000000000,"createdAt":"2022-03-16T00:40:35.254755868Z","updatedAt":"2022-03-16T00:40:35.254755908Z"}
"cf5febf7b6ac44de": {"id":"cf5febf7b6ac44de","orgID":"40435f03a52a8b61","type":1,"name":"_tasks","description":"System bucket for task logs","retentionPeriod":259200000000000,"shardGroupDuration":86400000000000,"createdAt":"2022-03-16T00:40:35.25073548Z","updatedAt":"2022-03-16T00:40:35.25073552Z"}
"faa9f0a09b421ef9": {"id":"faa9f0a09b421ef9","orgID":"40435f03a52a8b61","type":1,"name":"_monitoring","description":"System bucket for monitoring logs","retentionPeriod":604800000000000,"shardGroupDuration":86400000000000,"createdAt":"2022-03-16T00:40:35.252631698Z","updatedAt":"2022-03-16T00:40:35.252631738Z"}

@ahrtr
Copy link
Member

ahrtr commented Jul 23, 2023

@messnerdev Are you able to provide a summary of how InfluxDB use bbolt?

  • What are the buckets and buckets hierarchy hierarchies?
  • Are the buckets stable or frequently be created and removed?
  • What kind of data are saved in each bucket?
  • What are the hard requirement on the data (e.g. the meta.db must exist)?

Note it's hard to completely recover the db file, it may need huge effort to make it complete working in InfluxDB. What I have done far just repairs the "corrupted"db, to ensure it's a correct db file from bbolt perspective (e.g. a valid B+ tree). But obviously it still doesn't meet the InfluxDB's requirement on the data from application's perspective.

If you really want to make it reliable, you should have a distributed system, so that it can tolerate single point of corruption. Or at least you should have a regular backup plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

2 participants