Skip to content
This repository has been archived by the owner on Mar 9, 2019. It is now read-only.

frequent "misplaced bucket header" and "page already freed" errors #653

Closed
schollz opened this issue Jan 30, 2017 · 7 comments
Closed

frequent "misplaced bucket header" and "page already freed" errors #653

schollz opened this issue Jan 30, 2017 · 7 comments

Comments

@schollz
Copy link

schollz commented Jan 30, 2017

Hi, I've been using bolt quite successfully for awhile for my project, FIND.

Recent builds have been creating the following errors in one of my tests:

2017/01/30 16:47:32 [Recovery] panic recovered:
POST /track HTTP/1.1
Host: 127.0.0.1:8003
Accept: */*
Content-Length: 1089
Content-Type: application/json
Expect: 100-continue
User-Agent: curl/7.47.0


misplaced bucket header: 7265736f7572636573 -> 73766d44617461
/usr/local/go/src/runtime/panic.go:458 (0x47b6a3)
	gopanic: reflectcall(nil, unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz))
/home/zns/work/src/github.com/boltdb/bolt/bucket.go:556 (0x5778c8)
	(*Bucket).spill: panic(fmt.Sprintf("misplaced bucket header: %x -> %x", []byte(name), k))
/home/zns/work/src/github.com/boltdb/bolt/tx.go:163 (0x584f55)
	(*Tx).Commit: if err := tx.root.spill(); err != nil {
/home/zns/work/src/github.com/boltdb/bolt/db.go:605 (0x57c96d)
	(*DB).Update: return t.Commit()
/home/zns/Documents/find/svm.go:124 (0x4429c9)
	dumpFingerprintsSVM: })
/home/zns/Documents/find/fingerprint.go:204 (0x4127c5)
	trackFingerprint: dumpFingerprintsSVM(group)
/home/zns/Documents/find/fingerprint.go:138 (0x41088a)
	trackFingerprintPOST: message, success, locationGuess, bayes, svm, rf := trackFingerprint(jsonFingerprint)
/home/zns/work/src/github.com/gin-gonic/gin/context.go:98 (0x58a81a)
	(*Context).Next: c.handlers[c.index](c)
/home/zns/work/src/github.com/gin-gonic/contrib/sessions/sessions.go:65 (0x62dbb9)
	Sessions.func1: c.Next()
/home/zns/work/src/github.com/gin-gonic/gin/context.go:98 (0x58a81a)
	(*Context).Next: c.handlers[c.index](c)
/home/zns/work/src/github.com/gin-gonic/gin/recovery.go:45 (0x59a8fa)
	RecoveryWithWriter.func1: c.Next()
/home/zns/work/src/github.com/gin-gonic/gin/context.go:98 (0x58a81a)
	(*Context).Next: c.handlers[c.index](c)
/home/zns/work/src/github.com/gin-gonic/gin/logger.go:77 (0x5999ef)
	LoggerWithWriter.func1: c.Next()
/home/zns/work/src/github.com/gin-gonic/gin/context.go:98 (0x58a81a)
	(*Context).Next: c.handlers[c.index](c)
/home/zns/work/src/github.com/gin-gonic/gin/gin.go:290 (0x590eae)
	(*Engine).handleHTTPRequest: context.Next()
/home/zns/work/src/github.com/gin-gonic/gin/gin.go:271 (0x590790)
	(*Engine).ServeHTTP: engine.handleHTTPRequest(c)
/usr/local/go/src/net/http/server.go:2202 (0x51b1cd)
	serverHandler.ServeHTTP: handler.ServeHTTP(rw, req)
/usr/local/go/src/net/http/server.go:1579 (0x517b37)
	(*conn).serve: serverHandler{c.server}.ServeHTTP(w, w.req)
/usr/local/go/src/runtime/asm_amd64.s:2086 (0x4ab001)
	goexit: BYTE	$0x90	// NOP

or sometimes it creates this error:

2017/01/30 16:53:14 [Recovery] panic recovered:
GET /calculate?group=testdb HTTP/1.1
Host: 127.0.0.1:8003
Accept: */*
User-Agent: curl/7.47.0


page 193 already freed
/usr/local/go/src/runtime/panic.go:443 (0x4840c9)
	gopanic: reflectcall(nil, unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz))
/home/zns/work/src/github.com/boltdb/bolt/freelist.go:121 (0x5817fb)
	(*freelist).free: panic(fmt.Sprintf("page %d already freed", id))
/home/zns/work/src/github.com/boltdb/bolt/node.go:363 (0x58510a)
	(*node).spill: tx.db.freelist.free(tx.meta.txid, tx.page(node.pgid))
/home/zns/work/src/github.com/boltdb/bolt/bucket.go:570 (0x57861f)
	(*Bucket).spill: if err := b.rootNode.spill(); err != nil {
/home/zns/work/src/github.com/boltdb/bolt/bucket.go:537 (0x579012)
	(*Bucket).spill: if err := child.spill(); err != nil {
/home/zns/work/src/github.com/boltdb/bolt/tx.go:163 (0x589084)
	(*Tx).Commit: if err := tx.root.spill(); err != nil {
/home/zns/work/src/github.com/boltdb/bolt/db.go:605 (0x57ea3d)
	(*DB).Update: return t.Commit()
/home/zns/Downloads/find-2.1/svm.go:124 (0x448938)
	dumpFingerprintsSVM: })
/home/zns/Downloads/find-2.1/api.go:97 (0x402666)
	calculate: dumpFingerprintsSVM(strings.ToLower(group))
/home/zns/work/src/github.com/gin-gonic/gin/context.go:98 (0x58f5ba)
	(*Context).Next: c.handlers[c.index](c)
/home/zns/work/src/github.com/gin-gonic/contrib/sessions/sessions.go:65 (0x61d7f3)
	Sessions.func1: c.Next()
/home/zns/work/src/github.com/gin-gonic/gin/context.go:98 (0x58f5ba)
	(*Context).Next: c.handlers[c.index](c)
/home/zns/work/src/github.com/gin-gonic/gin/recovery.go:45 (0x5a25f1)
	RecoveryWithWriter.func1: c.Next()
/home/zns/work/src/github.com/gin-gonic/gin/context.go:98 (0x58f5ba)
	(*Context).Next: c.handlers[c.index](c)
/home/zns/work/src/github.com/gin-gonic/gin/logger.go:77 (0x5a1862)
	LoggerWithWriter.func1: c.Next()
/home/zns/work/src/github.com/gin-gonic/gin/context.go:98 (0x58f5ba)
	(*Context).Next: c.handlers[c.index](c)
/home/zns/work/src/github.com/gin-gonic/gin/gin.go:290 (0x596ae2)
	(*Engine).handleHTTPRequest: context.Next()
/home/zns/work/src/github.com/gin-gonic/gin/gin.go:271 (0x596717)
	(*Engine).ServeHTTP: engine.handleHTTPRequest(c)
/usr/local/go/src/net/http/server.go:2081 (0x51bfbe)
	serverHandler.ServeHTTP: handler.ServeHTTP(rw, req)
/usr/local/go/src/net/http/server.go:1472 (0x51886e)
	(*conn).serve: serverHandler{c.server}.ServeHTTP(w, w.req)
/usr/local/go/src/runtime/asm_amd64.s:1998 (0x4b4911)
	goexit: BYTE	$0x90	// NOP

It can be replicated the following way:

  1. Get latest dependencies with go get -u -v github.com/schollz/find
  2. git clone https://github.com/schollz/find.git && cd find && go build && ./find
  3. In another window, cd into the find/testing directory and run ./testdb.sh. It will insert things into the database, and then (possibly) suddenly fail when it tries to use the items from the database for calculations.
  4. If you want to try again, just do rm data/testdb.db from the find directory.

It doesn't seem to always fail, but it seems to fail about 7/10 times when I try this test on two systems with Linux (Ubuntu 16) + AMD64. It seems to fail 1/10 times on Linux (Ubuntu 14) + AMD64. . It seems to do this with the current master of boltdb (9145e04) with both go 1.6 and 1.7 and 1.8. I did try with earlier versions of Bolt (e.g. v1.2.0) and it works fine every time I run the test that earlier version.

@odeke-em
Copy link
Contributor

Hello there boltdb authors. Thank you for the hard work, my project has been using boltdb for almost 2 years, with no problems. However, in the past month , I've gotten numerous bug reports
and I kindly wanted to seek y'all help.

Bugs all related to boltdb, such as:

and the errors are quite cryptic to me so I haven't been able to help out debug the problems. I personally haven't yet experienced the mentioned problems but many of my users have.

Thank you.

@benbjohnson
Copy link
Member

@schollz I've tried running the test several times and I'm not getting an error. Can you use git bisect on bolt to figure out which commit is the issue?

I'm working on getting a larger, long running randomized test suite going for bolt that should hopefully catch these issues in the future.

@schollz
Copy link
Author

schollz commented Jan 31, 2017

@benbjohnson Thanks for looking into this! I've tried the bolt tests and they currently all pass on my offending system. I've also seen that the FIND bug doesn't seem to replicate on Debian + AMD64.

I ran git bisect between current master and v1.2.0 on my offending machine (Ubuntu16+AMD64) and got this report:

ca9f2088aab4fc9832e587655f0026875bddbf9b is the first bad commit
commit ca9f2088aab4fc9832e587655f0026875bddbf9b
Author: Ben Aldrich <[email protected]>
Date:   Wed Nov 16 02:08:01 2016 +0000

    replace unix implementation to be the same as solaris to fix an issue with glusterfs

:100644 100644 cad62dda1e38797eb9dd1a47953b5d21d479bd12 50cbbc0b6525f8ddad785228c6a74d0139a6ae12 M	bolt_unix.go

Here is the git bisect log:

git bisect log      
git bisect start
# bad: [9145e047707a32d3685f15d9fd97e1b25445ba1b] Merge pull request #651 from zweizeichen/master
git bisect bad 9145e047707a32d3685f15d9fd97e1b25445ba1b
# good: [c6ba97b89e0454fec9aa92e1d33a4e2c5fc1f631] v1.2.0
git bisect good c6ba97b89e0454fec9aa92e1d33a4e2c5fc1f631
# good: [1ff46c182dc45c853023928d13fc9970ef7fd72a] Merge pull request #590 from benbjohnson/vincent-petithory-compact-db
git bisect good 1ff46c182dc45c853023928d13fc9970ef7fd72a
# good: [315c65d4cf4f5278c73477a35fb1f9b08365d340] Merge pull request #621 from jcvernaleo/jcv_readme
git bisect good 315c65d4cf4f5278c73477a35fb1f9b08365d340
# good: [10c6e01e1f6eb5087f031a4ae7691b8dc52d34d5] Allow GC to reclaim completed transactions
git bisect good 10c6e01e1f6eb5087f031a4ae7691b8dc52d34d5
# bad: [ebe310138bbbe1e1d146197ab8b7fa772da38b1d] Merge pull request #625 from vrecan/FcntlFlock
git bisect bad ebe310138bbbe1e1d146197ab8b7fa772da38b1d
# good: [f57d45558ddeffb4309e971ab5b04f0c59364cbd] Merge pull request #628 from bep/patch-1
git bisect good f57d45558ddeffb4309e971ab5b04f0c59364cbd
# bad: [ca9f2088aab4fc9832e587655f0026875bddbf9b] replace unix implementation to be the same as solaris to fix an issue with glusterfs
git bisect bad ca9f2088aab4fc9832e587655f0026875bddbf9b
# first bad commit: [ca9f2088aab4fc9832e587655f0026875bddbf9b] replace unix implementation to be the same as solaris to fix an issue with glusterfs

@benbjohnson
Copy link
Member

@schollz I created a revert pull request here: #654

Can you do some thorough testing to see if that fixes your issue?

@schollz
Copy link
Author

schollz commented Feb 2, 2017

#654 Fixes my issue, maybe it works for you too @odeke-em ?

@schollz schollz closed this as completed Feb 2, 2017
@odeke-em
Copy link
Contributor

odeke-em commented Feb 3, 2017

@schollz and @benbjohnson the problem still persists for my users odeke-em/drive#862 (comment).
@ptman would you mind providing a stacktrace and perhaps open a new issue here? I personally have never encountered this issue but maybe that's because am on OSX, so I wouldn't have the proper context for filing the issue.

@ptman
Copy link

ptman commented Feb 3, 2017

Here's a stack trace. http://pastebin.com/LNdtPDys

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants