Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Int Overflow Bug Reoccurs - Please Add Better Testing #1182

Closed
bonedaddy opened this issue Jan 4, 2020 · 14 comments · Fixed by #1216
Closed

Int Overflow Bug Reoccurs - Please Add Better Testing #1182

bonedaddy opened this issue Jan 4, 2020 · 14 comments · Fixed by #1216
Assignees
Labels
good first issue These are simple issues that can be picked up by new contributors kind/enhancement Something could be better. priority/P2 Somehow important but would not block a release. status/accepted We accept to investigate or work on it.

Comments

@bonedaddy
Copy link

bonedaddy commented Jan 4, 2020

This is the second time in less than two months that an issue with integer overflow happens, breaking builds on 32-bit platforms. Given that this is the second occurrence of such a bug, I think testing needs to be improved to catch this, because if it has already happened twice in 2 months, it will probably happen a 3rd, and a 4th if testing isn't improved to catch this.

Previous bug report #1142, commit fixing the first time the bug happened fb0cdb8, and the most recent commit to fix the same issue 465f28a

@jarifibrahim jarifibrahim added kind/enhancement Something could be better. priority/P2 Somehow important but would not block a release. status/accepted We accept to investigate or work on it. labels Jan 6, 2020
@jarifibrahim
Copy link
Contributor

Thanks @bonedaddy . We should be able to add build options for 32-bit platforms to our travis.yml file

@bonedaddy
Copy link
Author

bonedaddy commented Jan 6, 2020

@jarifibrahim could that a look at https://github.com/mitchellh/gox it's how I handle this stuff and its been working pretty well.

@jarifibrahim jarifibrahim added the good first issue These are simple issues that can be picked up by new contributors label Jan 6, 2020
@jarifibrahim
Copy link
Contributor

@bonedaddy https://github.com/mitchellh/gox looks great but I think it might an overkill for us. I think if we just add a GOARCH=386 go build it should be enough :)

@anjannath
Copy link

@jarifibrahim So no need to add unit tests, just a build on i386 platform is enough for the test?

@jarifibrahim
Copy link
Contributor

Yes. Adding a GOARCH=386 go build would ensure we don't break badger for 32-bit platforms.

@andhe
Copy link

andhe commented Feb 3, 2020

I've uploaded badger to Debian a while ago, which means it gets built on a number of different architectures. Full logs available here (click red/green status column for full build logs):
https://buildd.debian.org/status/package.php?p=badger

As can be seen by cherry-picking the already committed fix, badger builds on 32bit architectures but testsuite has multiple failures and even crashes and even compilation failures. Maybe it's just the test-suite being broken and badger actually works on 32bit, but I'm scared to just disable the testsuite on all 32bit archs. Would be great if the test-suite also worked on 32 bit archs, so actually running it in your CI would IMHO be nice.

Here are four examples of failures:

src/github.com/dgraph-io/badger/db2_test.go:313:14: len argument too large in make([]byte)

$ cat -n db2_test.go  | grep '^ *313'
   313					v := make([]byte, 2<<30)

Note: 2<<30 == math.MaxInt32 + 1, but math.MaxInt32-1 seems to be the largest possible len argument allowed on GOARCH=386.
=== RUN   TestRotate
badger 2020/02/03 12:04:18 INFO: All 0 tables opened in 0s
badger 2020/02/03 12:04:19 INFO: Got compaction priority: {level:0 score:1.73 dropPrefix:[]}
badger 2020/02/03 12:04:19 INFO: All 0 tables opened in 0s
badger 2020/02/03 12:04:19 INFO: Replaying file id: 0 at offset: 0
badger 2020/02/03 12:04:19 INFO: Replay took: 16.72µs
--- FAIL: TestRotate (1.16s)
    rotate_test.go:53: 
        	Error Trace:	rotate_test.go:53
        	Error:      	Received unexpected error:
        	            	Map log file. Path=/tmp/badger-test546655184/000000.vlog. Error=cannot allocate memory
        	            	During db.vlog.open
        	            	github.com/dgraph-io/badger/y.Wrapf
        	            		/home/ah/badger-2.0.1/obj-arm-linux-gnueabi/src/github.com/dgraph-io/badger/y/error.go:82
        	            	github.com/dgraph-io/badger.Open
        	            		/home/ah/badger-2.0.1/obj-arm-linux-gnueabi/src/github.com/dgraph-io/badger/db.go:357
        	            	github.com/dgraph-io/badger/badger/cmd.TestRotate
        	            		/home/ah/badger-2.0.1/obj-arm-linux-gnueabi/src/github.com/dgraph-io/badger/badger/cmd/rotate_test.go:52
        	            	testing.tRunner
        	            		/usr/lib/go-1.13/src/testing/testing.go:909
        	            	runtime.goexit
        	            		/usr/lib/go-1.13/src/runtime/asm_arm.s:868
        	Test:       	TestRotate
FAIL

Note: options.go has the below snippet. The comment is right in that the value won't *overflow* on 32bit, but Linux won't allow this large mmap's on 32bit archs:

                // Nothing to read/write value log using standard File I/O
                // MemoryMap to mmap() the value log files
                // (2^30 - 1)*2 when mmapping < 2^31 - 1, max int32.
                // -1 so 2*ValueLogFileSize won't overflow on 32-bit systems.
                ValueLogFileSize: 1<<30 - 1,

=== RUN   TestTableIndex/multiple_keys
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0x39b050]

goroutine 9 [running]:
testing.tRunner.func1(0x20b2280)
	/usr/lib/go-1.13/src/testing/testing.go:874 +0x360
panic(0x3f7dc8, 0x744668)
	/usr/lib/go-1.13/src/runtime/panic.go:679 +0x194
github.com/dgraph-io/badger/table.(*Table).KeyID(...)
	/home/ah/badger-2.0.1/obj-arm-linux-gnueabi/src/github.com/dgraph-io/badger/table/table.go:500
github.com/dgraph-io/badger/table.TestTableIndex.func2(0x20b2280)
	/home/ah/badger-2.0.1/obj-arm-linux-gnueabi/src/github.com/dgraph-io/badger/table/builder_test.go:85 +0xaac
testing.tRunner(0x20b2280, 0x467044)
	/usr/lib/go-1.13/src/testing/testing.go:909 +0xa8
created by testing.(*T).Run
	/usr/lib/go-1.13/src/testing/testing.go:960 +0x2ac
FAIL	github.com/dgraph-io/badger/table	0.290s

(FWIW, I guess the last one crashes because the tbl is dereferenced before require.NoError(t, err, "unable to open table") is used and OpenTable probably failed for some reason leaving tbl==nil. )

=== RUN   TestTableIterator/n=101
--- FAIL: TestTableIterator (0.17s)
    --- FAIL: TestTableIterator/n=99 (0.06s)
        table_test.go:108: 
            	Error Trace:	table_test.go:108
            	Error:      	Received unexpected error:
            	            	Invalid filename: 6836921090146300611.sst
            	            	github.com/dgraph-io/badger/table.OpenTable
            	            		/home/ah/badger-2.0.1/obj-arm-linux-gnueabi/src/github.com/dgraph-io/badger/table/table.go:192
            	            	github.com/dgraph-io/badger/table.TestTableIterator.func1
            	            		/home/ah/badger-2.0.1/obj-arm-linux-gnueabi/src/github.com/dgraph-io/badger/table/table_test.go:107
            	            	testing.tRunner
            	            		/usr/lib/go-1.13/src/testing/testing.go:909
            	            	runtime.goexit
            	            		/usr/lib/go-1.13/src/runtime/asm_arm.s:868
            	Test:       	TestTableIterator/n=99
[...]

Note: In table/table_test.go there's this line:
        filename := fmt.Sprintf("%s%s%d.sst", os.TempDir(), string(os.PathSeparator), rand.Int63())
... generating a filename based of a 63bit random number (see also path.Join), later passed into table/table.go ParseFileID function which does:
        id, err := strconv.Atoi(name)
... and on 32bit archs, a 63 bit random number won't fit into a (32bit) integer.

@andhe
Copy link

andhe commented Feb 3, 2020

I've created a patch that seems to make everything build and testsuite pass for me on (atleast) 32bit (arm).
See: https://salsa.debian.org/go-team/packages/badger/blob/master/debian/patches/fix-32bit.patch

Please scream at me if I'm messing something up in badger with my changes. Please feel free to apply my patch (as I don't think it reaches https://en.wikipedia.org/wiki/Threshold_of_originality so no copyright).

@luca-moser
Copy link

Any updates on this? Version v2.0.1 can not be compiled on 32 bit systems.

@jarifibrahim
Copy link
Contributor

jarifibrahim commented Feb 7, 2020

@andhe your patch looks fine to me. I have one question, why did you have to do 1<<28 instead of 1<<29 here https://salsa.debian.org/go-team/packages/badger/blob/master/debian/patches/fix-32bit.patch#L21 ?

@vardhanapoorv will send a PR to fix this.

@jarifibrahim
Copy link
Contributor

Hey @bonedaddy and @luca-moser, I've merged the fix for this issue #1216. Hopefully, the 32-bit overflow issue won't reappear :)

@andhe
Copy link

andhe commented Mar 8, 2020

Hi @jarifibrahim, sorry for not finding the time to get back to you earlier. I just wanted to report back that I've uploaded badger 2.0.2 to Debian (with all my previous local patches dropped) and the test suite doesn't pass (failing the Debian build).

For full build/test logs click the red box at https://buildd.debian.org/status/package.php?p=badger

My previous patches that actually both built and passed the testsuite on 32bit can be found at https://sources.debian.org/src/badger/2.0.1-4/debian/patches/

Note however that it seems like badger will still not actually be usable on 32bit because your allocating all free memory space when running badger standalone, means that when an application that includes badger does a single memory allocation before initializing badger means there's not enough memory space for badger anymore. (At least that's my guess, I haven't really found time to look closer at this.)

I spent a few minutes on trying to get garagemq (which uses badger) to work on 32bit but quickly gave up... My comments for that are at valinurovam/garagemq#26 (comment)

@jarifibrahim
Copy link
Contributor

Hey @andhe I see

/<<PKGBUILDDIR>>/obj-powerpc64-linux-gnu/src/github.com/dgraph-io/badger/skl/skl.go:170: error: undefined reference to 'github.com..z2fdgraph..z2dio..z2fristretto..z2fz.FastRand'

on https://buildd.debian.org/status/package.php?p=badger .

What version of go are you using? I've seen the missing fastrand issue on go 1.10 and befow.

fastrand is loaded from the standard go libray.

@jarifibrahim
Copy link
Contributor

Note however that it seems like badger will still not actually be usable on 32bit because your allocating all free memory space when running badger standalone, means that when an application that includes badger does a single memory allocation before initializing badger means there's not enough memory space for badger anymore. (At least that's my guess, I haven't really found time to look closer at this.)

I don't think I understand this correctly. Why does the application which uses badger consume all the memory?

@gonzojive
Copy link

I think this bug is back again:

./../go/pkg/mod/github.com/dgraph-io/badger/[email protected]/value.go:50:5: constant 4294967295 overflows int

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue These are simple issues that can be picked up by new contributors kind/enhancement Something could be better. priority/P2 Somehow important but would not block a release. status/accepted We accept to investigate or work on it.
Development

Successfully merging a pull request may close this issue.

7 participants