Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gzip: fix memory allocs (buffers not returned to pool) #28

Merged
merged 6 commits into from
Feb 12, 2020

Conversation

wojas
Copy link
Contributor

@wojas wojas commented Feb 11, 2020

Fix allocation leaks during gzip writes. Due to incorrect use of the dstPool (unmatched Get and Put), large amounts of memory were temporarily allocated during writes and not put back into the pool.

This also removes some special handling code in compressCurrent that would recursively call itself for too large input buffers. This condition can never occur, because Write ensures that blocks are capped
and there is no other public interface that extends currentBuffer. The recursive call that slices the buffer would have made returning byte slices to the Pool dangerous, as we could have been returning the same
underlying buffer multiple times.

This also adds a test to check allocations per Write to prevent regressions. There is further room for improvement, but this was by far the biggest leak.

Closes #8

Additionally, this adds a go.mod for Go modules support.

(Note that tests broke with recent commit a8ba214).

Fix allocation leaks during gzip writes. Due to incorrect use of the
dstPool (unmatched Get and Put), large amounts of memory were
temporarily allocated during writes and not put back into the pool.

This also removes some special handling code in compressCurrent that
would recursively call itself for too large input buffers. This
condition can never occur, because Write ensures that blocks are capped
and there is no other public interface that extends currentBuffer.
The recursive call that slices the buffer would have made returning byte
slices to the Pool dangerous, as we could have been returning the same
underlying buffer multiple times.

This also adds a test to check allocations per Write to prevent
regressions. There is further room for improvement, but this was by far
the biggest leak.
@klauspost
Copy link
Owner

Thanks. I will review tomorrow.

Please take out the go.mod - I have not decided to go that route yet.

@wojas wojas force-pushed the fix-memory-allocs branch from 8d889e8 to c548e59 Compare February 11, 2020 05:26
@wojas
Copy link
Contributor Author

wojas commented Feb 11, 2020

Thanks for the fast response. I took out the go.mod.

@wojas
Copy link
Contributor Author

wojas commented Feb 11, 2020

I found that the tests pass without -v, but fail with -v... Investigating.

The allocation test was incorrectly using Alloc (the current number of
allocated bytes) instead of TotalAlloc (the cummulative number of
bytes), which would break the test if garbage collection happened during
the run.

This also increases the allocated bytes check to allow for occasional
other background allocations. I once saw 3kB instead of the usual <700B
allocations. This is still sufficiently low to detect pool usage issues.
In the panics, make clear that these can only happen due to concurrent
Write call races, which are unsafe. The alternative would be to silently
corrupt data.

Move the panic to the beginning of compressCurrent to not deadlock tests
after a panic.
Use a pool buffer for prevTail instead of slicing the current buffer,
which may have been returned to the pool by the time the tail is
accessed.
Skip the allocation tests when running with the race detector, because
the race detector allocates a lot of memory.

Rename the unreliable tests file.
@wojas
Copy link
Contributor Author

wojas commented Feb 11, 2020

This PR should be done as is. The remaining test failures are also present in master and I cannot reproduce them locally in my branch, which is based on v1.2.1.

I suspect they are caused by a8ba214.

@klauspost
Copy link
Owner

Thanks! I'll take a look. 👍

Copy link
Owner

@klauspost klauspost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look at it. It is some quite old code that I haven't touched in a while, so I also have some catching up to do.

gzip_norace_test.go Outdated Show resolved Hide resolved
gzip_norace_test.go Outdated Show resolved Hide resolved
// but it's better than the 175846 bytes before the pool release fix this tests.
// TODO: Further reduce allocations
if allocBytes > 10240 {
t.Errorf("Write allocated too much memory per run (%.0f bytes), Pool used incorrectly?", allocBytes)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the failures it reports gzip_test.go:637: Write allocated too much memory per run (18446744073709436 bytes), Pool used incorrectly?

That is 18446.7TB - I think you would be hard pressed to allocated that much, so I suspect there is something wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this happened with the latest version? I noticed this in earlier test runs and 799eba1 fixes this. Before that commit I was using .Alloc instead of .TotalAlloc.

From the MemStats comments:

TotalAlloc increases as heap objects are allocated, but unlike Alloc and HeapAlloc, it does not decrease when objects are freed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seems I mixed up the runs. Anyway, merging master should fix the writer.


// Read the final statistics
runtime.ReadMemStats(&memstats)
allocs := memstats.TotalAlloc - oldTotal
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the error is caused by memstats.TotalAlloc > oldTotal causing an underflow and the crazy numbers.

Not really an expert on how that can happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment

w.SetConcurrency(1000, 1)
data := bytes.Repeat([]byte("T"), 100000) // varying block splits

const n = 1000
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

Suggested change
const n = 1000
n := 1000
if testing.Short() {
n = 10
}

(or make the i++ change to bigger increments).

Though you have this guarded nicely by tags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test tries to reproduce a race condition. I tried, but I cannot reproduce it with n = 10.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, ok, no problem.

@wojas wojas force-pushed the fix-memory-allocs branch from a13a592 to 6b5bd9d Compare February 12, 2020 05:08
@klauspost klauspost merged commit c94cf00 into klauspost:master Feb 12, 2020
@klauspost
Copy link
Owner

Cool. I will give it some tests before the next release. Thanks a bunch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dictFlatePool not helping much (lots of new flate compressor allocations)
2 participants