Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only grow buffer exponentially when marked #129

Closed
wants to merge 1 commit into from

Conversation

Drvi
Copy link
Member

@Drvi Drvi commented Jan 9, 2023

When working on an a custom streaming object for JuliaServices/CloudStore.jl#24, I noticed that when decoding a 500MB gziped file, the buffers used by TranscodingStreams grow to unreasonable size. Here are some stats as captured inside the fillbuffer function (length, buffersize, marginsize):

[ Info: BUF1: len:  16.000 KiB | buf:     0 bytes | mar:  16.000 KiB *** BUF2: len:  16.000 KiB | buf:     0 bytes | mar:  16.000 KiB
[ Info: BUF1: len:  16.000 KiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len:  16.000 KiB | buf:  12.787 KiB | mar:     0 bytes
[ Info: BUF1: len:  16.000 KiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len:  20.787 KiB | buf:  17.891 KiB | mar:     0 bytes
[ Info: BUF1: len:  16.000 KiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len:  28.284 KiB | buf:  25.381 KiB | mar:     0 bytes
[ Info: BUF1: len:  16.000 KiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len:  39.522 KiB | buf:  36.634 KiB | mar:     0 bytes
[ Info: BUF1: len:  16.000 KiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len:  56.395 KiB | buf:  53.493 KiB | mar:     0 bytes
[ Info: BUF1: len:  20.422 KiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len:  81.690 KiB | buf:  77.928 KiB | mar:     0 bytes
[ Info: BUF1: len:  29.692 KiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 118.772 KiB | buf: 113.353 KiB | mar:     0 bytes
[ Info: BUF1: len:  43.185 KiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 172.738 KiB | buf: 164.926 KiB | mar:     0 bytes
[ Info: BUF1: len:  62.823 KiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 251.295 KiB | buf: 239.991 KiB | mar:     0 bytes
[ Info: BUF1: len:  91.409 KiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 365.639 KiB | buf: 349.496 KiB | mar:     0 bytes
[ Info: BUF1: len: 133.078 KiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 532.315 KiB | buf: 508.893 KiB | mar:     0 bytes
[ Info: BUF1: len: 193.762 KiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 775.050 KiB | buf: 740.910 KiB | mar:     0 bytes
[ Info: BUF1: len: 282.108 KiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len:   1.102 MiB | buf:   1.054 MiB | mar:     0 bytes
[ Info: BUF1: len: 410.831 KiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len:   1.605 MiB | buf:   1.535 MiB | mar:     0 bytes
[ Info: BUF1: len: 598.263 KiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len:   2.337 MiB | buf:   2.236 MiB | mar:     0 bytes
[ Info: BUF1: len: 871.515 KiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len:   3.404 MiB | buf:   3.258 MiB | mar:     0 bytes
[ Info: BUF1: len:   1.240 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len:   4.960 MiB | buf:   4.752 MiB | mar:     0 bytes
[ Info: BUF1: len:   1.808 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len:   7.232 MiB | buf:   6.928 MiB | mar:     0 bytes
[ Info: BUF1: len:   2.636 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len:  10.544 MiB | buf:  10.100 MiB | mar:     0 bytes
[ Info: BUF1: len:   3.843 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len:  15.371 MiB | buf:  14.726 MiB | mar:     0 bytes
[ Info: BUF1: len:   5.603 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len:  22.412 MiB | buf:  21.474 MiB | mar:     0 bytes
[ Info: BUF1: len:   8.170 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len:  32.680 MiB | buf:  31.331 MiB | mar:     0 bytes
[ Info: BUF1: len:  11.918 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len:  47.671 MiB | buf:  45.734 MiB | mar:     0 bytes
[ Info: BUF1: len:  17.392 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len:  69.569 MiB | buf:  66.754 MiB | mar:     0 bytes
[ Info: BUF1: len:  25.385 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 101.539 MiB | buf:  97.430 MiB | mar:     0 bytes
[ Info: BUF1: len:  37.050 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 148.199 MiB | buf: 142.209 MiB | mar:     0 bytes
[ Info: BUF1: len:  54.077 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 216.308 MiB | buf: 207.587 MiB | mar:     0 bytes
[ Info: BUF1: len:  78.935 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 315.741 MiB | buf: 303.233 MiB | mar:     0 bytes
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 461.104 MiB | buf: 426.272 MiB | mar:  17.465 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 656.824 MiB | buf: 409.055 MiB | mar: 230.552 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 737.467 MiB | buf: 391.848 MiB | mar: 328.412 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 760.581 MiB | buf: 374.625 MiB | mar: 368.733 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 760.581 MiB | buf: 357.407 MiB | mar: 385.956 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 760.581 MiB | buf: 340.186 MiB | mar: 385.956 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 760.581 MiB | buf: 322.973 MiB | mar: 385.956 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 760.581 MiB | buf: 305.760 MiB | mar: 385.956 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 760.581 MiB | buf: 288.546 MiB | mar: 385.956 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 760.581 MiB | buf: 271.325 MiB | mar: 385.956 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 760.581 MiB | buf: 254.117 MiB | mar: 385.956 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 760.581 MiB | buf: 236.907 MiB | mar: 385.956 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 760.581 MiB | buf: 219.697 MiB | mar: 385.956 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 760.581 MiB | buf: 202.751 MiB | mar: 385.956 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 760.581 MiB | buf: 185.847 MiB | mar: 385.956 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 760.581 MiB | buf: 168.948 MiB | mar: 385.956 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 760.581 MiB | buf: 152.041 MiB | mar: 385.956 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 760.581 MiB | buf: 135.130 MiB | mar: 385.956 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 760.581 MiB | buf: 118.233 MiB | mar: 385.956 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 760.581 MiB | buf: 101.325 MiB | mar: 385.956 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 760.581 MiB | buf:  84.415 MiB | mar: 385.956 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 760.581 MiB | buf:  67.501 MiB | mar: 385.956 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 760.581 MiB | buf:  50.591 MiB | mar: 385.956 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 760.581 MiB | buf:  33.677 MiB | mar: 385.956 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar:     0 bytes *** BUF2: len: 760.581 MiB | buf:  16.770 MiB | mar: 385.956 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar: 980.626 KiB *** BUF2: len: 760.581 MiB | buf:     0 bytes | mar: 385.956 MiB
[ Info: BUF1: len: 110.910 MiB | buf:     0 bytes | mar: 980.626 KiB *** BUF2: len: 760.581 MiB | buf:     0 bytes | mar: 385.956 MiB

So to decompress a 500MB file, we end up allocating 870MB of buffers which is wasteful and very slow for our prefetched downloading stream. With this change, the buffers don't change in size while decompressing the same file.

IIUC, #121 was introduced to help with cases where significant part of the buffer is marked, so I restricted the growing behavior to be proportional to size of the marked region.

CC: @jakobnissen @quinnj

@jakobnissen
Copy link
Collaborator

I think there is a logical bug somewhere else in the code. makemargin! ought to only increase the buffer size when the mark is set, and thus it is not possible to discard enough data to make room for the new data.
I will investigate what is happening and report back.

@jakobnissen
Copy link
Collaborator

jakobnissen commented Jan 9, 2023

Fixed by #130 . @Drvi , can you confirm that this fix stops the unneeded buffer growth in your use case?

It does not resize the buffer using the following test code:

julia> open(GzipDecompressorStream, PATH_TO_LARGE_GZIP_FILE) do io
           while true
               buf = zeros(UInt8, 1024)
               iszero(readbytes!(io, buf, 1024)) && break
           end
       end

Yet it still grows buffers exponentially when needed.

@Drvi
Copy link
Member Author

Drvi commented Jan 9, 2023

@jakobnissen Yes, your PR solved my problem! Thanks so much for a quick, proper fix:)

@Drvi Drvi closed this Jan 9, 2023
@jakobnissen
Copy link
Collaborator

jakobnissen commented Jan 9, 2023

Okay, I'll merge the other one and tag a new release (if I can)
@quinnj Turns out I don't have merge rights to this repo. Can you merge #130 and tag a new release of TranscodingStreams?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants