Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let Zlib::Reader, Gzip::Reader and Flate::Reader include IO::Buffered for optimized performance #6916

Merged
merged 1 commit into from
Oct 18, 2018

Conversation

asterite
Copy link
Member

@asterite asterite commented Oct 7, 2018

All of these readers wrap another IO from which they read. The problem is that when using methods like gets which read until some character is found, usually a newline, it's much faster to use input buffering to read a lot of bytes at once instead of going byte per byte (or worse, utf-8 char per char). The gets method actually relies on a peek method to peek next bytes if available to perform the scan faster, and the easiest way to accomplish this is by including IO::Buffered, which implements this.

(There might be a way to implement this without include IO::Buffered, maybe for example in Zlib::Reader having an output buffer similar to the one in IO::Buffered, and reading there and copying from that to the requested slice by read, but that's basically duplicating the work that's done in IO::Buffered, so it's probably not worth it).

Note that this will require an additional 8KB of memory for each of these IOs, but that's fine because one usually doesn't deal with a lot of IOs at a time.

This optimization happened because the slowness was discovered in IRC.

Also checking what Python does, their gzip reader is buffered too (and I guess that's the case for Ruby too).

(An additional research, if someone wants to tackle this, is checking whether other IOs in the std, notably HTTP responses, might need this optimization too.)

Benchmark follows:

require "gzip"
require "zlib"
require "flate"

FILENAME = "test.gzip"

def benchmark(type : T.class) forall T
  File.open(FILENAME, "w") do |file|
    T::Writer.open(file) do |writer|
      1_000_000.times do |i|
        writer.puts "This is line number #{i}"
      end
    end
  end

  i = 0
  puts "#{T}:"
  puts Time.measure {
    File.open(FILENAME) do |file|
      T::Reader.open(file) do |reader|
        reader.each_line do
          i += 1
        end
      end
    end
  }

  puts i
end

benchmark(Gzip)
benchmark(Zlib)
benchmark(Flate)

Old output:

Gzip:
00:00:03.052840054
1000000
Zlib:
00:00:02.439849829
1000000
Flate:
00:00:01.947586854
1000000

New output:

Gzip:
00:00:00.103666309
1000000
Zlib:
00:00:00.095306070
1000000
Flate:
00:00:00.090960507
1000000

Gzip: 29 times faster
Zlib: 25 times faster
Flate: 21 times faster

@bcardiff
Copy link
Member

bcardiff commented Oct 7, 2018

Can be composition of IOs used to reach the performance?

It would be great to keep IO as orthogonal as possible IMO.

If we need to chain two or more IOs that happen to be all buffered then we will end with intermediate buffers that will not be used at all and just filling memory I think.

@asterite
Copy link
Member Author

asterite commented Oct 7, 2018

@bcardiff Yes, for that we need to remove the IO::Buffered module and make it a wrapper with which you wrap other IOs. It's also a silent breaking change if File for example stop doing buffering by default.

On the other hand, if you have a File and a Flate::Reader, I think first the raw bytes from the File will be buffered by File, and then the decompressed bytes will be buffered by Flate::Reader, so the data is different and it might make sense (as in, it might be faster) to have two buffers. I'm not sure though, I guess someone will have to benchmark that.

@asterite
Copy link
Member Author

asterite commented Oct 7, 2018

Actually, you can try it by setting file.read_buffering = false. It takes the same time, so yeah, having multiple buffers is redundant. That said, it's convenient for the user and just takes 8KB more (it's a lot, but not a lot of IOs are usually open at the same time).

@asterite
Copy link
Member Author

asterite commented Oct 7, 2018

Actually, setting both file and gzip::reader to read_buffering = false gives the same time, so probably read_buffering is not working well...

@asterite
Copy link
Member Author

asterite commented Oct 8, 2018

I just added another commit that fixes IO::Buffered#peek: it wasn't honoring the read_buffering setting.

With that, if I set read_buffering = false in the File it works very, very slow. The reason is that:

  • you ask from, say, Flate::Reader, a line to decompress
  • Flate::Reader is now buffered, so it will ask 8KB to decompress instead of a line
  • to do this, it needs to first read bytes from the file. But there's a condition there to not read more than we need, and if we can't peek we have to read byte per byte (this is because we don't in turn implement flate in Crystal and rely on zlib). And this is slow. If it's buffered we can peek and it's faster.

Anyway, I think adding buffering at each level is good. And, again, 8KB pero IO is not a big deal, considering that the OS can't have many file descriptors open at once.

@asterite
Copy link
Member Author

asterite commented Oct 8, 2018

By the way, with this commit I also fixed rewind for these types, which previously didn't work.

@straight-shoota
Copy link
Member

straight-shoota commented Oct 8, 2018

Yeah, it makes a difference to have multiple buffers in the IO chain. All potentially costly read operations should be buffered to speed things up.

Gzip::Reader#read for example invokes a number of smaller reads on the wrapped IO. When that's a file, it needs a buffer as well as the Gzip::Reader itself benefits from a buffer.

I've taken a quick look at the other IO implementations.

OpenSSL::DigestIO, Zip::ChecksumReader and Zip::ChecksumWriter could probably benefit from being buffered, because every read/write operation updates the digest.

HTTP content IOs look fine, though. They mostly pass through.

A buffer is only useful if there is a significant performance overhead between 1000 read_byte and read(Bytes.new(1000)) (for example).

@asterite
Copy link
Member Author

asterite commented Oct 8, 2018

@straight-shoota Thank you for the research!

I think buffering is not needed for wrapper as long as they provide a good peek implementation, which is what the gets and all the methods that need to find something while reading bytes need. Zip::ChecksumReader already has peek, but OpenSSL::DigestIO doesn't have it and could use one (but maybe for another PR).

@straight-shoota
Copy link
Member

Relying on the user to peek is an incomplete solution. It can eliminate the need for buffering in some situations. But when you have a known data format and just issue a number of smaller reads, you don't need to peek and it wouldn't help in any way.
Plus, not every IO necessarily supports peek at all.

Copy link
Member

@sdogruyol sdogruyol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @asterite 👍

@RX14
Copy link
Contributor

RX14 commented Oct 10, 2018

This is fine to implement now that we have read_buffering= available.

Copy link
Contributor

@RX14 RX14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And ditto for all the other impls

@@ -119,8 +125,12 @@ class Flate::Reader < IO
end
end

def unbuffered_flush
raise IO::Error.new "Can't flush Flate::Reader"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be: io.flush? It makes no sense to raise here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's a reader, but yeah, we can delegate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do other readers raise on flush? I thought the default semantic for flush was to silently ignore if it has nothing to do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, other reads raise on flush.

@@ -130,6 +140,14 @@ class Flate::Reader < IO
@io.close if @sync_close
end

def unbuffered_rewind
raise IO::Error.new "Closed stream" if closed?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_open.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I updated the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use check_open call like @RX14 suggested?

@RX14
Copy link
Contributor

RX14 commented Oct 10, 2018

Reading from any IO::Buffered with a buffer greater than 4096 bytes will bypass the buffering (safely) anyway, so IO::Buffered impls should compose well. Unfortunately Flate::Reader will only ever read by 1 byte or use peek, which avoids that...

I'm not sure Gzip::Reader should be buffered because it always delegates to flate_io. Same for Zlib::Reader. Either they should be unbuffered or turn off buffering on their Flate::Readers.

@straight-shoota
Copy link
Member

@RX14 Gzip and Zlib both delegate reading the entire slice to Flate, so yes, as long as they themselves are buffered, a Flate reader wrapped in such a way shouldn't need a read buffer as well.

@asterite
Copy link
Member Author

@RX14 @straight-shoota Yeah, probably the buffer isn't needed for gzip and zlib, but if I don't include IO::Buffered it works slow. Maybe we'll have to see how to implement peek, but I have no idea how to do it. But feel free to play with that, if you can implement peek we can remove IO::Buffered.

@RX14
Copy link
Contributor

RX14 commented Oct 10, 2018

with Zlib::Reader you can just forward peek to @flate_io.

with Gzip::Reader you just need to forward peek, but handle peek.size == 0 specially, copying the contents of the read_bytes == 0 branch, pretty much exactly. It can probably be split into a read_next_header method.

@asterite
Copy link
Member Author

But:

  • Zlib::Reader has a section that reads a checksum, and that isn't returned in read.
  • Gzip::Reader must handle a header, I'm not sure how to account for that in peek

I think it's better for me to close this PR and you can work on all of these things because I won't have time to update this PR with more changes.

@RX14
Copy link
Contributor

RX14 commented Oct 10, 2018

@asterite notice that in Zlib::Reader there's only special cases when @flate_io.read returns zero, i.e. EOF, so they don't affect peek.

Gzip::Reader has a special case because @flate_io's EOF doesn't mean the reader's EOF, but you just have to loop peek = @flate_io.peek and read_next_header until the peek size is > 0 to avoid accidentally returning EOF from peek.

@straight-shoota
Copy link
Member

No, Zlib::Reader and Gzip::Reader both need to include IO::Buffered. Their single-byte read is much slower than a bigger slice.
But they can disable the buffer in the wrapped Flate::Reader because when they are buffered themselves, they will only issue large slice reads to the Flate reader.

@asterite
Copy link
Member Author

Ah, that makes sense. I'll try it later and update the code.

@RX14
Copy link
Contributor

RX14 commented Oct 10, 2018

Might as well keep this PR as-is then because of the 4KiB rule.

Ultimately, this needs benchmarks to know what the right behaviour is.

@asterite
Copy link
Member Author

Yes, I think the current PR is fine. Feel free to merge it.

@asterite
Copy link
Member Author

@RX14 Ping. Other readers also raise on flush, or at least File::PReader does. We can maybe leave this discussion for another issue/PR.

@RX14
Copy link
Contributor

RX14 commented Oct 18, 2018

I'm not sure what you mean regarding flush, I'm talking about this failing:

io.read_byte
# io now has a read buffer of size 8192, 8191 bytes are valid
io.read_buffering = false
io.gets
# io.gets will return only a maximum of 8191 bytes, even if the next `\n` or `\r\n` is beyond that size.

@asterite
Copy link
Member Author

For that I suggest removing read_buffering. As I said, it's only needed for secure random. For instance, you can't disable read buffering in Ruby.

@asterite
Copy link
Member Author

Or we can fix IO::Buffered#gets to check for read_buffering in addition to peek. But it's not related to this PR, really. Or yes, I can remove the "fix" for read_buffering, would that be OK?

@asterite asterite force-pushed the opt/gzip-zlib-flate branch from 06453a1 to a84c71d Compare October 18, 2018 12:06
@asterite
Copy link
Member Author

Okay, I removed the read_buffering changes.

@RX14
Copy link
Contributor

RX14 commented Oct 18, 2018

It's fine to remove that fix for now but the bug still persists and should be opened in another issue.

@asterite asterite added this to the 0.27.0 milestone Oct 18, 2018
@asterite asterite merged commit e62a944 into crystal-lang:master Oct 18, 2018
return if @closed
@closed = true

@flate_io.try &.close
@io.close if @sync_close
end

def unbuffered_rewind
raise IO::Error.new "Closed stream" if closed?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

return if @closed
@closed = true

@flate_io.close
@io.close if @sync_close
end

def unbuffered_rewind
raise IO::Error.new "Closed stream" if closed?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it was like that but I had to rollback to a previous version and I forgot to add that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone can send a PR, or commit directly to master with it (I'm trying but I can't).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#6958 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants