Let Zlib::Reader, Gzip::Reader and Flate::Reader include IO::Buffered for optimized performance #6916

asterite · 2018-10-07T18:38:32Z

All of these readers wrap another IO from which they read. The problem is that when using methods like gets which read until some character is found, usually a newline, it's much faster to use input buffering to read a lot of bytes at once instead of going byte per byte (or worse, utf-8 char per char). The gets method actually relies on a peek method to peek next bytes if available to perform the scan faster, and the easiest way to accomplish this is by including IO::Buffered, which implements this.

(There might be a way to implement this without include IO::Buffered, maybe for example in Zlib::Reader having an output buffer similar to the one in IO::Buffered, and reading there and copying from that to the requested slice by read, but that's basically duplicating the work that's done in IO::Buffered, so it's probably not worth it).

Note that this will require an additional 8KB of memory for each of these IOs, but that's fine because one usually doesn't deal with a lot of IOs at a time.

This optimization happened because the slowness was discovered in IRC.

Also checking what Python does, their gzip reader is buffered too (and I guess that's the case for Ruby too).

(An additional research, if someone wants to tackle this, is checking whether other IOs in the std, notably HTTP responses, might need this optimization too.)

Benchmark follows:

require "gzip"
require "zlib"
require "flate"

FILENAME = "test.gzip"

def benchmark(type : T.class) forall T
  File.open(FILENAME, "w") do |file|
    T::Writer.open(file) do |writer|
      1_000_000.times do |i|
        writer.puts "This is line number #{i}"
      end
    end
  end

  i = 0
  puts "#{T}:"
  puts Time.measure {
    File.open(FILENAME) do |file|
      T::Reader.open(file) do |reader|
        reader.each_line do
          i += 1
        end
      end
    end
  }

  puts i
end

benchmark(Gzip)
benchmark(Zlib)
benchmark(Flate)

Old output:

Gzip:
00:00:03.052840054
1000000
Zlib:
00:00:02.439849829
1000000
Flate:
00:00:01.947586854
1000000

New output:

Gzip:
00:00:00.103666309
1000000
Zlib:
00:00:00.095306070
1000000
Flate:
00:00:00.090960507
1000000

Gzip: 29 times faster
Zlib: 25 times faster
Flate: 21 times faster

bcardiff · 2018-10-07T19:36:51Z

Can be composition of IOs used to reach the performance?

It would be great to keep IO as orthogonal as possible IMO.

If we need to chain two or more IOs that happen to be all buffered then we will end with intermediate buffers that will not be used at all and just filling memory I think.

asterite · 2018-10-07T19:44:20Z

@bcardiff Yes, for that we need to remove the IO::Buffered module and make it a wrapper with which you wrap other IOs. It's also a silent breaking change if File for example stop doing buffering by default.

On the other hand, if you have a File and a Flate::Reader, I think first the raw bytes from the File will be buffered by File, and then the decompressed bytes will be buffered by Flate::Reader, so the data is different and it might make sense (as in, it might be faster) to have two buffers. I'm not sure though, I guess someone will have to benchmark that.

asterite · 2018-10-07T20:21:53Z

Actually, you can try it by setting file.read_buffering = false. It takes the same time, so yeah, having multiple buffers is redundant. That said, it's convenient for the user and just takes 8KB more (it's a lot, but not a lot of IOs are usually open at the same time).

asterite · 2018-10-07T20:28:21Z

Actually, setting both file and gzip::reader to read_buffering = false gives the same time, so probably read_buffering is not working well...

asterite · 2018-10-08T11:20:46Z

I just added another commit that fixes IO::Buffered#peek: it wasn't honoring the read_buffering setting.

With that, if I set read_buffering = false in the File it works very, very slow. The reason is that:

you ask from, say, Flate::Reader, a line to decompress
Flate::Reader is now buffered, so it will ask 8KB to decompress instead of a line
to do this, it needs to first read bytes from the file. But there's a condition there to not read more than we need, and if we can't peek we have to read byte per byte (this is because we don't in turn implement flate in Crystal and rely on zlib). And this is slow. If it's buffered we can peek and it's faster.

Anyway, I think adding buffering at each level is good. And, again, 8KB pero IO is not a big deal, considering that the OS can't have many file descriptors open at once.

asterite · 2018-10-08T11:21:20Z

By the way, with this commit I also fixed rewind for these types, which previously didn't work.

straight-shoota · 2018-10-08T12:53:27Z

Yeah, it makes a difference to have multiple buffers in the IO chain. All potentially costly read operations should be buffered to speed things up.

Gzip::Reader#read for example invokes a number of smaller reads on the wrapped IO. When that's a file, it needs a buffer as well as the Gzip::Reader itself benefits from a buffer.

I've taken a quick look at the other IO implementations.

OpenSSL::DigestIO, Zip::ChecksumReader and Zip::ChecksumWriter could probably benefit from being buffered, because every read/write operation updates the digest.

HTTP content IOs look fine, though. They mostly pass through.

A buffer is only useful if there is a significant performance overhead between 1000 read_byte and read(Bytes.new(1000)) (for example).

asterite · 2018-10-08T13:05:32Z

@straight-shoota Thank you for the research!

I think buffering is not needed for wrapper as long as they provide a good peek implementation, which is what the gets and all the methods that need to find something while reading bytes need. Zip::ChecksumReader already has peek, but OpenSSL::DigestIO doesn't have it and could use one (but maybe for another PR).

straight-shoota · 2018-10-08T13:29:10Z

Relying on the user to peek is an incomplete solution. It can eliminate the need for buffering in some situations. But when you have a known data format and just issue a number of smaller reads, you don't need to peek and it wouldn't help in any way.
Plus, not every IO necessarily supports peek at all.

sdogruyol

Thank you @asterite 👍

RX14 · 2018-10-10T15:14:02Z

This is fine to implement now that we have read_buffering= available.

RX14

And ditto for all the other impls

RX14 · 2018-10-10T15:36:33Z

src/flate/reader.cr

@@ -119,8 +125,12 @@ class Flate::Reader < IO
    end
  end

+  def unbuffered_flush
+    raise IO::Error.new "Can't flush Flate::Reader"


Shouldn't this be: io.flush? It makes no sense to raise here.

Well, it's a reader, but yeah, we can delegate

Do other readers raise on flush? I thought the default semantic for flush was to silently ignore if it has nothing to do.

Yes, other reads raise on flush.

RX14 · 2018-10-10T15:38:05Z

src/flate/reader.cr

@@ -130,6 +140,14 @@ class Flate::Reader < IO
    @io.close if @sync_close
  end

+  def unbuffered_rewind
+    raise IO::Error.new "Closed stream" if closed?


check_open.

Thanks! I updated the code.

Why not just use check_open call like @RX14 suggested?

RX14 · 2018-10-10T15:49:50Z

Reading from any IO::Buffered with a buffer greater than 4096 bytes will bypass the buffering (safely) anyway, so IO::Buffered impls should compose well. Unfortunately Flate::Reader will only ever read by 1 byte or use peek, which avoids that...

I'm not sure Gzip::Reader should be buffered because it always delegates to flate_io. Same for Zlib::Reader. Either they should be unbuffered or turn off buffering on their Flate::Readers.

straight-shoota · 2018-10-10T16:31:33Z

@RX14 Gzip and Zlib both delegate reading the entire slice to Flate, so yes, as long as they themselves are buffered, a Flate reader wrapped in such a way shouldn't need a read buffer as well.

asterite · 2018-10-10T18:17:12Z

@RX14 @straight-shoota Yeah, probably the buffer isn't needed for gzip and zlib, but if I don't include IO::Buffered it works slow. Maybe we'll have to see how to implement peek, but I have no idea how to do it. But feel free to play with that, if you can implement peek we can remove IO::Buffered.

RX14 · 2018-10-10T18:31:13Z

with Zlib::Reader you can just forward peek to @flate_io.

with Gzip::Reader you just need to forward peek, but handle peek.size == 0 specially, copying the contents of the read_bytes == 0 branch, pretty much exactly. It can probably be split into a read_next_header method.

asterite · 2018-10-10T18:34:27Z

But:

Zlib::Reader has a section that reads a checksum, and that isn't returned in read.
Gzip::Reader must handle a header, I'm not sure how to account for that in peek

I think it's better for me to close this PR and you can work on all of these things because I won't have time to update this PR with more changes.

RX14 · 2018-10-10T18:45:06Z

@asterite notice that in Zlib::Reader there's only special cases when @flate_io.read returns zero, i.e. EOF, so they don't affect peek.

Gzip::Reader has a special case because @flate_io's EOF doesn't mean the reader's EOF, but you just have to loop peek = @flate_io.peek and read_next_header until the peek size is > 0 to avoid accidentally returning EOF from peek.

straight-shoota · 2018-10-10T21:06:30Z

No, Zlib::Reader and Gzip::Reader both need to include IO::Buffered. Their single-byte read is much slower than a bigger slice.
But they can disable the buffer in the wrapped Flate::Reader because when they are buffered themselves, they will only issue large slice reads to the Flate reader.

asterite · 2018-10-10T21:10:27Z

Ah, that makes sense. I'll try it later and update the code.

RX14 · 2018-10-10T23:06:02Z

Might as well keep this PR as-is then because of the 4KiB rule.

Ultimately, this needs benchmarks to know what the right behaviour is.

asterite · 2018-10-11T00:39:39Z

Yes, I think the current PR is fine. Feel free to merge it.

asterite · 2018-10-18T11:53:05Z

@RX14 Ping. Other readers also raise on flush, or at least File::PReader does. We can maybe leave this discussion for another issue/PR.

RX14 · 2018-10-18T11:58:30Z

I'm not sure what you mean regarding flush, I'm talking about this failing:

io.read_byte
# io now has a read buffer of size 8192, 8191 bytes are valid
io.read_buffering = false
io.gets
# io.gets will return only a maximum of 8191 bytes, even if the next `\n` or `\r\n` is beyond that size.

asterite · 2018-10-18T12:01:18Z

For that I suggest removing read_buffering. As I said, it's only needed for secure random. For instance, you can't disable read buffering in Ruby.

asterite · 2018-10-18T12:04:42Z

Or we can fix IO::Buffered#gets to check for read_buffering in addition to peek. But it's not related to this PR, really. Or yes, I can remove the "fix" for read_buffering, would that be OK?

… for optimized performance

asterite · 2018-10-18T12:06:32Z

Okay, I removed the read_buffering changes.

RX14 · 2018-10-18T12:06:55Z

It's fine to remove that fix for now but the bug still persists and should be opened in another issue.

Sija · 2018-10-18T14:33:44Z

src/gzip/reader.cr

    return if @closed
    @closed = true

    @flate_io.try &.close
    @io.close if @sync_close
  end
+
+  def unbuffered_rewind
+    raise IO::Error.new "Closed stream" if closed?


Sija · 2018-10-18T14:33:54Z

src/zlib/reader.cr

    return if @closed
    @closed = true

    @flate_io.close
    @io.close if @sync_close
  end

+  def unbuffered_rewind
+    raise IO::Error.new "Closed stream" if closed?


Yeah, it was like that but I had to rollback to a previous version and I forgot to add that.

Someone can send a PR, or commit directly to master with it (I'm trying but I can't).

straight-shoota approved these changes Oct 7, 2018

View reviewed changes

bcardiff approved these changes Oct 8, 2018

View reviewed changes

sdogruyol approved these changes Oct 9, 2018

View reviewed changes

RX14 requested changes Oct 10, 2018

View reviewed changes

Let Zlib::Reader, Gzip::Reader and Flate::Reader include IO::Buffered…

a84c71d

… for optimized performance

asterite force-pushed the opt/gzip-zlib-flate branch from 06453a1 to a84c71d Compare October 18, 2018 12:06

RX14 mentioned this pull request Oct 18, 2018

IO::Buffered#peek doesn't respect read_buffering #6956

Open

RX14 approved these changes Oct 18, 2018

View reviewed changes

asterite added topic:stdlib performance labels Oct 18, 2018

asterite added this to the 0.27.0 milestone Oct 18, 2018

asterite merged commit e62a944 into crystal-lang:master Oct 18, 2018

Sija reviewed Oct 18, 2018

View reviewed changes

Sija mentioned this pull request Oct 18, 2018

Use check_open in {Zlib,Gzip,Flate}::Reader#unbuffered_rewind #6958

Merged

Let Zlib::Reader, Gzip::Reader and Flate::Reader include IO::Buffered for optimized performance #6916

Let Zlib::Reader, Gzip::Reader and Flate::Reader include IO::Buffered for optimized performance #6916

Conversation

asterite commented Oct 7, 2018

bcardiff commented Oct 7, 2018

asterite commented Oct 7, 2018

asterite commented Oct 7, 2018

asterite commented Oct 7, 2018

asterite commented Oct 8, 2018

asterite commented Oct 8, 2018

straight-shoota commented Oct 8, 2018 • edited Loading

asterite commented Oct 8, 2018

straight-shoota commented Oct 8, 2018

sdogruyol left a comment

Choose a reason for hiding this comment

RX14 commented Oct 10, 2018

RX14 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RX14 commented Oct 10, 2018

straight-shoota commented Oct 10, 2018

asterite commented Oct 10, 2018

RX14 commented Oct 10, 2018 • edited Loading

asterite commented Oct 10, 2018

RX14 commented Oct 10, 2018

straight-shoota commented Oct 10, 2018

asterite commented Oct 10, 2018

RX14 commented Oct 10, 2018 • edited Loading

asterite commented Oct 11, 2018

asterite commented Oct 18, 2018

RX14 commented Oct 18, 2018

asterite commented Oct 18, 2018

asterite commented Oct 18, 2018

asterite commented Oct 18, 2018

RX14 commented Oct 18, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

straight-shoota commented Oct 8, 2018 •

edited

Loading

RX14 commented Oct 10, 2018 •

edited

Loading

RX14 commented Oct 10, 2018 •

edited

Loading