-
Notifications
You must be signed in to change notification settings - Fork 344
Pure managed implementation of System.IO.Compression.Brotli #1673
Comments
We've had a native implementation for Deflate and I totally understand the issues one runs into with making it portable everywhere. That being said, this might just be an artifact of this being out-of-band (i.e. delivered as a NuGet package). Eventually, we'd put Brotli where the other compression algorithms live (i.e. into the platform). At that point, it shouldn't matter too much as we'd probably add the code to the existing clrcompression native module. |
FWIW Google already has an MIT licensed managed C# decoder implementation in https://github.com/google/brotli/tree/master/csharp (though this is transpiled from Java so probably not ideal). |
@shiftylogic, any thoughts? |
I would also prefer to see a managed implementation used unless the native implementation ends up being significantly more efficient. We're trying to minimize our native dependencies as much as possible, and if we're going to carry something with us, I'd prefer it be in managed unless there's a strong perf reason otherwise. |
Had a quick chat with @joshfree. One of the reasons we went with native code for our deflate implementation was being able to easier pickup improvements made in the upstream version, which was written in C. The argument that I've heard is that we're not interested in building and maintaining a compression algorithm ourselves; it's mostly about how we expose them. So in that sense, having a managed implementation just because it's easier to maintain isn't the higher order bit, assuming we take the code usually from somewhere else anyway. Of course, I could imagine that going with managed code might also help us with performance, for instance by avoiding P/Invoke overhead or by leveraging newer features on our end (like SIMD). So I guess there is no clear cut answer... |
If we're relying on the .so/.dylib as part of each distro we target, then we're limited to which distro we can support, and hit the kind of platform differences we've faced with libcurl. If not, then we are building and maintaining, and responsible for servicing and the like. We might benefit from fixes made upstream, but it's still up to us to consume them and adapt then to our environment. Relying on existing native components has been a great way to bootstrap the platform, and I do not regret our doing so. But going forward we need to be very thoughtful about where and how we do so. It's possible this is a desirable place, of course, especially if there is a performance argument, I just don't want us to assume so. |
@stephentoub, I think we'd put the source code into |
We only distribute clrcompression.dll on Windows. We'd need to build it into something else on Unix, like System.IO.Compression.Native.so/dylib. But regardless, that's what I referred to above where I said "If not, then we are building and maintaining, and responsible for servicing and the like. We might benefit from fixes made upstream, but it's still up to us to consume them and adapt them to our environment." Doable? Of course. But no free lunch. |
Either way we're going to have to maintain and update an implementation manually as we can't rely on Brotli being already being available on the machine. So our choices are:
I doubt there will be an especially strong performance argument for choosing one of the above over the other considering any managed encoder written today would have the latest hotness, but we can't really know for sure until we actually write the managed encoder at which point it would be silly not to use it :) |
Personally, I prefer the managed approach despite the extra work required to write the encoder in C#. Having a native zlib implementation on Windows provided a great perf improvement over our old managed deflater, but man is it a hassle to keep up with. I've spent so much time resolving packaging errors, import issues, platform library availability problems, compiler version snafus, binscope failures, etc. etc. that I wonder where we'd be at perf-wise if that time went into modernizing (and correcting) the managed deflater/inflater instead. Probably still behind zlib-intel, but it's interesting to think about either way. |
@ianhays, you might be able to use https://github.com/master131/BrotliSharpLib to jump start a corefx implementation. |
That would certainly lower the gap in time cost between the two approaches. It's licensed under MIT so we could use that as a base for the encoder without needing too many modifications. |
Sounds good, a managed implementation would definitely be less hassle to integrate into all our platforms. |
Based on https://blogs.msdn.microsoft.com/dotnet/2018/05/07/announcing-net-core-2-1-rc-1/ |
@msmolka I just checked, and the most recent Preview1 package of Same for and |
So it will be in version 2.2. not 2.1 as mentioned on Microsoft blog? |
It's in 2.1. See |
But then how to use it from full framework? Also how to include it in library targeted for netstandard and full ? |
Resolves #11431 I put the native impl to Mono.Native, but probably it should be in a separate lib (`libmono-native.0.dylib` before: **81kb**, after: **898kb**) if Linker is not able to trim it if it's not used. Compiles and works on Windows (however, we don't build Mono.Native for Windows, but we can borrow `clrcompression.dll` from .net core) and macOS but requires changes in mono/corefx, I disabled xunit tests as those need 50mb of test data. PS: Brotli could be fully managed 😢 dotnet/corefxlab#1673 C# ports: - https://github.com/master131/BrotliSharpLib A quick benchmark (Compressing a 5mb file with max compression level=11) Mono JIT: 25sec Mono LLVM-AOT: 7.8sec .NET Core 2.2: 6.8sec .NET Core 3.0: 5.8sec .NET Core's BrotliStream (with native lib): 3.5sec - https://github.com/google/brotli/tree/master/csharp - official C# port. Based on Java sources, slow, outdated (2 years ago).
The native dependency for the Brotli encoder/decoder is a bit problematic for e.g. Xamarin. A managed implementation of the algorithm would be more portable.
Relevant Twitter convo: https://twitter.com/terrajobst/status/890690221301350401
The text was updated successfully, but these errors were encountered: