refactor: Reduce how much code gets monomorphized #1032

Marwes · 2022-07-08T13:20:09Z

Motivation

tonic has a lot of generic code which causes a lot of monormorphization and therefore bloated compile times and binaries.

Solution

By extracting code which does not need to be generic over a type parameter into non generic (or less generic functions) we can reduce how much code gets instantiated and passed to LLVM.

This gives a ~23% reduction in the output of LLVM IR (according to cargo llvm-lines -p examples --bin streaming-client) and reduces the compilation time of cargo build --release --bin streaming-client by ~1.5s (21s to 19.5s). (Measured by just switching branches between this one and master to force recompilations)

Before

  Lines         Copies       Function name
  -----         ------       -------------
  Lines         Copies       Function name
  -----         ------       -------------
  45542 (100%)  1250 (100%)  (TOTAL)
   3588 (7.9%)     3 (0.2%)  tonic::client::grpc::Grpc<T>::streaming::{{closure}}
   3320 (7.3%)     3 (0.2%)  tonic::codec::encode::encode::{{closure}}
   1282 (2.8%)     1 (0.1%)  <tonic::codec::decode::Streaming<T> as futures_core::stream::Stream>::poll_next
   1035 (2.3%)    19 (1.5%)  core::result::Result<T,E>::map_err
    997 (2.2%)    21 (1.7%)  <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
    924 (2.0%)    21 (1.7%)  core::pin::Pin<&mut T>::map_unchecked_mut
    872 (1.9%)     2 (0.2%)  tonic::transport::channel::Channel::connect::{{closure}}
    760 (1.7%)    95 (7.6%)  core::pin::Pin<P>::new_unchecked
    720 (1.6%)     2 (0.2%)  streaming_client::pb::echo_client::EchoClient<T>::bidirectional_streaming_echo::{{closure}}
    710 (1.6%)     1 (0.1%)  tonic::codec::decode::Streaming<T>::decode_chunk
    621 (1.4%)     8 (0.6%)  std::thread::local::LocalKey<T>::try_with
    618 (1.4%)     3 (0.2%)  <tonic::codec::encode::EncodeBody<S> as http_body::Body>::poll_trailers
    605 (1.3%)     1 (0.1%)  prost::encoding::decode_varint_slice
    566 (1.2%)    13 (1.0%)  <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
    555 (1.2%)     3 (0.2%)  <tonic::codec::encode::EncodeBody<S> as http_body::Body>::poll_data
    455 (1.0%)     7 (0.6%)  tonic::request::Request<T>::map
    430 (0.9%)     1 (0.1%)  streaming_client::main::{{closure}}
    420 (0.9%)     9 (0.7%)  tonic::codec::encode::encode::{{closure}}::{{closure}}
    417 (0.9%)     1 (0.1%)  tonic::transport::channel::endpoint::Endpoint::connect::{{closure}}
    414 (0.9%)     9 (0.7%)  tonic::client::grpc::Grpc<T>::streaming::{{closure}}::{{closure}}
    373 (0.8%)     1 (0.1%)  core::ptr::drop_in_place<tonic::codec::encode::encode<tonic::codec::prost::ProstEncoder<streaming_client::pb::EchoRequest>,futures_util::stream::stream::map::Map<futures_util::stream::once::Once<futures_util::future::ready::Ready<streaming_client::pb::EchoRequest>>,core::result::Result<streaming_client::pb::EchoRequest,tonic::status::Status>::Ok>>::{{closure}}>
    373 (0.8%)     1 (0.1%)  core::ptr::drop_in_place<tonic::codec::encode::encode<tonic::codec::prost::ProstEncoder<streaming_client::pb::EchoRequest>,futures_util::stream::stream::map::Map<tokio_stream::stream_ext::throttle::Throttle<tokio_stream::stream_ext::map::Map<tokio_stream::iter::Iter<core::ops::range::Range<usize>>,streaming_client::echo_requests_iter::{{closure}}>>,core::result::Result<streaming_client::pb::EchoRequest,tonic::status::Status>::Ok>>::{{closure}}>
    367 (0.8%)     1 (0.1%)  streaming_client::pb::echo_client::EchoClient<T>::server_streaming_echo::{{closure}}

After

 Lines         Copies       Function name
  -----         ------       -------------
  35425 (100%)  1074 (100%)  (TOTAL)
   1587 (4.5%)     3 (0.3%)  tonic::client::grpc::Grpc<T>::streaming::{{closure}}
   1035 (2.9%)    19 (1.8%)  core::result::Result<T,E>::map_err
   1024 (2.9%)    21 (2.0%)  <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
    924 (2.6%)    21 (2.0%)  core::pin::Pin<&mut T>::map_unchecked_mut
    912 (2.6%)     6 (0.6%)  tonic::codec::encode::encode::{{closure}}::{{closure}}
    872 (2.5%)     2 (0.2%)  tonic::transport::channel::Channel::connect::{{closure}}
    720 (2.0%)     2 (0.2%)  streaming_client::pb::echo_client::EchoClient<T>::bidirectional_streaming_echo::{{closure}}
    688 (1.9%)    86 (8.0%)  core::pin::Pin<P>::new_unchecked
    605 (1.7%)     1 (0.1%)  prost::encoding::decode_varint_slice
    591 (1.7%)     3 (0.3%)  <futures_util::stream::try_stream::and_then::AndThen<St,Fut,F> as futures_core::stream::Stream>::poll_next
    586 (1.7%)    13 (1.2%)  <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
    582 (1.6%)     3 (0.3%)  <tonic::codec::encode::EncodeBody<S> as http_body::Body>::poll_data
    455 (1.3%)     7 (0.7%)  tonic::request::Request<T>::map
    430 (1.2%)     1 (0.1%)  streaming_client::main::{{closure}}
    420 (1.2%)     9 (0.8%)  tonic::codec::encode::encode::{{closure}}::{{closure}}::{{closure}}
    417 (1.2%)     9 (0.8%)  tonic::client::grpc::Grpc<T>::streaming::{{closure}}::{{closure}}
    417 (1.2%)     1 (0.1%)  tonic::transport::channel::endpoint::Endpoint::connect::{{closure}}
    391 (1.1%)     5 (0.5%)  std::thread::local::LocalKey<T>::try_with
    367 (1.0%)     1 (0.1%)  streaming_client::pb::echo_client::EchoClient<T>::server_streaming_echo::{{closure}}
    347 (1.0%)     1 (0.1%)  tokio::runtime::basic_scheduler::CoreGuard::block_on::{{closure}}
    324 (0.9%)     2 (0.2%)  tonic::transport::service::connection::Connection::connect::{{closure}}
    320 (0.9%)     2 (0.2%)  tokio::park::thread::CachedParkThread::block_on
    314 (0.9%)     4 (0.4%)  tokio::coop::with_budget::{{closure}}
    312 (0.9%)     3 (0.3%)  tonic::codec::encode::encode
    310 (0.9%)     1 (0.1%)  streaming_client::streaming_echo::{{closure}}
    306 (0.9%)     1 (0.1%)  streaming_client::bidirectional_streaming_echo_throttle::{{closure}}
    306 (0.9%)     1 (0.1%)  streaming_client::pb::echo_client::EchoClient<tonic::transport::channel::Channel>::connect::{{closure}}

Closes #849 (supersedes it)

LucioFranco

Thanks for taking this on! It would be interesting to know why these changes reduce the llvm ir bloat.

LucioFranco · 2022-07-15T17:37:58Z

tonic/src/codec/encode.rs

@@ -58,66 +58,62 @@ where
    T: Encoder<Error = Status>,
    U: Stream<Item = Result<T::Item, Status>>,
 {
-    async_stream::stream! {


So if I understand correctly async-stream causes llvm ir bloat? Is this something we can fix in that crate? I personally prefer to maintain this code vs a combinator but I accept that this may be an artifact of using this type of code.

It seems to bloat it a bit more than the combinator. I figured in this case the use of async_stream was a bit unecessary and the combinator was clear enough. I tweaked it to use map instead which removes the closure. Once IntoFuture gets stable in 1.64 and_then could be used without the closure as well (I think).

Marwes · 2022-08-10T09:58:54Z

Thanks for taking this on! It would be interesting to know why these changes reduce the llvm ir bloat.

Every combination of types that these functions gets called with gets a unique instantiation of binary code (monomorphized). So if we can shrink these generic functions, usually be moving code into less/non-generic functions rust/llvm can re-use those new functions for every generic function.

Marwes · 2022-08-10T10:35:40Z

Did a couple of more tweaks to re-use code when the output type is the same.

Marwes · 2022-08-10T11:04:12Z

With the new changes (and using a different/newer compiler) we go from 47011 to 33329 (-29%)

LucioFranco · 2022-08-19T15:40:06Z

Nice! Seems like CI is failing?

Seems to be some weirdness around type inference and `Into` causing the downcasts to fail

LucioFranco · 2022-08-23T18:26:14Z

tonic/src/codec/encode.rs

+    let mut buf = BytesMut::with_capacity(BUFFER_SIZE);
+
+    let compression_encoding = if compression_override == SingleMessageCompressionOverride::Disable
+    {
+        None
+    } else {
+        compression_encoding


So this code used to be executed lazily. Not sure how much it matters in practice, but I wonder if we want to move this stuff down into the closure?

I think for now this is fine, we can follow up if need be.

LucioFranco · 2022-08-23T18:29:05Z

Thanks!

Markus Westerlind added 9 commits June 30, 2022 14:36

refactor: Add a non-generic inner part of Streaming

11a551c

refactor: Factor out the non-generic part of decode_chunk

8debe0a

refactor: Factor out a non-generic part of poll_next

d684a14

refactor: Factor out a non-generic part of poll_next

539b5f4

refactor: Avoid using async_stream when not necessary

6434b6a

refactor: Factor out a non-generic part of encode

cb40df8

refactor: Factor out a less generic part of streaming

d7eeb1f

refactor: Implement Grpc::new in terms of with_origin

adfa597

refactor: Factor out poll_trailers to a non-generic function

97fd87c

LucioFranco reviewed Jul 18, 2022

View reviewed changes

refactor: Remove an unnecessary closure

9a1a22a

Markus Westerlind added 2 commits August 10, 2022 12:27

refactor: Share the output decoding across functions (-2%)

d3a1f52

refactor: Remove an unnecessary closure

4ccd3e5

Marwes force-pushed the monomorph branch from ad7d1b7 to 4ccd3e5 Compare August 10, 2022 10:34

Markus Westerlind added 2 commits August 10, 2022 12:37

refactor: Remove an unnecessary type parameter

1442332

refactor: Extract a function only generic on Encoder in encode

628f9b4

test: Correct the test for from_error

ff20380

Seems to be some weirdness around type inference and `Into` causing the downcasts to fail

LucioFranco approved these changes Aug 23, 2022

View reviewed changes

LucioFranco merged commit 523a550 into hyperium:master Aug 23, 2022

Marwes deleted the monomorph branch August 24, 2022 10:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Reduce how much code gets monomorphized #1032

refactor: Reduce how much code gets monomorphized #1032

Marwes commented Jul 8, 2022 •

edited

Loading

LucioFranco left a comment

LucioFranco Jul 15, 2022

Marwes Aug 10, 2022

Marwes commented Aug 10, 2022

Marwes commented Aug 10, 2022

Marwes commented Aug 10, 2022

LucioFranco commented Aug 19, 2022

LucioFranco Aug 23, 2022

LucioFranco Aug 23, 2022

LucioFranco commented Aug 23, 2022

refactor: Reduce how much code gets monomorphized #1032

refactor: Reduce how much code gets monomorphized #1032

Conversation

Marwes commented Jul 8, 2022 • edited Loading

Motivation

Solution

Before

After

LucioFranco left a comment

Choose a reason for hiding this comment

LucioFranco Jul 15, 2022

Choose a reason for hiding this comment

Marwes Aug 10, 2022

Choose a reason for hiding this comment

Marwes commented Aug 10, 2022

Marwes commented Aug 10, 2022

Marwes commented Aug 10, 2022

LucioFranco commented Aug 19, 2022

LucioFranco Aug 23, 2022

Choose a reason for hiding this comment

LucioFranco Aug 23, 2022

Choose a reason for hiding this comment

LucioFranco commented Aug 23, 2022

Marwes commented Jul 8, 2022 •

edited

Loading