Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move from protobuf to bincode for a wire format #85

Merged
merged 2 commits into from
Mar 27, 2017

Conversation

alexcrichton
Copy link
Contributor

@alexcrichton alexcrichton commented Mar 22, 2017

This commit migrates away from the protobuf crate to instead just working with
bincode on the wire as a serialization format. This is done by leveraging a few
different crates:

  • The bincode and serde_derive crates are used to define serialization
    for Rust structures as well as provide a bincode implementation.
  • The tokio_io::codec::length_delimited module implements framing via length
    prefixes to transform an asynchronous stream of bytes into a literal Stream
    of BytesMut.
  • The tokio_serde_bincode crate is then used to tie it all together, parsing
    these BytesMut as the request/response types of sccache.

Most of the changes here are related to moving away from the protobuf API
throughout the codebase (e.g. has_foo and take_foo) towards a more
rustic-ish API that just uses enums/structs. Overall it felt quite natural (as
one would expect) to just use the raw enum/struct values.

This may not be quite as performant as before but that doesn't really apply to
sccache's use case where perf is hugely dominated by actually compiling and
hashing, so I'm not too too worried about that.

My personal motivation for this is twofold:

  1. Using protobuf was a little clunky throughout the codebase and definitely
    had some sharp edges that felt good to smooth out.
  2. There's currently what I believe some mysterious segfault and/or stray write
    happening in sccache and I'm not sure where. The protobuf crate had a lot
    of unsafe code and in lieu of actually auditing it I figured it'd be good
    to kill two birds with one stone. I have no idea if this fixes my segfault
    problem (I never could reproduce it) but I figured it's worth a shot.

@alexcrichton alexcrichton force-pushed the less-protobuf branch 2 times, most recently from daff2b4 to 51bd77b Compare March 23, 2017 02:10
@luser
Copy link
Contributor

luser commented Mar 23, 2017

Excellent! I've wanted to drop protobuf for serde for a while now (#20). I'm not sure that the specific wire format matters all that much in terms of perf, and being able to packet-sniff the traffic and read the JSON could prove useful.

The only thing I worried about with using JSON was the compile response where a large part of the body is Vec<u8>. Presumably that will blow up to enormous size in JSON? It still may not matter in the grand scheme of things.

@alexcrichton
Copy link
Contributor Author

Oh jeez that's a very good point. It is true that swapping a different serde protocol should be pretty easy and another possible option is bincode which I believe Servo uses (but would be harder to inspect on the wire).

I'll do some testing locally to see what happens.

@alexcrichton
Copy link
Contributor Author

Yeah looks like b"foo" is serialized as [100,111,111], from 3 to 13 bytes. I'll switch to bincode!

@alexcrichton alexcrichton changed the title Move from protobuf to JSON for a wire format Move from protobuf to bincode for a wire format Mar 23, 2017
@alexcrichton
Copy link
Contributor Author

Ok switched over to bincode!

@luser
Copy link
Contributor

luser commented Mar 23, 2017

Wire inspection is not a deal breaker (given that we control both ends of the connection and already have logging), it was just a possible nice-to-have. :)

This is an abstraction of the Tokio stack now and doesn't have much impact on
sccache itself but I figured it'd be good to update a few deps and pick up the
recent version of things!

Plus that and I'd like to start using the length_delimited module soon...
@alexcrichton alexcrichton force-pushed the less-protobuf branch 2 times, most recently from 75196f9 to 07c64aa Compare March 25, 2017 01:49
Copy link
Contributor

@luser luser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you unintentionally removed the compile locally on EOF behavior you recently added, but aside from that this looks great! The code is so much more readable than it was with protobuf.

Feel free to merge after you fix that, I noted a couple of other nits you might want to fix as well.

src/client.rs Outdated
@@ -52,28 +45,29 @@ impl ServerConnection {
}

/// Send `request` to the server, read and return a `ServerResponse`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the doc comment still mentions ServerResponse

src/client.rs Outdated
BigEndian::write_u32(&mut len, bytes.len() as u32);
self.writer.write_all(&len)?;
self.writer.write_all(&bytes)?;
self.writer.flush()?;
trace!("ServerConnection::request: sent request");
self.read_one_response()
}

/// Read a single `ServerResponse` from the server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same nit as above

is.read_raw_varint32()
});
let mut bytes = [0; 4];
self.reader.read_exact(&mut bytes)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would things look nicer overall if we used tokio here instead of blocking IO? Originally I used blocking IO in the client because the mio stuff was so complicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah I think they'd basically be equivalent. As soon as we'd want to do anything else, though, such as timeouts, then we'd definitely want to use Tokio as it'd look quite a bit better.

The tokio version would basically just do the futures equivalent of this, using tokio_io combinators, and a little less ergonomically.

trace!("request_compile: {:?}", compile);
req.set_compile(compile);
let req = Request::Compile(Compile {
exe: exe.as_ref().to_str().ok_or("bad exe")?.to_owned(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a follow-up, we could now just use OsStr or OsString all the way for the values from the client commandline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I tried to do that but Serialize/Deserialize wasn't implemented for one of them :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually... I'll send a PR!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this is now tracked at serde-rs/serde#823

src/commands.rs Outdated
}
if response.has_retcode() {
let ret = response.get_retcode();
try!(stdout.write_all(&response.stdout));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch these to ? while you're here?

src/commands.rs Outdated
}
}

// Currently the shutdown behavior of the remote sccache server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see that you replaced this behavior anywhere (compile locally on EOF from server). Presumably you do want to keep this in some form?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, updated!

src/server.rs Outdated
match status.code() {
Some(code) => res.retcode = Some(code),
None => res.signal = Some(get_signal(status)),
};
//TODO: sort out getting signal return on Unix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this TODO that you fixed. :)

@@ -718,24 +704,25 @@ impl ServerStats {
fn to_cache_statistics(&self) -> Vec<CacheStatistic> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably do better here now just by defining the CacheStats type to use the CacheStat type for its fields directly, but we can sort that out in a followup.

};
$vec.push(CacheStatistic {
name: String::from($name),
value: CacheStat::String(s),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably should have added a CacheStat::Duration for clarity, eh?

This commit migrates away from the `protobuf` crate to instead just working with
bincode on the wire as a serialization format. This is done by leveraging a few
different crates:

* The `bincode` and `serde_derive` crates are used to define serialization
  for Rust structures as well as provide a bincode implementation.
* The `tokio_io::codec::length_delimited` module implements framing via length
  prefixes to transform an asynchronous stream of bytes into a literal `Stream`
  of `BytesMut`.
* The `tokio_serde_bincode` crate is then used to tie it all together, parsing
  these `BytesMut` as the request/response types of sccache.

Most of the changes here are related to moving away from the protobuf API
throughout the codebase (e.g. `has_foo` and `take_foo`) towards a more
rustic-ish API that just uses enums/structs. Overall it felt quite natural (as
one would expect) to just use the raw enum/struct values.

This may not be quite as performant as before but that doesn't really apply to
sccache's use case where perf is hugely dominated by actually compiling and
hashing, so I'm not too too worried about that.

My personal motivation for this is twofold:

1. Using `protobuf` was a little clunky throughout the codebase and definitely
   had some sharp edges that felt good to smooth out.
2. There's currently what I believe some mysterious segfault and/or stray write
   happening in sccache and I'm not sure where. The `protobuf` crate had a lot
   of `unsafe` code and in lieu of actually auditing it I figured it'd be good
   to kill two birds with one stone. I have no idea if this fixes my segfault
   problem (I never could reproduce it) but I figured it's worth a shot.
@alexcrichton alexcrichton merged commit a0b7cc1 into mozilla:master Mar 27, 2017
@alexcrichton alexcrichton deleted the less-protobuf branch March 27, 2017 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants