Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rust: define shared Commit type #4351

Merged
merged 19 commits into from
Nov 21, 2020
Merged

rust: define shared Commit type #4351

merged 19 commits into from
Nov 21, 2020

Conversation

wchargin
Copy link
Contributor

Summary:
This patch introduces the commit, which is the link between loading
threads and serving threads. The commit structure is designed for
concurrency: many threads can read from it at once, and when the commit
is updated, a loader acquires exclusive access only to its own run. This
improves latency for both readers and writers.

The commit itself doesn’t have much logic; it’s mostly just the correct
composition of substructures, most notably the reservoir basins.

See also the RustBoard technical design doc (#4350) for discussion.

Test Plan:
There’s one method on commit::TimeSeries; it has tests.

wchargin-branch: rust-commit-type

Summary:
This module performs conversions from legacy on-disk data formats. It
will correspond to both the `data_compat` and `dataclass_compat` modules
from Python TensorBoard. For now, we have just one function, to
determine the initial summary metadata of a time series. And, for now,
we only implement this for scalars, to keep things simple.

Test Plan:
Unit tests included.

wchargin-branch: rust-data-compat
wchargin-source: ef510b422b00a66a23efc1406947844fa8d1a798
Summary:
Our errors now have nice string formatting, easier `From` conversions,
and `std::error::Error` implementations.

Test Plan:
Included some tests for the `Display` implementations. They’re often not
necessary—one benefit of deriving traits is that you can be confident in
the implementation without manually testing it. But sometimes, if the
format string is non-trivial, it can be nice to actually see the full
text written out.

wchargin-branch: rust-use-thiserror
wchargin-source: 0f8009c3b96619f2eb4649353487dc996b45a712
Summary:
We’d defined a `Step` [newtype] in the `reservoir` module, but that type
will be useful more broadly. We thus move it into a new `types` module,
and define `WallTime` and `Tag` siblings, which we’ll need shortly.

[newtype]: https://doc.rust-lang.org/rust-by-example/generics/new_types.html

Test Plan:
Most behavior is just `#[derive]`d; unit tests included for the rest.

wchargin-branch: rust-types-module
wchargin-source: b0f5475b247ef8f9cb8bf2e0bf94d2b648b8009e
…rigin/wchargin-rust-use-thiserror' and 'origin/wchargin-rust-types-module' into HEAD
wchargin-branch: rust-run-loader
wchargin-source: 35359d6e233139973edac8b2c1440cc48ed4e710
wchargin-branch: rust-data-compat
wchargin-source: 467974fa4e0e55dea0872bb1394a768a38811ac5
wchargin-branch: rust-run-loader
wchargin-source: d190e3d45c304d976a3fc14a73d99572a6ad377f
Summary:
This patch refactors `run`’s `StagePayload` type into an `EventValue`
type in `data_compat`. This way, we can define enriching conversions
from `EventValue`s to committed values in a layer that’s not specific to
the run loader.

We also implement `initial_metadata` for the `GraphDef` variant, for
symmetry and to make the abstraction clearer. The run loader still
doesn’t handle graphs.

Test Plan:
Unit tests expanded for the new code, and continue to cover the old.

wchargin-branch: rust-promote-eventvalue
wchargin-source: 96af44556af064cf9ef933154ca3a522a8829798
wchargin-branch: rust-run-loader
wchargin-source: 170c77aae53df6716520df779e4838d3fc3b93b9

# Conflicts:
#	tensorboard/data/server/data_compat.rs
wchargin-branch: rust-run-loader
wchargin-source: 170c77aae53df6716520df779e4838d3fc3b93b9
wchargin-branch: rust-promote-eventvalue
wchargin-source: 106b4519aba1a966d8273b6b2509f1798131dc5a
Summary:
This patch introduces the *commit*, which is the link between loading
threads and serving threads. The commit structure is designed for
concurrency: many threads can read from it at once, and when the commit
is updated, a loader acquires exclusive access only to its own run. This
improves latency for both readers and writers.

The commit itself doesn’t have much logic; it’s mostly just the correct
composition of substructures, most notably the reservoir basins.

See also the RustBoard technical design doc (#4350) for discussion.

Test Plan:
There’s one method on `commit::TimeSeries`; it has tests.

wchargin-branch: rust-commit-type
wchargin-source: d69b4e0034b68d1d8dadc6aa0f6dc3cc48efccde
@wchargin wchargin added the core:rustboard //tensorboard/data/server/... label Nov 19, 2020
@google-cla google-cla bot added the cla: yes label Nov 19, 2020
@wchargin wchargin requested a review from nfelt November 19, 2020 02:04
wchargin-branch: rust-run-loader
wchargin-source: 793ff3e287c2571fd1bc06d5c1940ed3b3bdc796
wchargin-branch: rust-promote-eventvalue
wchargin-source: 432923653efb6fe768bdd507acac833672ec3e06
wchargin-branch: rust-commit-type
wchargin-source: 1850f510bac1f515fa4b7f73c1ec9a7a47fea283
wchargin-branch: rust-promote-eventvalue
wchargin-source: dd7917e2bd9084f8ff048657d94572168a221a7e

# Conflicts:
#	tensorboard/data/server/data_compat.rs
#	tensorboard/data/server/run.rs
wchargin-branch: rust-promote-eventvalue
wchargin-source: dd7917e2bd9084f8ff048657d94572168a221a7e
wchargin-branch: rust-commit-type
wchargin-source: 38179a4b9af7538f133ee04f771ac895a8db21a9
Base automatically changed from wchargin-rust-promote-eventvalue to master November 19, 2020 23:35
wchargin-branch: rust-commit-type
wchargin-source: b62b0669a50b5086a1f6af18effe62fbfd1e5431
let mut m: HashMap<Run, i32> = HashMap::new();
m.insert(Run("train".to_string()), 1);
m.insert(Run("test".to_string()), 2);
// We can call `get` given only a `&str`, not an owned `Run`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you said on the other PR that Borrow is what enables this. Do you mind explaining the intuition I should use for why that's the case?

Copy link
Contributor Author

@wchargin wchargin Nov 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. To start, the docs for Borrow have this as an example
case, which I found helpful.

Probably the most foolproof way to see what’s happening is to look at
the type signature for HashMap<K, V>::get:

pub fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V>
where
    K: Borrow<Q>,
    Q: Hash + Eq, 

So here we’re calling get with an &str, so Q = str, so it needs to
be the case that impl Borrow<str> for Run to satisfy the where
constraint. And internally, this calls k.eq(x.0.borrow()) where
(presumably) x.0 is the map-owned key, so it boils down to checking
whether "train" == Run("train").borrow().

Does this help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks! I didn't read far enough down the Borrow page to see the HashMap example but now it's quite clear, my bad.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem! Pointing you to the right docs is a good use of my time. :-)

/// tensors and blob sequences will come soon.
#[derive(Debug, Default)]
pub struct RunData {
/// The time of the first event recorded for this run.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I've wanted for a while to opportunistically parse this from the filename so we don't have to open a file for each run in order to serve the initial list of runs, which is especially bad for remote filesystems. WDYT? Wouldn't be implemented here of course, but might affect the semantics we describe for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’d be fine with me, yep. I think that fits naturally into the logdir
loader structure.

#[derive(Debug)]
pub struct TimeSeries<V> {
/// Summary metadata for this time series.
pub metadata: Box<pb::SummaryMetadata>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, what's the rationale for this being Boxed? The Book suggests 3 main reasons one might use Box and I'm guessing the reason here is "When you have a large amount of data and you want to transfer ownership but ensure the data won’t be copied when you do so" but then looking at SummaryMetadata I just see data that's already on the heap (except one i32): https://cs.opensource.google/tensorflow/tensorboard/+/master:tensorboard/data/server/tensorboard.pb.rs;l=294;drc=34fe0bfd9d41c4e0da175df16cf716436a4ae263

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re right that it’s not that bad. Here are the ones that I’m
concerned about, with output from rustc -Zprint-type-sizes:

Type size information
print-type-size type: `proto::tensorboard::summary::Value`: 536 bytes, alignment: 8 bytes
print-type-size     field `.node_name`: 24 bytes
print-type-size     field `.tag`: 24 bytes
print-type-size     field `.metadata`: 104 bytes
print-type-size     field `.value`: 384 bytes
print-type-size type: `proto::tensorboard::TensorProto`: 376 bytes, alignment: 8 bytes
print-type-size     field `.tensor_shape`: 32 bytes
print-type-size     field `.tensor_content`: 24 bytes
print-type-size     field `.half_val`: 24 bytes
print-type-size     field `.float_val`: 24 bytes
print-type-size     field `.double_val`: 24 bytes
print-type-size     field `.int_val`: 24 bytes
print-type-size     field `.string_val`: 24 bytes
print-type-size     field `.scomplex_val`: 24 bytes
print-type-size     field `.int64_val`: 24 bytes
print-type-size     field `.bool_val`: 24 bytes
print-type-size     field `.dcomplex_val`: 24 bytes
print-type-size     field `.resource_handle_val`: 24 bytes
print-type-size     field `.variant_val`: 24 bytes
print-type-size     field `.uint32_val`: 24 bytes
print-type-size     field `.uint64_val`: 24 bytes
print-type-size     field `.dtype`: 4 bytes
print-type-size     field `.version_number`: 4 bytes
print-type-size type: `proto::tensorboard::SummaryMetadata`: 104 bytes, alignment: 8 bytes
print-type-size     field `.plugin_data`: 48 bytes
print-type-size     field `.display_name`: 24 bytes
print-type-size     field `.summary_description`: 24 bytes
print-type-size     field `.data_class`: 4 bytes
print-type-size     end padding: 4 bytes

So yes, “large amount of data” can definitely go higher than 104 bytes,
but that’s the only reason here. It should also make hash map resizes of
the TagStore a bit faster, though that shouldn’t matter too much.

Boxing all three of these together improved end-to-end performance by
~25%, and I would believe that most of that is due to TensorProto and
Value since those directly affect the reservoir size, but I also don’t
mind doing this. If we want to unbox this, that’s okay with me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, it was primarily a curiosity question - fine by me to leave it boxed. I didn't realize it'd be 24 bytes per Vec aka per String so it does add up to more than I'd have thought; at least one reason is that in my head I'm still thinking about immutable string types where there isn't an extra 8 bytes for capacity.

I doubt that it makes a big difference in our case, but in general it would seem like some applications with a lot of immutable strings would save space overhead by having an immutable owned string representation with pointer + length only? I guess that must just not come up very often since I don't see much evidence that this exists.

Copy link
Contributor Author

@wchargin wchargin Nov 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right: you basically have:

  • growable vector: Vec<u8>, String; 3 words (ptr, len, cap)
  • slice reference: &[u8], &str; 2 words (ptr, len)
  • owned slice: Box<[u8]>, Box<str>; 2 words (ptr, len)

where in all case the string layout is the same as the vector
layout
, just with a “valid UTF-8” invariant.

So if you want “immutable owned string with no capacity penalty”, you
can take a Box<str>. And you can actually mutate it, too; you just
can’t resize it without reallocating.

Some protobuf-like APIs use this form: iirc, capnproto arrays aren’t
resizable after you construct them. But that’s a lot less ergonomic when
it comes to code that constructs the things, so I’m okay with prost
choosing to use Vecs and Strings. The TensorProto case is
definitely unfortunate, though. If protobuf had repeated oneofs, we
wouldn’t have this problem, since an enum of a bunch of Vec<_>s is
still just 32 bytes (24 + 8 for discriminant).

Rust also does have fixed-size arrays, so a Box<[u8; 12]> is even
smaller since the size is known at compile time. I was considering using
that for the TFRecord reader header rather than a Vec<u8>, but it’d be
a bit annoying, so would definitely need to be benchmarked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, of course, not all languages with immutable strings fit them into 8
or 16 bytes:

>>> import sys
>>> sys.getsizeof("")
49

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I think Box<str> is what I was looking for, and now that you mention it I see there's even https://doc.rust-lang.org/std/string/struct.String.html#method.into_boxed_str.

It hadn't occurred to me since str on its own seems a little exotic, but this explanation helps. What didn't help was the fact that https://doc.rust-lang.org/book/ch04-03-slices.html starts by saying "Another data type that does not have ownership is the slice" when I think what they're really doing is referring to &[_] as a "slice" informally and eliding the distinction between that and [_]. So whether or not a "slice" in the strict meaning of [_] has ownership depends on the type of pointer being used to hold it (namely if it's &, &mut, or Box).

Copy link
Contributor Author

@wchargin wchargin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think these are all non-blocking, but happy to change the
Box<T> in a follow-up if you’d like.

let mut m: HashMap<Run, i32> = HashMap::new();
m.insert(Run("train".to_string()), 1);
m.insert(Run("test".to_string()), 2);
// We can call `get` given only a `&str`, not an owned `Run`.
Copy link
Contributor Author

@wchargin wchargin Nov 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. To start, the docs for Borrow have this as an example
case, which I found helpful.

Probably the most foolproof way to see what’s happening is to look at
the type signature for HashMap<K, V>::get:

pub fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V>
where
    K: Borrow<Q>,
    Q: Hash + Eq, 

So here we’re calling get with an &str, so Q = str, so it needs to
be the case that impl Borrow<str> for Run to satisfy the where
constraint. And internally, this calls k.eq(x.0.borrow()) where
(presumably) x.0 is the map-owned key, so it boils down to checking
whether "train" == Run("train").borrow().

Does this help?

/// tensors and blob sequences will come soon.
#[derive(Debug, Default)]
pub struct RunData {
/// The time of the first event recorded for this run.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’d be fine with me, yep. I think that fits naturally into the logdir
loader structure.

#[derive(Debug)]
pub struct TimeSeries<V> {
/// Summary metadata for this time series.
pub metadata: Box<pb::SummaryMetadata>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re right that it’s not that bad. Here are the ones that I’m
concerned about, with output from rustc -Zprint-type-sizes:

Type size information
print-type-size type: `proto::tensorboard::summary::Value`: 536 bytes, alignment: 8 bytes
print-type-size     field `.node_name`: 24 bytes
print-type-size     field `.tag`: 24 bytes
print-type-size     field `.metadata`: 104 bytes
print-type-size     field `.value`: 384 bytes
print-type-size type: `proto::tensorboard::TensorProto`: 376 bytes, alignment: 8 bytes
print-type-size     field `.tensor_shape`: 32 bytes
print-type-size     field `.tensor_content`: 24 bytes
print-type-size     field `.half_val`: 24 bytes
print-type-size     field `.float_val`: 24 bytes
print-type-size     field `.double_val`: 24 bytes
print-type-size     field `.int_val`: 24 bytes
print-type-size     field `.string_val`: 24 bytes
print-type-size     field `.scomplex_val`: 24 bytes
print-type-size     field `.int64_val`: 24 bytes
print-type-size     field `.bool_val`: 24 bytes
print-type-size     field `.dcomplex_val`: 24 bytes
print-type-size     field `.resource_handle_val`: 24 bytes
print-type-size     field `.variant_val`: 24 bytes
print-type-size     field `.uint32_val`: 24 bytes
print-type-size     field `.uint64_val`: 24 bytes
print-type-size     field `.dtype`: 4 bytes
print-type-size     field `.version_number`: 4 bytes
print-type-size type: `proto::tensorboard::SummaryMetadata`: 104 bytes, alignment: 8 bytes
print-type-size     field `.plugin_data`: 48 bytes
print-type-size     field `.display_name`: 24 bytes
print-type-size     field `.summary_description`: 24 bytes
print-type-size     field `.data_class`: 4 bytes
print-type-size     end padding: 4 bytes

So yes, “large amount of data” can definitely go higher than 104 bytes,
but that’s the only reason here. It should also make hash map resizes of
the TagStore a bit faster, though that shouldn’t matter too much.

Boxing all three of these together improved end-to-end performance by
~25%, and I would believe that most of that is due to TensorProto and
Value since those directly affect the reservoir size, but I also don’t
mind doing this. If we want to unbox this, that’s okay with me.

@wchargin wchargin merged commit aa14272 into master Nov 21, 2020
@wchargin wchargin deleted the wchargin-rust-commit-type branch November 21, 2020 00:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes core:rustboard //tensorboard/data/server/...
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants