rust: define shared `Commit` type #4351

wchargin · 2020-11-19T02:02:20Z

Summary:
This patch introduces the commit, which is the link between loading
threads and serving threads. The commit structure is designed for
concurrency: many threads can read from it at once, and when the commit
is updated, a loader acquires exclusive access only to its own run. This
improves latency for both readers and writers.

The commit itself doesn’t have much logic; it’s mostly just the correct
composition of substructures, most notably the reservoir basins.

See also the RustBoard technical design doc (#4350) for discussion.

Test Plan:
There’s one method on commit::TimeSeries; it has tests.

wchargin-branch: rust-commit-type

Summary: This module performs conversions from legacy on-disk data formats. It will correspond to both the `data_compat` and `dataclass_compat` modules from Python TensorBoard. For now, we have just one function, to determine the initial summary metadata of a time series. And, for now, we only implement this for scalars, to keep things simple. Test Plan: Unit tests included. wchargin-branch: rust-data-compat wchargin-source: ef510b422b00a66a23efc1406947844fa8d1a798

Summary: Our errors now have nice string formatting, easier `From` conversions, and `std::error::Error` implementations. Test Plan: Included some tests for the `Display` implementations. They’re often not necessary—one benefit of deriving traits is that you can be confident in the implementation without manually testing it. But sometimes, if the format string is non-trivial, it can be nice to actually see the full text written out. wchargin-branch: rust-use-thiserror wchargin-source: 0f8009c3b96619f2eb4649353487dc996b45a712

Summary: We’d defined a `Step` [newtype] in the `reservoir` module, but that type will be useful more broadly. We thus move it into a new `types` module, and define `WallTime` and `Tag` siblings, which we’ll need shortly. [newtype]: https://doc.rust-lang.org/rust-by-example/generics/new_types.html Test Plan: Most behavior is just `#[derive]`d; unit tests included for the rest. wchargin-branch: rust-types-module wchargin-source: b0f5475b247ef8f9cb8bf2e0bf94d2b648b8009e

…rigin/wchargin-rust-use-thiserror' and 'origin/wchargin-rust-types-module' into HEAD

wchargin-branch: rust-run-loader wchargin-source: 35359d6e233139973edac8b2c1440cc48ed4e710

wchargin-branch: rust-data-compat wchargin-source: 467974fa4e0e55dea0872bb1394a768a38811ac5

wchargin-branch: rust-run-loader wchargin-source: d190e3d45c304d976a3fc14a73d99572a6ad377f

Summary: This patch refactors `run`’s `StagePayload` type into an `EventValue` type in `data_compat`. This way, we can define enriching conversions from `EventValue`s to committed values in a layer that’s not specific to the run loader. We also implement `initial_metadata` for the `GraphDef` variant, for symmetry and to make the abstraction clearer. The run loader still doesn’t handle graphs. Test Plan: Unit tests expanded for the new code, and continue to cover the old. wchargin-branch: rust-promote-eventvalue wchargin-source: 96af44556af064cf9ef933154ca3a522a8829798

wchargin-branch: rust-run-loader wchargin-source: 170c77aae53df6716520df779e4838d3fc3b93b9 # Conflicts: # tensorboard/data/server/data_compat.rs

wchargin-branch: rust-run-loader wchargin-source: 170c77aae53df6716520df779e4838d3fc3b93b9

wchargin-branch: rust-promote-eventvalue wchargin-source: 106b4519aba1a966d8273b6b2509f1798131dc5a

Summary: This patch introduces the *commit*, which is the link between loading threads and serving threads. The commit structure is designed for concurrency: many threads can read from it at once, and when the commit is updated, a loader acquires exclusive access only to its own run. This improves latency for both readers and writers. The commit itself doesn’t have much logic; it’s mostly just the correct composition of substructures, most notably the reservoir basins. See also the RustBoard technical design doc (#4350) for discussion. Test Plan: There’s one method on `commit::TimeSeries`; it has tests. wchargin-branch: rust-commit-type wchargin-source: d69b4e0034b68d1d8dadc6aa0f6dc3cc48efccde

wchargin-branch: rust-run-loader wchargin-source: 793ff3e287c2571fd1bc06d5c1940ed3b3bdc796

wchargin-branch: rust-promote-eventvalue wchargin-source: 432923653efb6fe768bdd507acac833672ec3e06

wchargin-branch: rust-commit-type wchargin-source: 1850f510bac1f515fa4b7f73c1ec9a7a47fea283

wchargin-branch: rust-promote-eventvalue wchargin-source: dd7917e2bd9084f8ff048657d94572168a221a7e # Conflicts: # tensorboard/data/server/data_compat.rs # tensorboard/data/server/run.rs

wchargin-branch: rust-promote-eventvalue wchargin-source: dd7917e2bd9084f8ff048657d94572168a221a7e

wchargin-branch: rust-commit-type wchargin-source: 38179a4b9af7538f133ee04f771ac895a8db21a9

wchargin-branch: rust-commit-type wchargin-source: b62b0669a50b5086a1f6af18effe62fbfd1e5431

nfelt · 2020-11-20T20:57:36Z

tensorboard/data/server/types.rs

+        let mut m: HashMap<Run, i32> = HashMap::new();
+        m.insert(Run("train".to_string()), 1);
+        m.insert(Run("test".to_string()), 2);
+        // We can call `get` given only a `&str`, not an owned `Run`.


I think you said on the other PR that Borrow is what enables this. Do you mind explaining the intuition I should use for why that's the case?

Correct. To start, the docs for Borrow have this as an example
case, which I found helpful.

Probably the most foolproof way to see what’s happening is to look at
the type signature for HashMap<K, V>::get:

pub fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V> where K: Borrow<Q>, Q: Hash + Eq,

So here we’re calling get with an &str, so Q = str, so it needs to
be the case that impl Borrow<str> for Run to satisfy the where
constraint. And internally, this calls k.eq(x.0.borrow()) where
(presumably) x.0 is the map-owned key, so it boils down to checking
whether "train" == Run("train").borrow().

Does this help?

Ah, thanks! I didn't read far enough down the Borrow page to see the HashMap example but now it's quite clear, my bad.

No problem! Pointing you to the right docs is a good use of my time. :-)

nfelt · 2020-11-20T21:00:01Z

tensorboard/data/server/commit.rs

+/// tensors and blob sequences will come soon.
+#[derive(Debug, Default)]
+pub struct RunData {
+    /// The time of the first event recorded for this run.


FWIW, I've wanted for a while to opportunistically parse this from the filename so we don't have to open a file for each run in order to serve the initial list of runs, which is especially bad for remote filesystems. WDYT? Wouldn't be implemented here of course, but might affect the semantics we describe for this.

That’d be fine with me, yep. I think that fits naturally into the logdir
loader structure.

nfelt · 2020-11-20T21:19:07Z

tensorboard/data/server/commit.rs

+#[derive(Debug)]
+pub struct TimeSeries<V> {
+    /// Summary metadata for this time series.
+    pub metadata: Box<pb::SummaryMetadata>,


Just wondering, what's the rationale for this being Boxed? The Book suggests 3 main reasons one might use Box and I'm guessing the reason here is "When you have a large amount of data and you want to transfer ownership but ensure the data won’t be copied when you do so" but then looking at SummaryMetadata I just see data that's already on the heap (except one i32): https://cs.opensource.google/tensorflow/tensorboard/+/master:tensorboard/data/server/tensorboard.pb.rs;l=294;drc=34fe0bfd9d41c4e0da175df16cf716436a4ae263

You’re right that it’s not that bad. Here are the ones that I’m
concerned about, with output from rustc -Zprint-type-sizes:

Type size information

print-type-size type: `proto::tensorboard::summary::Value`: 536 bytes, alignment: 8 bytes print-type-size field `.node_name`: 24 bytes print-type-size field `.tag`: 24 bytes print-type-size field `.metadata`: 104 bytes print-type-size field `.value`: 384 bytes

print-type-size type: `proto::tensorboard::TensorProto`: 376 bytes, alignment: 8 bytes print-type-size field `.tensor_shape`: 32 bytes print-type-size field `.tensor_content`: 24 bytes print-type-size field `.half_val`: 24 bytes print-type-size field `.float_val`: 24 bytes print-type-size field `.double_val`: 24 bytes print-type-size field `.int_val`: 24 bytes print-type-size field `.string_val`: 24 bytes print-type-size field `.scomplex_val`: 24 bytes print-type-size field `.int64_val`: 24 bytes print-type-size field `.bool_val`: 24 bytes print-type-size field `.dcomplex_val`: 24 bytes print-type-size field `.resource_handle_val`: 24 bytes print-type-size field `.variant_val`: 24 bytes print-type-size field `.uint32_val`: 24 bytes print-type-size field `.uint64_val`: 24 bytes print-type-size field `.dtype`: 4 bytes print-type-size field `.version_number`: 4 bytes

print-type-size type: `proto::tensorboard::SummaryMetadata`: 104 bytes, alignment: 8 bytes print-type-size field `.plugin_data`: 48 bytes print-type-size field `.display_name`: 24 bytes print-type-size field `.summary_description`: 24 bytes print-type-size field `.data_class`: 4 bytes print-type-size end padding: 4 bytes

So yes, “large amount of data” can definitely go higher than 104 bytes,
but that’s the only reason here. It should also make hash map resizes of
the TagStore a bit faster, though that shouldn’t matter too much.

Boxing all three of these together improved end-to-end performance by
~25%, and I would believe that most of that is due to TensorProto and
Value since those directly affect the reservoir size, but I also don’t
mind doing this. If we want to unbox this, that’s okay with me.

Okay, it was primarily a curiosity question - fine by me to leave it boxed. I didn't realize it'd be 24 bytes per Vec aka per String so it does add up to more than I'd have thought; at least one reason is that in my head I'm still thinking about immutable string types where there isn't an extra 8 bytes for capacity.

I doubt that it makes a big difference in our case, but in general it would seem like some applications with a lot of immutable strings would save space overhead by having an immutable owned string representation with pointer + length only? I guess that must just not come up very often since I don't see much evidence that this exists.

Right: you basically have:

growable vector: Vec<u8>, String; 3 words (ptr, len, cap)

slice reference: &[u8], &str; 2 words (ptr, len)

owned slice: Box<[u8]>, Box<str>; 2 words (ptr, len)

where in all case the string layout is the same as the vector
layout, just with a “valid UTF-8” invariant.

So if you want “immutable owned string with no capacity penalty”, you
can take a Box<str>. And you can actually mutate it, too; you just
can’t resize it without reallocating.

Some protobuf-like APIs use this form: iirc, capnproto arrays aren’t
resizable after you construct them. But that’s a lot less ergonomic when
it comes to code that constructs the things, so I’m okay with prost
choosing to use Vecs and Strings. The TensorProto case is
definitely unfortunate, though. If protobuf had repeated oneofs, we
wouldn’t have this problem, since an enum of a bunch of Vec<_>s is
still just 32 bytes (24 + 8 for discriminant).

Rust also does have fixed-size arrays, so a Box<[u8; 12]> is even
smaller since the size is known at compile time. I was considering using
that for the TFRecord reader header rather than a Vec<u8>, but it’d be
a bit annoying, so would definitely need to be benchmarked.

And, of course, not all languages with immutable strings fit them into 8
or 16 bytes:

>>> import sys >>> sys.getsizeof("") 49

Interesting, I think Box<str> is what I was looking for, and now that you mention it I see there's even https://doc.rust-lang.org/std/string/struct.String.html#method.into_boxed_str.

It hadn't occurred to me since str on its own seems a little exotic, but this explanation helps. What didn't help was the fact that https://doc.rust-lang.org/book/ch04-03-slices.html starts by saying "Another data type that does not have ownership is the slice" when I think what they're really doing is referring to &[_] as a "slice" informally and eliding the distinction between that and [_]. So whether or not a "slice" in the strict meaning of [_] has ownership depends on the type of pointer being used to hold it (namely if it's &, &mut, or Box).

wchargin

Thanks! I think these are all non-blocking, but happy to change the
Box<T> in a follow-up if you’d like.

wchargin · 2020-11-21T00:14:05Z

tensorboard/data/server/types.rs

+        let mut m: HashMap<Run, i32> = HashMap::new();
+        m.insert(Run("train".to_string()), 1);
+        m.insert(Run("test".to_string()), 2);
+        // We can call `get` given only a `&str`, not an owned `Run`.


Correct. To start, the docs for Borrow have this as an example
case, which I found helpful.

Probably the most foolproof way to see what’s happening is to look at
the type signature for HashMap<K, V>::get:

pub fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V> where K: Borrow<Q>, Q: Hash + Eq,

So here we’re calling get with an &str, so Q = str, so it needs to
be the case that impl Borrow<str> for Run to satisfy the where
constraint. And internally, this calls k.eq(x.0.borrow()) where
(presumably) x.0 is the map-owned key, so it boils down to checking
whether "train" == Run("train").borrow().

Does this help?

wchargin · 2020-11-21T00:16:55Z

tensorboard/data/server/commit.rs

+/// tensors and blob sequences will come soon.
+#[derive(Debug, Default)]
+pub struct RunData {
+    /// The time of the first event recorded for this run.


That’d be fine with me, yep. I think that fits naturally into the logdir
loader structure.

wchargin · 2020-11-21T00:35:26Z

tensorboard/data/server/commit.rs

+#[derive(Debug)]
+pub struct TimeSeries<V> {
+    /// Summary metadata for this time series.
+    pub metadata: Box<pb::SummaryMetadata>,


You’re right that it’s not that bad. Here are the ones that I’m
concerned about, with output from rustc -Zprint-type-sizes:

Type size information

print-type-size type: `proto::tensorboard::summary::Value`: 536 bytes, alignment: 8 bytes print-type-size field `.node_name`: 24 bytes print-type-size field `.tag`: 24 bytes print-type-size field `.metadata`: 104 bytes print-type-size field `.value`: 384 bytes

print-type-size type: `proto::tensorboard::TensorProto`: 376 bytes, alignment: 8 bytes print-type-size field `.tensor_shape`: 32 bytes print-type-size field `.tensor_content`: 24 bytes print-type-size field `.half_val`: 24 bytes print-type-size field `.float_val`: 24 bytes print-type-size field `.double_val`: 24 bytes print-type-size field `.int_val`: 24 bytes print-type-size field `.string_val`: 24 bytes print-type-size field `.scomplex_val`: 24 bytes print-type-size field `.int64_val`: 24 bytes print-type-size field `.bool_val`: 24 bytes print-type-size field `.dcomplex_val`: 24 bytes print-type-size field `.resource_handle_val`: 24 bytes print-type-size field `.variant_val`: 24 bytes print-type-size field `.uint32_val`: 24 bytes print-type-size field `.uint64_val`: 24 bytes print-type-size field `.dtype`: 4 bytes print-type-size field `.version_number`: 4 bytes

print-type-size type: `proto::tensorboard::SummaryMetadata`: 104 bytes, alignment: 8 bytes print-type-size field `.plugin_data`: 48 bytes print-type-size field `.display_name`: 24 bytes print-type-size field `.summary_description`: 24 bytes print-type-size field `.data_class`: 4 bytes print-type-size end padding: 4 bytes

So yes, “large amount of data” can definitely go higher than 104 bytes,
but that’s the only reason here. It should also make hash map resizes of
the TagStore a bit faster, though that shouldn’t matter too much.

Boxing all three of these together improved end-to-end performance by
~25%, and I would believe that most of that is due to TensorProto and
Value since those directly affect the reservoir size, but I also don’t
mind doing this. If we want to unbox this, that’s okay with me.

wchargin added 12 commits November 17, 2020 13:39

Merge remote-tracking branches 'origin/wchargin-rust-data-compat', 'o…

5e16ea3

…rigin/wchargin-rust-use-thiserror' and 'origin/wchargin-rust-types-module' into HEAD

[update patch]

997ddfa

wchargin-branch: rust-run-loader wchargin-source: 35359d6e233139973edac8b2c1440cc48ed4e710

[update diffbase]

103ccbd

wchargin-branch: rust-data-compat wchargin-source: 467974fa4e0e55dea0872bb1394a768a38811ac5

[update diffbase]

734d780

wchargin-branch: rust-run-loader wchargin-source: d190e3d45c304d976a3fc14a73d99572a6ad377f

[update diffbase]

76bcca2

wchargin-branch: rust-run-loader wchargin-source: 170c77aae53df6716520df779e4838d3fc3b93b9 # Conflicts: # tensorboard/data/server/data_compat.rs

[update patch]

9ba9726

wchargin-branch: rust-run-loader wchargin-source: 170c77aae53df6716520df779e4838d3fc3b93b9

[update diffbase]

aef800e

wchargin-branch: rust-promote-eventvalue wchargin-source: 106b4519aba1a966d8273b6b2509f1798131dc5a

wchargin added the core:rustboard //tensorboard/data/server/... label Nov 19, 2020

google-cla bot added the cla: yes label Nov 19, 2020

wchargin requested a review from nfelt November 19, 2020 02:04

wchargin added 6 commits November 19, 2020 00:29

[update patch]

532ea1a

wchargin-branch: rust-run-loader wchargin-source: 793ff3e287c2571fd1bc06d5c1940ed3b3bdc796

[update diffbase]

c577558

wchargin-branch: rust-promote-eventvalue wchargin-source: 432923653efb6fe768bdd507acac833672ec3e06

[update diffbase]

8b11660

wchargin-branch: rust-commit-type wchargin-source: 1850f510bac1f515fa4b7f73c1ec9a7a47fea283

[update diffbase]

e7a6f6b

wchargin-branch: rust-promote-eventvalue wchargin-source: dd7917e2bd9084f8ff048657d94572168a221a7e # Conflicts: # tensorboard/data/server/data_compat.rs # tensorboard/data/server/run.rs

[update patch]

b41f47f

wchargin-branch: rust-promote-eventvalue wchargin-source: dd7917e2bd9084f8ff048657d94572168a221a7e

[update diffbase]

3c7e13c

wchargin-branch: rust-commit-type wchargin-source: 38179a4b9af7538f133ee04f771ac895a8db21a9

Base automatically changed from wchargin-rust-promote-eventvalue to master November 19, 2020 23:35

[update diffbase]

ec0cf2c

wchargin-branch: rust-commit-type wchargin-source: b62b0669a50b5086a1f6af18effe62fbfd1e5431

nfelt approved these changes Nov 20, 2020

View reviewed changes

wchargin commented Nov 21, 2020

View reviewed changes

wchargin merged commit aa14272 into master Nov 21, 2020

wchargin deleted the wchargin-rust-commit-type branch November 21, 2020 00:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rust: define shared `Commit` type #4351

rust: define shared `Commit` type #4351

wchargin commented Nov 19, 2020

nfelt Nov 20, 2020

wchargin Nov 21, 2020 •

edited

Loading

nfelt Nov 21, 2020

wchargin Nov 21, 2020

nfelt Nov 20, 2020

wchargin Nov 21, 2020

nfelt Nov 20, 2020

wchargin Nov 21, 2020

nfelt Nov 21, 2020

wchargin Nov 21, 2020 •

edited

Loading

wchargin Nov 21, 2020

nfelt Nov 21, 2020

wchargin left a comment

wchargin Nov 21, 2020 •

edited

Loading

wchargin Nov 21, 2020

wchargin Nov 21, 2020

rust: define shared Commit type #4351

rust: define shared Commit type #4351

Conversation

wchargin commented Nov 19, 2020

Choose a reason for hiding this comment

wchargin Nov 21, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wchargin Nov 21, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wchargin left a comment

Choose a reason for hiding this comment

wchargin Nov 21, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rust: define shared `Commit` type #4351

rust: define shared `Commit` type #4351

wchargin Nov 21, 2020 •

edited

Loading

wchargin Nov 21, 2020 •

edited

Loading

wchargin Nov 21, 2020 •

edited

Loading