-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize repeated field decoding #911
Conversation
d557d5d
to
b41cc30
Compare
Thanks, this is great. I've also confirmed in some internal benchmarks (not micro benchmarks) that this makes quite a big difference. I will try to review and merge this this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again, this is great.
While reviewing this I noticed that the wins here are not coming from the cached list, but from avoiding the tear-off closure creating when reading repeated fields, for every element.
Indeed, fs._ensureRepeatedField(...)
is just a list access in the common case:
PbList<T> _ensureRepeatedField<T>(BuilderInfo meta, FieldInfo<T> fi) {
assert(!_isReadOnly);
assert(fi.isRepeated);
if (fi.index == null) { // <----- branch not taken in the common case
return _ensureExtensions()._ensureRepeatedField(fi as Extension<T>);
}
final value = _getFieldOrNull(fi); // <----- returns non-null after the first element of a list
if (value != null) return value;
...
}
dynamic _getFieldOrNull(FieldInfo fi) {
if (fi.index != null) return _values[fi.index!]; // <----- common case
return _extensions?._getFieldOrNull(fi as Extension<dynamic>);
}
So it makes sense that it doesn't make too much difference. (actually it makes my benchmark slower when compiled to JS, but that may be noise, see below for numbers)
The wins here are coming from avoiding the closures in e.g.
case PbFieldType._REPEATED_BOOL:
_readPackable(meta, fs, input, wireType, fi, input.readBool);
Here for an unpacked list, we create one tear-off closure (input.readBool
) for every element. That's expensive compared to the cost of reading one element. The version in this PR avoids it when the field is not packed:
case PbFieldType._REPEATED_BOOL:
final list = cachedList ?? fs._ensureRepeatedField(meta, fi);
if (wireType == WIRETYPE_LENGTH_DELIMITED) {
_readPacked(input, () => list.add(input.readBool()));
} else {
list.add(input.readBool()); // <----- no closure allocation when not packed
}
(Allocating a closure when reading packed is fine, as closure allocation is probably a small part of the whole operation, though we could even avoid that if we wanted, similar to enum code in this PR, which doesn't allocate any closures)
We should update the commit message to reflect this. I can do this if you prefer.
Benchmark results
From an internal benchmark.
Master branch
AOT
fromBuffer(RunTime): 26,048 us.
fromBufferUnmodifiableInput(RunTime): 26,596 us.
JS
fromBuffer(RunTime): 214,111 us.
fromBufferUnmodifiableInput(RunTime): 230,777 us.
With cached list
AOT
fromBuffer(RunTime): 24,877 us.
fromBufferUnmodifiableInput(RunTime): 24,864us.
JS
fromBuffer(RunTime): 193,333 us.
fromBufferUnmodifiableInput(RunTime): 192,500 us.
Same patch, but without the cached list
AOT
fromBuffer(RunTime): 25,388 us.
fromBufferUnmodifiableInput(RunTime): 25,073 us.
JS
fromBuffer(RunTime): 171,272 us.
fromBufferUnmodifiableInput(RunTime): 189,000 us.
Notes
Cached list is not that important (actually makes dart2js output slower), the important part is avoiding creating a tear-off closure for every element when reading non-packed repeated fields.
Similar differences can be seen with the benchmark in this PR.
Your findings match my findings. I never intended to claim that the caching has a big impact. My commit message specifically calls out "Boxing overhead", which is my, maybe inadequate lingo, to say I removed a closure. I'm not sure you wanted to say that it's the allocation itself that is solely responsible for the overhead because I'm not sure that's true. I left the cache from my experiments because it seemed fairly simple and not entirely pointless. It's not clear to me how it would slow things down unless this is some memory layout weirdness. We can remove it Lastly, you should always feel free to edit my commit message |
…ing overhead. Also memoize storage lookup from previously deserialized field, though that seems to have only a minor impact since the lookup is fast already.
e6ce1b6
to
68bb70d
Compare
It doesn't improve much in AOT, and slows down JS.
I'm testing this internally in cl/609281182. |
Awesome. Appreciate your changes. Thanks! |
Revisions updated by `dart tools/rev_sdk_deps.dart`. http (https://github.com/dart-lang/http/compare/ce0de37..6e0a46f): 6e0a46f 2024-02-24 Alex Li [cronet_http] 🏗️ Use dart-define to determine dependency (dart-lang/http#1111) d5cab74 2024-02-23 Alex Li 👷 Update configuration related files (dart-lang/http#1091) 6873731 2024-02-22 Brian Quinlan Make it possible for `CronetClient` to own the lifetime of an existing `CronetEngine` (dart-lang/http#1137) protobuf (https://github.com/dart-lang/protobuf/compare/f085bfd..ef0ab7d): ef0ab7d 2024-02-23 Sebastian Optimize repeated field decoding (google/protobuf.dart#911) tools (https://github.com/dart-lang/tools/compare/9f4e6a4..c7fbf26): c7fbf26 2024-02-21 Elias Yishak Fallback to current datetime when failing on parse session json file (dart-lang/tools#235) web (https://github.com/dart-lang/web/compare/975e55c..d96c01d): d96c01d 2024-02-23 Srujan Gaddam Use extension types and generics internally (dart-lang/web#184) yaml_edit (https://github.com/dart-lang/yaml_edit/compare/82ab64d..54884db): 54884db 2024-02-23 Devon Carew update to the latest sdk; update lints (dart-lang/yaml_edit#68) Change-Id: Ib2cb7971df05f4501f81fe5c04b7eaf88d0d93a3 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/354381 Commit-Queue: Devon Carew <[email protected]> Reviewed-by: Konstantin Shcheglov <[email protected]>
Avoid allocating tear-off closures when decoding repeated but non-packed fields.
Also refactors
BuilderInfo._decodeEnum
.Benchmarks
repeated_int64
,repeated_string
, andrepeated_enum
are the new benchmarks added with this PR.fromBuffer
is an internal benchmark that decodes a large buffer.fromBufferUnmodifiableInput
is the same asfromBuffer
, but the inputUint8List
is made unmodifiable.AOT
JS
cl/609281182