Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize repeated field decoding #911

Merged
merged 6 commits into from
Feb 23, 2024
Merged

Conversation

ignatz
Copy link
Contributor

@ignatz ignatz commented Feb 1, 2024

Avoid allocating tear-off closures when decoding repeated but non-packed fields.

Also refactors BuilderInfo._decodeEnum.

Benchmarks

repeated_int64, repeated_string, and repeated_enum are the new benchmarks added with this PR.

fromBuffer is an internal benchmark that decodes a large buffer.

fromBufferUnmodifiableInput is the same as fromBuffer, but the input Uint8List is made unmodifiable.

AOT

Before After Diff
repeated_int64 84,549 us 61,820 us -22,729 us, -25.8%
repeated_string 67,183 us 67,984 us +801 us, +1.19%
repeated_enum 52,335 us 44,772 us -7,563 us, -14.4%
fromBuffer 25,661 us 25,299 us -362 us, -1.4%
fromBufferUnmodifiableInput 26,135 us 25,689 us -446 us, -1.7%

JS

Before After Diff
repeated_int64 112,050 us 109,850 us -2,200 us, -1.9%
repeated_string 123,050 us 126,650 us +3,600 us, +2.9%
repeated_enum 43,920 us 39,400 us -4,520 us, -10.2%
fromBuffer 211,888 us 168,250 us -43,638 us, -20.5%
fromBufferUnmodifiableInput 227,444 us 185,454 us -41,990 us, -18.4%

cl/609281182

@ignatz ignatz force-pushed the faster_repeated_fields branch 2 times, most recently from d557d5d to b41cc30 Compare February 1, 2024 13:49
@osa1 osa1 changed the title ~30 Faster deserialization for packable but unpacked repeated fields by reducing boxing overhead. ~30% Faster deserialization for packable but unpacked repeated fields by reducing boxing overhead. Feb 20, 2024
@osa1
Copy link
Member

osa1 commented Feb 20, 2024

Thanks, this is great. I've also confirmed in some internal benchmarks (not micro benchmarks) that this makes quite a big difference. I will try to review and merge this this week.

Copy link
Member

@osa1 osa1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again, this is great.

While reviewing this I noticed that the wins here are not coming from the cached list, but from avoiding the tear-off closure creating when reading repeated fields, for every element.

Indeed, fs._ensureRepeatedField(...) is just a list access in the common case:

  PbList<T> _ensureRepeatedField<T>(BuilderInfo meta, FieldInfo<T> fi) {
    assert(!_isReadOnly);
    assert(fi.isRepeated);
    if (fi.index == null) {                             // <----- branch not taken in the common case
      return _ensureExtensions()._ensureRepeatedField(fi as Extension<T>);
    }
    final value = _getFieldOrNull(fi);                  // <----- returns non-null after the first element of a list
    if (value != null) return value;
    ...
  }

  dynamic _getFieldOrNull(FieldInfo fi) {
    if (fi.index != null) return _values[fi.index!];    // <----- common case
    return _extensions?._getFieldOrNull(fi as Extension<dynamic>);
  }

So it makes sense that it doesn't make too much difference. (actually it makes my benchmark slower when compiled to JS, but that may be noise, see below for numbers)

The wins here are coming from avoiding the closures in e.g.

      case PbFieldType._REPEATED_BOOL:
        _readPackable(meta, fs, input, wireType, fi, input.readBool);

Here for an unpacked list, we create one tear-off closure (input.readBool) for every element. That's expensive compared to the cost of reading one element. The version in this PR avoids it when the field is not packed:

      case PbFieldType._REPEATED_BOOL:
        final list = cachedList ?? fs._ensureRepeatedField(meta, fi);
        if (wireType == WIRETYPE_LENGTH_DELIMITED) {
          _readPacked(input, () => list.add(input.readBool()));
        } else {
          list.add(input.readBool());           // <----- no closure allocation when not packed
        }

(Allocating a closure when reading packed is fine, as closure allocation is probably a small part of the whole operation, though we could even avoid that if we wanted, similar to enum code in this PR, which doesn't allocate any closures)

We should update the commit message to reflect this. I can do this if you prefer.

Benchmark results

From an internal benchmark.

Master branch

AOT

fromBuffer(RunTime): 26,048 us.
fromBufferUnmodifiableInput(RunTime): 26,596 us.

JS

fromBuffer(RunTime): 214,111 us.
fromBufferUnmodifiableInput(RunTime): 230,777 us.

With cached list

AOT

fromBuffer(RunTime): 24,877 us.
fromBufferUnmodifiableInput(RunTime): 24,864us.

JS

fromBuffer(RunTime): 193,333 us.
fromBufferUnmodifiableInput(RunTime): 192,500 us.

Same patch, but without the cached list

AOT

fromBuffer(RunTime): 25,388 us.
fromBufferUnmodifiableInput(RunTime): 25,073 us.

JS

fromBuffer(RunTime): 171,272 us.
fromBufferUnmodifiableInput(RunTime): 189,000 us.

Notes

Cached list is not that important (actually makes dart2js output slower), the important part is avoiding creating a tear-off closure for every element when reading non-packed repeated fields.

Similar differences can be seen with the benchmark in this PR.

benchmarks/bin/repeated_field.dart Outdated Show resolved Hide resolved
protobuf/lib/src/protobuf/coded_buffer.dart Outdated Show resolved Hide resolved
protobuf/lib/src/protobuf/coded_buffer.dart Outdated Show resolved Hide resolved
@ignatz
Copy link
Contributor Author

ignatz commented Feb 21, 2024

We should update the commit message to reflect this. I can do this if you prefer.

Your findings match my findings. I never intended to claim that the caching has a big impact. My commit message specifically calls out "Boxing overhead", which is my, maybe inadequate lingo, to say I removed a closure.

I'm not sure you wanted to say that it's the allocation itself that is solely responsible for the overhead because I'm not sure that's true.

I left the cache from my experiments because it seemed fairly simple and not entirely pointless. It's not clear to me how it would slow things down unless this is some memory layout weirdness. We can remove it

Lastly, you should always feel free to edit my commit message

…ing overhead.

Also memoize storage lookup from previously deserialized field, though that seems to have only a minor impact since the lookup is fast already.
@ignatz ignatz force-pushed the faster_repeated_fields branch from e6ce1b6 to 68bb70d Compare February 21, 2024 13:16
osa1 added 3 commits February 22, 2024 09:50
It doesn't improve much in AOT, and slows down JS.
@osa1
Copy link
Member

osa1 commented Feb 22, 2024

I'm testing this internally in cl/609281182.

@osa1 osa1 changed the title ~30% Faster deserialization for packable but unpacked repeated fields by reducing boxing overhead. Optimize repeated field decoding Feb 23, 2024
@osa1 osa1 merged commit ef0ab7d into google:master Feb 23, 2024
17 of 18 checks passed
@ignatz
Copy link
Contributor Author

ignatz commented Feb 23, 2024

Awesome. Appreciate your changes. Thanks!

copybara-service bot pushed a commit to dart-lang/sdk that referenced this pull request Feb 27, 2024
Revisions updated by `dart tools/rev_sdk_deps.dart`.

http (https://github.com/dart-lang/http/compare/ce0de37..6e0a46f):
  6e0a46f  2024-02-24  Alex Li  [cronet_http] 🏗️ Use dart-define to determine dependency (dart-lang/http#1111)
  d5cab74  2024-02-23  Alex Li  👷 Update configuration related files (dart-lang/http#1091)
  6873731  2024-02-22  Brian Quinlan  Make it possible for `CronetClient` to own the lifetime of an existing `CronetEngine` (dart-lang/http#1137)

protobuf (https://github.com/dart-lang/protobuf/compare/f085bfd..ef0ab7d):
  ef0ab7d  2024-02-23  Sebastian  Optimize repeated field decoding (google/protobuf.dart#911)

tools (https://github.com/dart-lang/tools/compare/9f4e6a4..c7fbf26):
  c7fbf26  2024-02-21  Elias Yishak  Fallback to current datetime when failing on parse session json file (dart-lang/tools#235)

web (https://github.com/dart-lang/web/compare/975e55c..d96c01d):
  d96c01d  2024-02-23  Srujan Gaddam  Use extension types and generics internally (dart-lang/web#184)

yaml_edit (https://github.com/dart-lang/yaml_edit/compare/82ab64d..54884db):
  54884db  2024-02-23  Devon Carew  update to the latest sdk; update lints (dart-lang/yaml_edit#68)

Change-Id: Ib2cb7971df05f4501f81fe5c04b7eaf88d0d93a3
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/354381
Commit-Queue: Devon Carew <[email protected]>
Reviewed-by: Konstantin Shcheglov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants