-
Notifications
You must be signed in to change notification settings - Fork 459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use heap-based storage only when a Protobuf has a singular transitive recursion #900
Conversation
@@ -506,6 +506,23 @@ fileprivate func hasSingleMessageField(descriptor: Descriptor) -> Bool { | |||
return result | |||
} | |||
|
|||
fileprivate func hasRecursiveSingularField(descriptor: Descriptor, visited: [Descriptor] = []) -> Bool { | |||
var visited = visited |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't default visited
, and make it inout
, can you ensure every node is only checked once? i.e. - since it is a copy, the deeper nodes can get visited multiple times because nothing is shared between those sub visits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomasvl I guess it could be a Set<Descriptor>
. As it stands, it short-circuits on any recursion, but doesn’t share state between branches. Are you thinking about this as an optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a straightforward way to benchmark protoc
for the test protos and compare against master
? I guess that would answer the question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomasvl tried inout visited
and got 19 additional heap-allocated messages:
https://github.com/ydnar/swift-protobuf/commit/11698e3cf60f434646e0e3a808e689fcac26fd8a
Edit: I believe the additional heap allocations are because of non-intersecting paths:
A > B > C
+ A > C
would match as visited, even though there wasn’t recursion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to digest this one a little more, I didn't expect there to be any change in what it decided should use the heap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomasvl thanks for discovering the logic issue. I think you’re right in your example, A
won’t need heap storage, but B
will.
I had originally coded this without a visited
array, just checking if the root message (in this case A
) was referenced by any sub-messages. I’ll try that again.
Unrelated: on my machine (2018 MB Air), using Swift 5.1, the package fails to compile because of a missing default
statement in a generated switch
(generated_swift_names_enum_cases.pb.swift:2605
). I have to manually insert it (and accept the inevitable warnings) in order to get the package to build and tests to run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is hard to tell from the diffs, what is left using storage, and does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The remaining heap allocations look like they makes sense. I’d love to see it stress-tested by someone hitting performance limits with heap allocs like #733.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solving this optimally is tricky: In @thomasvl's example above, we technically only need storage for one of B
, C
, and D
. You don't need all three.
Some quick digging suggests that the optimal answer requires solving something called the "maximal acyclic subgraph" problem. That sounds like more work than I really want to tackle in this PR.
Re the compile failure in the generated file, can you open a sep issue for that (we already saw it on linux, but it hadn't hit macOS hosted builds yet). We can tweak the generated files to dodge it, but the underlying problem can happen in a real proto file if it has enough enum cases, so a complete solution for that will take more thought/work. |
Thanks. The exact error I’m seeing: swift-protobuf/Tests/SwiftProtobufTests/generated_swift_names_enum_cases.pb.swift:1749:5: error: the compiler is unable to check that this switch is exhaustive in reasonable time Does |
- `make regenerate` to rebuild the protos used by the runtime and plugin - `make test-runtime` to verify that the runtime works correctly with the new changes - `make reference` to update the Reference directory - MANUALLY go through `git diff Reference` to verify that the generated Swift changed in the way you expect - `make clean build test` to do a final check Test results: CONFORMANCE SUITE PASSED: 1940 successes, 0 skipped, 0 expected failures, 0 unexpected failures. CONFORMANCE SUITE PASSED: 63 successes, 0 skipped, 0 expected failures, 0 unexpected failures. Fixes #735 and #733.
…age for additional protos
} | ||
|
||
// Skip other visited fields. | ||
if (visited[1...].contains { $0 === messageType }) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, from a performance pov, you don't need this slice. visted[0]
was checked in the conditional above, so just check all of visitited
and avoid making the temp object, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, got it. I’m used to @golang where slicing is cheap and on the stack (shared storage with original slice).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
visited[1...]
returns an ArraySlice
that shares storage (and indices) with its original array, so it's the same in that regard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was there is a call to make the slice (which may or may not have an alloc to its internal layout). So that extra call up front (with potential allocs) is all to avoid one extra ===
check within the actually contains(where:)
loop, so it didn't seem like trying to skip the first one was worth it compare to the cost of that one extra compare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Roger that. Updated in 1dafa90.
.vscode/settings.json
Outdated
{ | ||
"swiftformat.enable": false, | ||
"files.trimTrailingWhitespace": true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gah. Oops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On that note, why doesn’t this repo use swiftformat
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which swiftformat, the new one in the Apple repo? There isn't yet an general style guide the community has adopted, so we haven't bothered to chase one yet (just like different parts of Swift have different formats at the moment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question. We’ve been using the non-Apple swiftformat
, not the new swift-format
. A blessed standard format is one of the best aspects of working with Go.
Any thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into this more deeply this week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might well have missed something, but I think this is "correct" in the sense that it will never choose non-heap storage when heap storage is required. (Performance and finding a more optimal set of storage choices can be iterated later.)
My only real concern with this is whether the larger structs we're generating will cause problems for some people. The best way to find out is to merge this and solicit feedback.
@@ -506,6 +506,23 @@ fileprivate func hasSingleMessageField(descriptor: Descriptor) -> Bool { | |||
return result | |||
} | |||
|
|||
fileprivate func hasRecursiveSingularField(descriptor: Descriptor, visited: [Descriptor] = []) -> Bool { | |||
var visited = visited |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solving this optimally is tricky: In @thomasvl's example above, we technically only need storage for one of B
, C
, and D
. You don't need all three.
Some quick digging suggests that the optimal answer requires solving something called the "maximal acyclic subgraph" problem. That sounds like more work than I really want to tackle in this PR.
I'll go head and merge this now so we can try to get the feedback. |
@ydnar thanks again for the work and patch! |
Hi there—long time reader, first-time PR.
I think this addresses the issues raised in #733 and broken out into #735 by analyzing if a proto has a message field with a transitive recursion.
I broke this out into two major commits: one that introduces the change to
protoc-gen-swift
, and a second commit with the updated generated Swift files. There’s a third commit that undoes a change I needed to build + run the tests on my machine using XCode 11 / Swift 5.1.Following the instructions in the Makefile, all tests including conformance tests, passed.
Is this the right idea?