-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure reproducible output from generate task #119
Conversation
The output of protoc depends on the order of the source files passed to it. This change makes sure that the input files passed to it are always in a consistent order.
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed, please reply here (e.g.
|
Hi, I signed the CLA. |
CLAs look good, thanks! |
@@ -361,7 +361,8 @@ public class GenerateProtoTask extends DefaultTask { | |||
Preconditions.checkState(state == State.FINALIZED, 'doneConfig() has not been called') | |||
|
|||
ToolsLocator tools = project.protobuf.tools | |||
Set<File> protoFiles = inputs.sourceFiles.files | |||
// Sort to make sure files are in a consistent order | |||
List<File> protoFiles = inputs.sourceFiles.files.sort() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still trying to understand what the problem is. This will only affect the order of the proto files passed to the protoc
command line. What output is changed by that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The protoc tool itself generates different output when the files are in different order. It's not functionally different, but you get different bytes as output, which then destroy the up-to-date checks downstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand how the order the files are written would impact up-to-date checks. Can you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What output changes? If the output does change then it sounds like the order in significant and shouldn't be sorted. If it is minor, then maybe we should fix protoc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the descriptor files that can change. In the example I know of, the descriptors were being packaged into a jar which (sometimes) made the jar different.
Here's a self contained example:
#!/bin/bash
declare A="a.proto"
declare B="b.proto"
cat << EOF > $A
syntax = "proto2";
package example;
message A {
required string foo = 1;
}
EOF
cat << EOF > $B
syntax = "proto2";
package example;
message B {
required int32 bar = 1;
}
EOF
declare FIRST_ORDER="$A $B"
declare SECOND_ORDER="$B $A"
protoc --descriptor_set_out=firstOrder.dsc $FIRST_ORDER
protoc --descriptor_set_out=secondOrder.dsc $SECOND_ORDER
protoc --descriptor_set_out=firstOrderAgain.dsc $FIRST_ORDER
protoc --descriptor_set_out=secondOrderAgain.dsc $SECOND_ORDER
md5 firstOrder.dsc firstOrderAgain.dsc
md5 secondOrder.dsc secondOrderAgain.dsc
This should produce something like:
MD5 (firstOrder.dsc) = 358eb874091e607d4a9bc2e3d1d40caa
MD5 (firstOrderAgain.dsc) = 358eb874091e607d4a9bc2e3d1d40caa
MD5 (secondOrder.dsc) = 2c3b519a765702851c80ad65b5034381
MD5 (secondOrderAgain.dsc) = 2c3b519a765702851c80ad65b5034381
The two "first" files will be the same and the two "second" files will be the same. I ran it twice to show that it's not something like the timestamp that makes the files different. Looking at the generated descriptors, it seems like the only difference is the order that protoc saw the input files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense; it is the generated descriptors with generateDescriptorSet
, not the source. The sort seems fine.
@@ -361,7 +361,8 @@ public class GenerateProtoTask extends DefaultTask { | |||
Preconditions.checkState(state == State.FINALIZED, 'doneConfig() has not been called') | |||
|
|||
ToolsLocator tools = project.protobuf.tools | |||
Set<File> protoFiles = inputs.sourceFiles.files | |||
// Sort to make sure files are in a consistent order | |||
List<File> protoFiles = inputs.sourceFiles.files.sort() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense; it is the generated descriptors with generateDescriptorSet
, not the source. The sort seems fine.
@@ -361,7 +361,8 @@ public class GenerateProtoTask extends DefaultTask { | |||
Preconditions.checkState(state == State.FINALIZED, 'doneConfig() has not been called') | |||
|
|||
ToolsLocator tools = project.protobuf.tools | |||
Set<File> protoFiles = inputs.sourceFiles.files | |||
// Sort to make sure files are in a consistent order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment says what you are doing and the obvious implication, but not why. Maybe:
Sort to ensure generated descriptors have a canonical representation to avoid triggering unnecessary rebuilds downstream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I've changed it.
The descriptors file generated by protoc depends on the order of the source files passed to it. This change makes sure that the input files passed to it are always in a consistent order, to ensure generated descriptors have a canonical representation to avoid triggering unnecessary rebuilds downstream.
Expected behavior
GenerateProtoTask
should generate the exact same output from the same inputs.Current behavior
GenerateProtoTask
generate slightly different output when the order of the input files is different. This will prevent tasks depending on the output of theGenerateProtoTask
to be marked asUP-TO-DATE
.