Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure reproducible output from generate task #119

Merged
merged 2 commits into from
Mar 22, 2017
Merged

Conversation

lptr
Copy link
Contributor

@lptr lptr commented Mar 9, 2017

Expected behavior

GenerateProtoTask should generate the exact same output from the same inputs.

Current behavior

GenerateProtoTask generate slightly different output when the order of the input files is different. This will prevent tasks depending on the output of the GenerateProtoTask to be marked as UP-TO-DATE.

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.
The output of protoc depends on the order of the source files passed to it. This change makes sure that the input files passed to it are always in a consistent order.
@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please let us know the company's name.

@lptr
Copy link
Contributor Author

lptr commented Mar 9, 2017

Hi, I signed the CLA.

@googlebot
Copy link

CLAs look good, thanks!

@@ -361,7 +361,8 @@ public class GenerateProtoTask extends DefaultTask {
Preconditions.checkState(state == State.FINALIZED, 'doneConfig() has not been called')

ToolsLocator tools = project.protobuf.tools
Set<File> protoFiles = inputs.sourceFiles.files
// Sort to make sure files are in a consistent order
List<File> protoFiles = inputs.sourceFiles.files.sort()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still trying to understand what the problem is. This will only affect the order of the proto files passed to the protoc command line. What output is changed by that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protoc tool itself generates different output when the files are in different order. It's not functionally different, but you get different bytes as output, which then destroy the up-to-date checks downstream.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand how the order the files are written would impact up-to-date checks. Can you elaborate?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What output changes? If the output does change then it sounds like the order in significant and shouldn't be sorted. If it is minor, then maybe we should fix protoc?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the descriptor files that can change. In the example I know of, the descriptors were being packaged into a jar which (sometimes) made the jar different.

Here's a self contained example:

#!/bin/bash 

declare A="a.proto"
declare B="b.proto"

cat << EOF > $A
syntax = "proto2";

package example;

message A {
    required string foo = 1;
}
EOF

cat << EOF > $B
syntax = "proto2";

package example;

message B {
    required int32 bar = 1;
}
EOF

declare FIRST_ORDER="$A $B"
declare SECOND_ORDER="$B $A"

protoc --descriptor_set_out=firstOrder.dsc $FIRST_ORDER 
protoc --descriptor_set_out=secondOrder.dsc $SECOND_ORDER
protoc --descriptor_set_out=firstOrderAgain.dsc $FIRST_ORDER
protoc --descriptor_set_out=secondOrderAgain.dsc $SECOND_ORDER

md5 firstOrder.dsc firstOrderAgain.dsc 
md5 secondOrder.dsc secondOrderAgain.dsc 

This should produce something like:
MD5 (firstOrder.dsc) = 358eb874091e607d4a9bc2e3d1d40caa
MD5 (firstOrderAgain.dsc) = 358eb874091e607d4a9bc2e3d1d40caa
MD5 (secondOrder.dsc) = 2c3b519a765702851c80ad65b5034381
MD5 (secondOrderAgain.dsc) = 2c3b519a765702851c80ad65b5034381

The two "first" files will be the same and the two "second" files will be the same. I ran it twice to show that it's not something like the timestamp that makes the files different. Looking at the generated descriptors, it seems like the only difference is the order that protoc saw the input files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense; it is the generated descriptors with generateDescriptorSet, not the source. The sort seems fine.

@@ -361,7 +361,8 @@ public class GenerateProtoTask extends DefaultTask {
Preconditions.checkState(state == State.FINALIZED, 'doneConfig() has not been called')

ToolsLocator tools = project.protobuf.tools
Set<File> protoFiles = inputs.sourceFiles.files
// Sort to make sure files are in a consistent order
List<File> protoFiles = inputs.sourceFiles.files.sort()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense; it is the generated descriptors with generateDescriptorSet, not the source. The sort seems fine.

@@ -361,7 +361,8 @@ public class GenerateProtoTask extends DefaultTask {
Preconditions.checkState(state == State.FINALIZED, 'doneConfig() has not been called')

ToolsLocator tools = project.protobuf.tools
Set<File> protoFiles = inputs.sourceFiles.files
// Sort to make sure files are in a consistent order
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment says what you are doing and the obvious implication, but not why. Maybe:

Sort to ensure generated descriptors have a canonical representation to avoid triggering unnecessary rebuilds downstream

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've changed it.

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.
@zhangkun83 zhangkun83 merged commit 8444e98 into google:master Mar 22, 2017
zhangkun83 pushed a commit to zhangkun83/protobuf-gradle-plugin-1 that referenced this pull request Nov 7, 2018
The descriptors file generated by protoc depends on the order of the source files passed to it. This change makes sure that the input files passed to it are always in a consistent order, to ensure generated descriptors have a canonical representation to avoid triggering unnecessary rebuilds downstream.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants