Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure reproducible output from generate task #119

Merged
merged 2 commits into from
Mar 22, 2017
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -361,7 +361,8 @@ public class GenerateProtoTask extends DefaultTask {
Preconditions.checkState(state == State.FINALIZED, 'doneConfig() has not been called')

ToolsLocator tools = project.protobuf.tools
Set<File> protoFiles = inputs.sourceFiles.files
// Sort to make sure files are in a consistent order
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment says what you are doing and the obvious implication, but not why. Maybe:

Sort to ensure generated descriptors have a canonical representation to avoid triggering unnecessary rebuilds downstream

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've changed it.

List<File> protoFiles = inputs.sourceFiles.files.sort()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still trying to understand what the problem is. This will only affect the order of the proto files passed to the protoc command line. What output is changed by that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protoc tool itself generates different output when the files are in different order. It's not functionally different, but you get different bytes as output, which then destroy the up-to-date checks downstream.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand how the order the files are written would impact up-to-date checks. Can you elaborate?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What output changes? If the output does change then it sounds like the order in significant and shouldn't be sorted. If it is minor, then maybe we should fix protoc?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the descriptor files that can change. In the example I know of, the descriptors were being packaged into a jar which (sometimes) made the jar different.

Here's a self contained example:

#!/bin/bash 

declare A="a.proto"
declare B="b.proto"

cat << EOF > $A
syntax = "proto2";

package example;

message A {
    required string foo = 1;
}
EOF

cat << EOF > $B
syntax = "proto2";

package example;

message B {
    required int32 bar = 1;
}
EOF

declare FIRST_ORDER="$A $B"
declare SECOND_ORDER="$B $A"

protoc --descriptor_set_out=firstOrder.dsc $FIRST_ORDER 
protoc --descriptor_set_out=secondOrder.dsc $SECOND_ORDER
protoc --descriptor_set_out=firstOrderAgain.dsc $FIRST_ORDER
protoc --descriptor_set_out=secondOrderAgain.dsc $SECOND_ORDER

md5 firstOrder.dsc firstOrderAgain.dsc 
md5 secondOrder.dsc secondOrderAgain.dsc 

This should produce something like:
MD5 (firstOrder.dsc) = 358eb874091e607d4a9bc2e3d1d40caa
MD5 (firstOrderAgain.dsc) = 358eb874091e607d4a9bc2e3d1d40caa
MD5 (secondOrder.dsc) = 2c3b519a765702851c80ad65b5034381
MD5 (secondOrderAgain.dsc) = 2c3b519a765702851c80ad65b5034381

The two "first" files will be the same and the two "second" files will be the same. I ran it twice to show that it's not something like the timestamp that makes the files different. Looking at the generated descriptors, it seems like the only difference is the order that protoc saw the input files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense; it is the generated descriptors with generateDescriptorSet, not the source. The sort seems fine.


[builtins, plugins]*.each { plugin ->
File outputDir = new File(getOutputDir(plugin))
Expand Down