Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SHA256 hash of .whl as info output #5908

Merged
merged 5 commits into from
Jun 26, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions news/5908.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Wheel builds will output the final filename and SHA256 hash of .whl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use the imperative tone (see https://pip.pypa.io/en/stable/development/contributing/#news-entries ). Maybe something like this?

Log the final filename and SHA256 hash of a .whl file when done building a wheel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed this with 09e3184

files as an info log entry.
14 changes: 12 additions & 2 deletions src/pip/_internal/wheel.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,15 +68,22 @@ def normpath(src, p):
return os.path.relpath(src, p).replace(os.path.sep, '/')


def rehash(path, blocksize=1 << 20):
# type: (str, int) -> Tuple[str, str]
def hash_file(path, blocksize=1 << 20):
cjerdonek marked this conversation as resolved.
Show resolved Hide resolved
# type: (str, int) -> Tuple[Any, str]
"""Return (hash, length) for path using hashlib.sha256()"""
h = hashlib.sha256()
length = 0
with open(path, 'rb') as f:
for block in read_chunks(f, size=blocksize):
length += len(block)
h.update(block)
return (h, str(length)) # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could return the length as int since rehash uses str(length) and _build_one_inside_env ignores it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good observation. For that matter (but not for this PR), rehash() should probably also be made to return an int! (And the caller can convert to a string as needed.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made hash_file return a int with 53c38d8. I will leave rehash() for now to avoid further confusion :)



def rehash(path, blocksize=1 << 20):
# type: (str, int) -> Tuple[str, str]
"""Return (encoded_digest, length) for path using hashlib.sha256()"""
h, length = hash_file(path, blocksize)
digest = 'sha256=' + urlsafe_b64encode(
h.digest()
).decode('latin1').rstrip('=')
Expand Down Expand Up @@ -885,7 +892,10 @@ def _build_one_inside_env(self, req, output_dir, python_tag=None):
wheel_name = os.path.basename(wheel_path)
dest_path = os.path.join(output_dir, wheel_name)
try:
wheel_hash, _ = hash_file(wheel_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful for readers of the code to give a name to the second variable in the return value. You could add a code comment above the line saying, "We don't need to use the second value." if you would like to communicate that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_length should do, or also output the file size :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have put the length into the output string with a8e382b, so this value is now used

shutil.move(wheel_path, dest_path)
logger.info('Finished: %s sha256=%s',
wheel_name, wheel_hash.hexdigest())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other log messages in your post have the form "Running setup.py bdist_wheel for simple: finished ..." So maybe this could be "Finished wheel for <name>: ..."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this info() message can be merged with the one right after. Maybe the directory path can go on a separate line like it is now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on the format and have changed that with a8e382b

I did leave the following info message (Stored in directory:) somewhat on purpose, because I felt like it might be the type of thing that people had greps for in various ways in existing code. I have made this point explicit with a comment in a8e382b, and additionally added a match for it to the functional test.

The line will become quite long with the full path too; however if you feel strongly the backwards compatibility issues aren't a concern I can change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I don't feel strongly about merging the log messages. However, I'd rather not create any expectation among users (or future maintainers for that matter) that log messages won't or shouldn't change. One reason is that I think we should be open to improving these messages when we can as a matter of UX. Thus, my preference would be for the comment not to be there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line will become quite long with the full path too;

By the way, to clarify, my suggestion was for the second log message to be in the same log message as the first but still be on a separate line (i.e. a newline character would go in the message). Thus, the lengths of lines needn't increase, and in fact the same greps could even work if the text was kept the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus, my preference would be for the comment not to be there.

Removed with 45a6415

By the way, to clarify, my suggestion was for the second log message to be in the same log message as the first but still be on a separate line (i.e. a newline character would go in the message).

Ahh ... yes I wasn't sure about multi-line log strings. This probably comes from some ancient history of mine where I was dealing with syslog() and /dev/msg a lot where multi-line strings are not handled well at all :) Again if you feel strongly happy to convert it to a suggested format

logger.info('Stored in directory: %s', output_dir)
return dest_path
except Exception:
Expand Down
4 changes: 4 additions & 0 deletions tests/functional/test_wheel.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
"""'pip wheel' tests"""
import os
import re
from os.path import exists

import pytest
Expand Down Expand Up @@ -48,6 +49,9 @@ def test_pip_wheel_success(script, data):
)
wheel_file_name = 'simple-3.0-py%s-none-any.whl' % pyversion[0]
wheel_file_path = script.scratch / wheel_file_name
assert re.search(
"Finished: %s sha256=[A-Fa-f0-9]{64}" % re.escape(wheel_file_name),
result.stdout)
assert wheel_file_path in result.files_created, result.stdout
assert "Successfully built simple" in result.stdout, result.stdout

Expand Down