Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rudimentary Bazel Download Retry Logic #495

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

celestialorb
Copy link

I get exceptions on downloading Bazel via Bazelisk quite often and I was surprised to see no retry logic implemented for the download, so I quickly slapped together something basic. Hopefully this'll get the ball rolling on a proper solution.

My initial thought was to use something like tenacity to wrap the download function, but didn't see an easy way of adding a Python dependency to Bazelisk.

There's also the approach of using the native retry handler with urllib, but I'm not sure if that'll work with Python2.

@google-cla
Copy link

google-cla bot commented Sep 8, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@celestialorb celestialorb force-pushed the improvement/download/retry branch 3 times, most recently from fabb809 to 7020954 Compare September 8, 2023 22:25
@celestialorb celestialorb force-pushed the improvement/download/retry branch from 51c786d to e24ba47 Compare September 8, 2023 23:13
bazelisk.py Outdated
pass
if creds is not None:
auth = base64.b64encode(("%s:%s" % (creds[0], creds[2])).encode("ascii"))
request.add_header("Authorization", "Basic %s" % auth.decode("utf-8"))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we construct auth and request once and use them across retries ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why not. I've made the changes but I'm hitting issues testing it locally. Don't think it's related to the changes, seems it's just unable to download a specific Bazel checksum file; specifically https://storage.googleapis.com/bazel-builds/artifacts/ubuntu1404/f3aa4184c0b53864f597ecfb8969938e9a2a8287/bazel.sha256

I'll go ahead and push my changes and then try reverting them locally to see if the old code in this PR works or if this is indeed an issue accessing that resource.

Copy link
Author

@celestialorb celestialorb Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note I also added configuration of the retry logic via environment variables to this PR. If you'd like me to change those variables or how those settings are sourced just let me know.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why not. I've made the changes but I'm hitting issues testing it locally. Don't think it's related to the changes, seems it's just unable to download a specific Bazel checksum file; specifically https://storage.googleapis.com/bazel-builds/artifacts/ubuntu1404/f3aa4184c0b53864f597ecfb8969938e9a2a8287/bazel.sha256

I'll go ahead and push my changes and then try reverting them locally to see if the old code in this PR works or if this is indeed an issue accessing that resource.

Figured it out, didn't realize that for the checksum the outer scope was expecting a 404 HTTPError to be thrown if the checksum doesn't exist for the specific Bazel version.

@akshaysngupta
Copy link

akshaysngupta commented Sep 20, 2023

@ maintainers - can someone take a look at this PR ?

@celestialorb celestialorb force-pushed the improvement/download/retry branch 2 times, most recently from 8150c31 to f8a7f11 Compare October 22, 2023 00:52
@celestialorb
Copy link
Author

celestialorb commented Oct 22, 2023

@akshaysngupta @ maintainers I went ahead and reexamined this one and saw that one of the tests timed out. I pushed an empty commit to retrigger the job and the test successfully passed after that. Should be ready to be merged in if someone could take a look at it.

@celestialorb
Copy link
Author

Should at least help to mitigate #432.

@celestialorb
Copy link
Author

How do we get owner attention on this PR?

@alexeagle
Copy link
Contributor

@meisterT this is a priority for some of my users who are running into CI flakiness due to Bazel download failures. Who can look at this?

@meisterT
Copy link
Member

Let me figure out who can do that. Note that this seems to touch the python version which is lacking behind in features - out of curiosity: why are you not using the golang version?

@meteorcloudy
Copy link
Member

Sorry for delay, but also curious about @meisterT 's question, are you sure the Python version is what you use?

@alexeagle
Copy link
Contributor

In the event we don't hear back from the original author after 2y, I think we would only fix this in the Go implementation. And should probably have a plan to remove the Python implementation or at least mark it obsolete

@celestialorb
Copy link
Author

celestialorb commented Nov 11, 2023

Let me figure out who can do that. Note that this seems to touch the python version which is lacking behind in features - out of curiosity: why are you not using the golang version?

@meisterT Honestly I'm not sure. We were encountering download failures in our CI with Bazelisk and I probably searched for the given error message which lead me to the Python implementation. What's the difference between the two implementations, and why are there two? Is the Golang implementation newer / preferred? If it helps I checked our version of bazelisk in our environment and we're on v1.17.0.

Also, from a quick glance at the Golang code it seems like there's already some rudimentary retry logic for that implementation.

If the Golang implementation is preferred I'll need to talk with the team that maintains this at my organization and ask them why we're using the Python implementation.

@meteorcloudy
Copy link
Member

@celestialorb Yes, the golang version if preferred and I suspect you are already using it since we don't actually have release for the python version. Please check.

@alexeagle I agree there should be only one version. But currently there are at least three?

bazelisk.go
bazelisk.js
bazelisk.py

I have no idea how the js and py version are used. /cc @fweikert

@fweikert
Copy link
Member

The unofficial policy has been to accept community contributions to the Python version, but we (Bazel team) release new features for the Go version only. The JS version seems to be stale and should be removed.

There's quite a backlog of PRs in general due to a lack of time. I hope this is something we can work through this month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants