Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core): Handle GCS Response retriable errors #2588

Merged
merged 2 commits into from
Aug 7, 2024

Conversation

EmilLuta
Copy link
Contributor

@EmilLuta EmilLuta commented Aug 4, 2024

Response errors that GCS considers retriable are treated as fatal in our codebase. This causes the code to crash, when it could safely retry and continue it's execution.

This PR treats GCS's response errors that are retriable as retriable. Alongside, is_transient has been renamed to is_retriable to better reflect it's usage.

The change will reduce the amount of job crashes and peak execution times seen across prover subsystems (especially in BWGs).

Response errors that GCS considers retriable are treated as fatal in our
codebase. This causes the code to crash, when it could safely retry and
continue it's execution.

This PR treats GCS's response errors that are retriable as retriable.
Alongside, `is_transient` has been renamed to `is_retriable` to better
reflect it's usage.

The change will reduce the amount of job crashes and peak execution
times seen across prover subsystems (especially in BWGs).
@EmilLuta EmilLuta requested review from popzxc and 0xVolosnikov August 5, 2024 13:11
Copy link
Member

@popzxc popzxc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change LGTM, but I would avoid changing the established terminology.
If we want to go this way, I'd expect it to be done consistently across the codebase (though not sure if it's worth it).

core/lib/object_store/src/gcs.rs Show resolved Hide resolved
Copy link
Member

@popzxc popzxc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After merge please announce in the chat that you changed transient to retriable across the codebase to be consistent with GCP terminology and ask everyone to use the new term from now on.

@EmilLuta EmilLuta added this pull request to the merge queue Aug 7, 2024
Merged via the queue into main with commit 4b74092 Aug 7, 2024
47 checks passed
@EmilLuta EmilLuta deleted the evl-better-gcs-error-handling branch August 7, 2024 08:03
github-merge-queue bot pushed a commit that referenced this pull request Aug 7, 2024
🤖 I have created a release *beep* *boop*
---


##
[24.15.0](core-v24.14.0...core-v24.15.0)
(2024-08-07)


### Features

* optimize LWG and NWG
([#2512](#2512))
([0d00650](0d00650))
* Poll the main node API for attestation status - relaxed (BFT-496)
([#2583](#2583))
([b45aa91](b45aa91))
* **zk_toolbox:** allow to run `zk_inception chain create`
non-interactively
([#2579](#2579))
([555fcf7](555fcf7))


### Bug Fixes

* **core:** Handle GCS Response retriable errors
([#2588](#2588))
([4b74092](4b74092))
* **node:** respect namespaces configuration
([#2578](#2578))
([e2d9060](e2d9060))
* **vm-runner:** Fix data race in storage loader
([1810b78](1810b78))


### Reverts

* "feat: Poll the main node for the next batch to sign (BFT-496)"
([#2574](#2574))
([72d3be8](72d3be8))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: zksync-era-bot <[email protected]>
Co-authored-by: Roman Brodetski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants