-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backport of Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true into release/1.4.x #16660
Closed
hc-github-team-nomad-core
wants to merge
671
commits into
release/1.4.x
from
backport/b-gh-11052/actually-enormous-mammoth
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…` to `_` (#15940) Signed-off-by: dttung2905 <[email protected]>
Bumps [github.com/docker/cli](https://github.com/docker/cli) from 20.10.22+incompatible to 20.10.23+incompatible. - [Release notes](https://github.com/docker/cli/releases) - [Commits](docker/cli@v20.10.22...v20.10.23) --- updated-dependencies: - dependency-name: github.com/docker/cli dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…15847) Bumps [github.com/hashicorp/vault/api](https://github.com/hashicorp/vault) from 1.8.2 to 1.8.3. - [Release notes](https://github.com/hashicorp/vault/releases) - [Changelog](https://github.com/hashicorp/vault/blob/main/CHANGELOG.md) - [Commits](hashicorp/vault@v1.8.2...v1.8.3) --- updated-dependencies: - dependency-name: github.com/hashicorp/vault/api dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
….1 (#15846) Bumps [github.com/brianvoe/gofakeit/v6](https://github.com/brianvoe/gofakeit) from 6.19.0 to 6.20.1. - [Release notes](https://github.com/brianvoe/gofakeit/releases) - [Commits](brianvoe/gofakeit@v6.19.0...v6.20.1) --- updated-dependencies: - dependency-name: github.com/brianvoe/gofakeit/v6 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
… to 20.10.23+incompatible (#15848) * build(deps): bump github.com/docker/docker Bumps [github.com/docker/docker](https://github.com/docker/docker) from 20.10.21+incompatible to 20.10.23+incompatible. - [Release notes](https://github.com/docker/docker/releases) - [Commits](moby/moby@v20.10.21...v20.10.23) --- updated-dependencies: - dependency-name: github.com/docker/docker dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * changelog: add entry for docker/docker --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Seth Hoenig <[email protected]>
* Change api Fields for expose and paths * Add changelog entry * changelog: add deprecation notes about connect fields * api: minor style tweaks --------- Co-authored-by: Seth Hoenig <[email protected]>
* Ensure infra_image gets proper label used for reconciliation Currently infra containers are not cleaned up as part of the dangling container cleanup routine. The reason is that Nomad checks if a container is a Nomad owned container by verifying the existence of the: `com.hashicorp.nomad.alloc_id` label. Ensure we set this label on the infra container as well. * fix unit test * changelog: add entry --------- Co-authored-by: Seth Hoenig <[email protected]>
While you can use any string value for a variable Item's key name using characters that are outside of the set [unicode.Letter, unicode.Number,`_`] will require the `index` function for direct access.
… multiple tags (#15962) * docker: set force=true on remove image to handle images referenced by multiple tags This PR changes our call of docker client RemoveImage() to RemoveImageExtended with the Force=true option set. This fixes a bug where an image referenced by more than one tag could never be garbage collected by Nomad. The Force option only applies to stopped containers; it does not affect running workloads. * docker: add note about image_delay and multiple tags
Prior to 2409f72 the code compared the modification index of a job to itself. Afterwards, the code compared the creation index of the job to itself. In either case there should never be a case of re-parenting of allocs causing the evaluation to trivially always result in false, which leads to unreclaimable memory. Prior to this change allocations and evaluations for batch jobs were never garbage collected until the batch job was explicitly stopped. The new `batch_eval_gc_threshold` server configuration controls how often they are collected. The default threshold is `24h`.
…a documentation (#15963)
* refact: add conditional error handling * test: test conditional logic
This pile was deprecated when we starting using HCP Consul for e2e instead of standing up our own cluster and managing Consuls at test runtime.
The ACL token decoding was not correctly handling time duration syntax such as "1h" which forced people to use the nanosecond representation via the HTTP API. The change adds an unmarshal function which allows this syntax to be used, along with other styles correctly.
* consul: reset consul token on job during registration of a reversion * e2e: add test for reverting a job with a consul service * cl: fixup cl entry
Also allows for default value of `datacenters = ["*"]`
* fix: fix broken test * fix: fix broken test for quota status
* Copyable server and client attribute values * Changelog
* Generate files for 1.5.2 release * Prepare for next release * add 1.4.7 and 1.3.12 to the changelog --------- Co-authored-by: hc-github-team-nomad-core <[email protected]>
#16612) This changeset refactors the tests of the draining node watcher so that we don't mock the node watcher's `Remove` and `Update` methods for its own tests. Instead we'll mock the node watcher's dependencies (the job watcher and deadline notifier) and now unit tests can cover the real code. This allows us to remove a bunch of TODOs in `watch_nodes.go` around testing and clarify some important behaviors: * Nodes that are down or disconnected will still be watched until the scheduler decides what to do with their allocations. This will drive the job watcher but not the node watcher, and that lets the node watcher gracefully handle cases where a heartbeat fails but the node heartbeats again before its allocs can be evicted. * Stop watching nodes that have been deleted. The blocking query for nodes set the maximum index to the highest index of a node it found, rather than the index of the nodes table. This misses updates to the index from deleting nodes. This was done as an performance optimization to avoid excessive unblocking, but because the query is over all nodes anyways there's no optimization to be had here. Remove the optimization so we can detect deleted nodes without having to wait for an update to an unrelated node.
Implement the new `nomad job restart` command that allows operators to restart allocations tasks or reschedule then entire allocation. Restarts can be batched to target multiple allocations in parallel. Between each batch the command can stop and hold for a predefined time or until the user confirms that the process should proceed. This implements the "Stateless Restarts" alternative from the original RFC (https://gist.github.com/schmichael/e0b8b2ec1eb146301175fd87ddd46180). The original concept is still worth implementing, as it allows this functionality to be exposed over an API that can be consumed by the Nomad UI and other clients. But the implementation turned out to be more complex than we initially expected so we thought it would be better to release a stateless CLI-based implementation first to gather feedback and validate the restart behaviour. Co-authored-by: Shishir Mahajan <[email protected]>
When a disconnect client reconnects the `allocReconciler` must find the allocations that were created to replace the original disconnected allocations. This process was being done in only a subset of non-terminal untainted allocations, meaning that, if the replacement allocations were not in this state the reconciler didn't stop them, leaving the job in an inconsistent state. This inconsistency is only solved in a future job evaluation, but at that point the allocation is considered reconnected and so the specific reconnection logic was not applied, leading to unexpected outcomes. This commit fixes the problem by running reconnecting allocation reconciliation logic earlier into the process, leaving the rest of the reconciler oblivious of reconnecting allocations. It also uses the full set of allocations to search for replacements, stopping them even if they are not in the `untainted` set. The system `SystemScheduler` is not affected by this bug because disconnected clients don't trigger replacements: every eligible client is already running an allocation.
…hibit_overlap is true Fixes #11052 When restoring periodic dispatcher, all periodic jobs are forced without checking for previous childre.
29f5258
to
f2a900f
Compare
6ddd97a
to
f2a900f
Compare
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport
This PR is auto-generated from #16583 to be assessed for backporting due to the inclusion of the label backport/1.4.x.
WARNING automatic cherry-pick of commits failed. Commits will require human attention.
The below text is copied from the body of the original PR.
This PR addresses the bug reported on #11052
When a leader change happens, the periodic dispatcher on the new leader starts by re running all periodic jobs by force, without checking if there is an instance of the said job already.
A new check is introduced that skips the job if prohibit_overlap is set and there is already a instance running.