Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand locking to entire pipeline #620

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

jameshochadel
Copy link
Contributor

Changes proposed in this pull request:

  • Format pipeline with prettier. Reviewers: Consider reviewing commit-by-commit for a smaller diff.
  • Hold a lock through pipeline execution, until failure or completion

security considerations

None.

@bengerman13
Copy link
Contributor

I'm not sure this does what we want - by my read this change means that:

  • deploy-master-bosh needs to acquire the lock stemcell-lock-pool
  • any job in the pipeline will release that lock if it fails or is aborted
  • deploy-production-bosh will release the lock when it finishes (under any condition)

I think in practice, this just means that once deploy-master-bosh runs, one of the other jobs has to fail, or deploy-production-bosh has to succeed, before deploy-master-bosh can run again.

I think what we want is for only one bosh to deploy at a time - to do that, I think we need to get the lock in each job and release it in each job

@jameshochadel
Copy link
Contributor Author

jameshochadel commented Sep 19, 2024

In practice, doesn't the pipeline ordering make it unlikely that multiple BOSHes are deploying at the same time? From the first linked thread, I thought the problem we were tackling with locks was Concourse & BOSH deploying at the same time — or at least, digging into @cweibel's messages, tooling BOSH and either Concourse deploying at the same time.

(I say "unlikely" — it's certainly not impossible, without locking as you describe. And I'm going off secondhand info so I could be wrong.)

@bengerman13
Copy link
Contributor

bengerman13 commented Sep 19, 2024

In practice, doesn't the pipeline ordering make it unlikely that multiple BOSHes are deploying at the same time?

one would think, but in practice, it seems to happen fairly often that they deploy multiple times back and forth. I think this is because often several triggers will hit around the same time (a new stemcell + a new version of bosh, for example)

Another concern I had in mind in my first comment, but didn't spell out, is that we'll often pause production-bosh for a few different reasons:

  • something seems fishy with a recent release in the lower environments
  • pipeline starts rolling near maintenance person's EOD
  • high-risk day (e.g, we're going to roll out a major release, or we expect to be under extra scrutiny for some reason)

In these cases, I don't think we want to lock everything up until prod bosh rolls or a lower bosh fails. My suggestion above fixes that (but it's not the only fix for it - for example, we could just have an unlock job, or run then immediately abort a job, etc)

I thought the problem we were tackling with locks was Concourse & BOSH deploying at the same time

Concourse and BOSH deploying at the same time is one of the noisier failures, but in a perfect world, I'd like no deployment (e.g., deploy-cf/deploy-cf-development) running when it's parent bosh is deploying (e.g, deploy-bosh/deploy-development-bosh). When this happens, we get noisy-but-harmless deployment failures

@markdboyd
Copy link
Contributor

@bengerman13 @jameshochadel I'd really like to get some form of these changes through to avoid alert fatigue and spurious CI failures

Could we have one lock per BOSH deployment (e.g. tooling, development, staging, prod)? Would that address your concerns @bengerman13 ?

As long as we had a lock for say when staging BOSH is deploying, then I could prevent OpenSearch from trying to deploy at the same time and failing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants