Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Simplify OpenSearch distribution branching strategy #251

Open
andrross opened this issue Dec 19, 2024 · 17 comments
Open

[PROPOSAL] Simplify OpenSearch distribution branching strategy #251

andrross opened this issue Dec 19, 2024 · 17 comments
Labels
discuss Issues calling for discussion Meta Meta issues serve as top level issues that group lower level changes into one bigger effort. v3.0

Comments

@andrross
Copy link
Member

andrross commented Dec 19, 2024

What/Why

What are you proposing?

We should change the branching strategy used across the OpenSearch Project to track two releases instead of three. We should make this change when we do the next major version release (3.0) of the distribution.

This change applies to all repositories that are part of the OpenSearch distribution (core, dashboards, plugins, etc). Some repositories, such as clients, have independent release cycles and are not impacted by this change.

Background

Currently, we track three releases in parallel:

  • 3.0 (the next major release) on the main branch
  • 2.x (the next minor release) on the 2.x branch
  • 1.3.x (the next maintenance release) on the 1.3 branch

This strategy optimizes for being able to incubate and develop breaking changes for the next major release for an extended period of time, as that code would be on the main development branch. The cost of this strategy is that all non-breaking changes intended for the next minor release require an extra step of backporting to the 2.x branch. The previous 2.5+ years have shown us that we have incubated very few breaking changes on the main branch for an extended period. While we have automation that makes backports to the 2.x branch very low effort for the majority of changes, it is not perfect (e.g. automation hits conflicts requiring manual resolution, flaky tests may fail, it takes time even when everything works perfectly). I believe the cost of these paper cuts, multiplied by thousands of changes across scores of repos, outweigh the benefits of the current approach. I also have no reason to believe things would be different for the 3.x→4.0 cycle than it was for the 2.x→3.0 cycle.

Proposal

The proposal is to track only two releases in parallel. After the 3.0 release, we will have the following branches:

  • main tracking the 3.x version (the next minor release)
  • 2.<last>.x tracking the next maintenance release of the 2.<last> version

At the time of a minor release, we will cut a release branch (e.g. 3.1) from the main branch. Development for the following minor release will continue on main and the 3.1 version will be built from the 3.1 branch.

At the time of a maintenance release, we will cut a release branch (e.g. 2.19.1) from the 2.19.x branch. The maintenance release will then be built from the 2.19.1 branch.

What users have asked for this feature?

This is a change to development and will not impact users. Developers frequently complain about friction introduced by backports.

What problems are you trying to solve?

This proposal is an attempt to reduce development friction by optimizing for what we routinely do (develop new features in a backward compatible way targeted toward the next minor release) at the expense of what we do not do (incubate breaking changes for an extended period in a snapshot build of the next major release).

What is the developer experience going to be?

The vast majority of code changes are intended to be released in the next minor version. With this proposal, such changes will require only one PR to commit the change to the main branch. This is in contrast to the procedure today, which requires committing the change to main and triggering a backport to the 2.x branch.

Are there any security considerations?

No.

Are there any breaking changes to the API

No.

Why should it be built? Any reason not to?

We should do this to improve the developer experience across dozens of repos.

The downside is that we will have no dedicated place to stage future 4.0 changes or routinely build 4.0 snapshot distributions. If someone wants to incubate a breaking change they would need to create a separate branch and regularly rebase it against main. We would have no automated mechanism to build a complete 4.0 distribution containing breaking changes in multiple components.

What will it take to execute?

This is a change in branching convention that will have to be coordinated across all repos that are part of the distribution. Some release tooling in opensearch-build may need to be updated to handle the new convention.

@dblock
Copy link
Member

dblock commented Dec 19, 2024

This is a great proposal, as the amount of backport PRs we did in 2.x is huge.

Will we continue releasing patch releases often such as 3.1.1 after 3.1 before 3.2? Feels like we want that to be reserved to urgent high severity security issues and just do 3.1, 3.2, 3.3, etc. And we won't need a 3.x branch.

@dblock dblock added discuss Issues calling for discussion Meta Meta issues serve as top level issues that group lower level changes into one bigger effort. v3.0 and removed untriaged labels Dec 19, 2024
@andrross
Copy link
Member Author

andrross commented Dec 19, 2024

Will we continue releasing patch releases often such as 3.1.1 after 3.1 before 3.2? Feels like we want that to be reserved to urgent high severity security issues and just do 3.1, 3.2, 3.3, etc.

Patches of the current minor version (e.g. 3.1.1) only happen for high severity issues. If a 3.1.1 release is necessary we would cherry pick the critical fixes on to the 3.1 branch and then do a patch release from that branch. This proposal does not change that policy.

And we won't need a 3.x branch.

That's correct. With this proposal we track two releases which means only two active branches: main and 2.19.x. At the time of any release (either regularly scheduled or a patch) we cut a release branch from the appropriate branch, but we don't continue to maintain release branches after the release.

@msfroh
Copy link

msfroh commented Dec 20, 2024

I have a very minor preference for renaming main to 3.x because it's the branch that we cut 3.x releases from (i.e. 3.1, 3.2, etc.)

At the end of the day, it wouldn't change anything functionally. It would just feel a little closer to "Developing directly on 2.x", which of course is something we decided not to do during the 2.x line (but we're advocating for here).

As I said -- it's a very minor preference. If we decide to keep the name of the branch as main, I still think that avoiding a backport for every single PR (and the main -> 2.x BWC dance) is a great idea.

@peterzhuamazon
Copy link
Member

I am on the same page with Andrew's proposal.

Overall it should simplify the process not only on code developer side, but also on infra and automation as we do not need to maintain a specific next version build that breaks very often due to instability of the code.

Thanks.

@dblock
Copy link
Member

dblock commented Dec 20, 2024

I have a very minor preference for renaming main to 3.x because it's the branch that we cut 3.x releases from (i.e. 3.1, 3.2, etc.)

This would be very surprising to anyone coming to the project and would make the main branch a moving target for dozens of repos.

@msfroh
Copy link

msfroh commented Dec 20, 2024

This would be very surprising to anyone coming to the project and would make the main branch a moving target for dozens of repos.

I don't understand the second clause. I'm suggesting that instead of having a branch named main, it would literally be called 3.x -- nothing moving. Since other repos (mostly?) take a dependency on built artifacts (which are versioned) rather than branches, I don't understand how it would impact them one way or another.

@andrross
Copy link
Member Author

andrross commented Dec 20, 2024

@msfroh I think @dblock is saying it would be surprising that the default branch would be a moving target, because it would change with every major version. The convention is that the name of the default branch is fixed (main or mainline etc) and always represents development on the latest version. @dblock please correct me if I'm wrong.

@dblock
Copy link
Member

dblock commented Dec 20, 2024

@andrross thanks, you said exactly what I was thinking

@peternied
Copy link
Member

@msfroh I think there is value in making it obvious which version number is associated with main and I think it breaks 'standard' convention to remap HEAD to a version'ed branch instead of main as @andrross points out.

Cheap alternative, what would you think of a small piece of automation that kept a 3.x branch in lockstep with main on ever push?


@andrross Thanks starting this discussion - I'm in full support of creating a 3.0.0 release.

In conjunction with this major release revisiting the release methodology and messaging might be a good exercise, as once upon a time we did -alpha releases. At the very least do a pass over the release process and confirm if we are making any changes. I'd love to see a blog post 😄

@msfroh
Copy link

msfroh commented Dec 26, 2024

Cheap alternative, what would you think of a small piece of automation that kept a 3.x branch in lockstep with main on ever push?

I don't think that would be of much benefit to anyone. If I cared enough, I could create a branch named 3.x in my local repository setup to track the main branch from the project whenever I git pull. (Or I could name my local branch whatever I want, like zorgoth_destroyer_of_worlds.) In practice, I'm happy to go along with the consensus of sticking to the main name.

@dbwiddis
Copy link
Member

dbwiddis commented Jan 7, 2025

Hi! I'm here to stir the pot!

OK, seriously. I don't want to cause drama, and I'm generally in agreement with this proposal, but the timing of this proposal is really bad. I don't think there's really been enough time to discuss it (it's been open 3 weeks but mostly over the holidays) and we're looking at locking in a 3.0.0 release schedule this week in #252. This is a pretty significant change and a pretty big blast radius of impacted downstream dependencies.

My general thoughts are that we need to maintain the existing status quo, or have much more frequent major version upgrades. And the latter choice has multiple other impacts that I think need a lot more discussion.

@msfroh
Copy link

msfroh commented Jan 7, 2025

My general thoughts are that we need to maintain the existing status quo, or have much more frequent major version upgrades.

I would argue that the status quo goes hand-in-hand with frequent major version upgrades.

If we're frequently upgrading major versions, then it makes sense to keep a dedicated "next major version" branch and do the backport dance to the "current major version" branch. The existing separation makes it easier to cut the next major version because you're already working towards it.

The proposal here only works because we expect the "next major version" to be so far off that doing the backporting dance for multiple years is not worth the effort to handle a few months of divergence when we decide it's time to plan for the next major version. (Also, there's nothing stopping us from coming back to the current model whenever we see the next major version on the horizon and want to start diverging.)

@dbwiddis
Copy link
Member

dbwiddis commented Jan 7, 2025

If we're frequently upgrading major versions,

My broader point is that if we change to not having multiple branches (status quo) we automatically buy into the increased frequency. Basically we'll cut a release every N weeks and decide at the time of the release whether it's breaking compatibility or not; if so, bump the major version. That said... I see your point here:

Also, there's nothing stopping us from coming back to the current model whenever we see the next major version on the horizon and want to start diverging.

We can choose to do the 4.x/3.x split at that time; so we could just keep working on main as 3.x without any "backport dance" until such time as we have a change that we don't want to put in the next (minor) release... and decide then.

@andrross
Copy link
Member Author

andrross commented Jan 7, 2025

My broader point is that if we change to not having multiple branches (status quo) we automatically buy into the increased frequency. Basically we'll cut a release every N weeks and decide at the time of the release whether it's breaking compatibility or not; if so, bump the major version.

@dbwiddis This is not the model we're aiming for, and I agree with @msfroh that the proposed model is aiming for the opposite because we expect not to work in earnest towards the next major version for multiple years. Can you clarify why you expect this change to "automatically buy into the increased frequency"?

My observation of the last 2.5 years is that when we look at introducing a feature, we either find how to do it in a backward compatible way and put it in the next minor release, or we don't do it. Do you have any specific examples where we put a lot of work into a feature that ended up staged on the main branch for the next major version?

@dbwiddis
Copy link
Member

dbwiddis commented Jan 7, 2025

My observation of the last 2.5 years is that when we look at introducing a feature, we either find how to do it in a backward compatible way and put it in the next minor release, or we don't do it. Do you have any specific examples where we put a lot of work into a feature that ended up staged on the main branch for the next major version?

Not making any judgement on how much work went into these, but see the list of breaking changes at https://github.com/opensearch-project/OpenSearch/blob/main/CHANGELOG-3.0.md

A quick mouse-over of the dates of the linked PRs shows breaking changes every few months.

  • July 2022
  • September 2022
  • October 2022
  • November 2022
  • March 2023
  • July 2023
  • August 2023
  • October 2023
  • January 2024
  • June 2024

Plus the change to JDK21 minimum compatibility that I don't see in the changelog. Plus ongoing efforts on various protocol changes.

My expectation is assuming that if we had this policy in 2022, we would have had multiple major version bumps for at least some of those features. We have delayed many years, precisely because we have a main tracking the version with these breaking changes.

Maybe we've slowed down and we're "done" with this higher rate? Will we implement new protocol changes in a backwards compatible way? Will we never actually get rid of the High Level Rest Client?

If we have no branch tracking 4.x, which I understand this proposal is, we have nowhere to put changes like this. We are forced to, as you say "don't do it".

And then if/when we do decide we want to eventually bump, we have no place with these changes already up and running and tested and built against by plugins, and we have to do a lot of work to catch up on all the breaking changes we wanted to make but didn't want to bump for, all at once, everywhere.

@andrross
Copy link
Member Author

andrross commented Jan 7, 2025

My expectation is assuming that if we had this policy in 2022, we would have had multiple major version bumps for at least some of those features. We have delayed many years, precisely because we have a main tracking the version with these breaking changes.

@dbwiddis I don't think so. None of those changes have delivered value to anyone (and arguably have had the opposite effect because every change is a possibility for backport conflict when implementing a feature that will actually deliver value to users). This proposal would have resulted in not doing those changes until we began preparing for the 3.0 release, which as your rightly say would result in more work at that time, which is the primary downside of this proposal.

Maybe we've slowed down and we're "done" with this higher rate?

I think that list is misleading. I think most changes there are either wrong (they actually were backported), or are very minor changes where the cost of deferring them is small. This is the crux of this proposal though...If my judgement here is wrong then we shouldn't make this change!

Will we implement new protocol changes in a backwards compatible way?

Yes.

Will we never actually get rid of the High Level Rest Client?

Good question! I don't think removal of this is on the 3.0 roadmap.

@dbwiddis
Copy link
Member

dbwiddis commented Jan 7, 2025

Closing the circle on this:

  • as stated earlier, I'm generally in support of this
  • as acknowledged in response to @msfroh we can always defer the idea of a 4.x branch until we actually need one, and then go back to what we currently have; but save all the backport dancing in the middle.
  • my comment about frequency of bumps was mostly speculation, not opposition, and you've clarified that here

So you can count me as a 👍 on this proposal.

The conversation about deprecation does prompt a small concern with the schedule but I'll go over to #252 to ask that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues calling for discussion Meta Meta issues serve as top level issues that group lower level changes into one bigger effort. v3.0
Projects
Status: New
Development

No branches or pull requests

6 participants