-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new RFC for monorepo #31
Conversation
Signed-off-by: Dorian Johnson <[email protected]>
Thanks a lot! |
- amundsencommon | ||
- amudnsendatabuilder | ||
|
||
This RFC proposes merging all of these repositories into a single repo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to add amundsengremlin to the list as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intention for amundsengremlin
to get merged into one of the proxies, or is the long-term intention to leave it as a separate package? I'm not super familiar with the code split there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a point that could have more attention in this rfc. We currently have two approaches for proxy development:
- 'old' proxies like neo4j, elasticsearch or atlas are kept directly in amundsenmetadata and amundsensearch respectively
- 'new' proxies like rds or gremlin are separate git repositories.
Maybe we should unify the approach ? afaik the idea with having proxies in different repositories made contributing easier - you could have a team that would be maintainers only for proxy repo, not core amundsen repos. @feng-tao would be more knowledgeable here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make more extensive use of codeowners files for each package, so we keep the ownership clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposed some language to consider the incubating repos. Open to feedback!
My thought is the separate packages should only be used for incubating, unsupported packages. Once it it officially supported by the project, it should require CR and RFCs from approved maintainers. Otherwise, we will have officially-supported parts of the project that aren't necessarily governed by our normal processes (RFC and CR). If the outside contributor is still making meaningful contributions after the package is landed, we should consider suggesting them as a maintainer (in which case @Golodhros's suggestion of stronger codeowners would make sense -- it might make sense to have that new maintainer start as specialized to the specific area they were previously working on, and if they want to contribute outside of that, we add that on later via codeowner additions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently neptune and RDS backend is introduced as part of RFC process. Though I think it is easier for maintain with different repos. But I wouldn't call them incubating repos. This is same as Atlas which the Lyft engineering team built the whole integration with neo4j while atlas is maintained by @bolkedebruin ING team. Both are officially supported backend in Amundsen for long time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense.
Though I think it is easier for maintain with different package
Can you elaborate on what is made easier here? I'd like to capture the benefits of those packages without the overhead of actually having separate repos, if that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RFC here is to propose mono repo with different packages. For RDS, gremlin based, it seemed to be easier to maintain them separately given the people who developed those are not part of the core committer team if that makes sense. It will help to iterate those features faster without asking existing committer to signoff.
But once the repos/features are mature, we should bring them into same repo each of which to be a new package.
|
||
This RFC proposes merging all of these repositories into a single repo. | ||
|
||
There will be no change to the built artifacts. Users who use PyPi packages or Docker images will have no change to their deployments. Users with forks or submodule based customization will have to make minor changes to their CI/CD pipelines to pull from the correct directories, but the required changes will be minor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Smooth change management, nice !
However, with current structure, when someone is using the submodule approach can freely choose versions (even refs) for each product separately (I can use frontend in different version than metadata and search). This change somehow limits that capability. It's worth to explicitly mention it in drawbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, good point, added
- amundsencommon | ||
- amudnsendatabuilder | ||
|
||
This RFC proposes merging all of these repositories into a single repo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a point that could have more attention in this rfc. We currently have two approaches for proxy development:
- 'old' proxies like neo4j, elasticsearch or atlas are kept directly in amundsenmetadata and amundsensearch respectively
- 'new' proxies like rds or gremlin are separate git repositories.
Maybe we should unify the approach ? afaik the idea with having proxies in different repositories made contributing easier - you could have a team that would be maintainers only for proxy repo, not core amundsen repos. @feng-tao would be more knowledgeable here
rfcs/031-monorepo.md
Outdated
|
||
The submodule architecture creates the following issues and is making it hard for companies to adopt Amundsen, and to customize it accordingly. It also causes substantial developer toil in the form of having to create multiple small PRs to merge one | ||
|
||
1. Dependency Management: Python packages are pretty much the same for each proxy, which makes it hard to sync across all the above repositories. As a result most of the time, each proxy is running its own version of the dependency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another approach (or used alonside changes from this RFC to strengthen the dependency management even more) would be introducing the pip constraints files https://pip.pypa.io/en/stable/user_guide/#constraints-files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are new to me! That's super cool, added.
|
||
and many more... This change will put all the packages in just one place, making practically to keep dependencies similar accross verisons, simply because it will be possible to open a single PR that updates multiple packages. Practically speaking this will improve the status quo slightly. | ||
|
||
1. Development Efforts: Having multiple repositories makes it really hard to implement a feature. Implementation and testing require efforts to synchronize and then code reviews, and finally, all the PRs across multiple repositories need to land in master at a certain time or at the same time. This change results in faster development, one PR to fix a bug or a feature, no dependency hell as we are facing today, hence attract more contributors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a question - do we have any hard data (like results of questionnaires) to back the validity of those issues ? or is it more the result of interacting with users and observation ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't -- we did the call for feedback in December or January from community members but only got a handful of detailed responses. This is from our interactions with many community members (both longer-time installs) as well as people new to the project. Both groups are impacted, but in different ways.
Signed-off-by: Dorian Johnson <[email protected]>
Signed-off-by: Dorian Johnson <[email protected]>
as long as different package / artifacts are maintained within single monorepo, I am +1 for the change. But it would be good for @allisonsuarez , @dkunitsk , @jinhyukchang @youngyjd backend ICs from Lyft to take a look and see anything related to existing integrations. |
* Add new RFC for monorepo Signed-off-by: Dorian Johnson <[email protected]> * monorepo: add constraints files, acknowledge a drawback Signed-off-by: Dorian Johnson <[email protected]> * monorepo: speak to incubating repos Signed-off-by: Dorian Johnson <[email protected]> Signed-off-by: Allison Suarez Miranda <[email protected]>
This is a continuation / down-scoping of #30. From that discussion, it seems that there is clear support for moving to a mono repo. We'd like to figure out the details of that change and get that rolling and defer the other things that were bundled in the prior RFC.