-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upload git info with metadata file (#1) #37802
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
This stack of pull requests is managed by Graphite. Learn more about stacking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes because I'd like your opinion on these suggestions:
- Create a separate step in the publish pipeline which would set the generated fields in
metadata.yaml
- Remove the git info retrieval logic from the metadata service lib and make
airbyte-ci
pass the git info. It will remove the need of mounting a git repo to the metadata service lib container and keep a single source of truth about the current git info (stored in the PipelineContext object).
Also something which bothers me a bit is that the checked in metadata file will be different from the one uploaded to GCS... We could make the publish pipeline commit and push to master... But what if we did not set these field in metadata.yaml
and just generate them at registry generation?
|
||
metadata, error = validate_and_load(metadata_file_path, POST_UPLOAD_VALIDATORS, validator_opts) | ||
metadata_file_path = _apply_modifications_to_metadata_file(metadata_file_path, validator_opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would find it more explicit and maintainable to decouple metadata mutation from metadata upload. Wdyt about create a separate step in publish which would perform this apply_modifications_to_metadata_file
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I dont think I like the logic living in the pipeline.
Simply because right now we have metadata logic living already in two places
- Metadata service (validation and upload)
- Orchestrator (Parsing and generation)
Currently they are at least contained in the metadata_service
folder
But if we add the git logic of
for a given metadata file get the author of the last commit that modified
Then we end up with 3 locations to look for logic, 1 of which is now outside the area in the codebase
|
||
commit_sha: Optional[str] = Field( | ||
None, | ||
description="The git commit sha of the last commit that modified this file. DO NOT DEFINE THIS FIELD IN YOUR SPEC. It will be overwritten by the CI.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by SPEC in this context? Do you mean the metadata.yaml
file?
Sounds confusing to me as spec
also relates to the connector spec command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good catch. Mistype Ill change to "metadata file"
@@ -82,6 +82,7 @@ def __init__( | |||
title=title, | |||
context=context, | |||
paths_to_mount=[ | |||
MountPath(GIT_DIRECTORY_ROOT_PATH), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mounting .git
is quite an expensive operation (depending on how many branches the local repo has).
As we have the git info in the context (self.context.git_branch
, self.context.git_revision
) what do you think about passing these to the step and remove the git logic from the metadata service lib?
It will have two advantages:
- remove the requirement for the metadata service lib to be executed on top of a git repo
- keep a single source of truth about the current git info (stored in the context object)
Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I summed up my thoughts here:
#37802 (comment)
But I wanted to ask, what is the time penalty for mounting git in CI?
I ran locally with a fairly limited number of branches (since I reset my laptop) and it ran decently quick.
My thoughts were it wouldn't be the worst penalty since its during publish and not test
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The time penalty would rather happen locally, on CI we only checkout the current branch + master, which is not too expensive.
My thoughts were it wouldn't be the worst penalty since its during publish and not test
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving as your point about keeping metadata service / orchestrator as self contained as possible sounds reasonable to me.
I'm still concerned about this:
Also something which bothers me a bit is that the checked in metadata file will be different from the one uploaded to GCS... We could make the publish pipeline commit and push to master... But what if we did not set these field in metadata.yaml and just generate them at registry generation?
Can you share your thoughts on it?
@alafanechere Thinking on the "at registry generation" time 🤔 The difficult part is the registry currently has no context as to
If we were to change any of these then we would
So Im not sure this approach would gain us anything. It actually seems like we would have even longer release times then the proposed path of retrieving git commit info during upload. |
7e76b6c
to
219e7b5
Compare
219e7b5
to
5f9a6b9
Compare
What
Adds git commit info to the metadata file during upload.
Spun out of #32715 as a stack