-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(build): remove base-requirements.txt #11238
Conversation
We want the ingestion-base image to be as slim as possible - that means not installing packages that we aren't sure will be used by images built on top of this one. This should also help slim down the datahub-actions container.
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this adds significant build time whenever metadata-ingestion
is changed. Currently this time cost is only paid when the base requirements.txt is updated.
For the slim ingestion image the build runs for 51 minutes and for full it runs 1 hour and 8 minutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit
actionlint
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:11:51: Double quote to prevent globbing and word splitting [shellcheck]
run: | |
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:2:55: Double quote to prevent globbing and word splitting [shellcheck]
run: | |
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:2:45: Double quote to prevent globbing and word splitting [shellcheck]
datahub/.github/workflows/docker-unified.yml
Line 100 in b3727c0
run: | |
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:2:45: Double quote to prevent globbing and word splitting [shellcheck]
datahub/.github/workflows/docker-unified.yml
Line 111 in b3727c0
run: | |
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:1:118: Double quote to prevent globbing and word splitting [shellcheck]
datahub/.github/workflows/docker-unified.yml
Line 552 in b3727c0
run: echo "tag=${{ needs.setup.outputs.ingestion_base_change == 'true' && needs.setup.outputs.unique_tag || 'head' }}" >> $GITHUB_OUTPUT |
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:1:128: Double quote to prevent globbing and word splitting [shellcheck]
datahub/.github/workflows/docker-unified.yml
Line 594 in b3727c0
run: echo "tag=${{ needs.setup.outputs.ingestion_base_change == 'true' && needs.setup.outputs.unique_slim_tag || 'head-slim' }}" >> $GITHUB_OUTPUT |
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:1:123: Double quote to prevent globbing and word splitting [shellcheck]
datahub/.github/workflows/docker-unified.yml
Line 635 in b3727c0
run: echo "tag=${{ needs.setup.outputs.ingestion_base_change == 'true' && needs.setup.outputs.unique_full_tag || 'head' }}" >> $GITHUB_OUTPUT |
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:1:123: Double quote to prevent globbing and word splitting [shellcheck]
datahub/.github/workflows/docker-unified.yml
Line 690 in b3727c0
run: echo "tag=${{ needs.setup.outputs.ingestion_change == 'true' && needs.setup.outputs.unique_slim_tag || 'head-slim' }}" >> $GITHUB_OUTPUT |
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:1:113: Double quote to prevent globbing and word splitting [shellcheck]
datahub/.github/workflows/docker-unified.yml
Line 776 in b3727c0
run: echo "tag=${{ needs.setup.outputs.ingestion_change == 'true' && needs.setup.outputs.unique_tag || 'head' }}" >> $GITHUB_OUTPUT |
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:2:55: Double quote to prevent globbing and word splitting [shellcheck]
datahub/.github/workflows/docker-unified.yml
Line 818 in b3727c0
run: | |
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:4:63: Double quote to prevent globbing and word splitting [shellcheck]
datahub/.github/workflows/docker-unified.yml
Line 818 in b3727c0
run: | |
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:6:95: Double quote to prevent globbing and word splitting [shellcheck]
datahub/.github/workflows/docker-unified.yml
Line 818 in b3727c0
run: | |
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:8:24: Double quote to prevent globbing and word splitting [shellcheck]
datahub/.github/workflows/docker-unified.yml
Line 818 in b3727c0
run: | |
[actionlint] reported by reviewdog 🐶
property "short_sha" is not defined in object type {backend_change: string; backend_only: string; branch_name: string; docker-login: string; elasticsearch_setup_change: string; frontend_change: string; frontend_only: string; full_tag: string; ingestion_base_change: string; ingestion_change: string; ingestion_only: string; kafka_setup_change: string; mysql_setup_change: string; postgres_setup_change: string; pr-publish: string; publish: string; python_release_version: string; repository_name: string; slim_tag: string; smoke_test_change: string; tag: string; unique_full_tag: string; unique_slim_tag: string; unique_tag: string} [expression]
datahub/.github/workflows/docker-unified.yml
Line 1065 in b3727c0
message: '{ "command": "git-sync", "args" : {"repoName": "${{ needs.setup.outputs.repository_name }}", "repoOrg": "${{ github.repository_owner }}", "repoBranch": "${{ needs.setup.outputs.branch_name }}", "repoShaShort": "${{ needs.setup.outputs.short_sha }}" }}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit
actionlint
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:4:63: Double quote to prevent globbing and word splitting [shellcheck]
datahub/.github/workflows/docker-unified.yml
Line 818 in febba8b
run: | |
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:6:95: Double quote to prevent globbing and word splitting [shellcheck]
datahub/.github/workflows/docker-unified.yml
Line 818 in febba8b
run: | |
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:8:24: Double quote to prevent globbing and word splitting [shellcheck]
datahub/.github/workflows/docker-unified.yml
Line 818 in febba8b
run: | |
[actionlint] reported by reviewdog 🐶
property "short_sha" is not defined in object type {backend_change: string; backend_only: string; branch_name: string; docker-login: string; elasticsearch_setup_change: string; frontend_change: string; frontend_only: string; full_tag: string; ingestion_base_change: string; ingestion_change: string; ingestion_only: string; kafka_setup_change: string; mysql_setup_change: string; postgres_setup_change: string; pr-publish: string; publish: string; python_release_version: string; repository_name: string; slim_tag: string; smoke_test_change: string; tag: string; unique_full_tag: string; unique_slim_tag: string; unique_tag: string} [expression]
datahub/.github/workflows/docker-unified.yml
Line 1065 in febba8b
message: '{ "command": "git-sync", "args" : {"repoName": "${{ needs.setup.outputs.repository_name }}", "repoOrg": "${{ github.repository_owner }}", "repoBranch": "${{ needs.setup.outputs.branch_name }}", "repoShaShort": "${{ needs.setup.outputs.short_sha }}" }}' |
@@ -68,23 +72,23 @@ jobs: | |||
id: tag | |||
run: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2129:style:2:1: Consider using { cmd1; cmd2; } >> file instead of individual redirects [shellcheck]
RUN sed -i.bak "s/__version__ = \"1\!0.0.0.dev0\"/__version__ = \"$(echo $RELEASE_VERSION|sed s/-/+/)\"/" src/datahub/__init__.py && \ | ||
sed -i.bak "s/__version__ = \"1\!0.0.0.dev0\"/__version__ = \"$(echo $RELEASE_VERSION|sed s/-/+/)\"/" airflow-plugin/src/datahub_airflow_plugin/__init__.py && \ | ||
cat src/datahub/__init__.py | grep __version__ && \ | ||
cat airflow-plugin/src/datahub_airflow_plugin/__init__.py | grep __version__ | ||
|
||
FROM base AS slim-install | ||
|
||
RUN uv pip install --no-cache -e ".[base,datahub-rest,datahub-kafka,snowflake,bigquery,redshift,mysql,postgres,hive,clickhouse,glue,dbt,looker,lookml,tableau,powerbi,superset,datahub-business-glossary]" | ||
RUN --mount=type=cache,target=/datahub-ingestion/.cache/uv,uid=1000,gid=1000 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious as to why this one is still /datahub-ingestion
while it seems to be changed everywhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the change was separating the metadata-ingestion
source from the user's home directory and the cache is expected in the home directory.
I think the WORKDIR should be restored after install.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup that's it
/datahub-ingestion
is the user's home directory - there's nothing in here other than dotfiles and the uv cache/metadata-ingestion
has the code from the metadata-ingestion directory
I can make it restore the workdir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[actionlint] reported by reviewdog 🐶
property "short_sha" is not defined in object type {backend_change: string; backend_only: string; branch_name: string; docker-login: string; elasticsearch_setup_change: string; frontend_change: string; frontend_only: string; full_tag: string; ingestion_base_change: string; ingestion_change: string; ingestion_only: string; kafka_setup_change: string; mysql_setup_change: string; postgres_setup_change: string; pr-publish: string; publish: string; python_release_version: string; repository_name: string; slim_tag: string; smoke_test_change: string; tag: string; unique_full_tag: string; unique_slim_tag: string; unique_tag: string} [expression]
datahub/.github/workflows/docker-unified.yml
Line 1071 in 16f834e
message: '{ "command": "git-sync", "args" : {"repoName": "${{ needs.setup.outputs.repository_name }}", "repoOrg": "${{ github.repository_owner }}", "repoBranch": "${{ needs.setup.outputs.branch_name }}", "repoShaShort": "${{ needs.setup.outputs.short_sha }}" }}' |
We want the ingestion-base image to be as slim as possible - that means not installing packages that we aren't sure will be used by images built on top of this one. We had previously needed this to improve build times since
pip
was slow, but our migration touv
obviates the need. This should also help slim down the datahub-actions container.This PR also sets up depot.dev. This helps us avoid using qemu for arm emulation, which slows down our builds significantly.
Checklist