Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(tableau): skip ingesting filtered project assets #12591

Conversation

mihai103
Copy link
Contributor

Issue Description:

Problem: The Tableau asset ingestion process is designed to filter and ingest allowed projects first. Subsequent layer-by-layer asset ingestion incorrectly includes published data sources from projects that should be filtered out if these data sources are upstream of embedded data sources in allowed projects. This results in unauthorized assets being ingested without correct 'browsePaths', as their parent projects are not ingested.

Expected: The ingestion should strictly honor project filters and not ingest any upstream assets from disallowed projects. Assets only in allowed projects should have their 'browsePaths' created and ingested.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Feb 11, 2025
Copy link

codecov bot commented Feb 11, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Files with missing lines Coverage Δ
...on/src/datahub/ingestion/source/tableau/tableau.py 89.02% <100.00%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5ba7767...a2c9852. Read the comment docs.

@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Feb 11, 2025
@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Feb 12, 2025
@sgomezvillamor
Copy link
Contributor

Thanks for raising the issue and the proposal. Overall, it looks good.

As a general comment, wondering if it would be possible to do the filtering upper in the call stack? Without a heavy refactor, of course. Ideally, and not sure if we are very consistent with that, emit methods should be exclusively responsible for generating the aspects and build the MetadataWorkUnit, not doing the filtering; filtering should happens before.

@mihai103 mihai103 closed this Feb 12, 2025
@mihai103 mihai103 reopened this Feb 12, 2025
@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Feb 12, 2025
@mihai103
Copy link
Contributor Author

Thanks for raising the issue and the proposal. Overall, it looks good.

As a general comment, wondering if it would be possible to do the filtering upper in the call stack? Without a heavy refactor, of course. Ideally, and not sure if we are very consistent with that, emit methods should be exclusively responsible for generating the aspects and build the MetadataWorkUnit, not doing the filtering; filtering should happens before.

I agree, but due to the way everything works, by traversing everything from downstream to upstream and the fact that even though published datasource is an upstream for embedded datasources, they still share all the upstreams, I can't think of a way to do it without a heavy refactoring.

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Feb 13, 2025
@sgomezvillamor
Copy link
Contributor

.pre-commit-config.yaml
datahub-web-react/src/app/ingest/source/IngestionSourceList.tsx
metadata-service/iceberg-catalog/build.gradle
metadata-service/iceberg-catalog/pyproject.toml
metadata-service/iceberg-catalog/requirements.txt
metadata-service/iceberg-catalog/src/integrationTest/integration_test.py

Those files may have been accidentally pushed. Maybe some issue during the merge?

Once they are removed from the PR, this is ready to be merged.

mihai103 and others added 2 commits February 13, 2025 10:51
@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Feb 13, 2025
Copy link
Contributor

@sgomezvillamor sgomezvillamor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thanks for the contrib!

@datahub-cyborg datahub-cyborg bot added merge-pending-ci A PR that has passed review and should be merged once CI is green. and removed needs-review Label for PRs that need review from a maintainer. labels Feb 13, 2025
@sgomezvillamor sgomezvillamor changed the title Fix/tableau skip ingesting filtered project assets fix(tableau): skip ingesting filtered project assets Feb 14, 2025
@sgomezvillamor sgomezvillamor merged commit dcf1576 into datahub-project:master Feb 14, 2025
188 of 190 checks passed
ttekampe pushed a commit to ttekampe/datahub that referenced this pull request Feb 14, 2025
…#12591)

Co-authored-by: Mihai Ciocirdel <[email protected]>
Co-authored-by: Jay <[email protected]>
Co-authored-by: Aseem Bansal <[email protected]>
Co-authored-by: Sergio Gómez Villamor <[email protected]>
ksrinath pushed a commit to ksrinath/datahub that referenced this pull request Feb 14, 2025
…#12591)

Co-authored-by: Mihai Ciocirdel <[email protected]>
Co-authored-by: Jay <[email protected]>
Co-authored-by: Aseem Bansal <[email protected]>
Co-authored-by: Sergio Gómez Villamor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata merge-pending-ci A PR that has passed review and should be merged once CI is green.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants