Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingest/tableau): support ingestion of access roles #11157

Conversation

haeniya
Copy link
Contributor

@haeniya haeniya commented Aug 13, 2024

Description

This PR enables the optional ingestion of access roles for the Tableau ingestion. To do this, we fetch the group permissions from the Tableau API, optionally transform it to the users needs and add them as roles to the Tableau asset. Currently, it is only implemented for Workbooks but could be extended to other assets as well.

The configuration in the recipe could look something like this:

source:
  type: tableau
  config:
    # Coordinates
    connect_uri: https://tableau.example.com

    access_role_ingestion:
      enable_workbooks: True
      role_prefix: "R-Tableau-"
      group_substring_start: 29
      role_description: "IAM Role required to access this Tableau asset."
      displayed_capabilities: ["Read", "Write", "Delete"]
      request_url: "https://iam.example.com/accessRequest?role=$ROLE_NAME"
      group_name_pattern:
        allow: [ "^.*_Consumer$" ]

    # Credentials
    username: "${TABLEAU_USER}"
    password: "${TABLEAU_PASSWORD}"

sink:
  type: "datahub-rest"
  config:
    server: "http://localhost:8080"

The access aspect was only available for datasets before. That's why I added it to containers now. The "Access Management" tab should only be visible in the UI if there's an access aspect associated with the container.

Let me know what you think and I will add some tests for this.

Thanks.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

Copy link
Contributor

coderabbitai bot commented Aug 13, 2024

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata product PR or Issue related to the DataHub UI/UX community-contribution PR or Issue raised by member(s) of DataHub Community labels Aug 13, 2024
@@ -264,8 +273,9 @@ def make_tableau_client(self, site: str) -> Server:
},
)

# From https://stackoverflow.com/a/50159273/5004662.
server._session.trust_env = False
if not self.session_trust_env:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was something we discovered along the way. It seems to disable / bypass proxy settings which is not what we want in our case. That's why we made it configurable. Let me know what you think.

@haeniya
Copy link
Contributor Author

haeniya commented Sep 10, 2024

If anyone has an idea regarding the failing tests, please let me know. They mostly don't seem related to my changes. I see linting issues in unity source and failing tests for other sources. Am I missing something?

Copy link
Contributor

@sid-acryl sid-acryl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update tableau connector documentation also for groups are getting ingested as role in DataHub with example of group name in Tableau and corresponding role in DataHub.

description="Use this property if you only need a substring of the group in Tableau to create the role used in the request_url.",
)

role_description: str = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a map here as there might be multiple roles ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For us it's currently enough to just have one generic description for all the roles. This could be extended to enable mapping between maybe a regex pattern for a role name and a role description in the future. But maybe we wait with this until we see the need for it, WDYT?

@haeniya
Copy link
Contributor Author

haeniya commented Sep 11, 2024

Please update tableau connector documentation also for groups are getting ingested as role in DataHub with example of group name in Tableau and corresponding role in DataHub.

@sid-acryl do you mean adding a new section in https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/sources/tableau/tableau_pre.md that explains the new feature of ingesting access roles with some examples?

@sid-acryl
Copy link
Contributor

Please update tableau connector documentation also for groups are getting ingested as role in DataHub with example of group name in Tableau and corresponding role in DataHub.

@sid-acryl do you mean adding a new section in https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/sources/tableau/tableau_pre.md that explains the new feature of ingesting access roles with some examples?

Yup and please also update the Concept Mapping

@haeniya
Copy link
Contributor Author

haeniya commented Sep 12, 2024

I'm currently fixing the tests and adding a new one - just fyi.

Copy link
Contributor

@sid-acryl sid-acryl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets verify if we need any update in Quickstart guide of tableau (https://github.com/datahub-project/datahub/tree/master/docs/quick-ingestion-guides/tableau)

config:
server: "http://localhost:8080"
```
Assuming you have groups in Tableau with names like `AB_XY00-Tableau-Access_A_123_PROJECT_XY_Consumer` or `AB_XY00-Tableau-Access_A_123_PROJECT_XY_Analyst` and corresponding IAM roles like `AR-Tableau-PROJECT_XY_Consumer` or `AR-Tableau-PROJECT_XY_Analyst`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this assumption to above the recipe and mention how to map this information in recipe to achieve correct transformation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sid-acryl I moved it above the recipe and improved the wording. Let me know what you think.

@haeniya
Copy link
Contributor Author

haeniya commented Sep 16, 2024

Lets verify if we need any update in Quickstart guide of tableau (https://github.com/datahub-project/datahub/tree/master/docs/quick-ingestion-guides/tableau)

I'd say we don't need to update this part.

@sid-acryl
Copy link
Contributor

What is the motivation behind below config properties

  • role_prefix
  • group_substring_start
  • group_substring_end

As I tried to describe it in the docs (https://github.com/datahub-project/datahub/pull/11157/files#diff-9a69800355baee2a7495dab637dd6bb46ba1f3be3a6b73bbc7bc70c22b9aa07fR78-R81) those are for transforming the group names in Tableau to what is used as role name in the request link. In our case we have IAM roles that are related to each Tableau group but the names are not exactly the same. These properties are currently enough for us to transform the group names coming from Tableau to their corresponding IAM roles that we want to use in the request link.

Are you referring to AWS IAM roles when you mention IAM?

@haeniya
Copy link
Contributor Author

haeniya commented Oct 7, 2024

What is the motivation behind below config properties

  • role_prefix
  • group_substring_start
  • group_substring_end

As I tried to describe it in the docs (https://github.com/datahub-project/datahub/pull/11157/files#diff-9a69800355baee2a7495dab637dd6bb46ba1f3be3a6b73bbc7bc70c22b9aa07fR78-R81) those are for transforming the group names in Tableau to what is used as role name in the request link. In our case we have IAM roles that are related to each Tableau group but the names are not exactly the same. These properties are currently enough for us to transform the group names coming from Tableau to their corresponding IAM roles that we want to use in the request link.

Are you referring to AWS IAM roles when you mention IAM?

Could be any Identity & Access Management tool. We're using Sailpoint but it doesn't really matter. It doesn't even need to be an IAM tool. Those fields are just there to transform the Tableau groups to what is used in the request link as $ROLE_NAME. Could be any tool. Maybe we remove IAM in the docs to avoid confusion.

Copy link
Contributor

@sid-acryl sid-acryl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the term IAM.
Is it possible for you to test it with other Identity or Cloud Provider (google, azure) ?

@@ -298,6 +308,56 @@ def make_tableau_client(self, site: str) -> Server:
) from e


class AccessRolesIngestionConfig(ConfigModel):
enable_workbooks: bool = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enable would be sufficient here.

Copy link
Contributor Author

@haeniya haeniya Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind enable_workbooks is that we can add more entities in the future like views or dashboards. We could then just add a new config property enable_dashboards or enable_views. WDYT?


request_url: str = Field(
default="",
description="Request URL that is added to the role for requesting access. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per code it looks like $ROLE_NAME is required to substitute the role name in request_url.

Request URL that is added to access role metadata. The request URL must include $ROLE_NAME, For example 'https://example-host.com/order/$ROLE_NAME'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, $ROLE_NAME shouldn't be required. If it's not present it just won't be replaced. Or where do you see an issue?

@haeniya
Copy link
Contributor Author

haeniya commented Oct 7, 2024

Let's remove the term IAM. Is it possible for you to test it with other Identity or Cloud Provider (google, azure) ?

Changed various parts according to your feedback. Please have a look at the remaining comments. To your question - no, unfortunately, I don't have the option to test with other IAM providers. But as already said, it doesn't matter what IAM tool is used. The way access requests work in Datahub currently is that there is a link in the Access Management tab of an asset that redirects the user some external page. My changes are only related to this link that is created for each Tableau group. In theory, any link can be created if the role name can be derived from the Tableau group.

@hsheth2
Copy link
Collaborator

hsheth2 commented Oct 8, 2024

Hey folks, I should've jumped in a bit earlier on this PR.

I'm good with

  • adding access aspect to containers
  • all the relevant UI changes
  • fetching user/group info for workbooks from the tableau API

The piece I am worried about is this - this convention feels a bit too specific to your setup.

Assuming you have groups in Tableau with names such as `AB_XY00-Tableau-Access_A_123_PROJECT_XY_Consumer` or `AB_XY00-Tableau-Access_A_123_PROJECT_XY_Analyst` and corresponding roles (e.g. in an IAM tool) like `AR-Tableau-PROJECT_XY_Consumer` or `AR-Tableau-PROJECT_XY_Analyst`.
Using the recipe below, would filter the groups that end with "_Consumer" and transform the group names into the corresponding roles.
With `group_start_index` and `group_end_index` you can define a substring of the group name to be used as role name and `role_prefix` can be used to add a prefix to the generated role name.
The role names then act as a substitute of `$ROLE_NAME` in the `request_url` and result in access request URLs like `https://iam.example.com/accessRequest?role=AR-Tableau-PROJECT_XY_Consumer`, for example.

If we can conclusively say that this is a standard convention for linking AWS / other IAM setups with Tableau, and that other DataHub users will also benefit from this specific setup, then I'm good with it. But I haven't really heard of this setup from other folks thus far.

At the same time, I do want to figure out how we can make this work for you. I'd personally be in favor of putting the user/group info in the container's custom properties. That way, you could still build a transformer that can convert custom properties into access aspects.

@haeniya
Copy link
Contributor Author

haeniya commented Oct 14, 2024

Hey folks, I should've jumped in a bit earlier on this PR.

I'm good with

  • adding access aspect to containers
  • all the relevant UI changes
  • fetching user/group info for workbooks from the tableau API

The piece I am worried about is this - this convention feels a bit too specific to your setup.

Assuming you have groups in Tableau with names such as `AB_XY00-Tableau-Access_A_123_PROJECT_XY_Consumer` or `AB_XY00-Tableau-Access_A_123_PROJECT_XY_Analyst` and corresponding roles (e.g. in an IAM tool) like `AR-Tableau-PROJECT_XY_Consumer` or `AR-Tableau-PROJECT_XY_Analyst`.
Using the recipe below, would filter the groups that end with "_Consumer" and transform the group names into the corresponding roles.
With `group_start_index` and `group_end_index` you can define a substring of the group name to be used as role name and `role_prefix` can be used to add a prefix to the generated role name.
The role names then act as a substitute of `$ROLE_NAME` in the `request_url` and result in access request URLs like `https://iam.example.com/accessRequest?role=AR-Tableau-PROJECT_XY_Consumer`, for example.

If we can conclusively say that this is a standard convention for linking AWS / other IAM setups with Tableau, and that other DataHub users will also benefit from this specific setup, then I'm good with it. But I haven't really heard of this setup from other folks thus far.

At the same time, I do want to figure out how we can make this work for you. I'd personally be in favor of putting the user/group info in the container's custom properties. That way, you could still build a transformer that can convert custom properties into access aspects.

Hi @hsheth2, as discussed offline, I refactored the code and removed the access aspect and role creation again. The ingestion now only sets the Tableau groups as custom properties. Access and role creation will be done in a transformer. Please review the changes again and let me know what you think. Thanks.

@hsheth2
Copy link
Collaborator

hsheth2 commented Oct 18, 2024

@haeniya feels like we're pretty close here :)

@hsheth2 hsheth2 added the merge-pending-ci A PR that has passed review and should be merged once CI is green. label Oct 21, 2024
@hsheth2 hsheth2 changed the title feat(tableau-ingestion): Enable optional ingestion of access roles feat(tableau): support ingestion of access roles Oct 21, 2024
@hsheth2
Copy link
Collaborator

hsheth2 commented Oct 23, 2024

@haeniya looks like the tests are failing

FAILED tests/integration/tableau/test_tableau_ingest.py::test_group_ingestion - datahub.ingestion.run.pipeline.PipelineInitError: Failed to configure the source (tableau): 1 validation error for TableauConfig
group_ingestion
  extra fields not permitted (type=value_error.extra)

@hsheth2 hsheth2 changed the title feat(tableau): support ingestion of access roles feat(ingest/tableau): support ingestion of access roles Oct 24, 2024
@hsheth2 hsheth2 merged commit 7c8dba4 into datahub-project:master Oct 24, 2024
78 of 80 checks passed
sleeperdeep pushed a commit to sleeperdeep/datahub that referenced this pull request Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata merge-pending-ci A PR that has passed review and should be merged once CI is green. product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants