-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingest): extract powerbi endorsements to tags #6638
feat(ingest): extract powerbi endorsements to tags #6638
Conversation
@@ -16,6 +16,8 @@ source: | |||
client_secret: bar | |||
# Enable / Disable ingestion of user information for dashboards | |||
extract_ownership: true | |||
# Enable / Disable ingestion of endorsements | |||
extract_endorsements_to_tags: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can keep it true by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per John's comment, I'll keep this disabled by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree
@@ -633,57 +673,6 @@ def get_pages_by_report( | |||
for raw_instance in response_dict["value"] | |||
] | |||
|
|||
def get_reports( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason to delete this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't called anymore as the reports were parsed from the workspace scan. No other reason, basically the logic moved to the part where the scan result is handled.
The previous implementation was gathering the reports through the REST API, the current the API implementation doesn't include endorsementDetails in the payload, only the workspace scan has that data. That's the main reason why I ended up rewriting the logic.
Highly frustrating to have some data in the workspace scan and some data in the REST API.
@@ -422,6 +443,16 @@ def chart_custom_properties(dashboard: PowerBiAPI.Dashboard) -> dict: | |||
if owner_mcp is not None: | |||
list_of_mcps.append(owner_mcp) | |||
|
|||
if self.__config.extract_endorsements_to_tags and dashboard.tags: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we go for the append method? where append the MCP if flag is enabled to avoid checking at multiple places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meaning something like the tags
in the dataclasses are populated only when extract_endorsements_to_tags==True
?
Then the code would look something like this, right?
if dashboard.tags:
list_of_mcps.append(
self.new_mcp(
Constant.DASHBOARD,
dashboard_urn,
Constant.GLOBAL_TAGS,
self.transform_tags(dashboard.tags)
tags,
)
)
@@ -139,6 +140,10 @@ class PowerBiAPIConfig(EnvBasedSourceConfigBase): | |||
extract_lineage: bool = pydantic.Field( | |||
default=True, description="Whether lineage should be ingested" | |||
) | |||
# Enable/Disable extracting endorsements to tags | |||
extract_endorsements_to_tags: bool = pydantic.Field( | |||
default=True, description="Whether to extract endorsements to tags" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we default this to "False"? Most people may be surprised by this behavior, since this can overwrite existing Tags defined in DataHub
@@ -139,6 +140,10 @@ class PowerBiAPIConfig(EnvBasedSourceConfigBase): | |||
extract_lineage: bool = pydantic.Field( | |||
default=True, description="Whether lineage should be ingested" | |||
) | |||
# Enable/Disable extracting endorsements to tags | |||
extract_endorsements_to_tags: bool = pydantic.Field( | |||
default=True, description="Whether to extract endorsements to tags" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please warn the author that enabling this will potentially overwrite Tags defined inside the DataHub application
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also mention that there's an AddTagTransformer that supports semantic addition of tags (e.g. PATCH instead of REPLACE)
dashboards: List[PowerBiAPI.Dashboard] = [] | ||
dashboard_data = self.get_dashboard_data(workspace) | ||
|
||
for scanned_dashboard in scan_result["dashboards"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extract into a separate method. This is getting too long
for scanned_dashboard in scan_result["dashboards"]: | ||
# Iterate through response and create a list of PowerBiAPI.Dashboard | ||
dashboard_details = next( | ||
(x for x in dashboard_data if x["id"] == scanned_dashboard["id"]), None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ID guaranteed to be present? Should would be defensive on this check?
# Iterate through response and create a list of PowerBiAPI.Dashboard | ||
dashboard = PowerBiAPI.Dashboard( | ||
id=scanned_dashboard.get("id"), | ||
isReadOnly=dashboard_details.get("isReadOnly"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isReadOnly guaranteed to be there? Always?
workspace_id=workspace.id, | ||
workspace_name=workspace.name, | ||
tiles=[], | ||
users=[], | ||
tags=self.parse_endorsement( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a safe "get"? Will endorsementDetails always be there?
@@ -877,6 +868,45 @@ def init_dashboard_tiles(workspace: PowerBiAPI.Workspace) -> None: | |||
|
|||
return None | |||
|
|||
def handle_report(report_data: dict) -> Optional[PowerBiAPI.Report]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I'd recommend reducing the surface area of this PR. It changes too many things at once :) Any way to break it into a few smaller chunks?
Cheers
John
I've refactored the implementation a bit and brought back the original implementation to handle fetching of reports. Now the implementation collects endorsements by id in a dict to the With this change, there's no need to separate What do you think? |
|
||
By default, extracting endorsement information to tags is disabled. The feature may be useful if organization uses [endorsements](https://learn.microsoft.com/en-us/power-bi/collaborate-share/service-endorse-content) to identify content quality. | ||
|
||
Please note that the default implementation overwrites tags for the ingested entities, if you need to preserve existing tags, consider using a [transformer](../../../../metadata-ingestion/docs/transformer/dataset_transformer.md#simple-add-dataset-globaltags) with `semantics: PATCH` tags instead of `OVERWRITE`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look great!
Going to give this a big old "LGTM". Thanks for the hard work!
Extract endorsements from workspace scan result to datasets, reports or dashboards.
Checklist