Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest): handle groupby custom label case #6456

Merged

Conversation

phongvu99
Copy link
Contributor

@phongvu99 phongvu99 commented Nov 16, 2022

Summary

This PR fixed the following exception when ingesting from Superset. The problem only happens if the ingested charts' metadata, the groupby parameter is using a custom label. The ChartAPI in Superset returns the groupby param in the following format, causing the exception.

{
  "label": "my_custom_label",
  "sqlExpression": "SELECT column_one, column_two FROM myDB"
}

What's changed

I've added a small check to ensure the ingestion won't fail. But the original label is lost, and replaced by a combination of the custom label and "_custom_label" appended at the end as a workaround.

A possible enhancement in the future is to modify the code to find the lost value somehow.

The Exception

'[2022-11-16 10:32:47,768] ERROR    {datahub.entrypoints:206} - Command failed: sequence item 0: expected str instance, dict found\n'

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Nov 16, 2022
fix(ingest): handle groupby custom label case
@phongvu99 phongvu99 force-pushed the superset-fix-ingest-groupbys branch from 070f12b to 8de070b Compare November 17, 2022 02:50
@phongvu99 phongvu99 requested a review from jjoyce0510 November 17, 2022 06:53
@github-actions
Copy link

github-actions bot commented Nov 17, 2022

Unit Test Results (metadata ingestion)

       8 files  ±0         8 suites  ±0   56m 21s ⏱️ - 1m 21s
   759 tests ±0     756 ✔️ ±0  3 💤 ±0  0 ±0 
1 520 runs  ±0  1 513 ✔️ ±0  7 💤 ±0  0 ±0 

Results for commit 247288e. ± Comparison against base commit b7c0373.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Nov 17, 2022

Unit Test Results (build & test)

621 tests  ±0   617 ✔️ ±0   15m 33s ⏱️ -10s
157 suites ±0       4 💤 ±0 
157 files   ±0       0 ±0 

Results for commit 247288e. ± Comparison against base commit b7c0373.

♻️ This comment has been updated with latest results.

@@ -308,6 +308,21 @@ def construct_chart_from_chart_data(self, chart_data):
group_bys = params.get("groupby", []) or []
if isinstance(group_bys, str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this also be a dictionary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far I've only seen the Superset API returns the groupby as a list. Not sure why there's a check if it's a string in the original code, hence my concern in the previous comment about the fourth case.

# if the item is a custom label
if isinstance(item, dict):
item_value = item.get("label", "")
if item_value != "": temp_group_bys.append(f"{item_value}_custom_label")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need _custom_label here?

Copy link
Contributor Author

@phongvu99 phongvu99 Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _custom_label acts as an identifier since the original label is lost when using a custom label. As mentioned in the description above, a future improvement is to somehow retrieve the lost value, but that's up to the Superset API.

Copy link
Collaborator

@jjoyce0510 jjoyce0510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Overall. Just a couple questions

@jjoyce0510 jjoyce0510 merged commit 0eb54c6 into datahub-project:master Nov 23, 2022
@phongvu99
Copy link
Contributor Author

LGTM Overall. Just a couple questions

I've replied to the comments above. Thank you for accepting the PR!

cccs-Dustin pushed a commit to CybercentreCanada/datahub that referenced this pull request Feb 1, 2023
fix(ingest): handle groupby custom label case

Co-authored-by: Phong Vu <[email protected]>
Co-authored-by: Harshal Sheth <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants