Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingestion): PowerBI# Improve PowerBI source ingestion #6549

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
69f945c
powerbi package
siddiquebagwan-gslab Nov 10, 2022
d68230b
restructure powerbi
siddiquebagwan-gslab Nov 14, 2022
51d6820
lexical rules
siddiquebagwan-gslab Nov 16, 2022
24e0ba9
12 expression test case
siddiquebagwan-gslab Nov 16, 2022
c539b08
12 M query expression parsed
siddiquebagwan-gslab Nov 17, 2022
9651e54
test cases
siddiquebagwan-gslab Nov 17, 2022
6f4d0cc
WIP
siddiquebagwan-gslab Nov 23, 2022
33d9a29
merge conflict
siddiquebagwan-gslab Nov 23, 2022
b4dd785
Current behaviour
siddiquebagwan-gslab Nov 24, 2022
281bc56
new behaviour where data-platform is powerbi
siddiquebagwan-gslab Nov 24, 2022
92272f4
Merge branch 'master' into master+acr-4765-powerbi-table-column
siddiquebagwan-gslab Nov 24, 2022
01e5839
Merge branch 'master+acr-4765-powerbi-table-column' of github.com:acr…
siddiquebagwan-gslab Nov 24, 2022
3b6a422
debug log
siddiquebagwan-gslab Nov 24, 2022
09268db
Merge branch 'master+acr-4765-powerbi-table-column' of github.com:acr…
siddiquebagwan-gslab Nov 24, 2022
90f6870
WIP
siddiquebagwan-gslab Dec 1, 2022
43f954a
WIP
siddiquebagwan-gslab Dec 2, 2022
fe7c505
config
siddiquebagwan-gslab Dec 7, 2022
f31c2e4
WIP
siddiquebagwan-gslab Dec 7, 2022
46dcafd
WIP
siddiquebagwan-gslab Dec 7, 2022
c5c5ace
working code for postgres
siddiquebagwan-gslab Dec 9, 2022
c86b23f
WIP
siddiquebagwan-gslab Dec 10, 2022
d7c0464
lint fix
siddiquebagwan-gslab Dec 12, 2022
75d5b6b
PowerBI API
siddiquebagwan-gslab Dec 12, 2022
f680ac7
Merge branch 'master+acr-4765-powerbi-table-column' of github.com:acr…
siddiquebagwan-gslab Dec 12, 2022
61c1d2d
mssql server support
siddiquebagwan-gslab Dec 12, 2022
a545f30
Merge branch 'master+acr-4765-powerbi-table-column' of github.com:acr…
siddiquebagwan-gslab Dec 12, 2022
aad6f29
WIP
siddiquebagwan-gslab Dec 14, 2022
33a3150
mssql key
siddiquebagwan-gslab Dec 14, 2022
aecb695
WIP
siddiquebagwan-gslab Dec 15, 2022
776a787
text fixes
siddiquebagwan-gslab Dec 15, 2022
0a4a9b0
WIP
siddiquebagwan-gslab Dec 18, 2022
eb3eda5
native query in MS-SQL
siddiquebagwan-gslab Dec 19, 2022
8c8fff4
Working native and regular cases
siddiquebagwan-gslab Dec 19, 2022
3719107
lint fix
siddiquebagwan-gslab Dec 19, 2022
bb1dea3
flag for switching native query
siddiquebagwan-gslab Dec 19, 2022
788be4e
update test-cases
siddiquebagwan-gslab Dec 20, 2022
b7dc3cb
lineage test
siddiquebagwan-gslab Dec 20, 2022
3656cc0
platform instance
siddiquebagwan-gslab Dec 20, 2022
1433b60
integration test
siddiquebagwan-gslab Dec 20, 2022
979b457
lint fix
siddiquebagwan-gslab Dec 20, 2022
b023525
resovle merge conflict
siddiquebagwan-gslab Dec 20, 2022
a595aa5
resovle merge conflict
siddiquebagwan-gslab Dec 20, 2022
955245c
lint fix
siddiquebagwan-gslab Dec 20, 2022
b53de60
fix golden files
siddiquebagwan-gslab Dec 20, 2022
3ca31a0
fix test
siddiquebagwan-gslab Dec 20, 2022
68363ff
lint fix
siddiquebagwan-gslab Dec 20, 2022
71e25a2
Merge branch 'table-lineage-advance' into master+acr-4765-powerbi-tab…
siddiquebagwan-gslab Dec 20, 2022
9b6a674
Merge branch 'master+acr-4765-powerbi-table-column' of github.com:acr…
siddiquebagwan-gslab Dec 20, 2022
3abe48f
lint fix
siddiquebagwan-gslab Dec 21, 2022
9b90a9a
Merge branch 'master+acr-4765-powerbi-table-column' of github.com:acr…
siddiquebagwan-gslab Dec 21, 2022
843cf0d
spell fix
siddiquebagwan-gslab Dec 21, 2022
cb92c9e
Merge branch 'master+acr-4765-powerbi-table-column' of github.com:acr…
siddiquebagwan-gslab Dec 21, 2022
3951bdd
Merge branch 'master' into master+acr-4765-powerbi-table-column
siddiquebagwan-gslab Dec 23, 2022
dfe51a0
1. Lint fix
siddiquebagwan-gslab Dec 26, 2022
1e2dc90
remove un-wanted code
siddiquebagwan-gslab Dec 26, 2022
329ff8d
Merge branch 'master' into master+acr-4765-powerbi-table-column
siddiquebagwan-gslab Dec 26, 2022
6cb46ca
Add new line
siddiquebagwan-gslab Dec 26, 2022
630259a
Merge branch 'master+acr-4765-powerbi-table-column' of github.com:acr…
siddiquebagwan-gslab Dec 26, 2022
63b9b07
review comments
siddiquebagwan-gslab Dec 26, 2022
6b7470c
Review comments
siddiquebagwan-gslab Dec 26, 2022
b378151
rename methods
siddiquebagwan-gslab Dec 26, 2022
4cd98d5
Merge branch 'master+acr-4765-powerbi-table-column' of github.com:acr…
siddiquebagwan-gslab Dec 26, 2022
b3594da
Merge branch 'master' into master+acr-4765-powerbi-table-column
siddiquebagwan-gslab Dec 27, 2022
0bec288
updated doc
siddiquebagwan-gslab Dec 27, 2022
7fbca87
Merge branch 'master+acr-4765-powerbi-table-column' of github.com:acr…
siddiquebagwan-gslab Dec 27, 2022
7ce75dc
support join in native query
siddiquebagwan-gslab Dec 28, 2022
383697e
integration test fix for native query
siddiquebagwan-gslab Dec 28, 2022
1efcb98
native sql query unit test
siddiquebagwan-gslab Dec 28, 2022
9ca12ae
Merge branch 'master+acr-4765-powerbi-table-column' of github.com:acr…
siddiquebagwan-gslab Dec 28, 2022
fd91110
review comment
siddiquebagwan-gslab Dec 28, 2022
02c2488
Merge branch 'master+acr-4765-powerbi-table-column' of github.com:acr…
siddiquebagwan-gslab Dec 28, 2022
6f1134d
mark workspace_id as optional
siddiquebagwan-gslab Dec 28, 2022
97b8b7f
updated config
siddiquebagwan-gslab Dec 28, 2022
8d70aec
Merge branch 'master+acr-4765-powerbi-table-column' of github.com:acr…
siddiquebagwan-gslab Dec 28, 2022
9f48036
review comments
siddiquebagwan-gslab Dec 29, 2022
5370d23
Merge branch 'master' into master+acr-4765-powerbi-table-column
siddiquebagwan-gslab Dec 29, 2022
51bb936
Merge branch 'master+acr-4765-powerbi-table-column' of github.com:acr…
siddiquebagwan-gslab Dec 29, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 79 additions & 2 deletions metadata-ingestion/docs/sources/powerbi/powerbi_pre.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,91 @@ See the
- Enhance admin APIs responses with detailed metadata
## Concept mapping

| Power BI | Datahub |
| Power BI | Datahub |
|-----------------------|---------------------|
| `Dashboard` | `Dashboard` |
| `Dataset, Datasource` | `Dataset` |
| `Dataset's Table` | `Dataset` |
| `Tile` | `Chart` |
| `Report.webUrl` | `Chart.externalUrl` |
| `Workspace` | `N/A` |
| `Report` | `Dashboard` |
| `Page` | `Chart` |

If Tile is created from report then Chart.externalUrl is set to Report.webUrl.

## Lineage
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add one sentence here to descibe what type of lineage this source extracts

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Introduction sentence)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


This source extract table lineage for tables present in Power BI Datasets. Lets consider a PowerBI Dataset `SALES_REPORT` and a PostgreSQL database is configured as data-source in `SALES_REPORT` dataset.

Consider `SALES_REPORT` PowerBI Dataset has a table `SALES_ANALYSIS` which is backed by `SALES_ANALYSIS_VIEW` of PostgreSQL Database then in this case `SALES_ANALYSIS_VIEW` will appear as upstream dataset for `SALES_ANALYSIS` table.

You can control table lineage ingestion using `extract_lineage` configuration parameter, by default it is set to `true`.

PowerBI Source extracts the lineage information by parsing PowerBI M-Query expression.

PowerBI Source supports M-Query expression for below listed PowerBI Data Sources

1. Snowflake
2. Oracle
3. PostgreSQL
4. Microsoft SQL Server

Native SQL query parsing is only supported for `Snowflake` data-source and only first table from `FROM` clause will be ingested as upstream table. Advance SQL construct like JOIN and SUB-QUERIES in `FROM` clause are not supported.

For example refer below native SQL query. The table `OPERATIONS_ANALYTICS.TRANSFORMED_PROD.V_UNIT_TARGET` will be ingested as upstream table.

```shell
let
Source = Value.NativeQuery(
Snowflake.Databases(
"sdfsd788.ws-east-2.fakecomputing.com",
"operations_analytics_prod",
[Role = "OPERATIONS_ANALYTICS_MEMBER"]
){[Name = "OPERATIONS_ANALYTICS"]}[Data],
"select #(lf)UPPER(REPLACE(AGENT_NAME,\'-\',\'\')) AS Agent,#(lf)TIER,#(lf)UPPER(MANAGER),#(lf)TEAM_TYPE,#(lf)DATE_TARGET,#(lf)MONTHID,#(lf)TARGET_TEAM,#(lf)SELLER_EMAIL,#(lf)concat((UPPER(REPLACE(AGENT_NAME,\'-\',\'\'))), MONTHID) as AGENT_KEY,#(lf)UNIT_TARGET AS SME_Quota,#(lf)AMV_TARGET AS Revenue_Quota,#(lf)SERVICE_QUOTA,#(lf)BL_TARGET,#(lf)SOFTWARE_QUOTA as Software_Quota#(lf)#(lf)from OPERATIONS_ANALYTICS.TRANSFORMED_PROD.V_UNIT_TARGETS#(lf)#(lf)where YEAR_TARGET >= 2020#(lf)and TEAM_TYPE = \'foo\'#(lf)and TARGET_TEAM = \'bar\'",
null,
[EnableFolding = true]
),
#"Added Conditional Column" = Table.AddColumn(
Source,
"Has PS Software Quota?",
each
if [TIER] = "Expansion (Medium)" then
"Yes"
else if [TIER] = "Acquisition" then
"Yes"
else
"No"
)
in
#"Added Conditional Column"
```

## M-Query Pattern Supported For Lineage Extraction
Lets consider a M-Query which combine two PostgreSQL tables. Such M-Query can be written as per below patterns.

**Pattern-1**

```shell
let
Source = PostgreSQL.Database("localhost", "book_store"),
book_date = Source{[Schema="public",Item="book"]}[Data],
issue_history = Source{[Schema="public",Item="issue_history"]}[Data],
combine_result = Table.Combine({book_date, issue_history})
in
combine_result
```

**Pattern-2**

```shell
let
Source = PostgreSQL.Database("localhost", "book_store"),
combine_result = Table.Combine({Source{[Schema="public",Item="book"]}[Data], Source{[Schema="public",Item="issue_history"]}[Data]})
in
combine_result
```

`Pattern-2` is *not* supported for upstream table lineage extraction as it uses nested item-selector i.e. {Source{[Schema="public",Item="book"]}[Data], Source{[Schema="public",Item="issue_history"]}[Data]} as argument to M-QUery table function i.e. Table.Combine

`Pattern-1` is supported as it first assign the table from schema to variable and then variable is used in M-Query Table function i.e. Table.Combine
4 changes: 3 additions & 1 deletion metadata-ingestion/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -344,7 +344,7 @@ def get_long_description():
"trino": sql_common | trino,
"starburst-trino-usage": sql_common | usage_common | trino,
"nifi": {"requests", "packaging"},
"powerbi": microsoft_common,
"powerbi": microsoft_common | {"lark[regex]==1.1.4"},
"powerbi-report-server": powerbi_report_server,
"vertica": sql_common | {"sqlalchemy-vertica[vertica-python]==0.0.5"},
"unity-catalog": databricks_cli | {"requests"},
Expand Down Expand Up @@ -627,6 +627,8 @@ def get_long_description():
"datahub": ["py.typed"],
"datahub.metadata": ["schema.avsc"],
"datahub.metadata.schemas": ["*.avsc"],
"datahub.ingestion.source.feast_image": ["Dockerfile", "requirements.txt"],
"datahub.ingestion.source.powerbi": ["powerbi-lexical-grammar.rule"],
},
entry_points=entry_points,
# Dependencies.
Expand Down
Loading