forked from datahub-project/datahub
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'datahub-project:master' into master
- Loading branch information
Showing
11 changed files
with
245 additions
and
43 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,7 +22,7 @@ The below table shows transformer which can transform aspects of entity [Dataset | |
|-----------------------------|----------|---------|---------------|---------------------------------------------| | ||
| `tag_pattern` | | str | | Regex to use for tags to match against. Supports Regex to match a pattern which is used to remove content. Rest of string is considered owner ID for creating owner URN. | | ||
| `is_user` | | bool | `true` | Whether should be consider a user or not. If `false` then considered a group. | | ||
| `owner_character_mapping` | | dict[str, str] | | A mapping of extracted owner character to datahub owner character. | | ||
| `tag_character_mapping` | | dict[str, str] | | A mapping of tag character to datahub owner character. If provided, `tag_pattern` config should be matched against converted tag as per mapping| | ||
| `email_domain` | | str | | If set then this is appended to create owner URN. | | ||
| `extract_owner_type_from_tag_pattern` | | str | `false` | Whether to extract an owner type from provided tag pattern first group. If `true`, no need to provide owner_type and owner_type_urn config. For example: if provided tag pattern is `(.*)_owner_email:` and actual tag is `developer_owner_email`, then extracted owner type will be `developer`.| | ||
| `owner_type` | | str | `TECHNICAL_OWNER` | Ownership type. | | ||
|
@@ -40,14 +40,14 @@ transformers: | |
``` | ||
So if we have input dataset tag like | ||
- `urn:li:tag:dataset_owner_email:[email protected]` | ||
- `urn:li:tag:dataset_owner_email:[email protected]` | ||
- `urn:li:tag:owner_email:[email protected]` | ||
- `urn:li:tag:owner_email:[email protected]` | ||
|
||
The portion of the tag after the matched tag pattern will be converted into an owner. Hence users `[email protected]` and `[email protected]` will be added as owners. | ||
|
||
### Examples | ||
|
||
- Add owners, however owner should be considered as group and also email domain not provided in tag string. For example: from tag urn `urn:li:tag:dataset_owner:abc` extracted owner urn should be `urn:li:corpGroup:[email protected]` then config would look like this: | ||
- Add owners, however owner should be considered as group and also email domain not provided in tag string. For example: from tag urn `urn:li:tag:owner:abc` extracted owner urn should be `urn:li:corpGroup:[email protected]` then config would look like this: | ||
```yaml | ||
transformers: | ||
- type: "extract_ownership_from_tags" | ||
|
@@ -56,7 +56,7 @@ The portion of the tag after the matched tag pattern will be converted into an o | |
is_user: false | ||
email_domain: "email.com" | ||
``` | ||
- Add owners, however owner type and owner type urn wanted to provide externally. For example: from tag urn `urn:li:tag:dataset_owner_email:[email protected]` owner type should be `CUSTOM` and owner type urn as `"urn:li:ownershipType:data_product"` then config would look like this: | ||
- Add owners, however owner type and owner type urn wanted to provide externally. For example: from tag urn `urn:li:tag:owner_email:[email protected]` owner type should be `CUSTOM` and owner type urn as `"urn:li:ownershipType:data_product"` then config would look like this: | ||
```yaml | ||
transformers: | ||
- type: "extract_ownership_from_tags" | ||
|
@@ -65,15 +65,17 @@ The portion of the tag after the matched tag pattern will be converted into an o | |
owner_type: "CUSTOM" | ||
owner_type_urn: "urn:li:ownershipType:data_product" | ||
``` | ||
- Add owners, however some owner characters needs to replace with some other characters before ingestion. For example: from tag urn `urn:li:tag:dataset_owner_email:abc_xyz-email_com` extracted owner urn should be `urn:li:corpGroup:[email protected]` then config would look like this: | ||
- Add owners, however some tag characters needs to replace with some other characters before extracting owner. For example: from tag urn `urn:li:tag:owner__email:abc--xyz-email_com` extracted owner urn should be `urn:li:corpGroup:[email protected]` then config would look like this: | ||
```yaml | ||
transformers: | ||
- type: "extract_ownership_from_tags" | ||
config: | ||
tag_pattern: "owner_email:" | ||
owner_character_mapping: | ||
"_": ".", | ||
"-": "@", | ||
tag_character_mapping: | ||
"_": "." | ||
"-": "@" | ||
"--": "-" | ||
"__": "_" | ||
``` | ||
- Add owners, however owner type also need to extracted from tag pattern. For example: from tag urn `urn:li:tag:data_producer_owner_email:[email protected]` extracted owner type should be `data_producer` then config would look like this: | ||
```yaml | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.