Skip to content

Commit

Permalink
docs(glue): clarify that table regex patterns should be fully-qualified
Browse files Browse the repository at this point in the history
  • Loading branch information
aditya-radhakrishnan committed Apr 25, 2022
1 parent 28240bb commit dc014a6
Showing 1 changed file with 26 additions and 26 deletions.
52 changes: 26 additions & 26 deletions metadata-ingestion/source_docs/glue.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,33 +75,33 @@ plus `s3:GetObject` for the job script locations.

Note that a `.` is used to denote nested fields in the YAML recipe.

| Field | Required | Default | Description |
|---------------------------------|----------|--------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `aws_region` || | AWS region code. |
| `env` | | `"PROD"` | Environment to use in namespace when constructing URNs. |
| `aws_access_key_id` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
| `aws_secret_access_key` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
| `aws_session_token` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
| `aws_role` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
| `aws_profile` | | | Named AWS profile to use, if not set the default will be used |
| `extract_transforms` | | `True` | Whether to extract Glue transform jobs. |
| `database_pattern.allow` | | | List of regex patterns for databases to include in ingestion. |
| `database_pattern.deny` | | | List of regex patterns for databases to exclude from ingestion. |
| `database_pattern.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching. |
| `table_pattern.allow` | | | List of regex patterns for tables to include in ingestion. |
| `table_pattern.deny` | | | List of regex patterns for tables to exclude from ingestion. |
| `table_pattern.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching. |
| `platform` | | `glue` | Override for platform name. Allowed values - `glue`, `athena` |
| `platform_instance` | | None | The Platform instance to use while constructing URNs. |
| `underlying_platform` | | `glue` | @deprecated(Use `platform`) Override for platform name. Allowed values - `glue`, `athena` |
| `ignore_unsupported_connectors` | | `True` | Whether to ignore unsupported connectors. If disabled, an error will be raised. |
| `emit_s3_lineage` | | `True` | Whether to emit S3-to-Glue lineage. |
| `glue_s3_lineage_direction` | | `upstream` | If `upstream`, S3 is upstream to Glue. If `downstream` S3 is downstream to Glue. |
| `extract_owners` | | `True` | When enabled, extracts ownership from Glue directly and overwrites existing owners. When disabled, ownership is left empty for datasets. |
| Field | Required | Default | Description |
|---------------------------------|----------|--------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|
| `aws_region` || | AWS region code. |
| `env` | | `"PROD"` | Environment to use in namespace when constructing URNs. |
| `aws_access_key_id` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
| `aws_secret_access_key` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
| `aws_session_token` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
| `aws_role` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
| `aws_profile` | | | Named AWS profile to use, if not set the default will be used |
| `extract_transforms` | | `True` | Whether to extract Glue transform jobs. |
| `database_pattern.allow` | | | List of regex patterns for databases to include in ingestion. |
| `database_pattern.deny` | | | List of regex patterns for databases to exclude from ingestion. |
| `database_pattern.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching. |
| `table_pattern.allow` | | | List of regex patterns for fully-qualified table names to include in ingestion. |
| `table_pattern.deny` | | | List of regex patterns for fully-qualified table names to exclude from ingestion. |
| `table_pattern.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching. |
| `platform` | | `glue` | Override for platform name. Allowed values - `glue`, `athena` |
| `platform_instance` | | None | The Platform instance to use while constructing URNs. |
| `underlying_platform` | | `glue` | @deprecated(Use `platform`) Override for platform name. Allowed values - `glue`, `athena` |
| `ignore_unsupported_connectors` | | `True` | Whether to ignore unsupported connectors. If disabled, an error will be raised. |
| `emit_s3_lineage` | | `True` | Whether to emit S3-to-Glue lineage. |
| `glue_s3_lineage_direction` | | `upstream` | If `upstream`, S3 is upstream to Glue. If `downstream` S3 is downstream to Glue. |
| `extract_owners` | | `True` | When enabled, extracts ownership from Glue directly and overwrites existing owners. When disabled, ownership is left empty for datasets. |
| `domain.domain_key.allow` | | | List of regex patterns for tables to set domain_key domain key (domain_key can be any string like `sales`. There can be multiple domain key specified. |
| `domain.domain_key.deny` | | | List of regex patterns for tables to not assign domain_key. There can be multiple domain key specified. |
| `domain.domain_key.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching.There can be multiple domain key specified. |
| `catalog_id` | | None | The aws account id where the target glue catalog lives. If None, datahub will ingest glue catalog in aws caller's account. |
| `domain.domain_key.deny` | | | List of regex patterns for tables to not assign domain_key. There can be multiple domain key specified. |
| `domain.domain_key.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching.There can be multiple domain key specified. |
| `catalog_id` | | None | The aws account id where the target glue catalog lives. If None, datahub will ingest glue catalog in aws caller's account. |

### Cross-account ingestion

Expand Down

0 comments on commit dc014a6

Please sign in to comment.