diff --git a/metadata-ingestion/source_docs/glue.md b/metadata-ingestion/source_docs/glue.md index a36129661807a4..2accb2d2691f7e 100644 --- a/metadata-ingestion/source_docs/glue.md +++ b/metadata-ingestion/source_docs/glue.md @@ -75,33 +75,33 @@ plus `s3:GetObject` for the job script locations. Note that a `.` is used to denote nested fields in the YAML recipe. -| Field | Required | Default | Description | -|---------------------------------|----------|--------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `aws_region` | ✅ | | AWS region code. | -| `env` | | `"PROD"` | Environment to use in namespace when constructing URNs. | -| `aws_access_key_id` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html | -| `aws_secret_access_key` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html | -| `aws_session_token` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html | -| `aws_role` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html | -| `aws_profile` | | | Named AWS profile to use, if not set the default will be used | -| `extract_transforms` | | `True` | Whether to extract Glue transform jobs. | -| `database_pattern.allow` | | | List of regex patterns for databases to include in ingestion. | -| `database_pattern.deny` | | | List of regex patterns for databases to exclude from ingestion. | -| `database_pattern.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching. | -| `table_pattern.allow` | | | List of regex patterns for tables to include in ingestion. | -| `table_pattern.deny` | | | List of regex patterns for tables to exclude from ingestion. | -| `table_pattern.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching. | -| `platform` | | `glue` | Override for platform name. Allowed values - `glue`, `athena` | -| `platform_instance` | | None | The Platform instance to use while constructing URNs. | -| `underlying_platform` | | `glue` | @deprecated(Use `platform`) Override for platform name. Allowed values - `glue`, `athena` | -| `ignore_unsupported_connectors` | | `True` | Whether to ignore unsupported connectors. If disabled, an error will be raised. | -| `emit_s3_lineage` | | `True` | Whether to emit S3-to-Glue lineage. | -| `glue_s3_lineage_direction` | | `upstream` | If `upstream`, S3 is upstream to Glue. If `downstream` S3 is downstream to Glue. | -| `extract_owners` | | `True` | When enabled, extracts ownership from Glue directly and overwrites existing owners. When disabled, ownership is left empty for datasets. | +| Field | Required | Default | Description | +|---------------------------------|----------|--------------|--------------------------------------------------------------------------------------------------------------------------------------------------------| +| `aws_region` | ✅ | | AWS region code. | +| `env` | | `"PROD"` | Environment to use in namespace when constructing URNs. | +| `aws_access_key_id` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html | +| `aws_secret_access_key` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html | +| `aws_session_token` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html | +| `aws_role` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html | +| `aws_profile` | | | Named AWS profile to use, if not set the default will be used | +| `extract_transforms` | | `True` | Whether to extract Glue transform jobs. | +| `database_pattern.allow` | | | List of regex patterns for databases to include in ingestion. | +| `database_pattern.deny` | | | List of regex patterns for databases to exclude from ingestion. | +| `database_pattern.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching. | +| `table_pattern.allow` | | | List of regex patterns for fully-qualified table names to include in ingestion. | +| `table_pattern.deny` | | | List of regex patterns for fully-qualified table names to exclude from ingestion. | +| `table_pattern.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching. | +| `platform` | | `glue` | Override for platform name. Allowed values - `glue`, `athena` | +| `platform_instance` | | None | The Platform instance to use while constructing URNs. | +| `underlying_platform` | | `glue` | @deprecated(Use `platform`) Override for platform name. Allowed values - `glue`, `athena` | +| `ignore_unsupported_connectors` | | `True` | Whether to ignore unsupported connectors. If disabled, an error will be raised. | +| `emit_s3_lineage` | | `True` | Whether to emit S3-to-Glue lineage. | +| `glue_s3_lineage_direction` | | `upstream` | If `upstream`, S3 is upstream to Glue. If `downstream` S3 is downstream to Glue. | +| `extract_owners` | | `True` | When enabled, extracts ownership from Glue directly and overwrites existing owners. When disabled, ownership is left empty for datasets. | | `domain.domain_key.allow` | | | List of regex patterns for tables to set domain_key domain key (domain_key can be any string like `sales`. There can be multiple domain key specified. | -| `domain.domain_key.deny` | | | List of regex patterns for tables to not assign domain_key. There can be multiple domain key specified. | -| `domain.domain_key.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching.There can be multiple domain key specified. | -| `catalog_id` | | None | The aws account id where the target glue catalog lives. If None, datahub will ingest glue catalog in aws caller's account. | +| `domain.domain_key.deny` | | | List of regex patterns for tables to not assign domain_key. There can be multiple domain key specified. | +| `domain.domain_key.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching.There can be multiple domain key specified. | +| `catalog_id` | | None | The aws account id where the target glue catalog lives. If None, datahub will ingest glue catalog in aws caller's account. | ### Cross-account ingestion