* change read me

datahub-project · Dec 2, 2022 · f07d2d9 · f07d2d9
1 parent bfcc0f6
commit f07d2d9
Showing 1 changed file with 15 additions and 14 deletions.
diff --git a/metadata-integration/java/spark-lineage/README.md b/metadata-integration/java/spark-lineage/README.md
@@ -122,20 +122,21 @@ The Spark agent can be configured using Databricks Cluster [Spark configuration]
 
 ## Configuration Options
 
-| Field                                            | Required | Default | Description                                                                                                                           |
-|--------------------------------------------------|----------|---------|---------------------------------------------------------------------------------------------------------------------------------------|
-| spark.jars.packages                              | ✅        |         | Set with latest/required version  io.acryl:datahub-spark-lineage:0.8.23                                                               |
-| spark.extraListeners                             | ✅        |         | datahub.spark.DatahubSparkListener                                                                                                    |
-| spark.datahub.rest.server                        | ✅        |         | Datahub server url  eg:<http://localhost:8080>                                                                                        |
-| spark.datahub.rest.token                         |          |         | Authentication token.                                                                                                                 |
-| spark.datahub.rest.disable_ssl_verification      |          | false   | Disable SSL certificate validation. Caution: Only use this if you know what you are doing!                                            |
-| spark.datahub.metadata.pipeline.platformInstance |          |         | Pipeline level platform instance                                                                                                      |
-| spark.datahub.metadata.dataset.platformInstance  |          |         | dataset level platform instance                                                                                                       |
-| spark.datahub.metadata.dataset.env               |          | PROD    | [Supported values](https://datahubproject.io/docs/graphql/enums#fabrictype). In all other cases, will fallback to PROD                |
-| spark.datahub.metadata.table.hive_platform_alias |          | hive    | By default, datahub assigns Hive-like tables to the Hive platform. If you are using Glue as your Hive metastore, set this config flag to `glue`                                                                                                                   |
-| spark.datahub.metadata.include_scheme            |          | true    | Include scheme from the path URI (e.g. hdfs://, s3://) in the dataset URN. We recommend setting this value to false, it is set to true for backwards compatibility with previous versions                                                                             |
-| spark.datahub.coalesce_jobs                      |          | false   | Only one datajob(task) will be emitted containing all input and output datasets for the spark application                            |
-| spark.datahub.parent.datajob_urn                 |          |         | Specified dataset will be set as upstream dataset for datajob created. Effective only when spark.datahub.coalesce_jobs is set to true |
+| Field                                            | Required | Default | Description                                                                                                                                                                               |
+|--------------------------------------------------|----------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| spark.jars.packages                              | ✅        |         | Set with latest/required version  io.acryl:datahub-spark-lineage:0.8.23                                                                                                                   |
+| spark.extraListeners                             | ✅        |         | datahub.spark.DatahubSparkListener                                                                                                                                                        |
+| spark.datahub.rest.server                        | ✅        |         | Datahub server url  eg:<http://localhost:8080>                                                                                                                                            |
+| spark.datahub.rest.token                         |          |         | Authentication token.                                                                                                                                                                     |
+| spark.datahub.rest.disable_ssl_verification      |          | false   | Disable SSL certificate validation. Caution: Only use this if you know what you are doing!                                                                                                |
+| spark.datahub.metadata.pipeline.platformInstance |          |         | Pipeline level platform instance                                                                                                                                                          |
+| spark.datahub.metadata.dataset.platformInstance  |          |         | dataset level platform instance                                                                                                                                                           |
+| spark.datahub.metadata.dataset.env               |          | PROD    | [Supported values](https://datahubproject.io/docs/graphql/enums#fabrictype). In all other cases, will fallback to PROD                                                                    |
+| spark.datahub.metadata.table.hive_platform_alias |          | hive    | By default, datahub assigns Hive-like tables to the Hive platform. If you are using Glue as your Hive metastore, set this config flag to `glue`                                           |
+| spark.datahub.metadata.include_scheme            |          | true    | Include scheme from the path URI (e.g. hdfs://, s3://) in the dataset URN. We recommend setting this value to false, it is set to true for backwards compatibility with previous versions |
+| spark.datahub.metadata.remove_partition_pattern  |          |         | Remove partition pattern. (e.g. /partition=\d+) It change database/table/partition=123 to database/table                                                                                  |
+| spark.datahub.coalesce_jobs                      |          | false   | Only one datajob(task) will be emitted containing all input and output datasets for the spark application                                                                                 |
+| spark.datahub.parent.datajob_urn                 |          |         | Specified dataset will be set as upstream dataset for datajob created. Effective only when spark.datahub.coalesce_jobs is set to true                                                     |
 
 ## What to Expect: The Metadata Model