diff --git a/metadata-ingestion/docs/sources/kafka-connect/README.md b/metadata-ingestion/docs/sources/kafka-connect/README.md new file mode 100644 index 00000000000000..ac3728b6eacba6 --- /dev/null +++ b/metadata-ingestion/docs/sources/kafka-connect/README.md @@ -0,0 +1,24 @@ +## Integration Details + +This plugin extracts the following: + +- Source and Sink Connectors in Kafka Connect as Data Pipelines +- For Source connectors - Data Jobs to represent lineage information between source dataset to Kafka topic per `{connector_name}:{source_dataset}` combination +- For Sink connectors - Data Jobs to represent lineage information between Kafka topic to destination dataset per `{connector_name}:{topic}` combination + +### Concept Mapping + +This ingestion source maps the following Source System Concepts to DataHub Concepts: + +| Source Concept | DataHub Concept | Notes | +| --------------------------- | ------------------------------------------------------------- | --------------------------------------------------------------------------- | +| `"kafka-connect"` | [Data Platform](https://datahubproject.io/docs/generated/metamodel/entities/dataPlatform/) | | +| [Connector](https://kafka.apache.org/documentation/#connect_connectorsandtasks) | [DataFlow](https://datahubproject.io/docs/generated/metamodel/entities/dataflow/) | | +| Kafka Topic | [Dataset](https://datahubproject.io/docs/generated/metamodel/entities/dataset/) | | + +## Current limitations + +Works only for + +- Source connectors: JDBC, Debezium, Mongo and Generic connectors with user-defined lineage graph +- Sink connectors: BigQuery diff --git a/metadata-ingestion/docs/sources/kafka-connect/kafka-connect.md b/metadata-ingestion/docs/sources/kafka-connect/kafka-connect.md new file mode 100644 index 00000000000000..9d400460407c8c --- /dev/null +++ b/metadata-ingestion/docs/sources/kafka-connect/kafka-connect.md @@ -0,0 +1,11 @@ +## Advanced Configurations + +Kafka Connect supports pluggable configuration providers which can load configuration data from external sources at runtime. These values are not available to DataHub ingestion source through Kafka Connect APIs. If you are using such provided configurations to specify connection url (database, etc) in Kafka Connect connector configuration then you will need also add these in `provided_configs` section in recipe for DataHub to generate correct lineage. + +```yml + # Optional mapping of provider configurations if using + provided_configs: + - provider: env + path_key: MYSQL_CONNECTION_URL + value: jdbc:mysql://test_mysql:3306/librarydb +``` diff --git a/metadata-ingestion/docs/sources/kafka-connect/kafka-connect_recipe.yml b/metadata-ingestion/docs/sources/kafka-connect/kafka-connect_recipe.yml index 747753c8461f02..f5e33e661622d8 100644 --- a/metadata-ingestion/docs/sources/kafka-connect/kafka-connect_recipe.yml +++ b/metadata-ingestion/docs/sources/kafka-connect/kafka-connect_recipe.yml @@ -3,20 +3,14 @@ source: config: # Coordinates connect_uri: "http://localhost:8083" - cluster_name: "connect-cluster" - provided_configs: - - provider: env - path_key: MYSQL_CONNECTION_URL - value: jdbc:mysql://test_mysql:3306/librarydb - # Optional mapping of platform types to instance ids - platform_instance_map: # optional - mysql: test_mysql # optional - connect_to_platform_map: # optional - postgres-connector-finance-db: # optional - Connector name - postgres: core_finance_instance # optional - Platform to instance map + # Credentials username: admin password: password + # Optional + platform_instance_map: + bigquery: bigquery_platform_instance_id + sink: # sink configs \ No newline at end of file diff --git a/metadata-ingestion/src/datahub/ingestion/source/kafka_connect.py b/metadata-ingestion/src/datahub/ingestion/source/kafka_connect.py index 3777ad8f772edc..3c6fa9587ed109 100644 --- a/metadata-ingestion/src/datahub/ingestion/source/kafka_connect.py +++ b/metadata-ingestion/src/datahub/ingestion/source/kafka_connect.py @@ -898,17 +898,6 @@ def transform_connector_config( @support_status(SupportStatus.CERTIFIED) @capability(SourceCapability.PLATFORM_INSTANCE, "Enabled by default") class KafkaConnectSource(Source): - """ - This plugin extracts the following: - - Kafka Connect connector as individual `DataFlowSnapshotClass` entity - - Creating individual `DataJobSnapshotClass` entity using `{connector_name}:{source_dataset}` naming - - Lineage information between source database to Kafka topic - Current limitations: - - works only for - - JDBC, Debezium, and Mongo source connectors - - Generic connectors with user-defined lineage graph - - BigQuery sink connector - """ config: KafkaConnectSourceConfig report: KafkaConnectSourceReport