Skip to content

Commit

Permalink
docs(ingest): update kafka connect doc, simplify starter recipe (#7243)
Browse files Browse the repository at this point in the history
Co-authored-by: John Joyce <[email protected]>
  • Loading branch information
mayurinehate and jjoyce0510 authored Feb 6, 2023
1 parent 36b6fce commit e8c1412
Show file tree
Hide file tree
Showing 4 changed files with 40 additions and 22 deletions.
24 changes: 24 additions & 0 deletions metadata-ingestion/docs/sources/kafka-connect/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
## Integration Details

This plugin extracts the following:

- Source and Sink Connectors in Kafka Connect as Data Pipelines
- For Source connectors - Data Jobs to represent lineage information between source dataset to Kafka topic per `{connector_name}:{source_dataset}` combination
- For Sink connectors - Data Jobs to represent lineage information between Kafka topic to destination dataset per `{connector_name}:{topic}` combination

### Concept Mapping

This ingestion source maps the following Source System Concepts to DataHub Concepts:

| Source Concept | DataHub Concept | Notes |
| --------------------------- | ------------------------------------------------------------- | --------------------------------------------------------------------------- |
| `"kafka-connect"` | [Data Platform](https://datahubproject.io/docs/generated/metamodel/entities/dataPlatform/) | |
| [Connector](https://kafka.apache.org/documentation/#connect_connectorsandtasks) | [DataFlow](https://datahubproject.io/docs/generated/metamodel/entities/dataflow/) | |
| Kafka Topic | [Dataset](https://datahubproject.io/docs/generated/metamodel/entities/dataset/) | |

## Current limitations

Works only for

- Source connectors: JDBC, Debezium, Mongo and Generic connectors with user-defined lineage graph
- Sink connectors: BigQuery
11 changes: 11 additions & 0 deletions metadata-ingestion/docs/sources/kafka-connect/kafka-connect.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
## Advanced Configurations

Kafka Connect supports pluggable configuration providers which can load configuration data from external sources at runtime. These values are not available to DataHub ingestion source through Kafka Connect APIs. If you are using such provided configurations to specify connection url (database, etc) in Kafka Connect connector configuration then you will need also add these in `provided_configs` section in recipe for DataHub to generate correct lineage.

```yml
# Optional mapping of provider configurations if using
provided_configs:
- provider: env
path_key: MYSQL_CONNECTION_URL
value: jdbc:mysql://test_mysql:3306/librarydb
```
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,14 @@ source:
config:
# Coordinates
connect_uri: "http://localhost:8083"
cluster_name: "connect-cluster"
provided_configs:
- provider: env
path_key: MYSQL_CONNECTION_URL
value: jdbc:mysql://test_mysql:3306/librarydb
# Optional mapping of platform types to instance ids
platform_instance_map: # optional
mysql: test_mysql # optional
connect_to_platform_map: # optional
postgres-connector-finance-db: # optional - Connector name
postgres: core_finance_instance # optional - Platform to instance map

# Credentials
username: admin
password: password

# Optional
platform_instance_map:
bigquery: bigquery_platform_instance_id

sink:
# sink configs
11 changes: 0 additions & 11 deletions metadata-ingestion/src/datahub/ingestion/source/kafka_connect.py
Original file line number Diff line number Diff line change
Expand Up @@ -898,17 +898,6 @@ def transform_connector_config(
@support_status(SupportStatus.CERTIFIED)
@capability(SourceCapability.PLATFORM_INSTANCE, "Enabled by default")
class KafkaConnectSource(Source):
"""
This plugin extracts the following:
- Kafka Connect connector as individual `DataFlowSnapshotClass` entity
- Creating individual `DataJobSnapshotClass` entity using `{connector_name}:{source_dataset}` naming
- Lineage information between source database to Kafka topic
Current limitations:
- works only for
- JDBC, Debezium, and Mongo source connectors
- Generic connectors with user-defined lineage graph
- BigQuery sink connector
"""

config: KafkaConnectSourceConfig
report: KafkaConnectSourceReport
Expand Down

0 comments on commit e8c1412

Please sign in to comment.