-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingest/kafka-connect): support MongoSourceConnector #6416
feat(ingest/kafka-connect): support MongoSourceConnector #6416
Conversation
34c4e9f
to
c7fcb15
Compare
@@ -986,5 +986,19 @@ | |||
"registryVersion": null, | |||
"properties": null | |||
} | |||
}, | |||
{ | |||
"entityType": "dataFlow", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't there also be a dataJob
and dataJobInputOutput
aspects to produce lineage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they get generated based on Kafka-topics in the connector, and topics get generated by generating collections in the DB. I spent some time trying to configure the connector and DB to generate a collection->topic, but as I'm not a expert on the topic (pun unintended) I wasn't able to. Working with the test setup was also a bit problematic (on a M1 Mac) and I had constant timing-out issues when running the tests, making iteration slow.
I can ask around, if someone is able to help troubleshoot the connection issue (might be a timing thing?). But I already tested this with our own Kafka-connect deployment and it was able to extract the topics/dataJobs as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ill give it one more try.
|
||
@freeze_time(FROZEN_TIME) | ||
@pytest.mark.integration_batch_1 | ||
def test_kafka_connect_mongosourceconnect_ingest( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shirshanka I separated the mongo source connector test from the original. The test passed locally, but for some reason it fails in CI. It's bit hard to debug, but either the collection in Mongo does not get created as it should, or there is some race condition that causes the connector to not be ready when the ingestion is run.
What do you think we should do? Skip the entire test? Comment out the aspects related to the topics and only test the existence of the connector?
0a1b8a7
to
c14bc11
Compare
metadata-ingestion/tests/integration/kafka-connect/setup/conf/mongo-init.sh
Show resolved
Hide resolved
metadata-ingestion/tests/integration/kafka-connect/test_kafka_connect.py
Show resolved
Hide resolved
@mayurinehate Thanks, that seems to have done the trick! Not sure why the startup script didn't work in GHA, while both approaches worked locally. |
…ject#6416) Co-authored-by: John Joyce <[email protected]> Co-authored-by: Tamas Nemeth <[email protected]>
Adding support for the MongoDB Source Connector to the kafka-connect source: https://www.mongodb.com/docs/kafka-connector/current/source-connector/
Checklist