Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingest/kafka-connect): support MongoSourceConnector #6416

Merged
merged 20 commits into from
Dec 5, 2022

Conversation

frsann
Copy link
Contributor

@frsann frsann commented Nov 11, 2022

Adding support for the MongoDB Source Connector to the kafka-connect source: https://www.mongodb.com/docs/kafka-connector/current/source-connector/

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Nov 11, 2022
@github-actions
Copy link

github-actions bot commented Nov 11, 2022

Unit Test Results (metadata ingestion)

       8 files         8 suites   1h 6m 27s ⏱️
   766 tests    764 ✔️ 2 💤 0
1 534 runs  1 529 ✔️ 5 💤 0

Results for commit 204710a.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Nov 11, 2022

Unit Test Results (build & test)

621 tests  ±0   617 ✔️ ±0   16m 6s ⏱️ -16s
157 suites ±0       4 💤 ±0 
157 files   ±0       0 ±0 

Results for commit 204710a. ± Comparison against base commit 4f7b5ac.

♻️ This comment has been updated with latest results.

@frsann frsann force-pushed the mongosourceconnector-support branch from 34c4e9f to c7fcb15 Compare November 12, 2022 17:58
@frsann frsann marked this pull request as ready for review November 12, 2022 17:58
@frsann frsann changed the title ingestion(kafka-connect): Support MongoSourceConnector [WIP] feat(ingest): Support MongoSourceConnector in kafka-connect ingestion Nov 12, 2022
@maggiehays maggiehays added the community-contribution PR or Issue raised by member(s) of DataHub Community label Nov 14, 2022
@@ -986,5 +986,19 @@
"registryVersion": null,
"properties": null
}
},
{
"entityType": "dataFlow",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there also be a dataJob and dataJobInputOutput aspects to produce lineage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they get generated based on Kafka-topics in the connector, and topics get generated by generating collections in the DB. I spent some time trying to configure the connector and DB to generate a collection->topic, but as I'm not a expert on the topic (pun unintended) I wasn't able to. Working with the test setup was also a bit problematic (on a M1 Mac) and I had constant timing-out issues when running the tests, making iteration slow.

I can ask around, if someone is able to help troubleshoot the connection issue (might be a timing thing?). But I already tested this with our own Kafka-connect deployment and it was able to extract the topics/dataJobs as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ill give it one more try.


@freeze_time(FROZEN_TIME)
@pytest.mark.integration_batch_1
def test_kafka_connect_mongosourceconnect_ingest(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shirshanka I separated the mongo source connector test from the original. The test passed locally, but for some reason it fails in CI. It's bit hard to debug, but either the collection in Mongo does not get created as it should, or there is some race condition that causes the connector to not be ready when the ingestion is run.

What do you think we should do? Skip the entire test? Comment out the aspects related to the topics and only test the existence of the connector?

@frsann frsann force-pushed the mongosourceconnector-support branch from 0a1b8a7 to c14bc11 Compare November 17, 2022 19:34
@frsann
Copy link
Contributor Author

frsann commented Dec 2, 2022

@mayurinehate Thanks, that seems to have done the trick! Not sure why the startup script didn't work in GHA, while both approaches worked locally.

@mayurinehate mayurinehate self-requested a review December 2, 2022 13:33
@hsheth2 hsheth2 changed the title feat(ingest): Support MongoSourceConnector in kafka-connect ingestion feat(ingest/kafka-connect): support MongoSourceConnector Dec 5, 2022
@hsheth2 hsheth2 merged commit 4dd66be into datahub-project:master Dec 5, 2022
cccs-Dustin pushed a commit to CybercentreCanada/datahub that referenced this pull request Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants