Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Create Kafka Source for Ingest #7020

Closed
kaning opened this issue Jan 12, 2023 · 1 comment · Fixed by #7046
Closed

Unable to Create Kafka Source for Ingest #7020

kaning opened this issue Jan 12, 2023 · 1 comment · Fixed by #7046
Assignees
Labels
bug Bug report

Comments

@kaning
Copy link

kaning commented Jan 12, 2023

Describe the bug
After using the datahub cli (python) to create the docker quickstart, Kafka Ingest Source does not work.

To Reproduce
Steps to reproduce the behavior:

  1. Install datahub cli with python virtual env
  2. run datahub docker quickstart
  3. After everything starts up go to ingestion -> Create new source
  4. Save and Run fails

Expected behavior
A Kafka Data Source should be created

Desktop (please complete the following information):

  • OS: Mac - M1
  • Browser: Chrome
  • Version [e.g. 22]

Additional context
Stack trace

2023-01-12 15:36:50 2023-01-12 15:36:50.446020 [exec_id=2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8] INFO: Starting execution for task with name=RUN_INGEST
2023-01-12 15:36:50 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs] Obtaining venv creation lock...
2023-01-12 15:36:50 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs] Acquired venv creation lock
2023-01-12 15:36:50 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs] venv setup time = 0
2023-01-12 15:36:50 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs] This version of datahub supports report-to functionality
2023-01-12 15:36:50 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs] datahub  ingest run -c /tmp/datahub/ingest/2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8/recipe.yml --report-to /tmp/datahub/ingest/2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8/ingestion_report.json
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs] [2023-01-12 15:36:52,179] INFO     {datahub.cli.ingest_cli:165} - DataHub CLI version: 0.9.5
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs] [2023-01-12 15:36:52,207] INFO     {datahub.ingestion.run.pipeline:179} - Sink configured successfully. DataHubRestEmitter: configured to talk to http://datahub-gms:8080
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs] [2023-01-12 15:36:52,631] ERROR    {datahub.entrypoints:213} - Command failed: Failed to find a registered source for type kafka: kafka is disabled due to an error in initialization
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs] Traceback (most recent call last):
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 97, in _ensure_not_lazy
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     plugin_class = import_path(path)
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 32, in import_path
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     item = importlib.import_module(module_name)
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     return _bootstrap._gcd_import(name[level:], package, level)
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/ingestion/source/kafka.py", line 11, in <module>
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     from confluent_kafka.admin import (
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs] ImportError: cannot import name 'ResourceType' from 'confluent_kafka.admin' (/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/confluent_kafka/admin/__init__.py)
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs] 
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs] The above exception was the direct cause of the following exception:
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs] 
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs] Traceback (most recent call last):
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 114, in _add_init_error_context
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     yield
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 189, in __init__
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     source_class = source_registry.get(source_type)
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 148, in get
2023-01-12 15:36:50 [2023-01-12 15:36:50,433] DEBUG    {acryl.executor.dispatcher.default_dispatcher:57} - Started thread <Thread(Thread-8 (dispatch_async), started 281472938263008)> for 2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8
2023-01-12 15:36:50 [2023-01-12 15:36:50,446] DEBUG    {acryl.executor.execution.default_executor:121} - Task for 2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 created
2023-01-12 15:36:50 [2023-01-12 15:36:50,448] INFO     {acryl.executor.execution.sub_process_ingestion_task:87} - Starting ingestion subprocess for exec_id=2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 (kafka)
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     raise ConfigurationError(
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs] datahub.configuration.common.ConfigurationError: kafka is disabled due to an error in initialization
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs] 
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs] The above exception was the direct cause of the following exception:
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs] 
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs] Traceback (most recent call last):
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/entrypoints.py", line 171, in main
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     sys.exit(datahub(standalone_mode=False, **kwargs))
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     return self.main(*args, **kwargs)
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/click/core.py", line 1055, in main
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     rv = self.invoke(ctx)
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     return _process_result(sub_ctx.command.invoke(sub_ctx))
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     return _process_result(sub_ctx.command.invoke(sub_ctx))
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     return ctx.invoke(self.callback, **ctx.params)
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/click/core.py", line 760, in invoke
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     return __callback(*args, **kwargs)
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     return f(get_current_context(), *args, **kwargs)
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 344, in wrapper
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     raise e
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 296, in wrapper
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     res = func(*args, **kwargs)
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     return func(ctx, *args, **kwargs)
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 179, in run
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     pipeline = Pipeline.create(
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 303, in create
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     return cls(
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 186, in __init__
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     with _add_init_error_context(
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     self.gen.throw(typ, value, traceback)
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]   File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 116, in _add_init_error_context
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs]     raise PipelineInitError(f"Failed to {step}: {e}") from e
2023-01-12 15:36:52 [2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 logs] datahub.ingestion.run.pipeline.PipelineInitError: Failed to find a registered source for type kafka: kafka is disabled due to an error in initialization
2023-01-12 15:36:54 2023-01-12 15:36:54.486099 [exec_id=2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8] INFO: Failed to execute 'datahub ingest'
2023-01-12 15:36:54 2023-01-12 15:36:54.486365 [exec_id=2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8] INFO: Caught exception EXECUTING task_id=2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8, name=RUN_INGEST, stacktrace=Traceback (most recent call last):
2023-01-12 15:36:54   File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task
2023-01-12 15:36:54     task_event_loop.run_until_complete(task_future)
2023-01-12 15:36:54   File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
2023-01-12 15:36:54     return future.result()
2023-01-12 15:36:54   File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 227, in execute
2023-01-12 15:36:54     raise TaskError("Failed to execute 'datahub ingest'")
2023-01-12 15:36:54 acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'
2023-01-12 15:36:54 
2023-01-12 15:36:54 ~~~~ Execution Summary ~~~~
2023-01-12 15:36:54 
2023-01-12 15:36:54 RUN_INGEST - {'errors': [],
2023-01-12 15:36:54  'exec_id': '2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8',
2023-01-12 15:36:54  'infos': ['2023-01-12 15:36:50.446020 [exec_id=2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8] INFO: Starting execution for task with name=RUN_INGEST',
2023-01-12 15:36:54            '2023-01-12 15:36:54.485766 [exec_id=2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8] INFO: stdout=Obtaining venv creation lock...\n'
2023-01-12 15:36:54            'Acquired venv creation lock\n'
2023-01-12 15:36:54            'venv setup time = 0\n'
2023-01-12 15:36:54            'This version of datahub supports report-to functionality\n'
2023-01-12 15:36:54            'datahub  ingest run -c /tmp/datahub/ingest/2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8/recipe.yml --report-to '
2023-01-12 15:36:54            '/tmp/datahub/ingest/2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8/ingestion_report.json\n'
2023-01-12 15:36:54            '[2023-01-12 15:36:52,179] INFO     {datahub.cli.ingest_cli:165} - DataHub CLI version: 0.9.5\n'
2023-01-12 15:36:54            '[2023-01-12 15:36:52,207] INFO     {datahub.ingestion.run.pipeline:179} - Sink configured successfully. DataHubRestEmitter: configured '
2023-01-12 15:36:54            'to talk to http://datahub-gms:8080\n'
2023-01-12 15:36:54            '[2023-01-12 15:36:52,631] ERROR    {datahub.entrypoints:213} - Command failed: Failed to find a registered source for type kafka: kafka '
2023-01-12 15:36:52 [2023-01-12 15:36:52,699] INFO     {acryl.executor.execution.sub_process_ingestion_task:120} - Got EOF from subprocess exec_id=2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 - stopping log monitor
2023-01-12 15:36:52 [2023-01-12 15:36:52,699] INFO     {acryl.executor.execution.sub_process_ingestion_task:180} - Detected subprocess exited exec_id=2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8
2023-01-12 15:36:54 [2023-01-12 15:36:54,485] INFO     {acryl.executor.execution.sub_process_ingestion_task:154} - Detected subprocess return code exec_id=2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8 - stopping logs reporting
2023-01-12 15:36:54 [2023-01-12 15:36:54,486] DEBUG    {acryl.executor.execution.default_executor:137} - Cleaned up task for 2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8
2023-01-12 15:36:54            'is disabled due to an error in initialization\n'
2023-01-12 15:36:54            'Traceback (most recent call last):\n'
2023-01-12 15:36:54            '  File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 97, in '
2023-01-12 15:36:54            '_ensure_not_lazy\n'
2023-01-12 15:36:54            '    plugin_class = import_path(path)\n'
2023-01-12 15:36:54            '  File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 32, in import_path\n'
2023-01-12 15:36:54            '    item = importlib.import_module(module_name)\n'
2023-01-12 15:36:54            '  File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module\n'
2023-01-12 15:36:54            '    return _bootstrap._gcd_import(name[level:], package, level)\n'
2023-01-12 15:36:54            '  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import\n'
2023-01-12 15:36:54            '  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load\n'
2023-01-12 15:36:54            '  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked\n'
2023-01-12 15:36:54            '  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked\n'
2023-01-12 15:36:54            '  File "<frozen importlib._bootstrap_external>", line 883, in exec_module\n'
2023-01-12 15:36:54            '  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed\n'
2023-01-12 15:36:54            '  File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/ingestion/source/kafka.py", line 11, in <module>\n'
2023-01-12 15:36:54            '    from confluent_kafka.admin import (\n'
2023-01-12 15:36:54            "ImportError: cannot import name 'ResourceType' from 'confluent_kafka.admin' "
2023-01-12 15:36:54            '(/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/confluent_kafka/admin/__init__.py)\n'
2023-01-12 15:36:54            '\n'
2023-01-12 15:36:54            'The above exception was the direct cause of the following exception:\n'
2023-01-12 15:36:54            '\n'
2023-01-12 15:36:54            'Traceback (most recent call last):\n'
2023-01-12 15:36:54            '  File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 114, in '
2023-01-12 15:36:54            '_add_init_error_context\n'
2023-01-12 15:36:54            '    yield\n'
2023-01-12 15:36:54            '  File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 189, in __init__\n'
2023-01-12 15:36:54            '    source_class = source_registry.get(source_type)\n'
2023-01-12 15:36:54            '  File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 148, in get\n'
2023-01-12 15:36:54            '    raise ConfigurationError(\n'
2023-01-12 15:36:54            'datahub.configuration.common.ConfigurationError: kafka is disabled due to an error in initialization\n'
2023-01-12 15:36:54            '\n'
2023-01-12 15:36:54            'The above exception was the direct cause of the following exception:\n'
2023-01-12 15:36:54            '\n'
2023-01-12 15:36:54            'Traceback (most recent call last):\n'
2023-01-12 15:36:54            '  File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/entrypoints.py", line 171, in main\n'
2023-01-12 15:36:54            '    sys.exit(datahub(standalone_mode=False, **kwargs))\n'
2023-01-12 15:36:54            '  File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/click/core.py", line 1130, in __call__\n'
2023-01-12 15:36:54            '    return self.main(*args, **kwargs)\n'
2023-01-12 15:36:54            '  File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/click/core.py", line 1055, in main\n'
2023-01-12 15:36:54            '    rv = self.invoke(ctx)\n'
2023-01-12 15:36:54            '  File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/click/core.py", line 1657, in invoke\n'
2023-01-12 15:36:54            '    return _process_result(sub_ctx.command.invoke(sub_ctx))\n'
2023-01-12 15:36:54            '  File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/click/core.py", line 1657, in invoke\n'
2023-01-12 15:36:54            '    return _process_result(sub_ctx.command.invoke(sub_ctx))\n'
2023-01-12 15:36:54            '  File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/click/core.py", line 1404, in invoke\n'
2023-01-12 15:36:54            '    return ctx.invoke(self.callback, **ctx.params)\n'
2023-01-12 15:36:54            '  File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/click/core.py", line 760, in invoke\n'
2023-01-12 15:36:54            '    return __callback(*args, **kwargs)\n'
2023-01-12 15:36:54            '  File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func\n'
2023-01-12 15:36:54            '    return f(get_current_context(), *args, **kwargs)\n'
2023-01-12 15:36:54            '  File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 344, in wrapper\n'
2023-01-12 15:36:54            '    raise e\n'
2023-01-12 15:36:54            '  File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 296, in wrapper\n'
2023-01-12 15:36:54            '    res = func(*args, **kwargs)\n'
2023-01-12 15:36:54            '  File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in '
2023-01-12 15:36:54            'wrapper\n'
2023-01-12 15:36:54            '    return func(ctx, *args, **kwargs)\n'
2023-01-12 15:36:54            '  File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 179, in run\n'
2023-01-12 15:36:54            '    pipeline = Pipeline.create(\n'
2023-01-12 15:36:54            '  File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 303, in create\n'
2023-01-12 15:36:54            '    return cls(\n'
2023-01-12 15:36:54            '  File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 186, in __init__\n'
2023-01-12 15:36:54            '    with _add_init_error_context(\n'
2023-01-12 15:36:54            '  File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__\n'
2023-01-12 15:36:54            '    self.gen.throw(typ, value, traceback)\n'
2023-01-12 15:36:54            '  File "/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 116, in '
2023-01-12 15:36:54            '_add_init_error_context\n'
2023-01-12 15:36:54            '    raise PipelineInitError(f"Failed to {step}: {e}") from e\n'
2023-01-12 15:36:54            'datahub.ingestion.run.pipeline.PipelineInitError: Failed to find a registered source for type kafka: kafka is disabled due to an error '
2023-01-12 15:36:54            'in initialization\n',
2023-01-12 15:36:54            "2023-01-12 15:36:54.486099 [exec_id=2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8] INFO: Failed to execute 'datahub ingest'",
2023-01-12 15:36:54            '2023-01-12 15:36:54.486365 [exec_id=2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8] INFO: Caught exception EXECUTING '
2023-01-12 15:36:54            'task_id=2a86cef2-d4e9-4dba-bc0c-9f6fa6725ba8, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
2023-01-12 15:36:54            '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
2023-01-12 15:36:54            '    task_event_loop.run_until_complete(task_future)\n'
2023-01-12 15:36:54            '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
2023-01-12 15:36:54            '    return future.result()\n'
2023-01-12 15:36:54            '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 227, in execute\n'
2023-01-12 15:36:54            '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
2023-01-12 15:36:54            "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
2023-01-12 15:36:54 Execution finished with errors.

If you are wondering about the import error

2023-01-12 15:36:54            "ImportError: cannot import name 'ResourceType' from 'confluent_kafka.admin' "
2023-01-12 15:36:54            '(/tmp/datahub/ingest/venv-kafka-0.9.5/lib/python3.10/site-packages/confluent_kafka/admin/__init__.py)\n'

This is the version of confluent_kafka it tried to install

2023-01-12 12:42:23 [ded72a25-bf82-4ff1-8e9c-eb3864409141 logs] Collecting confluent-kafka<1.9.0
2023-01-12 12:42:23 [ded72a25-bf82-4ff1-8e9c-eb3864409141 logs]   Using cached confluent_kafka-1.8.2-cp310-cp310-linux_aarch64.whl
@kaning kaning added the bug Bug report label Jan 12, 2023
@chriscollins3456
Copy link
Collaborator

After checking out the thread that lead to this Issue being created here: https://datahubspace.slack.com/archives/CV2KB471C/p1673529921033689 it appears like the real issue here is that we don't support confluent_kafka 1.9.0 and higher due to a hard compatibility break where confluent_kafka >= 1.9.0 requires librdkafka >=1.9.0 which isn't supported on all machines (comment explaining this here: https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/setup.py#L64-L93)

I'm sure there's something we can do but it would probably require some work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants