Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'AvroException' not found in 'avro.schema' module during metadata ingestion #11273

Closed
zeta9044 opened this issue Aug 30, 2024 · 1 comment · Fixed by #11311
Closed

'AvroException' not found in 'avro.schema' module during metadata ingestion #11273

zeta9044 opened this issue Aug 30, 2024 · 1 comment · Fixed by #11311
Labels
bug Bug report

Comments

@zeta9044
Copy link

zeta9044 commented Aug 30, 2024

Description:
I encountered an error while using the DataHub metadata-ingestion framework (version 0.14.0.2). The pipeline execution fails due to an AttributeError, specifically that the 'avro.schema' module has no attribute 'AvroException'.

Steps to Reproduce:

  1. Set up a metadata ingestion pipeline using the File source.
  2. Attempt to run the pipeline with a JSON file containing metadata.
  3. The pipeline fails with the following error:

Error Message:
PipelineExecutionError: ('Source reported errors', FileSourceReport(...))

Traceback:
Traceback (most recent call last):
File ".../batch_pipeline_ingest.py", line 108, in
run_pipeline(config)
File ".../batch_pipeline_ingest.py", line 103, in run_pipeline
pipeline.raise_from_status()
File ".../datahub/ingestion/run/pipeline.py", line 594, in raise_from_status
raise PipelineExecutionError(
datahub.configuration.common.PipelineExecutionError: ('Source reported errors', FileSourceReport(...))

The FileSourceReport contains multiple entries with the same error:
"module 'avro.schema' has no attribute 'AvroException'"

Environment:

  • Operating System: WSL2(Ubuntu-22.04)
  • Python Version: 3.10
  • DataHub Version: 0.14.0.2
  • Relevant package versions:
    • avro: [version]
    • [any other relevant packages and their versions]

Expected Behavior:
The pipeline should successfully process the metadata JSON file without raising an AttributeError related to 'AvroException'.

Actual Behavior:
The pipeline fails with an AttributeError, stating that the 'avro.schema' module has no attribute 'AvroException'.

Additional Context:
This error occurs consistently across multiple runs and affects the processing of various metadata entries in the JSON file.

Possible Related Issues:

  • Is there a version mismatch between the avro library and the version expected by DataHub?
  • Has there been a recent change in the avro library that might have removed or renamed the 'AvroException'?

I would appreciate any insights or suggestions on how to resolve this issue. Let me know if you need any additional information or if there are any specific diagnostic steps I should take.

@zeta9044 zeta9044 added the bug Bug report label Aug 30, 2024
@zeta9044 zeta9044 changed the title A short description of the bug 'AvroException' not found in 'avro.schema' module during metadata ingestion Aug 30, 2024
hsheth2 added a commit that referenced this issue Sep 5, 2024
@hsheth2
Copy link
Collaborator

hsheth2 commented Sep 5, 2024

@zeta9044 this looks like we weren't compatible with newer versions of the avro library. Fixing that here #11311

However, the root cause here is that some metadata (e.g. MCE or MCP) failed validation because it had extra/missing fields or incorrect types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants