-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(ingest): Refactor structured logging to support infos, warnings, and failures structured reporting to UI #10828
refactor(ingest): Refactor structured logging to support infos, warnings, and failures structured reporting to UI #10828
Conversation
…estion' into jj--add-structured-logging-to-ingestion
WalkthroughThe recent updates to DataHub's code involve a refactoring of the structured report handling functionality. This refactoring primarily revolves around changing data types in various TypeScript files to ensure consistency and reliability, particularly shifting from Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (4)
- metadata-ingestion/src/datahub/ingestion/api/exception.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/api/source.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/run/pipeline.py (7 hunks)
- metadata-ingestion/tests/unit/test_nifi_source.py (4 hunks)
Additional context used
Ruff
metadata-ingestion/src/datahub/ingestion/run/pipeline.py
103-104: Use a single
if
statement instead of nestedif
statements(SIM102)
541-543: Use
bool(...)
instead ofTrue if ... else False
Replace with `bool(...)
(SIM210)
Additional comments not posted (30)
metadata-ingestion/src/datahub/ingestion/api/exception.py (13)
4-5
: LGTM!The
ScanUnauthorizedException
class looks good.
8-9
: LGTM!The
LineageUnauthorizedException
class looks good.
12-13
: LGTM!The
UsageUnauthorizedException
class looks good.
16-17
: LGTM!The
ProfilingUnauthorizedException
class looks good.
20-21
: LGTM!The
LineageQueryParsingFailedException
class looks good.
24-25
: LGTM!The
UsageQueryParsingFailedException
class looks good.
28-29
: LGTM!The
ConnectionFailedCoordinatesException
class looks good.
32-33
: LGTM!The
ConnectionFailedCredentialsException
class looks good.
36-37
: LGTM!The
ConnectionFailedServiceUnavailableException
class looks good.
40-41
: LGTM!The
ConnectionFailedServiceTimeoutException
class looks good.
44-45
: LGTM!The
ConnectionFailedUnknownException
class looks good.
48-60
: LGTM!The
StructuredReportLogType
enum looks good.
63-75
: LGTM!The
EXCEPTION_TO_REPORT_TYPE
dictionary looks good.metadata-ingestion/src/datahub/ingestion/api/source.py (9)
65-68
: LGTM!The
StructuredLogLevel
enum looks good.
71-76
: LGTM!The
StructuredLog
dataclass looks good.
99-115
: LGTM!The
structured_logs
property method looks good.
147-164
: LGTM!The
report_warning
method looks good.
165-167
: LGTM!The
warning
method looks good.
169-186
: LGTM!The
report_failure
method looks good.
187-189
: LGTM!The
failure
method looks good.
190-206
: LGTM!The
report_info
method looks good.
208-210
: LGTM!The
info
method looks good.metadata-ingestion/tests/unit/test_nifi_source.py (4)
337-337
: LGTM!The
test_single_user_auth_failed_to_get_token
test function looks good.
356-356
: LGTM!The
test_kerberos_auth_failed_to_get_token
test function looks good.
376-376
: LGTM!The
test_client_cert_auth_failed
test function looks good.
396-396
: LGTM!The
test_failure_to_create_nifi_flow
test function looks good.metadata-ingestion/src/datahub/ingestion/run/pipeline.py (4)
512-513
: LGTM!The
run
method looks good.
623-623
: LGTM!The
_approx_all_vals
method looks good.
653-653
: LGTM!The
pretty_print_summary
method looks good.
721-732
: LGTM!The
_handle_uncaught_pipeline_exception
method looks good.
…estion' into jj--add-structured-logging-to-ingestion
…estion' into jj--add-structured-logging-to-ingestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (7)
- metadata-ingestion/src/datahub/ingestion/api/exception.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/api/source.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/run/pipeline.py (6 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/vertica.py (1 hunks)
- metadata-ingestion/tests/integration/snowflake/test_snowflake_failures.py (8 hunks)
- metadata-ingestion/tests/unit/test_nifi_source.py (4 hunks)
Files skipped from review as they are similar to previous changes (4)
- metadata-ingestion/src/datahub/ingestion/api/exception.py
- metadata-ingestion/src/datahub/ingestion/api/source.py
- metadata-ingestion/src/datahub/ingestion/run/pipeline.py
- metadata-ingestion/tests/unit/test_nifi_source.py
Additional context used
Ruff
metadata-ingestion/tests/integration/snowflake/test_snowflake_failures.py
77-77: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
99-99: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
121-121: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
149-149: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
175-176: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
201-202: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
232-232: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
254-254: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
@@ -96,7 +96,7 @@ def test_snowflake_missing_warehouse_access_causes_pipeline_failure( | |||
) | |||
pipeline = Pipeline(snowflake_pipeline_config) | |||
pipeline.run() | |||
assert "permission-error" in pipeline.source.get_report().failures.keys() | |||
assert "permission-error" in pipeline.source.get_report()._errors.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimize dictionary key check.
Use key in dict
instead of key in dict.keys()
.
- assert "permission-error" in pipeline.source.get_report()._errors.keys()
+ assert "permission-error" in pipeline.source.get_report()._errors
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
assert "permission-error" in pipeline.source.get_report()._errors.keys() | |
assert "permission-error" in pipeline.source.get_report()._errors |
Tools
Ruff
99-99: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
@@ -118,7 +118,7 @@ | |||
) | |||
pipeline = Pipeline(snowflake_pipeline_config) | |||
pipeline.run() | |||
assert "permission-error" in pipeline.source.get_report().failures.keys() | |||
assert "permission-error" in pipeline.source.get_report()._errors.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimize dictionary key check.
Use key in dict
instead of key in dict.keys()
.
- assert "permission-error" in pipeline.source.get_report()._errors.keys()
+ assert "permission-error" in pipeline.source.get_report()._errors
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
assert "permission-error" in pipeline.source.get_report()._errors.keys() | |
assert "permission-error" in pipeline.source.get_report()._errors |
Tools
Ruff
121-121: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
"Failed to get primary key for table" | ||
in pipeline.source.get_report().warnings.keys() | ||
in pipeline.source.get_report()._warnings.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimize dictionary key check.
Use key in dict
instead of key in dict.keys()
.
- "Failed to get primary key for table" in pipeline.source.get_report()._warnings.keys()
+ "Failed to get primary key for table" in pipeline.source.get_report()._warnings
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
"Failed to get primary key for table" | |
in pipeline.source.get_report().warnings.keys() | |
in pipeline.source.get_report()._warnings.keys() | |
"Failed to get primary key for table" | |
in pipeline.source.get_report()._warnings |
Tools
Ruff
201-202: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
@@ -146,7 +146,7 @@ | |||
|
|||
pipeline = Pipeline(snowflake_pipeline_config) | |||
pipeline.run() | |||
assert "permission-error" in pipeline.source.get_report().failures.keys() | |||
assert "permission-error" in pipeline.source.get_report()._errors.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimize dictionary key check.
Use key in dict
instead of key in dict.keys()
.
- assert "permission-error" in pipeline.source.get_report()._errors.keys()
+ assert "permission-error" in pipeline.source.get_report()._errors
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
assert "permission-error" in pipeline.source.get_report()._errors.keys() | |
assert "permission-error" in pipeline.source.get_report()._errors |
Tools
Ruff
149-149: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
assert ( | ||
"lineage-permission-error" in pipeline.source.get_report().failures.keys() | ||
) | ||
assert "lineage-permission-error" in pipeline.source.get_report()._errors.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimize dictionary key check.
Use key in dict
instead of key in dict.keys()
.
- assert "lineage-permission-error" in pipeline.source.get_report()._errors.keys()
+ assert "lineage-permission-error" in pipeline.source.get_report()._errors
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
assert "lineage-permission-error" in pipeline.source.get_report()._errors.keys() | |
assert "lineage-permission-error" in pipeline.source.get_report()._errors |
Tools
Ruff
232-232: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
@@ -253,4 +251,4 @@ | |||
) | |||
pipeline = Pipeline(snowflake_pipeline_config) | |||
pipeline.run() | |||
assert "usage-permission-error" in pipeline.source.get_report().failures.keys() | |||
assert "usage-permission-error" in pipeline.source.get_report()._errors.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimize dictionary key check.
Use key in dict
instead of key in dict.keys()
.
- assert "usage-permission-error" in pipeline.source.get_report()._errors.keys()
+ assert "usage-permission-error" in pipeline.source.get_report()._errors
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
assert "usage-permission-error" in pipeline.source.get_report()._errors.keys() | |
assert "usage-permission-error" in pipeline.source.get_report()._errors |
Tools
Ruff
254-254: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
@@ -74,7 +74,7 @@ | |||
|
|||
pipeline = Pipeline(snowflake_pipeline_config) | |||
pipeline.run() | |||
assert "permission-error" in pipeline.source.get_report().failures.keys() | |||
assert "permission-error" in pipeline.source.get_report()._errors.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimize dictionary key check.
Use key in dict
instead of key in dict.keys()
.
- assert "permission-error" in pipeline.source.get_report()._errors.keys()
+ assert "permission-error" in pipeline.source.get_report()._errors
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
assert "permission-error" in pipeline.source.get_report()._errors.keys() | |
assert "permission-error" in pipeline.source.get_report()._errors |
Tools
Ruff
77-77: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
"Failed to get columns for table" | ||
in pipeline.source.get_report().warnings.keys() | ||
in pipeline.source.get_report()._warnings.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimize dictionary key check.
Use key in dict
instead of key in dict.keys()
.
- "Failed to get columns for table" in pipeline.source.get_report()._warnings.keys()
+ "Failed to get columns for table" in pipeline.source.get_report()._warnings
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
"Failed to get columns for table" | |
in pipeline.source.get_report().warnings.keys() | |
in pipeline.source.get_report()._warnings.keys() | |
"Failed to get columns for table" in pipeline.source.get_report()._warnings |
Tools
Ruff
175-176: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
missing_column_info_warn = self.report._warnings.get(MISSING_COLUMN_INFO) | ||
if ( | ||
missing_column_info_warn is not None | ||
and dataset_name in missing_column_info_warn | ||
and dataset_name in missing_column_info_warn.context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Access the _errors
property instead of _warnings
.
The _warnings
property should be _errors
based on the AI-generated summary.
- missing_column_info_warn = self.report._warnings.get(MISSING_COLUMN_INFO)
+ missing_column_info_warn = self.report._errors.get(MISSING_COLUMN_INFO)
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
missing_column_info_warn = self.report._warnings.get(MISSING_COLUMN_INFO) | |
if ( | |
missing_column_info_warn is not None | |
and dataset_name in missing_column_info_warn | |
and dataset_name in missing_column_info_warn.context | |
missing_column_info_warn = self.report._errors.get(MISSING_COLUMN_INFO) | |
if ( | |
missing_column_info_warn is not None | |
and dataset_name in missing_column_info_warn.context |
missing_column_info_warn = self.report._warnings.get(MISSING_COLUMN_INFO) | ||
if ( | ||
missing_column_info_warn is not None | ||
and dataset_name in missing_column_info_warn | ||
and dataset_name in missing_column_info_warn.context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Access the _errors
property instead of _warnings
.
The _warnings
property should be _errors
based on the AI-generated summary.
- missing_column_info_warn = self.report._warnings.get(MISSING_COLUMN_INFO)
+ missing_column_info_warn = self.report._errors.get(MISSING_COLUMN_INFO)
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
missing_column_info_warn = self.report._warnings.get(MISSING_COLUMN_INFO) | |
if ( | |
missing_column_info_warn is not None | |
and dataset_name in missing_column_info_warn | |
and dataset_name in missing_column_info_warn.context | |
missing_column_info_warn = self.report._errors.get(MISSING_COLUMN_INFO) | |
if ( | |
missing_column_info_warn is not None | |
and dataset_name in missing_column_info_warn.context |
…estion' into jj--add-structured-logging-to-ingestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (11)
- metadata-ingestion/src/datahub/ingestion/api/exception.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/dynamodb/dynamodb.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py (5 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (13 hunks)
- metadata-ingestion/src/datahub/ingestion/source/metabase.py (15 hunks)
- metadata-ingestion/src/datahub/ingestion/source/mode.py (15 hunks)
- metadata-ingestion/src/datahub/ingestion/source/mongodb.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/openapi.py (6 hunks)
- metadata-ingestion/src/datahub/ingestion/source/redash.py (2 hunks)
- metadata-ingestion/src/datahub/ingestion/source/tableau.py (3 hunks)
Files skipped from review as they are similar to previous changes (1)
- metadata-ingestion/src/datahub/ingestion/api/exception.py
Additional context used
Ruff
metadata-ingestion/src/datahub/ingestion/source/openapi.py
340-340: Use
key not in dict
instead ofkey not in dict.keys()
Remove
.keys()
(SIM118)
Additional comments not posted (48)
metadata-ingestion/src/datahub/ingestion/source/openapi.py (3)
187-212
: LGTM!The changes to the
report_bad_responses
function improve error message clarity and are well-structured.
279-279
: LGTM!The changes to the
get_workunits_internal
function ensure consistent usage of thetype
parameter.Also applies to: 301-303, 328-332, 361-363, 393-395
340-340
: Simplify dictionary key check.Use
key not in dict
instead ofkey not in dict.keys()
.- if endpoint_k not in config.forced_examples.keys(): + if endpoint_k not in config.forced_examples:Tools
Ruff
340-340: Use
key not in dict
instead ofkey not in dict.keys()
Remove
.keys()
(SIM118)
metadata-ingestion/src/datahub/ingestion/source/mongodb.py (5)
321-323
: LGTM!The changes to the
get_pymongo_type_string
function improve the clarity of warning messages by providing additional context.
347-349
: LGTM!The changes to the
get_field_type
function improve the clarity of warning messages by providing additional context.
425-427
: LGTM!The changes to the
construct_schema_metadata
function improve the clarity of warning messages by providing additional context.
Line range hint
539-541
: LGTM!The changes to the
get_native_type
function improve the clarity of warning messages by providing additional context.
Line range hint
556-558
: LGTM!The changes to the
get_field_type
function improve the clarity of warning messages by providing additional context.metadata-ingestion/src/datahub/ingestion/source/dynamodb/dynamodb.py (4)
469-471
: LGTM!The changes to the
construct_schema_metadata
function improve the clarity of warning messages by providing additional context.
539-541
: LGTM!The changes to the
get_native_type
function improve the clarity of warning messages by providing additional context.
556-558
: LGTM!The changes to the
get_field_type
function improve the clarity of warning messages by providing additional context.
556-558
: LGTM!The changes to the
get_datasource_urn
function improve the clarity of warning messages by providing additional context.Also applies to: 571-573
metadata-ingestion/src/datahub/ingestion/source/metabase.py (8)
214-215
: LGTM!The changes to the
setup_session
function improve the clarity of failure messages by providing additional context.
226-227
: LGTM!The changes to the
close
function improve the clarity of failure messages by providing additional context.
260-262
: LGTM!The changes to the
emit_dashboard_mces
function improve the clarity of failure messages by providing additional context.
287-289
: LGTM!The changes to the
construct_dashboard_from_api_data
function improve the clarity of warning messages by providing additional context.
351-353
: LGTM!The changes to the
_get_ownership
function improve the clarity of warning messages by providing additional context.Also applies to: 358-360
392-394
: LGTM!The changes to the
emit_card_mces
function improve the clarity of failure messages by providing additional context.
415-417
: LGTM!The changes to the
construct_card_from_api_data
function improve the clarity of warning messages by providing additional context.Also applies to: 425-427, 434-436
556-558
: LGTM!The changes to the
get_datasource_urn
function improve the clarity of warning messages by providing additional context.Also applies to: 571-573
metadata-ingestion/src/datahub/ingestion/source/redash.py (2)
662-677
: LGTM!The changes to use
report_warning
for structured logging are appropriate and align with the new logging approach.
716-717
: LGTM!The changes to use
report_warning
for structured logging are appropriate and align with the new logging approach.metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (3)
581-582
: LGTM!The changes to use
report_warning
for structured logging are appropriate and align with the new logging approach.
838-840
: LGTM!The changes to use
report_warning
for structured logging are appropriate and align with the new logging approach.
986-988
: LGTM!The changes to use
report_warning
for structured logging are appropriate and align with the new logging approach.metadata-ingestion/src/datahub/ingestion/source/mode.py (5)
314-315
: Add structured logging for connection verification.The structured logging for connection verification looks good.
However, ensure that the structured logging functionality is verified.
Verification successful
Structured logging for connection verification is correctly implemented.
The structured logging for connection verification using
report_failure
is present in the codebase as expected.
metadata-ingestion/src/datahub/ingestion/source/mode.py
: Lines 314-315Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the structured logging for connection verification works correctly. # Test: Search for the structured logging usage in the codebase. Expect: structured logging for connection verification. rg --type python 'report_failure' --context 5Length of output: 76
Script:
#!/bin/bash # Description: Verify that the structured logging for connection verification works correctly. # Test: Search for the structured logging usage in Python files. Expect: structured logging for connection verification. rg 'report_failure' --glob '*.py' --context 5Length of output: 93868
531-532
: Add structured logging for failed retrieval of spaces for the workspace.The structured logging for failed retrieval of spaces for the workspace looks good.
However, ensure that the structured logging functionality is verified.
Verification successful
Structured logging for failed retrieval of spaces for the workspace is correctly implemented.
The structured logging functionality for failed retrieval of spaces for the workspace is verified and appears to be implemented correctly.
- The
report_failure
function is used for structured logging in the specified file and other parts of the codebase.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the structured logging for failed retrieval of spaces for the workspace works correctly. # Test: Search for the structured logging usage in the codebase. Expect: structured logging for failed retrieval of spaces for the workspace. rg --type python 'report_failure' --context 5Length of output: 76
Script:
#!/bin/bash # Description: Verify that the structured logging for failed retrieval of spaces for the workspace works correctly. # Test: Search for the structured logging usage in the codebase. Expect: structured logging for failed retrieval of spaces for the workspace. rg 'report_failure' --context 5Length of output: 93854
374-375
: Add structured logging for missing report tokens and IDs.The structured logging for missing report tokens and IDs looks good.
However, ensure that the structured logging functionality is verified.
Also applies to: 381-382
Verification successful
Add structured logging for missing report tokens and IDs.
The structured logging for missing report tokens and IDs is implemented correctly in the codebase. The
report_warning
method is utilized as expected.
metadata-ingestion/src/datahub/ingestion/source/mode.py
:
- Line 374:
self.report.report_warning(type="Missing Report Token", message=f"Report token is missing for {report_info.get('id', '')}")
- Line 381:
self.report.report_warning(type="Missing Report ID", message=f"Report id is missing for {report_info.get('token', '')}")
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the structured logging for missing report tokens and IDs works correctly. # Test: Search for the structured logging usage in the codebase. Expect: structured logging for missing report tokens and IDs. rg --type python 'report_warning' --context 5Length of output: 76
Script:
#!/bin/bash # Description: Verify that the structured logging for missing report tokens and IDs works correctly. # Test: Search for the structured logging usage in the codebase. Expect: structured logging for missing report tokens and IDs. rg 'report_warning' --glob '*.py' --context 5Length of output: 256673
679-680
: Add structured logging for failed retrieval of data sources.The structured logging for failed retrieval of data sources looks good.
However, ensure that the structured logging functionality is verified.
Verification successful
Structured logging for failed retrieval of data sources verified successfully.
The structured logging for failed retrieval of data sources is implemented correctly across the codebase, including the target file.
metadata-ingestion/src/datahub/ingestion/source/mode.py
:
- Line 679:
self.report.report_failure(type="Unable to retrieve Data Sources", message=f"Unable to retrieve data sources. Reason: {str(http_error)}")
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the structured logging for failed retrieval of data sources works correctly. # Test: Search for the structured logging usage in the codebase. Expect: structured logging for failed retrieval of data sources. rg --type python 'report_failure' --context 5Length of output: 76
Script:
#!/bin/bash # Description: Verify that the structured logging for failed retrieval of data sources works correctly. # Test: Search for the structured logging usage in the codebase. Expect: structured logging for failed retrieval of data sources. rg --type py 'report_failure' --context 5Length of output: 93864
491-493
: Add structured logging for failed retrieval of Mode creator.The structured logging for failed retrieval of Mode creator looks good.
However, ensure that the structured logging functionality is verified.
Verification successful
Add structured logging for failed retrieval of Mode creator.
The structured logging for failed retrieval of Mode creator looks good.
- File:
metadata-ingestion/src/datahub/ingestion/source/mode.py
- Lines: 491-493
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the structured logging for failed retrieval of Mode creator works correctly. # Test: Search for the structured logging usage in the codebase. Expect: structured logging for failed retrieval of Mode creator. rg --type python 'report_warning' --context 5Length of output: 76
Script:
#!/bin/bash # Description: Verify that the structured logging for failed retrieval of Mode creator works correctly. # Test: Search for the structured logging usage in the codebase. Expect: structured logging for failed retrieval of Mode creator. rg --type py 'report_warning' --context 5Length of output: 256669
metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py (4)
287-289
: LGTM! Structured logging for warnings.The changes to report warnings using structured logs are approved.
597-599
: LGTM! Structured logging for unrecognized chart types.The changes to report warnings for unrecognized chart types using structured logs are approved.
Also applies to: 606-608
1257-1259
: LGTM! Structured logging for errors and warnings.The changes to report errors and warnings using structured logs are approved.
Also applies to: 1269-1271
1549-1551
: LGTM! Structured logging for warnings.The changes to report warnings using structured logs are approved.
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (12)
422-424
: Good use of structured logging for error handling.The changes correctly implement the new structured logging format for reporting errors when loading included files.
525-529
: Good use of structured logging for error handling.The changes correctly implement the new structured logging format for reporting failures when resolving includes.
532-534
: Good use of structured logging for error handling.The changes correctly implement the new structured logging format for reporting failures when resolving wildcard includes.
573-575
: Good use of structured logging for error handling.The changes correctly implement the new structured logging format for reporting errors when loading included files.
686-690
: Good use of structured logging for error handling.The changes correctly implement the new structured logging format for reporting failures when loading view files.
708-712
: Good use of structured logging for error handling.The changes correctly implement the new structured logging format for reporting errors when loading view files.
1369-1371
: Good use of structured logging for error handling.The changes correctly implement the new structured logging format for reporting errors when parsing SQL queries.
1956-1958
: Good use of structured logging for error handling.The changes correctly implement the new structured logging format for reporting failures when no metadata is produced.
2083-2085
: Good use of structured logging for error handling.The changes correctly implement the new structured logging format for reporting errors when loading models.
2096-2098
: Good use of structured logging for error handling.The changes correctly implement the new structured logging format for reporting failures when loading connections.
2138-2140
: Good use of structured logging for error handling.The changes correctly implement the new structured logging format for reporting failures when processing explores.
2220-2222
: Good use of structured logging for error handling.The changes correctly implement the new structured logging format for reporting errors when loading Looker views.
metadata-ingestion/src/datahub/ingestion/source/tableau.py (2)
803-805
: Use consistent structured logging format.The new structured logging format is correctly implemented for reporting failures.
2825-2827
: Use consistent structured logging format.The new structured logging format is correctly implemented for reporting failures.
…estion' into jj--add-structured-logging-to-ingestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- metadata-ingestion/src/datahub/ingestion/api/source.py (4 hunks)
Additional comments not posted (7)
metadata-ingestion/src/datahub/ingestion/api/source.py (7)
66-70
: EnumStructuredLogLevel
is well-defined.The enum class
StructuredLogLevel
is correctly defined with levels INFO, WARN, and ERROR.
104-123
: Propertieswarnings
,failures
, andinfos
are well-defined.The properties correctly aggregate the respective logs into a
LossyList
.
295-302
: Methodas_obj
is well-defined.The method converts the report object to a dictionary with materialized properties.
202-247
: Methodsreport_failure
andfailure
are well-defined.The methods handle and log failures with structured logs.
Ensure the logging functionality is supported.
Verification successful
We need to re-verify the logging functionality by correctly specifying the file type for
rg
and checking the definition oflogger
.
Logging functionality is supported.
The
logger
is defined and used extensively throughout the codebase, confirming that the logging functionality is appropriately supported.
metadata-ingestion/src/datahub/ingestion/api/source.py
includes the requiredlogger.error
usage.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the logging functionality is supported. # Test: Search for the usage of `logger.error`. Expect: The logging functionality to be supported. rg --type python $'logger.error'Length of output: 62
Script:
#!/bin/bash # Description: Verify the logging functionality is supported. # Test: Search for the usage of `logger.error`. Expect: The logging functionality to be supported. rg --type py 'logger.error' # Test: Search for the definition of `logger`. Expect: `logger` to be defined and properly configured. rg --type py 'logger ='Length of output: 30465
249-289
: Methodsreport_info
andinfo
are well-defined.The methods handle and log info messages with structured logs.
Ensure the logging functionality is supported.
Verification successful
Methods
report_info
andinfo
are well-defined.The methods handle and log info messages with structured logs. The logging functionality is supported as confirmed by the presence of
logger.info
in multiple files across the codebase.
metadata-ingestion/src/datahub/ingestion/api/source.py
metadata-ingestion/src/datahub/ingestion/source_config/operation_config.py
metadata-ingestion/src/datahub/ingestion/source_config/sql/snowflake.py
- and many others.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the logging functionality is supported. # Test: Search for the usage of `logger.info`. Expect: The logging functionality to be supported. rg --type python $'logger.info'Length of output: 61
Script:
#!/bin/bash # Description: Verify the logging functionality is supported. # Test: Search for the usage of `logger.info`. Expect: The logging functionality to be supported. rg --type py $'logger.info'Length of output: 39297
72-79
: DataclassStructuredLog
is well-defined.The dataclass
StructuredLog
is correctly defined with appropriate fields for structured logging.Ensure that the
Report
class supports the required functionality.Verification successful
Dataclass
StructuredLog
is well-defined.The dataclass
StructuredLog
is correctly defined with appropriate fields for structured logging. TheReport
class provides utility methods that support the required functionality forStructuredLog
.
metadata-ingestion/src/datahub/ingestion/api/report.py
: TheReport
class extendsSupportsAsObj
and includes methods for converting values to strings and pure Python objects.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the `Report` class supports the required functionality. # Test: Search for the `Report` class definition. Expect: The class to support required functionality. rg --type python $'class Report'Length of output: 64
Script:
#!/bin/bash # Description: Verify the `Report` class supports the required functionality. # Test: Search for the `Report` class definition in Python files. rg 'class Report' --glob '*.py'Length of output: 536
Script:
#!/bin/bash # Description: Extract the definition and methods of the `Report` class in `metadata-ingestion/src/datahub/ingestion/api/report.py`. # Extract the `Report` class definition and its methods. ast-grep --lang python --pattern $'class Report($_) { $$$ }' metadata-ingestion/src/datahub/ingestion/api/report.pyLength of output: 567
Script:
#!/bin/bash # Description: Extract the definition and methods of the `Report` class in `metadata-ingestion/src/datahub/ingestion/api/report.py`. # Extract the `Report` class definition and its methods. ast-grep --lang python --pattern 'class Report: $$$' metadata-ingestion/src/datahub/ingestion/api/report.pyLength of output: 109
Script:
#!/bin/bash # Description: Extract the definition and methods of the `Report` class in `metadata-ingestion/src/datahub/ingestion/api/report.py`. # Search for the `Report` class definition and output surrounding lines for context. rg 'class Report' -A 20 metadata-ingestion/src/datahub/ingestion/api/report.pyLength of output: 1409
155-201
: Methodsreport_warning
andwarning
are well-defined.The methods handle and log warnings with structured logs.
Ensure the logging functionality is supported.
Verification successful
Methods
report_warning
andwarning
are well-defined.The methods handle and log warnings with structured logs. The logging functionality is supported across the codebase.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the logging functionality is supported. # Test: Search for the usage of `logger.warning`. Expect: The logging functionality to be supported. rg --type python $'logger.warning'Length of output: 64
Script:
#!/bin/bash # Description: Verify the logging functionality is supported. # Test: Search for the usage of `logger.warning` in .py files. rg 'logger.warning' --glob '*.py'Length of output: 28807
…estion' into jj--add-structured-logging-to-ingestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Outside diff range and nitpick comments (4)
datahub-web-react/src/app/ingest/source/types.ts (1)
38-38
: Ensure Consistency in Field NamesThe
items
field inStructuredReport
should be consistent with the naming conventions used in the rest of the interface.Consider renaming
items
tologEntries
for clarity and consistency.datahub-web-react/src/app/ingest/source/utils.ts (3)
131-133
: Use Constants for Regular ExpressionsConsider defining the URL pattern as a constant outside the function for better readability and maintainability.
const URL_PATTERN = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[a-zA-Z0-9.-]{2,})+[\w\-._~:/?#[\]@!$&'()*+,;=.]+$/; export const validateURL = (fieldName: string) => { return { validator(_, value) { const isURLValid = URL_PATTERN.test(value); if (!value || isURLValid) { return Promise.resolve(); } return Promise.reject(new Error(`A valid ${fieldName} is required.`)); }, }; };
165-173
: Improve Commenting and DocumentationThe comments and documentation for
mapItemObject
andmapItemArray
could be more detailed to explain the purpose and usage of these helper functions.Consider adding more detailed comments and examples to improve readability and maintainability.
Also applies to: 178-181
Line range hint
341-341
: Simplify Dictionary Key CheckUse
key not in dict
instead ofkey not in dict.keys()
for better readability and performance.if endpoint_k not in config.forced_examples:
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (13)
- datahub-web-react/src/app/ingest/source/types.ts (2 hunks)
- datahub-web-react/src/app/ingest/source/utils.ts (4 hunks)
- metadata-ingestion/src/datahub/ingestion/api/source.py (4 hunks)
- metadata-ingestion/src/datahub/ingestion/source/dynamodb/dynamodb.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py (5 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (13 hunks)
- metadata-ingestion/src/datahub/ingestion/source/metabase.py (15 hunks)
- metadata-ingestion/src/datahub/ingestion/source/mode.py (15 hunks)
- metadata-ingestion/src/datahub/ingestion/source/mongodb.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/openapi.py (6 hunks)
- metadata-ingestion/src/datahub/ingestion/source/redash.py (2 hunks)
- metadata-ingestion/src/datahub/ingestion/source/tableau.py (3 hunks)
Files skipped from review due to trivial changes (1)
- metadata-ingestion/src/datahub/ingestion/source/tableau.py
Files skipped from review as they are similar to previous changes (7)
- metadata-ingestion/src/datahub/ingestion/source/dynamodb/dynamodb.py
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py
- metadata-ingestion/src/datahub/ingestion/source/metabase.py
- metadata-ingestion/src/datahub/ingestion/source/mode.py
- metadata-ingestion/src/datahub/ingestion/source/mongodb.py
- metadata-ingestion/src/datahub/ingestion/source/redash.py
Additional context used
Ruff
metadata-ingestion/src/datahub/ingestion/source/openapi.py
341-341: Use
key not in dict
instead ofkey not in dict.keys()
Remove
.keys()
(SIM118)
Additional comments not posted (27)
datahub-web-react/src/app/ingest/source/types.ts (1)
24-26
: Ensure Optional Fields are Properly HandledThe
title
field is marked as optional, butmessage
andcontext
are not. Ensure that all usages ofStructuredReportLogEntry
properly handle the case wheretitle
isundefined
.datahub-web-react/src/app/ingest/source/utils.ts (3)
16-16
: Imports Look GoodThe imports for
StructuredReport
,StructuredReportLogEntry
, andStructuredReportItemLevel
are correctly added.
145-145
: Ensure Case-Insensitive MatchingEnsure that
toLocaleUpperCase
is appropriate for your use case. If you need case-insensitive matching, consider usingtoUpperCase
for simplicity.
Line range hint
148-153
: Functionality Looks GoodThe
createStructuredReport
function correctly calculates the counts and returns the structured report object.metadata-ingestion/src/datahub/ingestion/source/openapi.py (5)
280-280
: Ensure Proper Warning HandlingThe warning message splitting logic should be robust to handle unexpected formats.
Ensure that the warning message splitting logic correctly handles all expected formats.
302-304
: Ensure Context is Properly HandledThe
context
field is newly introduced. Ensure that all usages correctly handle this field.
330-333
: Ensure Consistent Warning MessagesEnsure that the warning messages are consistent and provide enough context for debugging.
362-364
: Ensure Consistent Warning MessagesEnsure that the warning messages are consistent and provide enough context for debugging.
394-396
: Ensure Consistent Warning MessagesEnsure that the warning messages are consistent and provide enough context for debugging.
metadata-ingestion/src/datahub/ingestion/api/source.py (13)
66-70
: Enum Declaration Looks GoodThe
StructuredLogLevel
enum is correctly declared with the appropriate log levels.
72-79
: Dataclass Declaration Looks GoodThe
StructuredLog
dataclass is correctly declared with the appropriate fields.
95-103
: LossyDict Initialization Looks GoodThe initialization of
_errors
,_warnings
, and_infos
usingLossyDict
is correct.
104-110
: Property Method Looks GoodThe
warnings
property method correctly aggregates the warnings.
111-117
: Property Method Looks GoodThe
failures
property method correctly aggregates the failures.
118-123
: Property Method Looks GoodThe
infos
property method correctly aggregates the infos.
155-175
: Method Documentation Looks GoodThe documentation for the
report_warning
method is clear and detailed.Also applies to: 176-181
192-201
: Method Implementation Looks GoodThe
warning
method correctly callsreport_warning
and logs the warning.
202-222
: Method Documentation Looks GoodThe documentation for the
report_failure
method is clear and detailed.Also applies to: 223-228
239-248
: Method Implementation Looks GoodThe
failure
method correctly callsreport_failure
and logs the error.
249-266
: Method Documentation Looks GoodThe documentation for the
report_info
method is clear and detailed.Also applies to: 267-272
282-289
: Method Implementation Looks GoodThe
info
method correctly callsreport_info
and logs the info.
295-302
: Method Implementation Looks GoodThe
as_obj
method correctly materializes the properties for the report object.metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (5)
422-424
: Structured logging improvements look good.The changes to add structured logging for errors when loading included files provide more context and improve error reporting.
525-534
: Structured logging improvements look good.The changes to add structured logging for failures when resolving includes provide more context and improve error reporting.
686-690
: Structured logging improvements look good.The changes to add structured logging for failures when loading view files provide more context and improve error reporting.
708-712
: Structured logging improvements look good.The changes to add structured logging for failures when loading view files provide more context and improve error reporting.
1369-1371
: Structured logging improvements look good.The changes to add structured logging for failures when parsing SQL provide more context and improve error reporting.
def report_bad_responses(self, status_code: int, type: str) -> None: | ||
if status_code == 400: | ||
self.report.report_warning( | ||
key=key, reason="Unknown error for reaching endpoint" | ||
title=type, | ||
message="Bad request body when retrieving data from endpoint", | ||
) | ||
elif status_code == 403: | ||
self.report.report_warning(key=key, reason="Not authorised to get endpoint") | ||
self.report.report_warning( | ||
title=type, | ||
message="Not authorised to retrieve data from OpenAPI endpoint", | ||
) | ||
elif status_code == 404: | ||
self.report.report_warning( | ||
key=key, | ||
reason="Unable to find an example for endpoint. Please add it to the list of forced examples.", | ||
title=type, | ||
message="Unable to find an example for endpoint. Please add it to the list of forced examples.", | ||
) | ||
elif status_code == 500: | ||
self.report.report_warning( | ||
key=key, reason="Server error for reaching endpoint" | ||
title=type, message="Server error for reaching endpoint" | ||
) | ||
elif status_code == 504: | ||
self.report.report_warning(key=key, reason="Timeout for reaching endpoint") | ||
self.report.report_warning( | ||
title=type, message="Timeout for reaching endpoint" | ||
) | ||
else: | ||
raise Exception( | ||
f"Unable to retrieve endpoint, response code {status_code}, key {key}" | ||
f"Unable to retrieve endpoint, response code {status_code}, key {type}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor Method for Consistency and Readability
The report_bad_responses
method could be refactored to reduce redundancy and improve readability.
def report_bad_responses(self, status_code: int, type: str) -> None:
messages = {
}
message = messages.get(status_code)
if message:
self.report.report_warning(title=type, message=message)
else:
raise Exception(f"Unable to retrieve endpoint, response code {status_code}, key {type}")
schema_metadata = set_metadata(dataset_name, fields2add) | ||
dataset_snapshot.aspects.append(schema_metadata) | ||
|
||
yield self.build_wu(dataset_snapshot, dataset_name) | ||
else: | ||
self.report_bad_responses(response.status_code, key=endpoint_k) | ||
self.report_bad_responses(response.status_code, type=endpoint_k) | ||
else: | ||
if endpoint_k not in config.forced_examples.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify Dictionary Key Check
Use key not in dict
instead of key not in dict.keys()
.
if endpoint_k not in config.forced_examples:
Tools
Ruff
341-341: Use
key not in dict
instead ofkey not in dict.keys()
Remove
.keys()
(SIM118)
…b.com/acryldata/datahub-fork into jj--add-structured-logging-to-ingestion
…estion' into jj--add-structured-logging-to-ingestion
…estion' into jj--add-structured-logging-to-ingestion
…estion' into jj--add-structured-logging-to-ingestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range and nitpick comments (2)
metadata-ingestion/tests/integration/mode/test_mode.py (2)
Line range hint
66-66
: Rename the test function to reflect Metabase ingestion.The function name should be updated to
test_metabase_ingest_success
to reflect Metabase ingestion.- def test_mode_ingest_success(pytestconfig, tmp_path): + def test_metabase_ingest_success(pytestconfig, tmp_path):
Line range hint
163-163
: Rename the test function to reflect Metabase ingestion.The function name should be updated to
test_metabase_ingest_failure
to reflect Metabase ingestion.- def test_mode_ingest_failure(pytestconfig, tmp_path): + def test_metabase_ingest_failure(pytestconfig, tmp_path):
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- metadata-ingestion/src/datahub/ingestion/source/mode.py (15 hunks)
- metadata-ingestion/tests/integration/metabase/test_metabase.py (2 hunks)
- metadata-ingestion/tests/integration/mode/test_mode.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- metadata-ingestion/src/datahub/ingestion/source/mode.py
Additional comments not posted (2)
metadata-ingestion/tests/integration/metabase/test_metabase.py (2)
166-166
: LGTM!The function name and logic correctly reflect Metabase ingestion.
263-263
: LGTM!The function name and logic correctly reflect Metabase ingestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs some major cleanup, but approving to unblock
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py
Outdated
Show resolved
Hide resolved
…estion' into jj--add-structured-logging-to-ingestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 19
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (5)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (13 hunks)
- metadata-ingestion/src/datahub/ingestion/source/mode.py (15 hunks)
- metadata-ingestion/src/datahub/ingestion/source/mongodb.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/openapi.py (6 hunks)
- metadata-ingestion/src/datahub/ingestion/source/tableau.py (3 hunks)
Files skipped from review as they are similar to previous changes (3)
- metadata-ingestion/src/datahub/ingestion/source/mode.py
- metadata-ingestion/src/datahub/ingestion/source/mongodb.py
- metadata-ingestion/src/datahub/ingestion/source/tableau.py
Additional context used
Ruff
metadata-ingestion/src/datahub/ingestion/source/openapi.py
347-347: Use
key not in dict
instead ofkey not in dict.keys()
Remove
.keys()
(SIM118)
Additional comments not posted (1)
metadata-ingestion/src/datahub/ingestion/source/openapi.py (1)
Line range hint
413-413
: LGTM!The
get_report
function is a simple getter and aligns with the new structured logging format.
title="Error Parsing SQL", | ||
message="Failed to parse sql query, lineage will not be accurate.", | ||
context=f"Table Name: {sql_table_name}, Query: {sql_query}", | ||
exc=e, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use consistent structured logging format.
The report_warning
method should be used consistently with the new structured logging format.
- reporter.report_warning(
- title="Error Parsing SQL",
- message="Failed to parse sql query, lineage will not be accurate.",
- context=f"Table Name: {sql_table_name}, Query: {sql_query}",
- exc=e,
+ reporter.report_warning(
+ structured_log=StructuredLog(
+ level=StructuredLogLevel.WARN,
+ title="Error Parsing SQL",
+ message="Failed to parse sql query, lineage will not be accurate.",
+ context=f"Table Name: {sql_table_name}, Query: {sql_query}",
+ exc=e,
+ )
)
Committable suggestion was skipped due to low confidence.
title="Failed to Load Connection", | ||
message="Failed to load connection. Check your API key permissions and/or connection_to_platform_map configuration.", | ||
context=f"Connection: {model.connection}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use consistent structured logging format.
The report_warning
method should be used consistently with the new structured logging format.
- self.reporter.report_warning(
- title="Failed to Load Connection",
- message="Failed to load connection. Check your API key permissions and/or connection_to_platform_map configuration.",
- context=f"Connection: {model.connection}",
+ self.reporter.report_warning(
+ structured_log=StructuredLog(
+ level=StructuredLogLevel.WARN,
+ title="Failed to Load Connection",
+ message="Failed to load connection. Check your API key permissions and/or connection_to_platform_map configuration.",
+ context=f"Connection: {model.connection}",
+ )
)
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
title="Failed to Load Connection", | |
message="Failed to load connection. Check your API key permissions and/or connection_to_platform_map configuration.", | |
context=f"Connection: {model.connection}", | |
self.reporter.report_warning( | |
structured_log=StructuredLog( | |
level=StructuredLogLevel.WARN, | |
title="Failed to Load Connection", | |
message="Failed to load connection. Check your API key permissions and/or connection_to_platform_map configuration.", | |
context=f"Connection: {model.connection}", | |
) | |
) |
title="Failed to process explores", | ||
message="Failed to process explore dictionary.", | ||
context=f"Explore Details: {explore_dict}", | ||
exc=e, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use consistent structured logging format.
The report_warning
method should be used consistently with the new structured logging format.
- self.reporter.report_warning(
- title="Failed to process explores",
- message="Failed to process explore dictionary.",
- context=f"Explore Details: {explore_dict}",
- exc=e,
+ self.reporter.report_warning(
+ structured_log=StructuredLog(
+ level=StructuredLogLevel.WARN,
+ title="Failed to process explores",
+ message="Failed to process explore dictionary.",
+ context=f"Explore Details: {explore_dict}",
+ exc=e,
+ )
)
Committable suggestion was skipped due to low confidence.
title="Malformed Table Name", | ||
message="Table name has more than 3 parts.", | ||
context=f"Table Name: {sql_table_name}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use consistent structured logging format.
The report_warning
method should be used consistently with the new structured logging format.
- self.reporter.report_warning(
- title="Malformed Table Name",
- message="Table name has more than 3 parts.",
- context=f"Table Name: {sql_table_name}",
+ self.reporter.report_warning(
+ structured_log=StructuredLog(
+ level=StructuredLogLevel.WARN,
+ title="Malformed Table Name",
+ message="Table name has more than 3 parts.",
+ context=f"Table Name: {sql_table_name}",
+ )
)
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
title="Malformed Table Name", | |
message="Table name has more than 3 parts.", | |
context=f"Table Name: {sql_table_name}", | |
self.reporter.report_warning( | |
structured_log=StructuredLog( | |
level=StructuredLogLevel.WARN, | |
title="Malformed Table Name", | |
message="Table name has more than 3 parts.", | |
context=f"Table Name: {sql_table_name}", | |
) | |
) |
self.reporter.report_failure( | ||
message="Failed to parse view file", | ||
context=f"Path: {path}", | ||
exc=e, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use consistent structured logging format.
The report_failure
method should be used consistently with the new structured logging format.
- self.reporter.report_failure(
- message="Failed to parse view file",
- context=f"Path: {path}",
- exc=e,
+ self.reporter.report_failure(
+ structured_log=StructuredLog(
+ level=StructuredLogLevel.ERROR,
+ message="Failed to parse view file",
+ context=f"Path: {path}",
+ exc=e,
+ )
)
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
self.reporter.report_failure( | |
message="Failed to parse view file", | |
context=f"Path: {path}", | |
exc=e, | |
) | |
self.reporter.report_failure( | |
structured_log=StructuredLog( | |
level=StructuredLogLevel.ERROR, | |
message="Failed to parse view file", | |
context=f"Path: {path}", | |
exc=e, | |
) | |
) |
def report_bad_responses(self, status_code: int, type: str) -> None: | ||
if status_code == 400: | ||
self.report.report_warning( | ||
key=key, reason="Unknown error for reaching endpoint" | ||
title="Failed to Extract Metadata", | ||
message="Bad request body when retrieving data from OpenAPI endpoint", | ||
context=f"Endpoint Type: {type}, Status Code: {status_code}", | ||
) | ||
elif status_code == 403: | ||
self.report.report_warning(key=key, reason="Not authorised to get endpoint") | ||
self.report.report_warning( | ||
title="Unauthorized to Extract Metadata", | ||
message="Received unauthorized response when attempting to retrieve data from OpenAPI endpoint", | ||
context=f"Endpoint Type: {type}, Status Code: {status_code}", | ||
) | ||
elif status_code == 404: | ||
self.report.report_warning( | ||
key=key, | ||
reason="Unable to find an example for endpoint. Please add it to the list of forced examples.", | ||
title="Failed to Extract Metadata", | ||
message="Unable to find an example for endpoint. Please add it to the list of forced examples.", | ||
context=f"Endpoint Type: {type}, Status Code: {status_code}", | ||
) | ||
elif status_code == 500: | ||
self.report.report_warning( | ||
key=key, reason="Server error for reaching endpoint" | ||
title="Failed to Extract Metadata", | ||
message="Received unknown server error from OpenAPI endpoint", | ||
context=f"Endpoint Type: {type}, Status Code: {status_code}", | ||
) | ||
elif status_code == 504: | ||
self.report.report_warning(key=key, reason="Timeout for reaching endpoint") | ||
self.report.report_warning( | ||
title="Failed to Extract Metadata", | ||
message="Timed out when attempting to retrieve data from OpenAPI endpoint", | ||
context=f"Endpoint Type: {type}, Status Code: {status_code}", | ||
) | ||
else: | ||
raise Exception( | ||
f"Unable to retrieve endpoint, response code {status_code}, key {key}" | ||
f"Unable to retrieve endpoint, response code {status_code}, key {type}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor Method for Consistency and Readability
The report_bad_responses
method could be refactored to reduce redundancy and improve readability.
def report_bad_responses(self, status_code: int, type: str) -> None:
messages = {
}
if status_code in messages:
title, message = messages[status_code]
self.report.report_warning(
title=title,
message=message,
context=f"Endpoint Type: {type}, Status Code: {status_code}",
)
else:
raise Exception(
f"Unable to retrieve endpoint, response code {status_code}, key {type}"
)
self.report.info( | ||
message="No fields found from endpoint response.", | ||
context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Structured Logging for Infos
The info message should include a title for consistency with other structured logs.
- self.report.info(
- message="No fields found from endpoint response.",
- context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}",
- )
+ self.report.info(
+ title="Info",
+ message="No fields found from endpoint response.",
+ context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}",
+ )
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
self.report.info( | |
message="No fields found from endpoint response.", | |
context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}", | |
self.report.info( | |
title="Info", | |
message="No fields found from endpoint response.", | |
context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}", |
title="Failed to Extract Endpoint Metadata", | ||
message=f"No example provided for {endpoint_dets['method']}", | ||
context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Structured Logging for Warnings
The warning message should include a title for consistency with other structured logs.
- self.report.report_warning(
- message=f"No example provided for {endpoint_dets['method']}",
- context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}",
- )
+ self.report.report_warning(
+ title="Failed to Extract Endpoint Metadata",
+ message=f"No example provided for {endpoint_dets['method']}",
+ context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}",
+ )
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
title="Failed to Extract Endpoint Metadata", | |
message=f"No example provided for {endpoint_dets['method']}", | |
context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}", | |
self.report.report_warning( | |
title="Failed to Extract Endpoint Metadata", | |
message=f"No example provided for {endpoint_dets['method']}", | |
context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}", | |
) |
@@ -271,7 +284,7 @@ | |||
for w in warn_c: | |||
w_msg = w.message | |||
w_spl = w_msg.args[0].split(" --- ") # type: ignore | |||
self.report.report_warning(key=w_spl[1], reason=w_spl[0]) | |||
self.report.report_warning(message=w_spl[1], context=w_spl[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Structured Logging for Warnings
The warning message should include a title for consistency with other structured logs.
- self.report.report_warning(message=w_spl[1], context=w_spl[0])
+ self.report.report_warning(title="Warning", message=w_spl[1], context=w_spl[0])
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
self.report.report_warning(message=w_spl[1], context=w_spl[0]) | |
self.report.report_warning(title="Warning", message=w_spl[1], context=w_spl[0]) |
self.report.info( | ||
message="No fields found from endpoint response.", | ||
context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Structured Logging for Infos
The info message should include a title for consistency with other structured logs.
- self.report.info(
- message="No fields found from endpoint response.",
- context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}",
- )
+ self.report.info(
+ title="Info",
+ message="No fields found from endpoint response.",
+ context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}",
+ )
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
self.report.info( | |
message="No fields found from endpoint response.", | |
context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}", | |
) | |
self.report.info( | |
title="Info", | |
message="No fields found from endpoint response.", | |
context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}", | |
) |
…estion' into jj--add-structured-logging-to-ingestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- datahub-web-react/src/app/ingest/source/utils.ts (4 hunks)
Additional comments not posted (7)
datahub-web-react/src/app/ingest/source/utils.ts (7)
16-16
: Import statement is correct.The import statement correctly imports
StructuredReport
,StructuredReportLogEntry
, andStructuredReportItemLevel
from thetypes
module. These imports are necessary for the structured logging functionality.
143-143
: Function correctly handles the new type.The
createStructuredReport
function correctly handlesStructuredReportLogEntry[]
and accurately counts the number of errors, warnings, and infos.
160-160
: Function and helper functions correctly handle legacy and new structured report formats.The
transformToStructuredReport
function and its helper functions correctly map legacy and new structured report formats toStructuredReportLogEntry[]
. The use oftry-catch
ensures that any errors during transformation are caught and logged.
224-224
: Function correctly extracts and transforms the structured report.The
getStructuredReport
function correctly extracts the serialized structured report, parses it into a JSON object, and transforms it using thetransformToStructuredReport
function.
Line range hint
239-239
:
Function correctly determines the ingestion source status.The
getIngestionSourceStatus
function correctly determines the status based on the structured report and the presence of warnings. The logic to map SUCCESS to SUCCEEDED_WITH_WARNINGS is appropriate.
164-164
: Helper function correctly maps legacy item objects.The
mapItemObject
helper function correctly maps legacy item objects toStructuredReportLogEntry[]
. The use ofObject.entries
and mapping to the new structure is appropriate.
174-174
: Helper function correctly maps new item arrays.The
mapItemArray
helper function correctly maps new item arrays toStructuredReportLogEntry[]
. The function handles edge cases, such as items being strings, appropriately by returningnull
.
…estion' into jj--add-structured-logging-to-ingestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- datahub-web-react/src/app/ingest/source/executions/reporting/StructuredReportItem.tsx (2 hunks)
- datahub-web-react/src/app/ingest/source/executions/reporting/StructuredReportItemContext.tsx (2 hunks)
- datahub-web-react/src/app/ingest/source/executions/reporting/StructuredReportItemList.tsx (2 hunks)
Additional comments not posted (6)
datahub-web-react/src/app/ingest/source/executions/reporting/StructuredReportItemContext.tsx (2)
6-6
: Update import to useStructuredReportLogEntry
.The import statement has been updated to use
StructuredReportLogEntry
, which aligns with the new data structure.
29-29
: Update Props interface to useStructuredReportLogEntry
.The Props interface has been updated to use
StructuredReportLogEntry
instead ofStructuredReportItem
. This change is consistent with the overall refactoring.datahub-web-react/src/app/ingest/source/executions/reporting/StructuredReportItemList.tsx (2)
5-5
: Update import to useStructuredReportLogEntry
.The import statement has been updated to use
StructuredReportLogEntry
, which aligns with the new data structure.
14-14
: Update Props interface to useStructuredReportLogEntry[]
.The Props interface has been updated to use
StructuredReportLogEntry[]
instead ofStructuredReportItemType[]
. This change is consistent with the overall refactoring.datahub-web-react/src/app/ingest/source/executions/reporting/StructuredReportItem.tsx (2)
8-8
: Update import to useStructuredReportLogEntry
.The import statement has been updated to use
StructuredReportLogEntry
, which aligns with the new data structure.
54-54
: Update Props interface to useStructuredReportLogEntry
.The Props interface has been updated to use
StructuredReportLogEntry
instead ofStructuredReportItem
. This change is consistent with the overall refactoring.
@jjoyce0510 CI is still red |
…estion' into jj--add-structured-logging-to-ingestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- datahub-web-react/src/app/ingest/source/executions/reporting/StructuredReportItemList.tsx (3 hunks)
Files skipped from review as they are similar to previous changes (1)
- datahub-web-react/src/app/ingest/source/executions/reporting/StructuredReportItemList.tsx
…ngs, and failures structured reporting to UI (#10828) Co-authored-by: John Joyce <[email protected]> Co-authored-by: Harshal Sheth <[email protected]>
…ngs, and failures structured reporting to UI (datahub-project#10828) Co-authored-by: John Joyce <[email protected]> Co-authored-by: Harshal Sheth <[email protected]>
* feat(forms) Handle deleting forms references when hard deleting forms (datahub-project#10820) * refactor(ui): Misc improvements to the setup ingestion flow (ingest uplift 1/2) (datahub-project#10764) Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> * fix(ingestion/airflow-plugin): pipeline tasks discoverable in search (datahub-project#10819) * feat(ingest/transformer): tags to terms transformer (datahub-project#10758) Co-authored-by: Aseem Bansal <[email protected]> * fix(ingestion/unity-catalog): fixed issue with profiling with GE turned on (datahub-project#10752) Co-authored-by: Aseem Bansal <[email protected]> * feat(forms) Add java SDK for form entity PATCH + CRUD examples (datahub-project#10822) * feat(SDK) Add java SDK for structuredProperty entity PATCH + CRUD examples (datahub-project#10823) * feat(SDK) Add StructuredPropertyPatchBuilder in python sdk and provide sample CRUD files (datahub-project#10824) * feat(forms) Add CRUD endpoints to GraphQL for Form entities (datahub-project#10825) * add flag for includeSoftDeleted in scroll entities API (datahub-project#10831) * feat(deprecation) Return actor entity with deprecation aspect (datahub-project#10832) * feat(structuredProperties) Add CRUD graphql APIs for structured property entities (datahub-project#10826) * add scroll parameters to openapi v3 spec (datahub-project#10833) * fix(ingest): correct profile_day_of_week implementation (datahub-project#10818) * feat(ingest/glue): allow ingestion of empty databases from Glue (datahub-project#10666) Co-authored-by: Harshal Sheth <[email protected]> * feat(cli): add more details to get cli (datahub-project#10815) * fix(ingestion/glue): ensure date formatting works on all platforms for aws glue (datahub-project#10836) * fix(ingestion): fix datajob patcher (datahub-project#10827) * fix(smoke-test): add suffix in temp file creation (datahub-project#10841) * feat(ingest/glue): add helper method to permit user or group ownership (datahub-project#10784) * feat(): Show data platform instances in policy modal if they are set on the policy (datahub-project#10645) Co-authored-by: Hendrik Richert <[email protected]> * docs(patch): add patch documentation for how implementation works (datahub-project#10010) Co-authored-by: John Joyce <[email protected]> * fix(jar): add missing custom-plugin-jar task (datahub-project#10847) * fix(): also check exceptions/stack trace when filtering log messages (datahub-project#10391) Co-authored-by: John Joyce <[email protected]> * docs(): Update posts.md (datahub-project#9893) Co-authored-by: Hyejin Yoon <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * chore(ingest): update acryl-datahub-classify version (datahub-project#10844) * refactor(ingest): Refactor structured logging to support infos, warnings, and failures structured reporting to UI (datahub-project#10828) Co-authored-by: John Joyce <[email protected]> Co-authored-by: Harshal Sheth <[email protected]> * fix(restli): log aspect-not-found as a warning rather than as an error (datahub-project#10834) * fix(ingest/nifi): remove duplicate upstream jobs (datahub-project#10849) * fix(smoke-test): test access to create/revoke personal access tokens (datahub-project#10848) * fix(smoke-test): missing test for move domain (datahub-project#10837) * ci: update usernames to not considered for community (datahub-project#10851) * env: change defaults for data contract visibility (datahub-project#10854) * fix(ingest/tableau): quote special characters in external URL (datahub-project#10842) * fix(smoke-test): fix flakiness of auto complete test * ci(ingest): pin dask dependency for feast (datahub-project#10865) * fix(ingestion/lookml): liquid template resolution and view-to-view cll (datahub-project#10542) * feat(ingest/audit): add client id and version in system metadata props (datahub-project#10829) * chore(ingest): Mypy 1.10.1 pin (datahub-project#10867) * docs: use acryl-datahub-actions as expected python package to install (datahub-project#10852) * docs: add new js snippet (datahub-project#10846) * refactor(ingestion): remove company domain for security reason (datahub-project#10839) * fix(ingestion/spark): Platform instance and column level lineage fix (datahub-project#10843) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(ingestion/tableau): optionally ingest multiple sites and create site containers (datahub-project#10498) Co-authored-by: Yanik Häni <[email protected]> * fix(ingestion/looker): Add sqlglot dependency and remove unused sqlparser (datahub-project#10874) * fix(manage-tokens): fix manage access token policy (datahub-project#10853) * Batch get entity endpoints (datahub-project#10880) * feat(system): support conditional write semantics (datahub-project#10868) * fix(build): upgrade vercel builds to Node 20.x (datahub-project#10890) * feat(ingest/lookml): shallow clone repos (datahub-project#10888) * fix(ingest/looker): add missing dependency (datahub-project#10876) * fix(ingest): only populate audit stamps where accurate (datahub-project#10604) * fix(ingest/dbt): always encode tag urns (datahub-project#10799) * fix(ingest/redshift): handle multiline alter table commands (datahub-project#10727) * fix(ingestion/looker): column name missing in explore (datahub-project#10892) * fix(lineage) Fix lineage source/dest filtering with explored per hop limit (datahub-project#10879) * feat(conditional-writes): misc updates and fixes (datahub-project#10901) * feat(ci): update outdated action (datahub-project#10899) * feat(rest-emitter): adding async flag to rest emitter (datahub-project#10902) Co-authored-by: Gabe Lyons <[email protected]> * feat(ingest): add snowflake-queries source (datahub-project#10835) * fix(ingest): improve `auto_materialize_referenced_tags_terms` error handling (datahub-project#10906) * docs: add new company to adoption list (datahub-project#10909) * refactor(redshift): Improve redshift error handling with new structured reporting system (datahub-project#10870) Co-authored-by: John Joyce <[email protected]> Co-authored-by: Harshal Sheth <[email protected]> * feat(ui) Finalize support for all entity types on forms (datahub-project#10915) * Index ExecutionRequestResults status field (datahub-project#10811) * feat(ingest): grafana connector (datahub-project#10891) Co-authored-by: Shirshanka Das <[email protected]> Co-authored-by: Harshal Sheth <[email protected]> * fix(gms) Add Form entity type to EntityTypeMapper (datahub-project#10916) * feat(dataset): add support for external url in Dataset (datahub-project#10877) * docs(saas-overview) added missing features to observe section (datahub-project#10913) Co-authored-by: John Joyce <[email protected]> * fix(ingest/spark): Fixing Micrometer warning (datahub-project#10882) * fix(structured properties): allow application of structured properties without schema file (datahub-project#10918) * fix(data-contracts-web) handle other schedule types (datahub-project#10919) * fix(ingestion/tableau): human-readable message for PERMISSIONS_MODE_SWITCHED error (datahub-project#10866) Co-authored-by: Harshal Sheth <[email protected]> * Add feature flag for view defintions (datahub-project#10914) Co-authored-by: Ethan Cartwright <[email protected]> * feat(ingest/BigQuery): refactor+parallelize dataset metadata extraction (datahub-project#10884) * fix(airflow): add error handling around render_template() (datahub-project#10907) * feat(ingestion/sqlglot): add optional `default_dialect` parameter to sqlglot lineage (datahub-project#10830) * feat(mcp-mutator): new mcp mutator plugin (datahub-project#10904) * fix(ingest/bigquery): changes helper function to decode unicode scape sequences (datahub-project#10845) * feat(ingest/postgres): fetch table sizes for profile (datahub-project#10864) * feat(ingest/abs): Adding azure blob storage ingestion source (datahub-project#10813) * fix(ingest/redshift): reduce severity of SQL parsing issues (datahub-project#10924) * fix(build): fix lint fix web react (datahub-project#10896) * fix(ingest/bigquery): handle quota exceeded for project.list requests (datahub-project#10912) * feat(ingest): report extractor failures more loudly (datahub-project#10908) * feat(ingest/snowflake): integrate snowflake-queries into main source (datahub-project#10905) * fix(ingest): fix docs build (datahub-project#10926) * fix(ingest/snowflake): fix test connection (datahub-project#10927) * fix(ingest/lookml): add view load failures to cache (datahub-project#10923) * docs(slack) overhauled setup instructions and screenshots (datahub-project#10922) Co-authored-by: John Joyce <[email protected]> * fix(airflow): Add comma parsing of owners to DataJobs (datahub-project#10903) * fix(entityservice): fix merging sideeffects (datahub-project#10937) * feat(ingest): Support System Ingestion Sources, Show and hide system ingestion sources with Command-S (datahub-project#10938) Co-authored-by: John Joyce <[email protected]> * chore() Set a default lineage filtering end time on backend when a start time is present (datahub-project#10925) Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> * Added relationships APIs to V3. Added these generic APIs to V3 swagger doc. (datahub-project#10939) * docs: add learning center to docs (datahub-project#10921) * doc: Update hubspot form id (datahub-project#10943) * chore(airflow): add python 3.11 w/ Airflow 2.9 to CI (datahub-project#10941) * fix(ingest/Glue): column upstream lineage between S3 and Glue (datahub-project#10895) * fix(ingest/abs): split abs utils into multiple files (datahub-project#10945) * doc(ingest/looker): fix doc for sql parsing documentation (datahub-project#10883) Co-authored-by: Harshal Sheth <[email protected]> * fix(ingest/bigquery): Adding missing BigQuery types (datahub-project#10950) * fix(ingest/setup): feast and abs source setup (datahub-project#10951) * fix(connections) Harden adding /gms to connections in backend (datahub-project#10942) * feat(siblings) Add flag to prevent combining siblings in the UI (datahub-project#10952) * fix(docs): make graphql doc gen more automated (datahub-project#10953) * feat(ingest/athena): Add option for Athena partitioned profiling (datahub-project#10723) * fix(spark-lineage): default timeout for future responses (datahub-project#10947) * feat(datajob/flow): add environment filter using info aspects (datahub-project#10814) * fix(ui/ingest): correct privilege used to show tab (datahub-project#10483) Co-authored-by: Kunal-kankriya <[email protected]> * feat(ingest/looker): include dashboard urns in browse v2 (datahub-project#10955) * add a structured type to batchGet in OpenAPI V3 spec (datahub-project#10956) * fix(ui): scroll on the domain sidebar to show all domains (datahub-project#10966) * fix(ingest/sagemaker): resolve incorrect variable assignment for SageMaker API call (datahub-project#10965) * fix(airflow/build): Pinning mypy (datahub-project#10972) * Fixed a bug where the OpenAPI V3 spec was incorrect. The bug was introduced in datahub-project#10939. (datahub-project#10974) * fix(ingest/test): Fix for mssql integration tests (datahub-project#10978) * fix(entity-service) exist check correctly extracts status (datahub-project#10973) * fix(structuredProps) casing bug in StructuredPropertiesValidator (datahub-project#10982) * bugfix: use anyOf instead of allOf when creating references in openapi v3 spec (datahub-project#10986) * fix(ui): Remove ant less imports (datahub-project#10988) * feat(ingest/graph): Add get_results_by_filter to DataHubGraph (datahub-project#10987) * feat(ingest/cli): init does not actually support environment variables (datahub-project#10989) * fix(ingest/graph): Update get_results_by_filter graphql query (datahub-project#10991) * feat(ingest/spark): Promote beta plugin (datahub-project#10881) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(ingest): support domains in meta -> "datahub" section (datahub-project#10967) * feat(ingest): add `check server-config` command (datahub-project#10990) * feat(cli): Make consistent use of DataHubGraphClientConfig (datahub-project#10466) Deprecates get_url_and_token() in favor of a more complete option: load_graph_config() that returns a full DatahubClientConfig. This change was then propagated across previous usages of get_url_and_token so that connections to DataHub server from the client respect the full breadth of configuration specified by DatahubClientConfig. I.e: You can now specify disable_ssl_verification: true in your ~/.datahubenv file so that all cli functions to the server work when ssl certification is disabled. Fixes datahub-project#9705 * fix(ingest/s3): Fixing container creation when there is no folder in path (datahub-project#10993) * fix(ingest/looker): support platform instance for dashboards & charts (datahub-project#10771) * feat(ingest/bigquery): improve handling of information schema in sql parser (datahub-project#10985) * feat(ingest): improve `ingest deploy` command (datahub-project#10944) * fix(backend): allow excluding soft-deleted entities in relationship-queries; exclude soft-deleted members of groups (datahub-project#10920) - allow excluding soft-deleted entities in relationship-queries - exclude soft-deleted members of groups * fix(ingest/looker): downgrade missing chart type log level (datahub-project#10996) * doc(acryl-cloud): release docs for 0.3.4.x (datahub-project#10984) Co-authored-by: John Joyce <[email protected]> Co-authored-by: RyanHolstien <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Pedro Silva <[email protected]> * fix(protobuf/build): Fix protobuf check jar script (datahub-project#11006) * fix(ui/ingest): Support invalid cron jobs (datahub-project#10998) * fix(ingest): fix graph config loading (datahub-project#11002) Co-authored-by: Pedro Silva <[email protected]> * feat(docs): Document __DATAHUB_TO_FILE_ directive (datahub-project#10968) Co-authored-by: Harshal Sheth <[email protected]> * fix(graphql/upsertIngestionSource): Validate cron schedule; parse error in CLI (datahub-project#11011) * feat(ece): support custom ownership type urns in ECE generation (datahub-project#10999) * feat(assertion-v2): changed Validation tab to Quality and created new Governance tab (datahub-project#10935) * fix(ingestion/glue): Add support for missing config options for profiling in Glue (datahub-project#10858) * feat(propagation): Add models for schema field docs, tags, terms (datahub-project#2959) (datahub-project#11016) Co-authored-by: Chris Collins <[email protected]> * docs: standardize terminology to DataHub Cloud (datahub-project#11003) * fix(ingestion/transformer): replace the externalUrl container (datahub-project#11013) * docs(slack) troubleshoot docs (datahub-project#11014) * feat(propagation): Add graphql API (datahub-project#11030) Co-authored-by: Chris Collins <[email protected]> * feat(propagation): Add models for Action feature settings (datahub-project#11029) * docs(custom properties): Remove duplicate from sidebar (datahub-project#11033) * feat(models): Introducing Dataset Partitions Aspect (datahub-project#10997) Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> * feat(propagation): Add Documentation Propagation Settings (datahub-project#11038) * fix(models): chart schema fields mapping, add dataHubAction entity, t… (datahub-project#11040) * fix(ci): smoke test lint failures (datahub-project#11044) * docs: fix learning center color scheme & typo (datahub-project#11043) * feat: add cloud main page (datahub-project#11017) Co-authored-by: Jay <[email protected]> * feat(restore-indices): add additional step to also clear system metadata service (datahub-project#10662) Co-authored-by: John Joyce <[email protected]> * docs: fix typo (datahub-project#11046) * fix(lint): apply spotless (datahub-project#11050) * docs(airflow): example query to get datajobs for a dataflow (datahub-project#11034) * feat(cli): Add run-id option to put sub-command (datahub-project#11023) Adds an option to assign run-id to a given put command execution. This is useful when transformers do not exist for a given ingestion payload, we can follow up with custom metadata and assign it to an ingestion pipeline. * fix(ingest): improve sql error reporting calls (datahub-project#11025) * fix(airflow): fix CI setup (datahub-project#11031) * feat(ingest/dbt): add experimental `prefer_sql_parser_lineage` flag (datahub-project#11039) * fix(ingestion/lookml): enable stack-trace in lookml logs (datahub-project#10971) * (chore): Linting fix (datahub-project#11015) * chore(ci): update deprecated github actions (datahub-project#10977) * Fix ALB configuration example (datahub-project#10981) * chore(ingestion-base): bump base image packages (datahub-project#11053) * feat(cli): Trim report of dataHubExecutionRequestResult to max GMS size (datahub-project#11051) * fix(ingestion/lookml): emit dummy sql condition for lookml custom condition tag (datahub-project#11008) Co-authored-by: Harshal Sheth <[email protected]> * fix(ingestion/powerbi): fix issue with broken report lineage (datahub-project#10910) * feat(ingest/tableau): add retry on timeout (datahub-project#10995) * change generate kafka connect properties from env (datahub-project#10545) Co-authored-by: david-leifker <[email protected]> * fix(ingest): fix oracle cronjob ingestion (datahub-project#11001) Co-authored-by: david-leifker <[email protected]> * chore(ci): revert update deprecated github actions (datahub-project#10977) (datahub-project#11062) * feat(ingest/dbt-cloud): update metadata_endpoint inference (datahub-project#11041) * build: Reduce size of datahub-frontend-react image by 50-ish% (datahub-project#10878) Co-authored-by: david-leifker <[email protected]> * fix(ci): Fix lint issue in datahub_ingestion_run_summary_provider.py (datahub-project#11063) * docs(ingest): update developing-a-transformer.md (datahub-project#11019) * feat(search-test): update search tests from datahub-project#10408 (datahub-project#11056) * feat(cli): add aspects parameter to DataHubGraph.get_entity_semityped (datahub-project#11009) Co-authored-by: Harshal Sheth <[email protected]> * docs(airflow): update min version for plugin v2 (datahub-project#11065) * doc(ingestion/tableau): doc update for derived permission (datahub-project#11054) Co-authored-by: Pedro Silva <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Harshal Sheth <[email protected]> * fix(py): remove dep on types-pkg_resources (datahub-project#11076) * feat(ingest/mode): add option to exclude restricted (datahub-project#11081) * fix(ingest): set lastObserved in sdk when unset (datahub-project#11071) * doc(ingest): Update capabilities (datahub-project#11072) * chore(vulnerability): Log Injection (datahub-project#11090) * chore(vulnerability): Information exposure through a stack trace (datahub-project#11091) * chore(vulnerability): Comparison of narrow type with wide type in loop condition (datahub-project#11089) * chore(vulnerability): Insertion of sensitive information into log files (datahub-project#11088) * chore(vulnerability): Risky Cryptographic Algorithm (datahub-project#11059) * chore(vulnerability): Overly permissive regex range (datahub-project#11061) Co-authored-by: Harshal Sheth <[email protected]> * fix: update customer data (datahub-project#11075) * fix(models): fixing the datasetPartition models (datahub-project#11085) Co-authored-by: John Joyce <[email protected]> * fix(ui): Adding view, forms GraphQL query, remove showing a fallback error message on unhandled GraphQL error (datahub-project#11084) Co-authored-by: John Joyce <[email protected]> * feat(docs-site): hiding learn more from cloud page (datahub-project#11097) * fix(docs): Add correct usage of orFilters in search API docs (datahub-project#11082) Co-authored-by: Jay <[email protected]> * fix(ingest/mode): Regexp in mode name matcher didn't allow underscore (datahub-project#11098) * docs: Refactor customer stories section (datahub-project#10869) Co-authored-by: Jeff Merrick <[email protected]> * fix(release): fix full/slim suffix on tag (datahub-project#11087) * feat(config): support alternate hashing algorithm for doc id (datahub-project#10423) Co-authored-by: david-leifker <[email protected]> Co-authored-by: John Joyce <[email protected]> * fix(emitter): fix typo in get method of java kafka emitter (datahub-project#11007) * fix(ingest): use correct native data type in all SQLAlchemy sources by compiling data type using dialect (datahub-project#10898) Co-authored-by: Harshal Sheth <[email protected]> * chore: Update contributors list in PR labeler (datahub-project#11105) * feat(ingest): tweak stale entity removal messaging (datahub-project#11064) * fix(ingestion): enforce lastObserved timestamps in SystemMetadata (datahub-project#11104) * fix(ingest/powerbi): fix broken lineage between chart and dataset (datahub-project#11080) * feat(ingest/lookml): CLL support for sql set in sql_table_name attribute of lookml view (datahub-project#11069) * docs: update graphql docs on forms & structured properties (datahub-project#11100) * test(search): search openAPI v3 test (datahub-project#11049) * fix(ingest/tableau): prevent empty site content urls (datahub-project#11057) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(entity-client): implement client batch interface (datahub-project#11106) * fix(snowflake): avoid reporting warnings/info for sys tables (datahub-project#11114) * fix(ingest): downgrade column type mapping warning to info (datahub-project#11115) * feat(api): add AuditStamp to the V3 API entity/aspect response (datahub-project#11118) * fix(ingest/redshift): replace r'\n' with '\n' to avoid token error redshift serverless… (datahub-project#11111) * fix(entiy-client): handle null entityUrn case for restli (datahub-project#11122) * fix(sql-parser): prevent bad urns from alter table lineage (datahub-project#11092) * fix(ingest/bigquery): use small batch size if use_tables_list_query_v2 is set (datahub-project#11121) * fix(graphql): add missing entities to EntityTypeMapper and EntityTypeUrnMapper (datahub-project#10366) * feat(ui): Changes to allow editable dataset name (datahub-project#10608) Co-authored-by: Jay Kadambi <[email protected]> * fix: remove saxo (datahub-project#11127) * feat(mcl-processor): Update mcl processor hooks (datahub-project#11134) * fix(openapi): fix openapi v2 endpoints & v3 documentation update * Revert "fix(openapi): fix openapi v2 endpoints & v3 documentation update" This reverts commit 573c1cb. * docs(policies): updates to policies documentation (datahub-project#11073) * fix(openapi): fix openapi v2 and v3 docs update (datahub-project#11139) * feat(auth): grant type and acr values custom oidc parameters support (datahub-project#11116) * fix(mutator): mutator hook fixes (datahub-project#11140) * feat(search): support sorting on multiple fields (datahub-project#10775) * feat(ingest): various logging improvements (datahub-project#11126) * fix(ingestion/lookml): fix for sql parsing error (datahub-project#11079) Co-authored-by: Harshal Sheth <[email protected]> * feat(docs-site) cloud page spacing and content polishes (datahub-project#11141) * feat(ui) Enable editing structured props on fields (datahub-project#11042) * feat(tests): add md5 and last computed to testResult model (datahub-project#11117) * test(openapi): openapi regression smoke tests (datahub-project#11143) * fix(airflow): fix tox tests + update docs (datahub-project#11125) * docs: add chime to adoption stories (datahub-project#11142) * fix(ingest/databricks): Updating code to work with Databricks sdk 0.30 (datahub-project#11158) * fix(kafka-setup): add missing script to image (datahub-project#11190) * fix(config): fix hash algo config (datahub-project#11191) * test(smoke-test): updates to smoke-tests (datahub-project#11152) * fix(elasticsearch): refactor idHashAlgo setting (datahub-project#11193) * chore(kafka): kafka version bump (datahub-project#11211) * readd UsageStatsWorkUnit * fix merge problems * change logo --------- Co-authored-by: Chris Collins <[email protected]> Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> Co-authored-by: dushayntAW <[email protected]> Co-authored-by: sagar-salvi-apptware <[email protected]> Co-authored-by: Aseem Bansal <[email protected]> Co-authored-by: Kevin Chun <[email protected]> Co-authored-by: jordanjeremy <[email protected]> Co-authored-by: skrydal <[email protected]> Co-authored-by: Harshal Sheth <[email protected]> Co-authored-by: david-leifker <[email protected]> Co-authored-by: sid-acryl <[email protected]> Co-authored-by: Julien Jehannet <[email protected]> Co-authored-by: Hendrik Richert <[email protected]> Co-authored-by: Hendrik Richert <[email protected]> Co-authored-by: RyanHolstien <[email protected]> Co-authored-by: Felix Lüdin <[email protected]> Co-authored-by: Pirry <[email protected]> Co-authored-by: Hyejin Yoon <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: cburroughs <[email protected]> Co-authored-by: ksrinath <[email protected]> Co-authored-by: Mayuri Nehate <[email protected]> Co-authored-by: Kunal-kankriya <[email protected]> Co-authored-by: Shirshanka Das <[email protected]> Co-authored-by: ipolding-cais <[email protected]> Co-authored-by: Tamas Nemeth <[email protected]> Co-authored-by: Shubham Jagtap <[email protected]> Co-authored-by: haeniya <[email protected]> Co-authored-by: Yanik Häni <[email protected]> Co-authored-by: Gabe Lyons <[email protected]> Co-authored-by: Gabe Lyons <[email protected]> Co-authored-by: 808OVADOZE <[email protected]> Co-authored-by: noggi <[email protected]> Co-authored-by: Nicholas Pena <[email protected]> Co-authored-by: Jay <[email protected]> Co-authored-by: ethan-cartwright <[email protected]> Co-authored-by: Ethan Cartwright <[email protected]> Co-authored-by: Nadav Gross <[email protected]> Co-authored-by: Patrick Franco Braz <[email protected]> Co-authored-by: pie1nthesky <[email protected]> Co-authored-by: Joel Pinto Mata (KPN-DSH-DEX team) <[email protected]> Co-authored-by: Ellie O'Neil <[email protected]> Co-authored-by: Ajoy Majumdar <[email protected]> Co-authored-by: deepgarg-visa <[email protected]> Co-authored-by: Tristan Heisler <[email protected]> Co-authored-by: Andrew Sikowitz <[email protected]> Co-authored-by: Davi Arnaut <[email protected]> Co-authored-by: Pedro Silva <[email protected]> Co-authored-by: amit-apptware <[email protected]> Co-authored-by: Sam Black <[email protected]> Co-authored-by: Raj Tekal <[email protected]> Co-authored-by: Steffen Grohsschmiedt <[email protected]> Co-authored-by: jaegwon.seo <[email protected]> Co-authored-by: Renan F. Lima <[email protected]> Co-authored-by: Matt Exchange <[email protected]> Co-authored-by: Jonny Dixon <[email protected]> Co-authored-by: Pedro Silva <[email protected]> Co-authored-by: Pinaki Bhattacharjee <[email protected]> Co-authored-by: Jeff Merrick <[email protected]> Co-authored-by: skrydal <[email protected]> Co-authored-by: AndreasHegerNuritas <[email protected]> Co-authored-by: jayasimhankv <[email protected]> Co-authored-by: Jay Kadambi <[email protected]> Co-authored-by: David Leifker <[email protected]>
Summary
In this PR, we add a new structured_logs field and refactor the warning / failures APIs to take type, message, and context.
Still need to update a bunch of method references, will do that refactoring once approach is aligned.
We also add support for throwing well-specified exception types, and mapping those into a standard set of types. This enables the source to EITHER raise a standard exception OR report_failure and return.
QA
I qa'd this locally by testing various failure and warning scenarios to ensure UI is displaying them.
Status
Review
Checklist
Summary by CodeRabbit
Refactor
Bug Fixes