-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingest): add --log-file
option and show CLI logs in UI report
#7118
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
@@ -122,6 +122,8 @@ def get_long_description(): | |||
"greenlet", | |||
} | |||
|
|||
sqllineage_lib = "sqllineage==1.3.6" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice refactor!
@@ -393,5 +393,5 @@ def rollback( | |||
writer.writerow([row.get("urn")]) | |||
|
|||
except IOError as e: | |||
print(e) | |||
logger.exception(f"Unable to save rollback failure report: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
@@ -262,7 +262,7 @@ def dataplatform2instance_func( | |||
delete_cli._delete_one_urn(src_entity_urn, soft=not hard, run_id=run_id) | |||
migration_report.on_entity_migrated(src_entity_urn, "status") # type: ignore | |||
|
|||
print(f"{migration_report}") | |||
click.echo(f"{migration_report}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
omg thank you
execution_result_aspect = ExecutionRequestResultClass( | ||
status=status, | ||
startTimeMs=self.start_time_ms, | ||
durationMs=self.get_cur_time_in_ms() - self.start_time_ms, | ||
report=json.dumps(report, indent=2), | ||
report=summary, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! So in this case, we are now sending the structured report to the correct field. From here, we can continue to add to that report and get it into our result aspect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Sets ourselves up for much richer reports)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup exactly - this way the structured report shows up in the correct field for both managed ingestion and cli ingestion
note that the structured report field is basically a string right now - we may need to rethink that in the future
return record.levelno >= logging.INFO | ||
else: | ||
if self.debug: | ||
return record.levelno >= logging.WARNING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice. good idea to filter out non-datahub noise
structured_report_str = json.dumps(report, indent=2) | ||
summary = f"~~~~ Ingestion Report ~~~~\n{structured_report_str}\n\n" | ||
summary += "~~~~ Ingestion Logs ~~~~\n" | ||
summary += get_log_buffer().format_lines() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah so here we leverage the in memory buffer. nice
# Configure the loggers. | ||
root_logger = logging.getLogger() | ||
_remove_all_handlers(root_logger) | ||
root_logger.setLevel(logging.INFO) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this "normalization" step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default is warning globally, and can also be messed with by other libraries
# or otherwise mess with the logging configuration. | ||
lib_logger = logging.getLogger(lib) | ||
_remove_all_handlers(lib_logger) | ||
lib_logger.setLevel(logging.DEBUG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this step?
from typing import Any, TextIO | ||
|
||
|
||
class TeeIO: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fancy!
file_handler: logging.Handler | ||
if log_file: | ||
file = stack.enter_context(open(log_file, "w")) | ||
tee = TeeIO(sys.stdout, file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious - why is this Tee and context piece even required? I'd imagine we should be able to create a global handler that directs to a file without requiring this right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're redirecting stdout here (e.g. click.echo and print statements)
we want those messages to show up both in the terminal and in the log_file, which is why we need the tee
logger.warning("This is a warning message") | ||
logger.error("this is an error with no stack trace") | ||
try: | ||
1 / 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like this
r"""\ | ||
this is a print statement | ||
this is a click.echo statement | ||
[.+] DEBUG .datahub.my_cli_module:\d+. - Example debug line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha! This is pretty cool
[.+] INFO .datahub.my_cli_module:\d+. - This is an info message | ||
[.+] WARNING .datahub.my_cli_module:\d+. - This is a warning message | ||
[.+] ERROR .datahub.my_cli_module:\d+. - this is an error with no stack trace | ||
[.+] ERROR .datahub.my_cli_module:\d+. - failed to divide by zero |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does everything start with [.+]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to handle the timestamp part of the log
# The in-memory log buffer should contain the log messages. | ||
# The first two lines are stdout, so we skip them. | ||
expected_log_output = "\n".join(result.output.splitlines()[2:]) | ||
assert get_log_buffer().format_lines() == expected_log_output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to cleanup the file here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope its a pytest tmp_file so it gets cleaned up automatically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…atahub-project#7118) Co-authored-by: Tamas Nemeth <[email protected]>
…atahub-project#7118) Co-authored-by: Tamas Nemeth <[email protected]>
--log-file
attribute that writes debug logs and stdout (e.g. the final ingestion report) to a file. This can even be used to dump debug logs to a file while printing info logs to the terminalChecklist