Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stringify relation in logging #371

Closed
wants to merge 1 commit into from
Closed

Conversation

jtcohen6
Copy link
Contributor

resolves databricks/dbt-databricks#115 (?)

Not totally sure what causes this one to arise, but I bet it's some weirdness around how the arguments are being passed through into AdapterLogger. (Throwback to dbt-labs/dbt-core#4305.)

A simple resolution is to explicitly stringify the relation, or template out an f-string, rather than passing the full Relation object into *args.

ipdb> logger.debug("Getting table schema for relation {}", relation)
*** TypeError: Object of type DatabricksRelation is not JSON serializable
ipdb> logger.debug("Getting table schema for relation {}", str(relation))
{"code": "E001", "data": {"args": ["dbt_jcohen.a_model"], "base_msg": "Getting table schema for relation {}", "name": "Spark"}, "invocation_id": "c6d345cc-dcbb-4790-8b2d-1b0e4b44e9ec", "level": "debug", "log_version": 2, "msg": "Spark adapter: Getting table schema for relation dbt_jcohen.a_model", "pid": 17840, "thread_name": "ThreadPoolExecutor-2_0", "ts": "2022-06-17T19:37:08.275536Z", "type": "log_line"}
ipdb> logger.debug(f"Getting table schema for relation {relation}")
{"code": "E001", "data": {"args": [], "base_msg": "Getting table schema for relation dbt_jcohen.a_model", "name": "Spark"}, "invocation_id": "c6d345cc-dcbb-4790-8b2d-1b0e4b44e9ec", "level": "debug", "log_version": 2, "msg": "Spark adapter: Getting table schema for relation dbt_jcohen.a_model", "pid": 17840, "thread_name": "ThreadPoolExecutor-2_0", "ts": "2022-06-17T19:28:38.944487Z", "type": "log_line"}

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-spark next" section.

@cla-bot cla-bot bot added the cla:yes label Jun 17, 2022
@jtcohen6 jtcohen6 requested a review from nathaniel-may June 17, 2022 19:37
@nathaniel-may
Copy link
Contributor

This works because you're explicitly getting the value's string representation when you are making the log call. The reason the Python standard library doesn't do this, and instead uses the style of the original code is so that that string function is only called for the additional arguments if the log line is actually logged. For example, if you have a ton of debug lines with big values but you're only logging info level messages, those potentially expensive string conversions never actually happen.

The place this should be handled is here in core which was the original intent. If this is not working we should investigate modifying this line in core so that we can avoid calling string functions unnecessarily both to comply with a standard Python logging interface in adapters, and to avoid potential performance issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot run dbt docs generate with JSON logs
2 participants