Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cli): add support for sampled reporting to keep logs manageable #5800

Merged
merged 7 commits into from
Sep 1, 2022

Conversation

shirshanka
Copy link
Contributor

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@shirshanka
Copy link
Contributor Author

example of report output after this change:

[2022-08-31 18:16:59,700] INFO     {datahub.cli.ingest_cli:147} - Finished metadata ingestion

Cli report:
{'cli_version': 'unavailable (installed in develop mode)',
 'cli_entry_location': '/Users/shirshankadas/workspace/sd-datahub-fork/metadata-ingestion/src/datahub/__init__.py',
 'py_version': '3.9.9 (main, Feb  5 2022, 20:34:50) \n[Clang 13.0.0 (clang-1300.0.29.3)]',
 'py_exec_path': '/Users/shirshankadas/workspace/sd-datahub-fork/metadata-ingestion/venv/bin/python',
 'os_details': 'macOS-12.0-arm64-arm-64bit'}
Source (looker) report:
{'events_produced': '71',
 'events_produced_per_sec': '3',
 'event_ids': "['looker-urn:li:chart:(looker,dashboard_elements.18555)', 'looker-urn:li:chart:(looker,dashboard_elements.18559)', "
              "'looker-urn:li:dashboard:(looker,dashboards.1710)', 'tag-urn:li:tag:Temporal', "
              "'looker-dashboardUsageStatistics-urn:li:dashboard:(looker,dashboards.1710)-1661731200000', "
              "'looker-dashboardUsageStatistics-urn:li:dashboard:(looker,dashboards.1710)-1661385600000', "
              "'looker-dashboardUsageStatistics-urn:li:dashboard:(looker,dashboards.1710)-1660694400000', "
              "'looker-dashboardUsageStatistics-urn:li:dashboard:(looker,dashboards.1710)-1660608000000', "
              "'looker-inputFields-urn:li:chart:(looker,dashboard_elements.18556)', "
              "'looker-inputFields-urn:li:chart:(looker,dashboard_elements.18559)']... sampled of 71 total elements",
 'warnings': '0 total entries. {}',
 'failures': '0 total entries. {}',
 'dashboards_scanned': '6397',
 'looks_scanned': '17',
 'filtered_dashboards': "['786', '1405', '1557', '1713', '3484', '4261', '4272', '5107', '5819', '8208']... sampled of 6395 total elements",
 'filtered_looks': '[]',
 'dashboards_scanned_for_usage': '1',
 'charts_scanned_for_usage': '0',
 'charts_with_activity': 'LossySet()',
 'dashboards_with_activity': "LossySet({'1710'})",
 'query_latency': {'Dashboard:entity_query': '2.75', 'Dashboard:user_query': '6.51'},
 'stage_latency': [{'name': 'list_dashboards', 'latency_seconds': '5.6'},
                   {'name': 'dashboard_chart_metadata', 'latency_seconds': '1.35'},
                   {'name': 'explore_metadata', 'latency_seconds': '4.71'},
                   {'name': 'usage_extraction', 'latency_seconds': '9.96'},
                   {'name': 'field_metadata', 'latency_seconds': '0.04'}],
 'start_time': '2022-08-31 18:16:36.445383',
 'running_time_in_seconds': '23'}
Sink (datahub-rest) report:
{'total_records_written': '52',
 'records_written_per_second': '1',
 'warnings': "[{'warning': 'Unable to emit metadata to DataHub GMS', 'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', "
             "'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: Unknown aspect "
             'inputFields for entity chart\\n\\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\\n\\tat '
             "com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)', 'message': 'java.lang.RuntimeException: Unknown aspect "
             "inputFields for entity chart', 'status': 500, 'id': 'urn:li:chart:(looker,dashboard_elements.18542)'}}, {'warning': 'Unable to emit "
             "metadata to DataHub GMS', 'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': "
             "'com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: Unknown aspect inputFields for "
             'entity chart\\n\\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\\n\\tat '
             "com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)', 'message': 'java.lang.RuntimeException: Unknown aspect "
             "inputFields for entity chart', 'status': 500, 'id': 'urn:li:chart:(looker,dashboard_elements.18543)'}}, {'warning': 'Unable to emit "
             "metadata to DataHub GMS', 'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': "
             "'com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: Unknown aspect inputFields for "
             'entity chart\\n\\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\\n\\tat '
             "com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)', 'message': 'java.lang.RuntimeException: Unknown aspect "
             "inputFields for entity chart', 'status': 500, 'id': 'urn:li:chart:(looker,dashboard_elements.18544)'}}, {'warning': 'Unable to emit "
             "metadata to DataHub GMS', 'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': "
             "'com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: Unknown aspect inputFields for "
             'entity chart\\n\\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\\n\\tat '
             "com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)', 'message': 'java.lang.RuntimeException: Unknown aspect "
             "inputFields for entity chart', 'status': 500, 'id': 'urn:li:chart:(looker,dashboard_elements.18545)'}}, {'warning': 'Unable to emit "
             "metadata to DataHub GMS', 'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': "
             "'com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: Unknown aspect inputFields for "
             'entity chart\\n\\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\\n\\tat '
             "com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)', 'message': 'java.lang.RuntimeException: Unknown aspect "
             "inputFields for entity chart', 'status': 500, 'id': 'urn:li:chart:(looker,dashboard_elements.18549)'}}, {'warning': 'Unable to emit "
             "metadata to DataHub GMS', 'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': "
             "'com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: Unknown aspect inputFields for "
             'entity chart\\n\\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\\n\\tat '
             "com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)', 'message': 'java.lang.RuntimeException: Unknown aspect "
             "inputFields for entity chart', 'status': 500, 'id': 'urn:li:chart:(looker,dashboard_elements.18551)'}}, {'warning': 'Unable to emit "
             "metadata to DataHub GMS', 'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': "
             "'com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: Unknown aspect inputFields for "
             'entity chart\\n\\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\\n\\tat '
             "com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)', 'message': 'java.lang.RuntimeException: Unknown aspect "
             "inputFields for entity chart', 'status': 500, 'id': 'urn:li:chart:(looker,dashboard_elements.18553)'}}, {'warning': 'Unable to emit "
             "metadata to DataHub GMS', 'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': "
             "'com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: Unknown aspect inputFields for "
             'entity chart\\n\\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\\n\\tat '
             "com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)', 'message': 'java.lang.RuntimeException: Unknown aspect "
             "inputFields for entity chart', 'status': 500, 'id': 'urn:li:chart:(looker,dashboard_elements.18556)'}}, {'warning': 'Unable to emit "
             "metadata to DataHub GMS', 'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': "
             "'com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: Unknown aspect inputFields for "
             'entity chart\\n\\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\\n\\tat '
             "com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)', 'message': 'java.lang.RuntimeException: Unknown aspect "
             "inputFields for entity chart', 'status': 500, 'id': 'urn:li:chart:(looker,dashboard_elements.18558)'}}, {'warning': 'Unable to emit "
             "metadata to DataHub GMS', 'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': "
             "'com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: Unknown aspect inputFields for "
             'entity chart\\n\\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\\n\\tat '
             "com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)', 'message': 'java.lang.RuntimeException: Unknown aspect "
             "inputFields for entity chart', 'status': 500, 'id': 'urn:li:chart:(looker,dashboard_elements.18559)'}}]... sampled of 18 total "
             'elements',
 'failures': "[{'error': 'Unable to emit metadata to DataHub GMS', 'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', "
             "'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:422]: com.linkedin.metadata.entity.ValidationException: "
             'Failed to validate record with class com.linkedin.entity.Entity: ERROR :: '
             '/value/com.linkedin.metadata.snapshot.DatasetSnapshot/aspects/4/com.linkedin.schema.SchemaMetadata/fields/0/label :: unrecognized '
             'field found but not allowed\\nERROR :: '
             '/value/com.linkedin.metadata.snapshot.DatasetSnapshot/aspects/4/com.linkedin.schema.SchemaMetadata/fields/1/label :: unrecognized '
             'field found but not allowed\\nERROR :: '
             '/value/com.linkedin.metadata.snapshot.DatasetSnapshot/aspects/4/com.linkedin.schema.SchemaMetadata/fields/2/label :: unrecognized '
             "field found but not allowed', 'message': 'com.linkedin.metadata.entity.ValidationException: Failed to validate record with class "
             "com.linkedin.entity.Entity: ERROR :: /value/com.linkedin.metadata.snapshot.DatasetSnapshot/aspects/4/com.linkedin', 'status': 422}}]",
 'start_time': '2022-08-31 18:16:33.661327',
 'current_time': '2022-08-31 18:16:59.701432',
 'total_duration_in_seconds': '26.04',
 'gms_version': 'v0.8.43',
 'pending_requests': '0'}

 Pipeline finished with at least 1 failures ; produced 71 events in 23 seconds.

@shirshanka
Copy link
Contributor Author

I would rather not paste the report output before this change, as it wouldn't be fair to GitHub's storage servers :)

@github-actions
Copy link

github-actions bot commented Sep 1, 2022

Unit Test Results (metadata ingestion)

       8 files  ±  0         8 suites  ±0   57m 27s ⏱️ + 1m 17s
   668 tests +  7     636 ✔️  - 22  3 💤 ±0  29 +29 
1 336 runs  +14  1 272 ✔️  - 44  6 💤 ±0  58 +58 

For more details on these failures, see this check.

Results for commit a95fa7f. ± Comparison against base commit af1fc8d.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Sep 1, 2022

Unit Test Results (build & test)

517 tests  ±0   517 ✔️ ±0   9m 9s ⏱️ -39s
121 suites ±0       0 💤 ±0 
121 files   ±0       0 ±0 

Results for commit a95fa7f. ± Comparison against base commit af1fc8d.

♻️ This comment has been updated with latest results.

Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a huge fan of serializing the list into a string and embedding that in the structured report - any way we can make that a bit nicer to read?

metadata-ingestion/src/datahub/cli/docker.py Show resolved Hide resolved
metadata-ingestion/src/datahub/ingestion/api/report.py Outdated Show resolved Hide resolved
return

return super().append((self.total_elements, __object)) # type: ignore
finally:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need the try..finally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes it easier to ensure that the counter gets updated before we return from different places in the code

metadata-ingestion/src/datahub/ingestion/api/report.py Outdated Show resolved Hide resolved
metadata-ingestion/src/datahub/ingestion/api/report.py Outdated Show resolved Hide resolved
@shirshanka
Copy link
Contributor Author

Only way to preserve the nice formatting of the list .. is to add additional element to the list:
something like
[1,2,3,4, "... and 10 more items"]

but that would violate typing and make it very tricky to make sure that someone doesn't trip up on this.

@shirshanka shirshanka merged commit 9afda47 into datahub-project:master Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants