Cloudwatch dashboard for indexer logs #86

helen-m-lin · 2024-09-04T17:29:47Z

Is your feature request related to a problem? Please describe.
The indexer flags various types of warnings and errors based on the state of the data assets in S3 and docdb. For example, data assets are skipped (not pushed to docdb) if the s3 prefix is invalid, if the location or name in the metadata.nd.json file are invalid, etc. These logs are currently queried manually in Cloudwatch. It would be nice to have a better way to surface these errors.

cc: @dyf

Describe the solution you'd like

A saved query in Cloudwatch (within an appropriately named folder) to parse useful info about warnings and logs.
A saved dashboard in Cloudwatch to surface the query results as a table. Optionally add a chart for count of each log type, or other visualization.
Optional: refactor warnings and error messages to include error type/code based on the msgs

Describe alternatives you've considered
Leaving it as is and having maintainers manually query logs.

Additional context
The query below parses the log severity and error type from the log message.

fields @timestamp, @message, @logStream, @log
| parse @message "*:root:" as severity, short_message
| filter ispresent(severity) and severity not in ["INFO", "DEBUG"]
| parse short_message "Location field * or name field * does not match actual location of record *!" as json_location, json_name, actual_location
| parse short_message "Prefix * not valid in bucket *! Skipping." as invalid_prefix, actual_bucket
| parse short_message "Error processing *: WriteError(\"Name is not valid for storage, full error: {'index': 0, 'code': 163, 'errmsg': 'Name is not valid for storage'}\")" as corrupt_location
| fields if(ispresent(json_location), "Location/name", if(ispresent(invalid_prefix), "Prefix", if(ispresent(corrupt_location), "Corrupt", "Other"))) as errorType
| display @timestamp, severity, errorType, @message, @logStream
| sort errorType, @timestamp asc

Cloudwatch can also detect patterns in logs automatically. This eliminates need for complex queries, but cannot be added to a dashboard.

The text was updated successfully, but these errors were encountered:

helen-m-lin · 2024-09-10T18:22:04Z

Also may be useful to add:

number of external_links added
time taken for indexer to run each job
total count of updates

helen-m-lin · 2024-10-15T00:42:28Z

Marking as blocked since we may be switching to another dashboarding service.

helen-m-lin · 2025-01-08T01:19:23Z

Reach out to SIPE

dyf assigned helen-m-lin Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cloudwatch dashboard for indexer logs #86

Cloudwatch dashboard for indexer logs #86

helen-m-lin commented Sep 4, 2024

helen-m-lin commented Sep 10, 2024

helen-m-lin commented Oct 15, 2024

helen-m-lin commented Jan 8, 2025

Cloudwatch dashboard for indexer logs #86

Cloudwatch dashboard for indexer logs #86

Comments

helen-m-lin commented Sep 4, 2024

helen-m-lin commented Sep 10, 2024

helen-m-lin commented Oct 15, 2024

helen-m-lin commented Jan 8, 2025