Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloudwatch dashboard for indexer logs #86

Open
3 tasks
helen-m-lin opened this issue Sep 4, 2024 · 3 comments
Open
3 tasks

Cloudwatch dashboard for indexer logs #86

helen-m-lin opened this issue Sep 4, 2024 · 3 comments
Assignees

Comments

@helen-m-lin
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
The indexer flags various types of warnings and errors based on the state of the data assets in S3 and docdb. For example, data assets are skipped (not pushed to docdb) if the s3 prefix is invalid, if the location or name in the metadata.nd.json file are invalid, etc. These logs are currently queried manually in Cloudwatch. It would be nice to have a better way to surface these errors.

cc: @dyf

Describe the solution you'd like

  • A saved query in Cloudwatch (within an appropriately named folder) to parse useful info about warnings and logs.
  • A saved dashboard in Cloudwatch to surface the query results as a table. Optionally add a chart for count of each log type, or other visualization.
  • Optional: refactor warnings and error messages to include error type/code based on the msgs

Describe alternatives you've considered
Leaving it as is and having maintainers manually query logs.

Additional context
The query below parses the log severity and error type from the log message.

fields @timestamp, @message, @logStream, @log
| parse @message "*:root:" as severity, short_message
| filter ispresent(severity) and severity not in ["INFO", "DEBUG"]
| parse short_message "Location field * or name field * does not match actual location of record *!" as json_location, json_name, actual_location
| parse short_message "Prefix * not valid in bucket *! Skipping." as invalid_prefix, actual_bucket
| parse short_message "Error processing *: WriteError(\"Name is not valid for storage, full error: {'index': 0, 'code': 163, 'errmsg': 'Name is not valid for storage'}\")" as corrupt_location
| fields if(ispresent(json_location), "Location/name", if(ispresent(invalid_prefix), "Prefix", if(ispresent(corrupt_location), "Corrupt", "Other"))) as errorType
| display @timestamp, severity, errorType, @message, @logStream
| sort errorType, @timestamp asc

image

Cloudwatch can also detect patterns in logs automatically. This eliminates need for complex queries, but cannot be added to a dashboard.
image

@helen-m-lin
Copy link
Collaborator Author

Also may be useful to add:

  • number of external_links added
  • time taken for indexer to run each job
  • total count of updates

@helen-m-lin
Copy link
Collaborator Author

Marking as blocked since we may be switching to another dashboarding service.

@helen-m-lin
Copy link
Collaborator Author

Reach out to SIPE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant