Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indexer does not check is_dict_corrupt for existing metadata.nd.json before writing to docdb #75

Closed
helen-m-lin opened this issue Jul 17, 2024 · 2 comments · Fixed by #106
Assignees

Comments

@helen-m-lin
Copy link
Collaborator

helen-m-lin commented Jul 17, 2024

Describe the bug
Some uploads from Jul 16 were not able to be written to docdb. The indexer is erroring out after WriteError in _process_prefix() and _process_codeocean_record(). It was found that the code does not first check is_dict_corrupt for existing metadata.nd.json. Additionally, the current is_dict_corrupt does not check the fieldnames in nested lists.

To Reproduce

  1. View logs for the indexer for Jul 17
  2. Observe write errors when indexing aind-private-data and the codeocean bucket
[ERROR] WriteError: Name is not valid for storage, full error: {'index': 0, 'code': 163, 'errmsg': 'Name is not valid for storage'}

Expected behavior

  • Existing metadata.nd.json from S3/Code ocean should first be checked to see if it is corrupt using aind_data_access_api.utils.is_dict_corrupt.
  • If corrupt, skip the upload to s3 docdb and log a warning/error that includes the s3 location of the invalid file.
  • is_dict_corrupt should check nested lists recursively.

Additional context
The errors are currently causing the job to crash completely. A hotfix will be implemented to add error handling for processing each record.

@mekhlakapoor
Copy link
Contributor

@helen-m-lin did that linked PR fix this issue? If so we can go ahead and close this out

@helen-m-lin
Copy link
Collaborator Author

@mekhlakapoor no, it was just a hotfix to add error handling so the indexer job doesn't crash completely. We still need this bug ticket to resolve the actual issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants