You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Some uploads from Jul 16 were not able to be written to docdb. The indexer is erroring out after WriteError in _process_prefix() and _process_codeocean_record(). It was found that the code does not first check is_dict_corrupt for existing metadata.nd.json. Additionally, the current is_dict_corrupt does not check the fieldnames in nested lists.
To Reproduce
View logs for the indexer for Jul 17
Observe write errors when indexing aind-private-data and the codeocean bucket
[ERROR] WriteError: Name is not valid for storage, full error: {'index': 0, 'code': 163, 'errmsg': 'Name is not valid for storage'}
Expected behavior
Existing metadata.nd.json from S3/Code ocean should first be checked to see if it is corrupt using aind_data_access_api.utils.is_dict_corrupt.
If corrupt, skip the upload to s3 docdb and log a warning/error that includes the s3 location of the invalid file.
is_dict_corrupt should check nested lists recursively.
Additional context
The errors are currently causing the job to crash completely. A hotfix will be implemented to add error handling for processing each record.
The text was updated successfully, but these errors were encountered:
@mekhlakapoor no, it was just a hotfix to add error handling so the indexer job doesn't crash completely. We still need this bug ticket to resolve the actual issue.
Describe the bug
Some uploads from Jul 16 were not able to be written to docdb. The indexer is erroring out after WriteError in
_process_prefix()
and_process_codeocean_record()
. It was found that the code does not first checkis_dict_corrupt
for existing metadata.nd.json. Additionally, the currentis_dict_corrupt
does not check the fieldnames in nested lists.To Reproduce
Expected behavior
aind_data_access_api.utils.is_dict_corrupt
.s3docdb and log a warning/error that includes the s3 location of the invalid file.is_dict_corrupt
should check nested lists recursively.Additional context
The errors are currently causing the job to crash completely. A hotfix will be implemented to add error handling for processing each record.
The text was updated successfully, but these errors were encountered: