Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guard against null url in data_object ingest #1123

Closed
wants to merge 3 commits into from
Closed

Conversation

pkalita-lbl
Copy link
Collaborator

While running the post-release ingest on prod after the 2024.1 release yesterday it initially got hung up on one data_object record where the url field was explicitly set to null (rather than simply being unset, which is typical). It's unclear how that record got into the production MongoDB. It stands to reason that it was somehow inserted directly instead of going through the runtime API since the schema should reject such an object. Nonetheless it seems reasonably defensive to guard against this case.

@pkalita-lbl
Copy link
Collaborator Author

The black formatting issues will be resolved once #1124 is merged into main and the changes are merged back into this branch.

@naglepuff
Copy link
Collaborator

Is it ok that ingest fails in this case? If there's an invalid data object in mongo, we'd want to know. Do we have any other methods in place to make sure all of the mongo data is good?

@pkalita-lbl
Copy link
Collaborator Author

It's a fair question, and I agree that the ingest code shouldn't have to be so defensive that it essentially re-implements schema validation. I started asking around on Slack about where the record might of come from and how it go into MongoDB without being caught by the validation check on incoming data that the runtime API does. Haven't heard back yet, but we can hold off on this PR until we know more.

@pkalita-lbl
Copy link
Collaborator Author

After discussing in the release management squad meeting, I think the feeling is that we'd rather know about these data issues and fix them if ingest fails rather than going out of our way in ingest code to handle non-schema compliant data.

@pkalita-lbl pkalita-lbl deleted the handle-null-url branch October 8, 2024 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants