Improved browser error logging in crawl_history #473
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This contains the changes in #468, will rebase once that merges.
This PR includes several improvements to error logging:
bool_success
has been removed from thecrawl_history
table and replaced bycommand_status
, which is a string."ok"
"error"
about:neterror
WebDriverExceptions are special cased, and use"neterror"
"timeout"
crawl_history
table in the columnserror
andtraceback
. The error message forabout:neterror
exceptions parses out the exact error from theabout:
url -- e.g.,dnsError
.about:neterror
exceptions now log at thelogging.INFO
level, so they should not be included in Sentry to cut down on noise.A sample from a very small, 10 site crawl is available here: https://dbc-caf9527b-e073.cloud.databricks.com/#notebook/154881/command/154898.