Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved browser error logging in crawl_history #473

Merged
merged 2 commits into from
Aug 27, 2019
Merged

Conversation

englehardt
Copy link
Collaborator

@englehardt englehardt commented Aug 22, 2019

This contains the changes in #468, will rebase once that merges.

This PR includes several improvements to error logging:

  1. bool_success has been removed from the crawl_history table and replaced by command_status, which is a string.
    1. Successful commands use "ok"
    2. Exceptions in the browser manager process use "error"
    3. about:neterror WebDriverExceptions are special cased, and use "neterror"
    4. Timeouts use "timeout"
  2. Error messages and full, serialized tracebacks are included in the crawl_history table in the columns error and traceback. The error message for about:neterror exceptions parses out the exact error from the about: url -- e.g., dnsError.
  3. about:neterror exceptions now log at the logging.INFO level, so they should not be included in Sentry to cut down on noise.

A sample from a very small, 10 site crawl is available here: https://dbc-caf9527b-e073.cloud.databricks.com/#notebook/154881/command/154898.

@englehardt englehardt force-pushed the browser_error_logging branch from 980de3e to a075c9c Compare August 26, 2019 20:47
@englehardt
Copy link
Collaborator Author

Now rebased on #468 and ready for review. To see what this kind of logging looks like, see: https://dbc-caf9527b-e073.cloud.databricks.com/#notebook/165717/command/166596.

@englehardt englehardt changed the title [WIP] Browser error logging Improved browser error logging in crawl_history Aug 26, 2019
Copy link
Contributor

@motin motin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it :)

@motin motin merged commit b46b0a4 into master Aug 27, 2019
@englehardt englehardt deleted the browser_error_logging branch August 27, 2019 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants