Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(URGENT) Guard against scrape failure + other flavors #167

Open
ronaldtse opened this issue May 29, 2024 · 2 comments
Open

(URGENT) Guard against scrape failure + other flavors #167

ronaldtse opened this issue May 29, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@ronaldtse
Copy link
Contributor

As reported in #166 , in the relaton-data-iso dataset, all titles are now missing.

According to @andrew2net , this started happening on May 24 and we did not know until May 29 (today).

The page template has been changed.

The scraper must fail when data is missing, and do not commit anything to our dataset when the data is clearly broken or the template has changed.

@ronaldtse ronaldtse added the bug Something isn't working label May 29, 2024
@andrew2net
Copy link
Contributor

@ronaldtse with the relaton-loger we can implement addition log channels. So I'm going to design a log channel that will create and an issue in the relation-data-iso repo in case any error happens while the documents are fetching. The issue will contain a message with all unique errors listed in it. All the subscribers will receive a new issue notification. Is that ok?

andrew2net added a commit to relaton/relaton-logger that referenced this issue Aug 4, 2024
andrew2net added a commit to relaton/relaton-data-iso that referenced this issue Aug 16, 2024
@andrew2net andrew2net reopened this Aug 16, 2024
@ronaldtse
Copy link
Contributor Author

@ronaldtse with the relaton-loger we can implement addition log channels. So I'm going to design a log channel that will create and an issue in the relation-data-iso repo in case any error happens while the documents are fetching. The issue will contain a message with all unique errors listed in it. All the subscribers will receive a new issue notification. Is that ok?

Of course, that's a good idea. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants