Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid creating a new empty talk page when removing a template from a non-existent page #5518

Conversation

gabina
Copy link
Member

@gabina gabina commented Oct 22, 2023

What this PR does

In some cases, such as an assignment being added and then removed again before the edit job is run, the dashboard may create a page that doesn't exist already, but make it empty. Ideally, this should not happen.

This PR starts handing the specific situation of trying to remove a template from a page that doesn't exist in WikiAssignmentOutput, and add a unit spec for the new build_talk_page_update behavior.

The idea is to make build_talk_page_update return nil when:

  1. the talk page doesn't exist already (get_page_content returns empty string) and,
  2. the update is to remove the assignments (@assignments is an empty collection)

This way update_assignments_for_article won't make the post_whole_page call, avoiding the creation of the empty page.

Prior to this PR, build_talk_page_update returned an empty string. This shall have caused update_assignments_for_article to call post_whole_page with empty content, resulting in the creation of an empty page.

Closes #2412

Screenshots

Open questions and concerns

The mentioned issue says that "edits should never create an empty page". This PR only takes into account one specific case of edits (trying to remove templates when the talk page doesn't exist). It might make sense to keep the issue open, as there could be other instances of edits that result in creating empty pages, which have not been addressed in this PR.

@gabina
Copy link
Member Author

gabina commented Oct 22, 2023

Some concerns:

The issue mentions that edits shouldn't create empty pages and, in such cases, "it should just leave the page as a redlink". I don't fully understand what the "redlink" refers to, so the PR may be missing something.

I didn't set my dev environment to make live edits on Wikipedia for this PR (as I considered it a bit risky), so it could be valuable to add an end-to-end test for this. However, I would need guidance for doing that.

@ragesoss
Copy link
Member

@gabina a "redlink" just means a page that does not exist on Wikipedia. If you link to a page title that does not actually exist, instead of the normal blue color, the link is rendered in red on Wikipedia to indicate that the page has not been created yet.

# article page actually exists, and is not a disambiguation page.
article_content = WikiApi.new(@wiki).get_page_content(@title)
return nil if article_content.blank?
return nil if includes_disambiguation_template?(article_content)

# Empty initial page content implies non-existent page.
talk_page_does_not_exist = initial_page_content.empty?
prevent_removal_of_assignments_from_non_existent_talk_pages(talk_page_does_not_exist) { return }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it will work, but it it's a little confusing to me. It would be clearer, in my opinion, to follow the structure of the other guard statements that are already present: return nil if [some condition].

I'm guessing that you're trying to work around complexity warnings from Rubocop here; if so, I suggest extracting lines 44-46 into a helper method; all that code is basically answering the same question: is it a real, existing article? I'd suggest something like return nil unless is_normal_article?

For the new check, I think a comment along with something like return nil if @assignment.empty? && initial_page_content.empty? will be easy enough to understand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, exactly, I was trying to deal with Cyclomatic Complexity while trying to modify the least amount of existing code. I like your suggestions, addressed changes on 89a3501

# Normal article means the article page actually exists, and is not a disambiguation page.
def normal_article?
article_content = WikiApi.new(@wiki).get_page_content(@title)
article_content.present? && !includes_disambiguation_template?(article_content)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to use the negation of or operator, but I feel like and operator is usually easier to understand:
!(article_content.blank? || includes_disambiguation_template?(article_content))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your implementation looks good to me.

@gabina
Copy link
Member Author

gabina commented Oct 23, 2023

@gabina a "redlink" just means a page that does not exist on Wikipedia. If you link to a page title that does not actually exist, instead of the normal blue color, the link is rendered in red on Wikipedia to indicate that the page has not been created yet.

Gotcha, thank you. I understand that I don't need to do any extra check then.

@gabina
Copy link
Member Author

gabina commented Oct 23, 2023

CI failed due to what I believe to be a flaky test

Finished in 33 minutes 37 seconds (files took 13.59 seconds to load)
1827 examples, 1 failure, 17 pending

Failed examples:

rspec ./spec/lib/importers/revision_score_importer_spec.rb:205 # RevisionScoreImporter.update_revision_scores_for_all_wikis imports data and calcuates an article completeness score for available wikis

@gabina gabina requested a review from ragesoss October 23, 2023 21:35
@ragesoss ragesoss merged commit c54e121 into WikiEducationFoundation:master Oct 23, 2023
1 check failed
@gabina gabina deleted the avoid-creating-new-empty-talk-pages-when-removing-a-template-from-a-non-existent-page branch November 3, 2023 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Edits should never create an empty page
2 participants