Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement GHA workflow that scans the website file tree for broken links #157

Merged
merged 62 commits into from
Feb 5, 2025

Conversation

kaijli
Copy link
Contributor

@kaijli kaijli commented Jan 17, 2025

On this branch, @kaijli and @eecavanna added a link checker workflow to the repository's GitHub Actions. Whenever the workflow that assembles the website runs, it also invokes this new link checker workflow.

The new link checker workflow does the following things. First, it finds all hyperlinks in the HTML files in the website file tree. Next, it visits each of those linked URLs and checks whether the hyperlink is broken or not. Finally, writes a report of broken links in the GitHub Actions output.

image

Also, if the workflow happens to be processing a commit on the main branch and it detects any broken links, it creates a GitHub Issue (like this one) listing the broken links.

Note: This PR was initially going to introduce both a link checker and a spell checker. The scope has since changes to only include the link checker. The original PR description is below, and the commit history on this branch includes some content that I want to salvage and put on a different branch, in pursuit of getting a spell checker working.


PR for spell check and link check github actions.

The link checker used performs (what I believe) is recursive checking through the compiled documents (since Eric mentions in this issue that lychee is not recursive). I found this specific checker by looking through the link checkers listed in this table generated by lychee.

The spell checker used is based off my list in the original issue. I don't remember the reasons for the order of the list, but if this does not serve us as well down the line, there are other options to look into.

Both actions are listed in one yml to be grabbed by the deployment action after the build step. I'm not sure if the move is to keep it as is, or move the build step to its own file to be called by the deploy action. I think it would be good for this check action to be run with every PR and / or deployment because it doesn't hurt.

@eecavanna
Copy link
Collaborator

Another option we have is to remove the spell checking from this PR and merge in the link checking, and then introduce spell checking as a follow-on task (some day).

@eecavanna
Copy link
Collaborator

The spell check is happening and its results are being formatted and posted in the "Summary" section of the run details page (it's below the link check report on that same page). Here's a screenshot showing the top part of the spell check report:

image

Source

Outstanding issues:

  • It is only scanning 20 files
  • It is flagging HTML tags as though they are misspelled English words (e.g. DOCTYPE)

@eecavanna
Copy link
Collaborator

eecavanna commented Jan 30, 2025

Also, I don't want the link checker (or spell checker) to create Issues unless it is being run on the main branch. This is a reminder for myself (and a note to anyone else that may work on this PR before I do again).

Edit: Done!

image

@eecavanna
Copy link
Collaborator

Hi @kaijli,

I'd be OK merging the link checking portion of this in already. I'm not comfortable with the spell checking portion and would like to continue refining that (mainly, to get it to scan all the files I'm expecting and to not spell check HTML markup as though it is English text).

Are you OK with me removing the spell checking stuff from this branch (I'd put it on a new, separate branch)?

@kaijli
Copy link
Contributor Author

kaijli commented Feb 4, 2025

The more I plan my next steps on this branch, the more I want to separate the build step and deploy steps into separate workflows. I'll do that on main now and then merge main into this branch.

This was something I wanted to suggest but only after I had everything figured out, so, glad we came to the same conclusion!

I'd be OK merging the link checking portion of this in already. I'm not comfortable with the spell checking portion and would like to continue refining that (mainly, to get it to scan all the files I'm expecting and to not spell check HTML markup as though it is English text).

Are you OK with me removing the spell checking stuff from this branch (I'd put it on a new, separate branch)?

Sounds good to me. I haven't been looking at this much because I didn't want to mess with something and get the hairs crossed, so let me know if I can be of any help!

@eecavanna
Copy link
Collaborator

Thanks, @kaijli! I'll remove the spell checking stuff from this branch this afternoon (after 2pm PT) and merge the remainder of this branch in.

@eecavanna eecavanna marked this pull request as ready for review February 5, 2025 20:58
@eecavanna eecavanna changed the title Configure GHA to scan for spelling errors and broken links Configure GHA to scan for broken links Feb 5, 2025
@eecavanna eecavanna changed the title Configure GHA to scan for broken links Implement GHA workflow that scans the website file tree for broken links Feb 5, 2025
@eecavanna eecavanna self-requested a review February 5, 2025 20:59
@eecavanna eecavanna merged commit 3722a20 into main Feb 5, 2025
5 checks passed
@eecavanna
Copy link
Collaborator

I created this new branch (and draft PR) containing the spellchecker-related code: #187

@eecavanna eecavanna deleted the 61-add-link-checker branch February 5, 2025 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add spell checker Add Link Checker
2 participants