-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script to check links #173
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This looks great 🙌
We'll also want this hooked up to package.json
as check:links
, and to add it to GitHub Actions so we error on invalid links. Include instructions in the README too.
@@ -42,5 +42,8 @@ | |||
"unified": "^10.0.0", | |||
"unist-util-visit": "^4.0.0", | |||
"zx": "^7.2.3" | |||
}, | |||
"dependencies": { | |||
"markdown-link-extractor": "^3.1.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better as a dev dependency.
scripts/commands/checkLinks.ts
Outdated
// that they have been altered from the originals. | ||
|
||
import { globby } from "globby"; | ||
const { existsSync, readFileSync } = require('fs'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better is to use the async versions:
documentation/scripts/lib/fs.ts
Line 13 in a1a72fe
import { stat } from 'fs/promises'; |
Async is nice for two reasons:
- Makes more explicit what is doing a side-effecty and often slower thing like reading the file system
- Allows us to run multiple futures at once using
Promise.all()
. It will allow us to check every link in concurrency, which is going to be a lot faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allows us to run multiple futures at once using Promise.all(). It will allow us to check every link in concurrency, which is going to be a lot faster.
That's not going to be relevant if you take my below recommendation to pre-read all the valid files, since they will be synchronous operations. That's fine though - it will be really fast.
It's still good to use async here for consistent style.
console.log(`❌ ${this.origin}: Could not find link '${this.value}'`) | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of logging directly in this function, it might be a cleaner user interface to collect all the failing links in the main
function or checkLinksInFile
, and then bundle them together as a single console error, where you list all the failures.
This is a lot of repeated warnings right now, which can be noisy.
See an example:
documentation/scripts/commands/checkMetadata.ts
Lines 55 to 65 in a1a72fe
const notebookErrors = []; | |
const notebookFiles = await globby("docs/**/*.ipynb"); | |
for (const file of notebookFiles) { | |
if (IGNORED_FILES.has(file)) { | |
continue; | |
} | |
const metadata = await readMetadata(file); | |
if (!isValidMetadata(metadata)) { | |
notebookErrors.push(file); | |
} | |
} |
You'd have this function return something like structured data or a string representing what the failure is. Return undefined
if it's okay. Return type is string?
.
scripts/commands/checkLinks.ts
Outdated
let allGood = true | ||
for (let link of links) { | ||
allGood = link.check() && allGood | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return links.all(link => link.check())
If you take the above suggestion to have check()
return the error or undefineed
, rather than directly logging it, then this would be return links.map(link => link.check()).filter(Boolean)
, where .filter(Boolean)
gets rid of the undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With links.every
, the script will exit on the first broken link it finds, whereas the current loop prints all broken links. Is exiting early ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay, never mind. We definitely do want to collect every single link. The .map().filter()
suggestion would do that properly though, if you end up returning the error rather than logging directly.
Co-authored-by: Eric Arellano <[email protected]>
* Reduces use of `existSync` by ~98%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how this is progressing! The Link
class is quite nice.
} | ||
|
||
function checkLinksInFile(filePath: string, filePaths: string[]): boolean { | ||
const source = readFileSync(filePath, {encoding: 'utf8'}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll still want this to be an async read, which will require marking async function checkLinksInFile
. This line will become const source = await readFile(...)
Then, lines 130-133 will become:
const results = await Promise.all(filePaths.map(fp => checkLinksInFile(fp, filePaths))
if (results.some(x => !x)) {
}
That will parallelize things.
const DOCS_ROOT = "./docs" | ||
const CONTENT_FILE_EXTENSIONS = [".md", ".mdx", ".ipynb"] | ||
|
||
class Link { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be super valuable to have some unit tests for Link.resolve()
and Link.check()
. They don't have any side effects anymore and they're now pure functions, so it should be easy to test.
To do so, you'd add a file scripts/commands/linkChecker.ts
or something like that. Copy this code over. Then import it from your commands file. And add a test file similar to dedupeIds.test.ts
.
npm test
to ensure the test works.
return true | ||
} | ||
} | ||
// Check disk for files not in cache (images etc.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was thinking the images should be in the cache too. The only files that we expect you to use locally are images from public/images
and other docs/
files. Otherwise it's an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're going to merge this now to help out the content team. Frank will make a follow up PR to make some of the suggested refactors
This PR adds a link-checking script to verify internal links work. It should be easy to extend this to check external links, and anchors within internal links. All comments welcome. ### Details Uses `markdown-link-extractor`, which is used by `markdown-link-check` (one of the proposed tools). *** First part of #4
This PR adds a link-checking script to verify internal links work. It should be easy to extend this to check external links, and anchors within internal links. All comments welcome.
Details
Uses
markdown-link-extractor
, which is used bymarkdown-link-check
(one of the proposed tools).First part of #4