linkcheck builder: optionally allow HTTP 401 status code hyperlinks to be reported broken #11684

jayaddison · 2023-09-14T14:22:27Z

Feature or Bugfix

Bugfix

Purpose

For legacy reasons, HTTP 401 statuses had been considered acceptable as success during linkchecking in the past; this pull request allows deployments to change that behaviour. This is supported by the fact that Sphinx can now provide authentication credentials to servers during linkchecking.

Detail

The HTTP 401 status code indicates that access to a web resource requires authentication, and that the client has not provided that authentication.
Webservers often configure authentication on a per-directory or per-path basis, so client receipt of an authentication-required status code does not necessarily confirm that a URL is valid.
🗒️ With this changeset, the ability to treat HTTP 401 responses from hyperlinks as broken is opt-in - deployments must set the linkcheck_allow_unauthorized config option to False to consider unauthorized links as unavailable during linkchecking. The default reporting status for unauthorized links will change to broken as part of the major-version upgrade to Sphinx 8.0 in future.

Relates

Resolves linkchecker: update handling of HTTP 401 unauthorized responses #11433.

…r it a broken-link condition

picnixz · 2023-09-15T00:43:27Z

sphinx/builders/linkcheck.py

                if status_code == 401:
-                    return 'working', 'unauthorized', 0
+                    return 'broken', 'unauthorized', 0


I think that the 401 still makes sense if people do not want to expose credentials in their configuration and only assume that something exists (and could return a 401).

As such, I think we could keep the 'working' but change a bit the way that linkcheck_auth is handled:

If we specify "fake" credentials by None, we keep 'working' and do not emit warnings.

If no fake credential for the corresponding URL is specified, we keep the 'working', but we emit a warning saying that the content couldn't be accessed and that users should specify credentials (fake or not).

That way, this also ensures that credentials are not exposed unintentionally and that the contents with 401 errors are detected correctly.

It's more of a follow-up, but what about using an enumeration (or at least mark the possible string statuses we returm using Literal) for the status instead of using 'working', 'broken', etc?

In process_result, we use if-case but using match instead + enumerations may be more elegant and clearer (also we wouldn't have an arbitrary status code being emitted).

I think that the 401 still makes sense if people do not want to expose credentials in their configuration and only assume that something exists (and could return a 401).

Hmm.. point understood, although I think that even when an URL has no linkcheck credentials configured (the "no fake credential" case), an HTTP 401 response should be considered a broken link, because it means that the link is unchecked. This is probably the core of the disruptive/controversial angle I have on this.

If for some reason a user is unable to provide credentials for some hyperlinks included in a documentation set, but wants a success report from linkchecking despite that, I'd argue they should use ~~linkcheck_anchors_ignore~~ linkcheck_ignore to skip the relevant websites.

I'm still thinking about the first point -- using None as a special marker for intentionally-empty credentials. It seems a bit fragile and too-clever; I like simple and hard-to-misunderstand things, because I misunderstand a lot.

It's more of a follow-up, but what about using an enumeration (or at least mark the possible string statuses we returm using Literal) for the status instead of using 'working', 'broken', etc?

👍

In process_result, we use if-case but using match instead + enumerations may be more elegant and clearer (also we wouldn't have an arbitrary status code being emitted).

Mostly agree, although we still support Py3.9 at the moment - so this could be a future enhancement?

Fine by me using linkcheck_anchors_ignore although it's a bit counterintuitive in this case (like I don't really want to ignore).

using None as a special marker for intentionally-empty credentials

It was an example. We could have another configuration value like linkcheck_auth_bypass and put what links we expect to have 401.

Concerning the match, I forgot it was introduced in 3.10.

Actually, when writing my comment, I thought about a much more elegant solution, namely, you could specify the HTTP response code to expect for specific links and treat them as "working".

Fine by me using linkcheck_anchors_ignore although it's a bit counterintuitive in this case (like I don't really want to ignore).

Let me try to make a convicing argument that it makes sense :) (I could be wrong! in which case that'll help too)

The HTTP 401 response is zero-information in terms of hyperlink validity. It does confirm that the client should use auth if it wants to gain more-than-zero information, but that's the only feedback it provides -- and we hide that from users at the moment.

So we're making network requests that fail, and we're not informing the user about it. I think we should start by informing the user -- and then they either choose to find auth to gain greater-than-zero info, or they ignore the links and reduce the outbound traffic (and linkchecking time).

I think we should start by informing the user

Agreed. Actually, by counterintuitive, I meant that using the linkcheck_anchors_ignore was weird to me since I would have used it to suppress errors related to anchors rather than to HTTP response codes.

So, we could keep what you suggested.

Agh, sorry. I should have written linkcheck_ignore instead. I completely failed to notice/parse the word anchor in there.

Now it's ok for me !

:) thank you!

It's more of a follow-up, but what about using an enumeration (or at least mark the possible string statuses we returm using Literal) for the status instead of using 'working', 'broken', etc?
In process_result, we use if-case but using match instead + enumerations may be more elegant and clearer (also we wouldn't have an arbitrary status code being emitted).

I'm working on some refactoring for this at the moment; the CheckResult class relies upon being JSON-serializable at the moment, and I don't feel like introducing custom serialization code, so I've opted to use the StrEnum type -- Enum with string values isn't serializable by default.

Taking that approach adds a dependency on Py3.11, so I'm experimenting with adding match statements at the same time.

I did read up a little bit about using Literal - it looks like it's mostly a type-checking aid? I think I'd prefer the enum route, even if it may take a while before it could land in the codebase. Maybe I'll change my mind as I learn more, though..

AA-Turner · 2023-09-21T02:31:33Z

For this to go into 7.3, it should be opt-in; we can flip to opt-out in 8.0.

A

jayaddison · 2023-09-21T11:03:51Z

Ok, thanks @AA-Turner.

I could wait until 8.0 to merge this, if that's simpler -- or I'll add a config option called linkcheck_permit_unauthorized_responses, default True (existing behaviour).

More concise and/or better config option name ideas welcome.. I'd like to communicate the fact that it's server-communicated auth status that is the condition, and for the way those cases are handled to be fairly clear from the option name.

picnixz · 2023-09-21T11:08:51Z

An alternative: linkcheck_ignore_error_codes containing 401 by default.

Or an alternative name: linkcheck_ignore_unauthorized set to true.

Or: linkcheck_strict_authorized set to false.

jayaddison · 2023-09-21T12:50:36Z

Hmm.. the error-code list approach could be a neat solution, but gives me a sense of too much configurability. Roughly speaking, I think HTTP status codes are well-enough defined to determine success and failure in the case of linkchecking.

The ignore keyword would be nice because it'd visually align with other configuration options and re-uses existing terminology.. but the use case is different here, I think. We do check these unauthorized URLs, so they're not ignored.

strict could also be OK; my concern there is that it infers that there could be a category of checks -- but since this is intended to be a temporary flag until 8.0, I'd prefer to pick something that can't accidentally become repurposed for other things prior to removal.

To shoot down one of my own other alternative ideas as well: linkcheck_allow_unauthorized - concise, re-uses the word allow -- but could be confusing because there is client auth, and it's perfectly fine for the linkchecker to make unauthorized requests in most cases. What we care about is whether the server is saying the request was not authorized.

So.. I suppose linkcheck_allow_unauthorized_responses could be OK. linkcheck_tolerate_unauthorized_responses?

picnixz · 2023-09-21T13:01:35Z

I'm not fond of long configuration value names. And I would prefer using standard terminology (i.e., not "tolerate"). I think we should pick something from: allow, ignore, skip or strict.

We could also have something like linkcheck_ignore_errors = ['unauthorized'] (meaning, you do check but ignore specific errors, and not the link itself). Since we also have a bunch of other errors that turn into warnings, we could also suppress them like that.

jayaddison · 2023-09-21T13:44:59Z

I'm not fond of long configuration value names. And I would prefer using standard terminology (i.e., not "tolerate"). I think we should pick something from: allow, ignore, skip or strict.

Ok, agreed on both of those.

We could also have something like linkcheck_ignore_errors = ['unauthorized'] (meaning, you do check but ignore specific errors, and not the link itself). Since we also have a bunch of other errors that turn into warnings, we could also suppress them like that.

I'd like to keep the scope of this change (and the config option) small, and the management of log-level for some of those errors/warnings seems a bit unrelated.

However: it does make me wonder whether it'd be OK to add a warning message even when HTTP 401 is treated as success?

Regarding the option name: probably enough bikeshedding, so linkcheck_allow_unauthorized is it, I think. I resolved my own concern about the ambiguity with client auth because I suppose technically that refers to authentication and not authorization.

picnixz · 2023-09-21T13:53:43Z

whether it'd be OK to add a warning message even when HTTP 401 is treated as success?

Not good if people are turning warnings into errors.

…setting so that users can opt-in to use the stricter treatment of HTTP 401 unauthorized responses

…false to opt-in to treating HTTP 401 hyperlinks as broken

CHANGES.rst

…ion (pending 7.3 dev changes)

… option

sphinx/builders/linkcheck.py

…broken'

…kcheck_allow_unauthorized' option will be removed in Sphinx 8.0

…ling

jayaddison · 2023-09-23T13:33:35Z

I don't have any further changes/commits planned here at the moment and would be glad for more review when possible.

doc/usage/configuration.rst

sphinx/builders/linkcheck.py

tests/test_build_linkcheck.py

sphinx/builders/linkcheck.py

tests/test_build_linkcheck.py

AA-Turner · 2023-09-26T09:10:53Z

I could wait until 8.0 to merge this

For X.0 releases, I prefer to have a 'clean' release with only removals/changes to defaults -- a new feature I think should be merged in a point release, with an opt-in.

…-specific Sphinx 8.0 deprecation subclass Co-authored-by: picnixz <[email protected]>

…riable private, to prevent other code relying on it, since it is intended for removal soon Co-authored-by: picnixz <[email protected]>

… earlier in the builder lifecycle

…fixture is created Follows-on-from commit 2b90b76.

Follows-on-from commit d9144cc.

jayaddison · 2023-09-26T16:02:38Z

Ok, I've reached another review-pause now (no further commits planned).

The main question I have open / to resolve is how and when we should emit deprecation warnings -- if at all -- and whether my assumptions in this subthread are correct.

…ed' to True instead of None Co-authored-by: picnixz <[email protected]>

jayaddison · 2023-09-27T10:00:55Z

Ok, reading back through the commentary, there's one detail I missed:

For this to go into 7.3, it should be opt-in; we can flip to opt-out in 8.0.

So we should switch the default, not remove the configuration value entirely. I'll update the deprecation notice about this correspondingly -- and as I understand from the other comments, deprecation warnings don't become errors, so that should be innocuous, and we can remove the warning from 8.0 onwards.

… the default value for it will change; update the deprecation message and config documentation accordingly

sphinx/builders/linkcheck.py

…d in-the-wild and legacy behaviour is enabled, emit a notice to inform the deployment about the upcoming behaviour-change

sphinx/builders/linkcheck.py

…ld not be logged as part of the warning message; partly for reasons of log-message deduplication, and partly because the information is already available elsewhere in the linkchecker output

CHANGES.rst

sphinx/builders/linkcheck.py

Co-authored-by: Bénédikt Tran <[email protected]>

jayaddison · 2023-10-03T19:02:02Z

I think this is ready - no further commits/changes planned.

…ling

jayaddison · 2023-10-18T20:26:12Z

@AA-Turner @picnixz any further feedback for these changes? I don't know if it'll have much effect in practice, but if the number of people who have broken hyperlinks and don't realize it yet is greater than zero, I'd like to help with that.

picnixz · 2023-10-21T14:57:49Z

Personally I don't have anything else in mind. However note that I willl be extremely busy for the next months as I am leaning towards the end of my thesis.

…ling

sphinx/builders/linkcheck.py

jayaddison · 2024-01-09T12:47:58Z

Thank you @AA-Turner! (and again @picnixz for your review of this)

I'll offer the GitHub issue reference suggestion in a small follow-up PR.

linkcheck builder: update handling of HTTP 401 status code to conside…

6aee5b8

…r it a broken-link condition

picnixz reviewed Sep 15, 2023

View reviewed changes

jayaddison added 3 commits September 21, 2023 15:02

linkcheck builder: introduce a 'linkcheck_allow_unauthorized' config …

7f68206

…setting so that users can opt-in to use the stricter treatment of HTTP 401 unauthorized responses

Update CHANGES.rst

2d4f4bc

Clarify phrasing in CHANGES.rst: the setting has to be configured to …

a958e86

…false to opt-in to treating HTTP 401 hyperlinks as broken

jayaddison commented Sep 21, 2023

View reviewed changes

CHANGES.rst Outdated Show resolved Hide resolved

jayaddison added 3 commits September 21, 2023 15:12

CHANGES.rst: fixup: relocate the changelog entry to the correct locat…

712f060

…ion (pending 7.3 dev changes)

docs: add documentation for the 'linkcheck_allow_unauthorized' config…

ee7348f

… option

CHANGES.rst: nitpick: undo accidental empty-line removal

c19f7c3

jayaddison commented Sep 21, 2023

View reviewed changes

sphinx/builders/linkcheck.py Outdated Show resolved Hide resolved

jayaddison added 3 commits September 22, 2023 15:30

CHANGES.rst: nitpick: phrasing: 'handle...as broken' -> 'report...as …

a290e3c

…broken'

CHANGES.rst: nitpick: brevity

46d8206

linkcheck builder: add a deprecation warning indicating that the 'lin…

c895ca2

…kcheck_allow_unauthorized' option will be removed in Sphinx 8.0

jayaddison changed the title ~~linkcheck builder: consider HTTP 401 status code to indicate failure~~ linkcheck builder: allow HTTP 401 status code hyperlinks to be reported as broken Sep 22, 2023

jayaddison changed the title ~~linkcheck builder: allow HTTP 401 status code hyperlinks to be reported as broken~~ linkcheck builder: optionally allow HTTP 401 status code hyperlinks to be reported broken Sep 22, 2023

jayaddison and others added 3 commits September 22, 2023 17:18

CHANGES.rst: fixup: add self-attribution

37b50ae

Merge branch 'master' into issue-11433/adjust-linkcheck-http-401-hand…

bc3c390

…ling

ruff: linting fixups

37634b1

picnixz reviewed Sep 26, 2023

View reviewed changes

jayaddison and others added 2 commits September 26, 2023 16:09

Apply code review suggestion: filter warnings in test case using more…

a81f283

…-specific Sphinx 8.0 deprecation subclass Co-authored-by: picnixz <[email protected]>

Apply code review suggestion: make the 'allow_unauthorized' worker va…

de6ac23

…riable private, to prevent other code relying on it, since it is intended for removal soon Co-authored-by: picnixz <[email protected]>

jayaddison added 3 commits September 26, 2023 16:46

linkcheck builder: relocate 'linkcheck_allow_unauthorized' warning to…

2b90b76

… earlier in the builder lifecycle

fixup: use pytest.mark.filterwarnings to filter warning _before_ app …

d9144cc

…fixture is created Follows-on-from commit 2b90b76.

cleanup: remove unused imports

fa9be8a

Follows-on-from commit d9144cc.

Apply code review suggestion: set default value for 'allow_unauthoriz…

4750345

…ed' to True instead of None Co-authored-by: picnixz <[email protected]>

Updated plan: instead of removing the setting entirely in Sphinx 8.0,…

6d450c2

… the default value for it will change; update the deprecation message and config documentation accordingly

jayaddison commented Sep 27, 2023

View reviewed changes

sphinx/builders/linkcheck.py Outdated Show resolved Hide resolved

Apply code review suggestion: when an HTTP 401 response is encountere…

cdfa342

…d in-the-wild and legacy behaviour is enabled, emit a notice to inform the deployment about the upcoming behaviour-change

jayaddison commented Sep 27, 2023

View reviewed changes

sphinx/builders/linkcheck.py Outdated Show resolved Hide resolved

Code behaviour consensus: the URI of unauthorized HTTP responses shou…

daf4efb

…ld not be logged as part of the warning message; partly for reasons of log-message deduplication, and partly because the information is already available elsewhere in the linkchecker output

picnixz reviewed Oct 1, 2023

View reviewed changes

CHANGES.rst Outdated Show resolved Hide resolved

sphinx/builders/linkcheck.py Show resolved Hide resolved

CHANGES.rst: prefer Pythonic representation of false value

e17581a

Co-authored-by: Bénédikt Tran <[email protected]>

jayaddison added 2 commits October 3, 2023 23:04

Merge branch 'master' into issue-11433/adjust-linkcheck-http-401-hand…

2070c88

…ling

Merge branch 'master' into issue-11433/adjust-linkcheck-http-401-hand…

bc97be3

…ling

jayaddison added 4 commits December 11, 2023 17:26

Merge branch 'master' into issue-11433/adjust-linkcheck-http-401-hand…

a75abcf

…ling

Merge branch 'master' into issue-11433/adjust-linkcheck-http-401-hand…

5d16bb3

…ling

Merge branch 'master' into issue-11433/adjust-linkcheck-http-401-hand…

000af8b

…ling

Merge branch 'master' into issue-11433/adjust-linkcheck-http-401-hand…

bfaf179

…ling

AA-Turner reviewed Jan 9, 2024

View reviewed changes

sphinx/builders/linkcheck.py Outdated Show resolved Hide resolved

Update sphinx/builders/linkcheck.py

16a3695

AA-Turner merged commit 5211c01 into sphinx-doc:master Jan 9, 2024
21 of 22 checks passed

jayaddison deleted the issue-11433/adjust-linkcheck-http-401-handling branch January 9, 2024 12:48

jayaddison mentioned this pull request Jan 9, 2024

linkcheck builder: fixup / review item: expand another GitHub shorthand issue ref #11864

Merged

github-actions bot locked as resolved and limited conversation to collaborators Feb 9, 2024

AA-Turner added this to the 7.3.0 milestone Jul 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

linkcheck builder: optionally allow HTTP 401 status code hyperlinks to be reported broken #11684

linkcheck builder: optionally allow HTTP 401 status code hyperlinks to be reported broken #11684

jayaddison commented Sep 14, 2023 •

edited

Loading

picnixz Sep 15, 2023

jayaddison Sep 15, 2023 •

edited

Loading

picnixz Sep 15, 2023

jayaddison Sep 15, 2023

picnixz Sep 15, 2023

jayaddison Sep 15, 2023

picnixz Sep 15, 2023

jayaddison Sep 15, 2023

jayaddison Sep 15, 2023 •

edited

Loading

AA-Turner commented Sep 21, 2023

jayaddison commented Sep 21, 2023

picnixz commented Sep 21, 2023 •

edited

Loading

jayaddison commented Sep 21, 2023

picnixz commented Sep 21, 2023

jayaddison commented Sep 21, 2023

picnixz commented Sep 21, 2023

jayaddison commented Sep 23, 2023

AA-Turner commented Sep 26, 2023

jayaddison commented Sep 26, 2023

jayaddison commented Sep 27, 2023

jayaddison commented Oct 3, 2023

jayaddison commented Oct 18, 2023

picnixz commented Oct 21, 2023

jayaddison commented Jan 9, 2024

linkcheck builder: optionally allow HTTP 401 status code hyperlinks to be reported broken #11684

linkcheck builder: optionally allow HTTP 401 status code hyperlinks to be reported broken #11684

Conversation

jayaddison commented Sep 14, 2023 • edited Loading

Feature or Bugfix

Purpose

Detail

Relates

picnixz Sep 15, 2023

Choose a reason for hiding this comment

jayaddison Sep 15, 2023 • edited Loading

Choose a reason for hiding this comment

picnixz Sep 15, 2023

Choose a reason for hiding this comment

jayaddison Sep 15, 2023

Choose a reason for hiding this comment

picnixz Sep 15, 2023

Choose a reason for hiding this comment

jayaddison Sep 15, 2023

Choose a reason for hiding this comment

picnixz Sep 15, 2023

Choose a reason for hiding this comment

jayaddison Sep 15, 2023

Choose a reason for hiding this comment

jayaddison Sep 15, 2023 • edited Loading

Choose a reason for hiding this comment

AA-Turner commented Sep 21, 2023

jayaddison commented Sep 21, 2023

picnixz commented Sep 21, 2023 • edited Loading

jayaddison commented Sep 21, 2023

picnixz commented Sep 21, 2023

jayaddison commented Sep 21, 2023

picnixz commented Sep 21, 2023

jayaddison commented Sep 23, 2023

AA-Turner commented Sep 26, 2023

jayaddison commented Sep 26, 2023

jayaddison commented Sep 27, 2023

jayaddison commented Oct 3, 2023

jayaddison commented Oct 18, 2023

picnixz commented Oct 21, 2023

jayaddison commented Jan 9, 2024

jayaddison commented Sep 14, 2023 •

edited

Loading

jayaddison Sep 15, 2023 •

edited

Loading

jayaddison Sep 15, 2023 •

edited

Loading

picnixz commented Sep 21, 2023 •

edited

Loading