Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 1 & 12: Have Post-History link posts to preserve Discussions-To history #2358

Merged
merged 4 commits into from
Mar 9, 2022

Conversation

CAM-Gerlach
Copy link
Member

@CAM-Gerlach CAM-Gerlach commented Feb 23, 2022

As requested on #2335 by multiple people, discussed and documented in #2266 and as a followup to PR #2346 , where it was also independently requested by another reviewer.

The Discussions-To link, while helpful in providing the latest/current thread (which is hopefully cross-linked to any previous threads) is currently transient, and does not preserve the history of previous discussion threads. This is useful to retain for most of the same reasons linking the current thread is, to document and allow readers to explore and learn from the discussions that led to a PEP, enable the Steering Council, PEP-Delegate and other reviewers to examine the full discussion history, and track such threads for authors and other interested parties.

Therefore, the obvious place to preserve this information is in the Post-History header, and the most straightforward way to do so, with no changes to any existing code, just guidance, is via regular reST inline links on the dates. Inline-linking the existing dates retains the current rendered format, while offering readers direct access to the actual mentioned posts/threads with just a click. This doesn't affect any current linting or processing, since the Post-History field does not have a programmatically-enforced structure (as a number of PEPs didn't strictly follow the format anyway).

As such, this PR makes the relatively modest updates to PEP 1, PEP 12 and the template to formally describe this revised usage. It also incorporates a few straightforward fixes to the content added/modified in #2266 . As the changes are purely technical in nature, this PR should be able to be merged once at least a few PEP editors approve.

@brettcannon
Copy link
Member

I assume you tested this and everything parses and renders fine if you use this?

This also brings up an interesting question of the usefulness of Discussions-To if the post history naturally links to the place where discussions are occurring.

Lastly, and I don't think this matters, but this does make it so other tools can't parse Post-History for date information (at least easily). People will simply need to apply a regex or something over the values to find the dates within the reST markup.

@AA-Turner
Copy link
Member

I've mixed feelings about this, as on the one hand I see the usefulness, but on the other I feel we loose something by not having structured data.

The following proposal is untested and has not checked the RFC 2822 specification:

What if we allow multiple Discussions-To (or Post-History) headers, formatted as dd-mmm-yyyy; <link>. We could then parse this and e.g. display the latest link as discussions-to, and have a formatted post-history with everything, but still allow reasonably easy parsing.

A

@CAM-Gerlach
Copy link
Member Author

I assume you tested this and everything parses and renders fine if you use this?

Yup, I did, and it builds and renders fine with both the legacy and the PEP 676 systems, as well as our linters. Additionally. it is actually implemented on PEP 12 in this PR.

This also brings up an interesting question of the usefulness of Discussions-To if the post history naturally links to the place where discussions are occurring.

Yeah, I'd been thinking about the redundancy there too. However, I concluded that it made sense to keep Discussions-To how it is proposed and implemented as of PR #2346 (particularly with @AA-Turner 's enhancements to the same in the PEP 676 system), because it:

  • Is likely much more discoverable and intuitive to an average reader just interested in where to go to comment on or check the status of a PEP than having to know to dig through a Post-History field, find the latest date and click it.
  • Preserves consistency with previous PEP's use and display of these fields
  • Makes "upgrading" seamless to the reader and straightforward for the author
  • Handles potential cases where the current canonical discussion thread may not be the latest posted
  • Stores the name and/or email of the discussion venue

Lastly, and I don't think this matters, but this does make it so other tools can't parse Post-History for date information (at least easily). People will simply need to apply a regex or something over the values to find the dates within the reST markup.

I'd definitely be more concerned with this if the field's use strictly followed the specification, unfortunately, (as I mention above), many of them don't. Back when I was implementing #1890 to add regex checks to lint the PEP headers, so many PEPs failed even loose validation due to not using the right date format, having extraneous free-text content intermixed with the dates, or not listing dates at all, that we decided it wasn't worthwhile to conform and validate it at all for now. So unfortunately, those tools already have to do that, if they are to work reliably on many (and still not all) PEPs.

However, if we (e.g. @AA-Turner ) come up with a viable better solution (as @AA-Turner did for the Discussions-To field, which is what we implemented), I'd certainly be supportive of it.

What if we allow multiple Discussions-To (or Post-History) headers, formatted as dd-mmm-yyyy; . We could then parse this and e.g. display the latest link as discussions-to, and have a formatted post-history with everything, but still allow reasonably easy parsing.

I like the conceptual aspects that it avoids duplication of the current Discusses-To link and is a more structured format. However, from a "practicality beats purity" perspective, I'm not sure its worth it due to the relatively small real-world benefit versus the added implementation and user complexity. I don't think we should make hypothetical programmatic access easier (and I'm not sure it really does that) at the expense of intuitive usability by PEP authors and editors, or at undue cost of compatibility and consistency with previous PEPs and updating existing drafts to the new format.

It hard-breaks backward compatibility for any existing tools that are reading this data from PEPs, and is arguably harder for other tools to process (negating the very simplicity benefits you highlighted in successfully advocating keeping just the link in Discussions-To), since in order to just get the link of the latest thread, they have to parse the full RFC (2)822 format, and then our custom format inside of that, and then parsing the dates, and then extract the link associated with the most recent one, versus just reading the simple value of the Discussions-To field, or even simpler, just using the regex Discussions-To: (.+). Even to get the post history, it still requires the first two steps, versus just using a regex on the single Post-History field contents (e.g. `([^<]+) <` for the dates, and <([^<]+])>`_ for the links. The only real practical benefit would be authors/editors only have to paste the date once rather than twice.

On the other hand, it:

  • Requires teaching PEP authors a new, bespoke and incompatible format, rather than just normal reST links that they're already using in the PEP bodies,
  • Is a much greater break syntactically (both for humans and machines) from existing practice, being more work and churn to use it in existing draft PEPs
  • Also introduces a new aspect of RFC 2822 semantics (multi-use fields) not previously used on PEP headers, and calls into question why we don't do the same for, e.g., the Author field
  • Is time and effort to discuss, implement and review that we could be spending elsewhere

Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of making it easier to find the discussion threads for a PEP, and this seems like a good way to do it. I added a wording suggestion.

(Aside, am I the only one who was confused when I first saw the "Post-History" field? I thought it was the opposite of "pre-history".)

pep-0012.rst Outdated Show resolved Hide resolved
@AA-Turner
Copy link
Member

I like the conceptual aspects that it avoids duplication of the current Discusses-To link and is a more structured format. However, [two paragraphs of criticism elided]

You make some good points. Perhaps I have a tendancy to over-generalise, too. Adding reST syntax to PEP headers would be technically fine, and reasonably easy to teach. What niggles is that the header format was the only thing to not change from plaintext to reST, and if we allow other structured text formats in the future, the headers will likely be the only constant.

I think it is fairly unlikely we will allow said alternate formats, so perhaps the challenge is moot, but I do think there is a frontier being crossed in using reST syntax in headers.

As I said above, I'm supportive of the general idea, but -0 on the specific implementation. (Your criticism of my quickly jotted down idea is well founded, perhaps I shall try and think of a more workable solution!)

A

@CAM-Gerlach
Copy link
Member Author

CAM-Gerlach commented Feb 26, 2022

That's a good point; the problem is we're going to have to invent some non-trivial format, standard or custom, to represent this information, so its convenient to simply use reST rather than making up our own bespoke one and writing our own parser for it. If we do add another PEP format, we'd at most have to write a parser for whatever format anyway, and the reST format I suggested is structured enough that it should at least be no more difficult that your proposal to parse.

In a broader sense, I don't disagree that we might think up a better way to do this in the future, and there isn't a huge rush to merge this immediately if you want a couple days to think about it and try to come up with one. However, I also don't think we should let the great be the enemy of the good and block an existing, already-working way of recording previous discussion threads, which a number of PEP contributors have asked for, on the possibility that we might come up, agree on and implement a better way to do it in the future.

If and when we do, we can always convert existing headers in this proposed format to a new one, as you've already done with the Discusses-To, Resolution and others, and in this case it should be quite doable programmatically with a few regexes, as I outline above, as it contains all the same information in a tightly structured format.

@CAM-Gerlach
Copy link
Member Author

Hey @AA-Turner , its been a few days now—have you come up with any other proposals? If not, I think we should just go ahead and merge this for now, especially considering the SC's approval in python/steering-council#113 , and if we do come up with and implement a better idea at some point in the future, it should be straightforward to convert this (as we have for other fields).

@AA-Turner
Copy link
Member

AA-Turner commented Mar 1, 2022

The other idea I had was a slight abuse of the RFC-2822 'name-addr' (name <email@domain>) syntax, as below:

Post-History: dd-mmm-yyyy <https://mail.python.org/spam>, ...

I have a sketch of parsing this as a proof of concept.

A

@CAM-Gerlach
Copy link
Member Author

CAM-Gerlach commented Mar 1, 2022

I like that lot better than your previous suggestion, since its much simpler, less of a change from the status quo, and has a more familiar syntax, which addresses a lot of my concerns with the previous. Relative to the approach I initially proposed here, it has a few advantages:

  • More visually consistent with other headers
  • Fewer special characters to remember and type
  • Can be more easily processed and enforced by our and others' tooling (without relying on docutils/Sphinx)

On the other hand, it does have a few downsides, none of which are blockers but worth considering:

  • Still bespoke-ish syntax that authors/editors need to learn/remember, rather than "regular" reST links
  • Doesn't actually follow RFC 822/2822 conventions for addresses despite appearing to (AFAIK)
  • We need to develop, test, review and maintain the processing code for it (what's the ETA on that?)

I was initially -0 on it relative to the current one here, mostly due to users (and us) having to learn and remember a new (ish) bespoke format rather than relying on their existing knowledge of reST confusion, but on the other hand basic reST links themselves are hard to remember and get right, and the relative simplicity of the approach, and its consistency with the other headers, is quite attractive. So I'm a cautious +0.5, I think, assuming you're able to implement this easily, and can update this PR accordingly if yours is reviewed and merged, though I'd want to hear what @hugovk and @JelleZijlstra think.

EDIT: I'm back to +/-0 due to not having thought of the fact that any solution you'd propose would presumably work under the new PEP 676 build system, so we'd either be blocked adopting it until if and when PEP 676 is adopted and fully implemented, or make do with rather ugly, space-inefficient text links until then. Whereas reST links work across any reST rendering system, without the custom extensions that are tied to Sphinx anyway.

@AA-Turner
Copy link
Member

what's the ETA on that?

PR at #2375

A

@AA-Turner
Copy link
Member

We would also need to update linting rules for Post-History, whatever route we go down.

A

@CAM-Gerlach
Copy link
Member Author

One big issue I didn't think about before with any of your proposals was if we use it instead of the initial proposed reST approach, , the processing isn't actually applied until if and when PEP 676 is accepted and fully implemented. Until then, if we do format date-links as suggested here, they'll show up in a rather space-consuming and ugly style, rather than as inline links, which is a bit of a regression at least in aesthetics and space efficiency, if not functionality. By contrast, reST links work identically across either (and any other reST based system, with which the processing is currently tied to anyway).

We could use reST links for now, and then use a regex to convert them if and when PEP 676 goes live (since these links are just reST links without the backticks and trailing underscore, otherwise the format is identical). But that does require more churn, for not entirely clear benefit.

We would also need to update linting rules for Post-History, whatever route we go down.

Right now there are none, since as previously discussed in response to your earlier reply, the field does not actually have that strict a format in practice, especially on older PEPs.

@hugovk
Copy link
Member

hugovk commented Mar 2, 2022

Post-History: dd-mmm-yyyy <https://mail.python.org/spam>, ...

This looks okay, people are usually going to copy and paste the template/another PEP to get started, and then copy and paste a date-link for new discussions.

But I would lean towards normal RST links.

The same copy/paste reasoning applies, plus it's the usual way, plus the custom way needs custom linting which likely means more regex ("now you have two problems").

@hugovk
Copy link
Member

hugovk commented Mar 2, 2022

And in general (as it's come up in other modernisation work): where things depend on whether PEP 676 is accepted, it may be better to just wait until that decision is made rather than having to weigh up and think out two parallel solutions.

@CAM-Gerlach
Copy link
Member Author

Where things depend on whether PEP 676 is accepted, it may be better to just wait until that decision is made rather than having to weigh up and think out two parallel solutions.

Yeah, I definitely wouldn't advocate the approach of starting with one and switching to the other, though to note, this aspect was mostly resolved by @AA-Turner just implementing equivalent functionality in the old system (albeit which, at least per my testing, doesn't appear to actually be working as expected yet).

the custom way needs custom linting which likely means more regex ("now you have two problems").

Technically speaking, either way syntax validity is checked by the parser (reST or custom), while the field matching the expected structure is checked by a regex (which is a delta of a few extra constant characters in the regex, offset by reST having the advantage of ` separating the string from the pre-string whitepsace). But by the same token, either way it is linting that validate the semantics of the field and provide a regex template for other tools to use, which nullifies the advantage of a bespoke approach in those aspects.

But I would lean towards normal RST links.

This is where I'm leaning again now myself after some more thought. It adds further complexity to our custom implementation we have to maintain, and another bespoke format PEP authors need to learn and remember (since they have to know reST links anyway for the prose, for better or worse), without much real corresponding benefit. At the end of the day, while the current format may happen to leverage existing reST processing, there isn't a real downside to that in terms of format independence since our custom parser would have to be rewritten for another rendering engine anyway, and if we're independently linting that the field conforms to a specific format, and considering tools reading them, it doesn't make much difference changing the format by a few constant characters other than existing tools that read reST will understand the former automatically.

Therefore, to move forward, I propose we:

  • Merge this PR specifying a simplified, structured subset of the reST syntax for links
  • Drop the bespoke processing part of PR Several PEPs: Normalise Post-History #2375
  • Retain the normalizations in that PR, converting the few that were linkified to the reST-based format, and merge that
  • I follow up with a PR (already prototyped) adding automated linting checks for it, like for the other headers

@CAM-Gerlach
Copy link
Member Author

BTW, whichever format we choose here we should also use for the Resolution header, such that the displayed inline linked text is the date of resolution (acceptance/rejection/withdrawal) rather than the venue on which the resolution occurs. The former is far more useful and meaningful than the latter (I frequently want to reference the former, and have to resort to clicking through to the list or trawling the commit history, while the latter is essentially an implementation detail), This also allows us to list the date PEPs are withdrawn, or accepted/rejected directly, without a linked announcement (like some earlier PEPs in the BDFL era).

Also, I've finished developing and testing the cheap pygrep checker/linter (like we have for the others) for the Post-History field (both reST or faux-RFC 2822-like), which rigorously validates the field's contents to make external tool parsing straightforward, regardless of which format we choose). In addition, I've added and tested linters for the remaining un-validated headers and improved the existing ones, which only requires minor changes in a handful of PEPs, such that all headers are now fully validated and can be parsed with either relatively straightforward regexes or a simple script, independent of output format. I'll open a PR for that once this PR and the normalizations in #2375 are merged.

@AA-Turner
Copy link
Member

I continue to be -1 on reST syntax in the headers, I'll try to outline my reasoning:

specifying a simplified, structured subset of the reST syntax for links

If an argument is that reST syntax is familiar, not allowing the full gamut of link syntax seems to render that moot immediatley. Why can't I do

Post-History: 01-Jan-2000__, 01-Feb-2000__

___: https://example.org/jan
___: https://example.org/feb

or

Post-History: `01-Jan-2000 <python-3000_>`_, `01-Feb-2000 <python-dev_>`_

_python-3000: https://example.org/jan
_python-dev: https://example.org/feb

-- whilst slightly contrived, these are valid reST syntax and arbitrarily banning them means that we are imposing our own rules. As such, if we're specifying our own rules, why choose something that is content-type dependent?

If there is strong consensus in favour of the proposal as it stands now I will withdraw my objections, but I hope you can appreciate my position.

A

P.S.:

whichever format we choose here we should also use for the Resolution header

This is a different change & should be discussed as such, but yes the outcome of this will likely influence that.

@CAM-Gerlach
Copy link
Member Author

-- whilst slightly contrived, these are valid reST syntax and arbitrarily banning them means that we are imposing our own rules. As such, if we're specifying our own rules, why choose something that is content-type dependent?

Yes, we would be specifying a stricter subset of reST syntax, albeit the subset that users are likely to be most familiar with and likely to use anyway, and works with any reST parser including our existing one. The restrictions aren't arbitrary; they ensure the field is self-contained and easily consumable by tooling, same as we'd do with your custom syntax. This is not really the same as making up our own brand new bespoke format that's somewhat (perhaps confusingly) similar to but incompatible with RFC 2822 email addresses, developing and maintaining our own parser for it (which is itself coupled to the Sphinx internals), and expecting users to pick up and remember it.

Furthermore, taking a step back, I also think that all of us have much more important and valuable things to spend our time on than bikeshedding over a few characters of the syntax of a field in the PEP header so that they conform to a particular personal notion of aesthetic/conceptual purity (not to mention developing, testing, reviewing and maintaining that code), and instead just go with the existing status quo option that already works just fine with no clear practical downside. If the current working syntax does end up posing some actual real-world problem someday in the future, we can always change it then, when we actually have a clear idea of the problem and what will actually solve it.

I feel "practicality over purity" and "the great is the enemy of the good" sometimes gets lost in these discussions over trivialities—and I say that as someone guilty of that myself many times.

Copy link
Member

@AA-Turner AA-Turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an improvement over the status quo, and although I still have reservations we can change this in the future should need be.

Do we define "inline-linked" to be the specific form anywhere?

A

pep-0012.rst Outdated Show resolved Hide resolved
@CAM-Gerlach CAM-Gerlach force-pushed the pep-1-12-post-history-upgrade branch from f95e99b to ceec060 Compare March 8, 2022 03:16
@CAM-Gerlach
Copy link
Member Author

Do we define "inline-linked" to be the specific form anywhere?

Nope, good point. I've gone ahead and done so in PEP 12, and linked to the Hyperlink section with more details.

@CAM-Gerlach CAM-Gerlach requested a review from AA-Turner March 8, 2022 03:17
@CAM-Gerlach CAM-Gerlach requested a review from hugovk March 8, 2022 05:35
@CAM-Gerlach
Copy link
Member Author

Looks like this has been open for two weeks and has picked up two approvals, one generally approving comment and comment review that appears to have been addressed. Anyone have further feedback before we merge this and move on with the PRs pending this one's resolution?

Copy link
Member

@hugovk hugovk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upgrading my generally approving comment into an approval :)

@CAM-Gerlach
Copy link
Member Author

FYI, this is blocking #2375 , which is blocking my already implemented and tested improvements to the checks to validate the format of this header and the Discussions-To, among others, so they can be reliably parsed programmatically. This would potentially have avoided incidents like #2393 , which introduced a new Discussions-To header with an invalid format, which is a regression on #2361 .

@CAM-Gerlach
Copy link
Member Author

CAM-Gerlach commented Mar 9, 2022

Once PEP 676 is Final (and these related current and pending PRs are merged), to avoid the now-somewhat redundant nature of manually filling in the Discussions-To field after this PR, we can just parse the Post-History field and extract the last link (which should be pretty easy due to its strict validated format) and display it (with the existing processing) in the rendered Discussions-To field. This avoids users having to fill in that field themselves, in a different format as in the Post-History (and Resolution, if we change that to match this).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants