Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-slug HTML attachments if they are unpublished #5942

Merged
merged 4 commits into from
Jan 12, 2021

Conversation

brucebolt
Copy link
Member

@brucebolt brucebolt commented Jan 11, 2021

It appears we have long standing logic (added in c45922d) that will do the following with HTML attachments:

  • if a document has never been published before, the URL slug will be updated automatically each time the attachment title is changed
  • if a document has previously been published, the URL slug for new (or modified existing) attachments will never be updated after the title has been entered for the first time

This behaviour was added to ensure links to previously published HTML attachments continue functioning even if the attachment's title is changed in a subsequent edition.

However, this causes confusion if a publisher adds a new attachment in a later edition and finds the slug does not get updated (e.g. from a placeholder). Therefore we need to handle both scenarios correctly.

This adds an additional scenario, so placeholder titles can be used in subsequent editions:

  • if a document has never been published before, the URL slug for attachments will be updated automatically each time the attachment title is changed
  • if a document has previously been published, the URL slug for existing attachments will never be updated after the title has been entered for the first time (to ensure users who have bookmarked the attachment can still access it)
  • if a document has previously been published, the URL slug for new attachments will be updated automatically each time the attachment title is changed until the attachment has been published

Zendesk ticket: https://govuk.zendesk.com/agent/tickets/4420527

It appears we have long standing logic (added in
c45922d) that will do the following:
- if a document has never been published before, the URL slug will be
  updated automatically each time the attachment title is changed
- if a document has previously been published, the URL slug for new (or
  modified existing) attachments will never be updated after the title
  has been entered for the first time

This behaviour was added to ensure links to previously published HTML
attachments continue functioning even if the attachment's title is changed
in a subsequent edition.

However, this causes confusion if a publisher adds a new attachment in a
later edition and finds the slug does not get updated (e.g. from a
placeholder). Therefore we need to handle both scenarios correctly.

Also updates descriptions of other related tests to ensure they reflect
what they are actually testing (the reslugging is determined on whether
the document has been published, not the edition).
@brucebolt brucebolt force-pushed the update-new-attachment-slug branch from 0449d78 to a94abc1 Compare January 11, 2021 17:29
@brucebolt brucebolt marked this pull request as ready for review January 12, 2021 09:47
This will be set to true only for HTML attachments that have never
previously been published. It will flag whether we can safely reslug a
HTML attachment when the title changes prior to publication.

Once an attachment has been published, it is no longer safely
resluggable otherwise links to that attachment will break.

We will set the value for all existing attachments to false, to ensure
the existing behaviour is maintained for attachments that have been
created.

The reason we need this column is that when a document has a new edition
created, the attachments get duplicated (with a new created_at time,
etc) and there is no link back to the previous version of that
attachment so there is no other way of telling whether an attachment has
previously been published or not.
Previously we would only reslug an attachment on a never published
document. However, we should be restricting this to only never published
attachments. The status of the document is irrelevant, as the link to
the attachment will never have been used until the point it is itself
published.
When a new edition is created, we duplicate all the attachment
attributes. The copied attachments are no longer safely reslugged as the
URLs have previously been published.
@brucebolt brucebolt force-pushed the update-new-attachment-slug branch from a94abc1 to 8cb4da0 Compare January 12, 2021 10:44
Copy link
Contributor

@AlanGabbianelli AlanGabbianelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and thanks for clarifying in the commit message why we need this extra column. ⭐

@brucebolt brucebolt merged commit 70454ec into master Jan 12, 2021
@brucebolt brucebolt deleted the update-new-attachment-slug branch January 12, 2021 11:06
ChrisBAshton added a commit that referenced this pull request Feb 5, 2025
This PR:

- No longer sets `slug` to `nil` when the locale is non-English.
  Every locale is considered "sluggable".
- This also means we no longer need the `sluggable_locale?`,
  `slug_eligible?` methods and ASCII checks. (The latter has
  caused a number of Zendesk tickets, whereby even English HTML
  Attachments have been given a content-ID based URL, because
  they contain a non-ASCII character like a fancy single quote).
- `sluggable_string` doesn't appear to have been in use, so it
  was deleted. (`sluggable_string` is used in the Document class:
  `friendly_id :sluggable_string`. But the Attachment class uses
  `friendly_id :title`, so overriding the `sluggable_string`
  method had no effect).
- Adds a test for friendly_id's native "uniqueness" feature
  (which we've opted into via `use: :scoped, scope: :attachable`
  on the Attachment class), so in theory we should be protected
  from clashes in the attachment slug namespace. Therefore the
  issue alluded to in #5321 (which reverts the last time this
  was attempted) should not happen again.
- Adds a basic safeguard that the slug-generation results in a
  slug that contains at least 4 a-z or A-Z characters. This is
  to protect against generating a slug like `-`, which could
  happen from sluggifying a title like `{chinese}-{chinese}`.
- That said, if we somehow DO get into a state whereby an
  attachment has what we'd call an invalid slug (<4 azAZ chars),
  we don't want to fix it if the attachment is already live.
  So we check for `safely_resluggable?`, introduced in #5942.

Result:

- Most HTML attachments should now have human-readable slugs.
  Only languages that are non-ASCII and cannot be converted to
  ASCII, such as Chinese, will continue to fall back to the
  content-ID based URLs.

Trello: https://trello.com/c/YGsEG9YV/3418-improve-html-attachment-slug-creation
ChrisBAshton added a commit that referenced this pull request Feb 5, 2025
This PR:

- No longer sets `slug` to `nil` when the locale is non-English.
  Every locale is considered "sluggable".
- This also means we no longer need the `sluggable_locale?`,
  `slug_eligible?` methods and ASCII checks. (The latter has
  caused a number of Zendesk tickets, whereby even English HTML
  Attachments have been given a content-ID based URL, because
  they contain a non-ASCII character like a fancy single quote).
- `sluggable_string` doesn't appear to have been in use, so it
  was deleted. (`sluggable_string` is used in the Document class:
  `friendly_id :sluggable_string`. But the Attachment class uses
  `friendly_id :title`, so overriding the `sluggable_string`
  method had no effect).
- Adds a test for friendly_id's native "uniqueness" feature
  (which we've opted into via `use: :scoped, scope: :attachable`
  on the Attachment class), so in theory we should be protected
  from clashes in the attachment slug namespace. Therefore the
  issue alluded to in #5321 (which reverts the last time this
  was attempted) should not happen again.
- Adds a basic safeguard that the slug-generation results in a
  slug that contains at least 4 a-z or A-Z characters. This is
  to protect against generating a slug like `-`, which could
  happen from sluggifying a title like `{chinese}-{chinese}`.
- That said, if we somehow DO get into a state whereby an
  attachment has what we'd call an invalid slug (<4 azAZ chars),
  we don't want to fix it if the attachment is already live.
  So we check for `safely_resluggable?`, introduced in #5942.

Result:

- Most HTML attachments should now have human-readable slugs.
  Only languages that are non-ASCII and cannot be converted to
  ASCII, such as Chinese, will continue to fall back to the
  content-ID based URLs.

Next steps:

- Now that all Whitehall tables use the UTF8MB4 charset (#9767),
  we could potentially use CJK character sets natively in
  slugs - but it may not always show nicely in the browser.
  Compare `https://example.com/你好` with
  `https://example.com/%E4%BD%A0%E5%A5%BD`.
- The option we'd likely want to go with is conversion to their
  Pinyin representations, e.g. `https://example.com/ni-hao`,
  probably using the `pinyin` or `ruby-pinyin` gem.

Trello: https://trello.com/c/YGsEG9YV/3418-improve-html-attachment-slug-creation
ChrisBAshton added a commit that referenced this pull request Feb 5, 2025
This PR:

- No longer sets `slug` to `nil` when the locale is non-English.
  Every locale is considered "sluggable".
- This also means we no longer need the `sluggable_locale?`,
  `slug_eligible?` methods and ASCII checks. (The latter has
  caused a number of Zendesk tickets, whereby even English HTML
  Attachments have been given a content-ID based URL, because
  they contain a non-ASCII character like a fancy single quote).
- `sluggable_string` doesn't appear to have been in use, so it
  was deleted. (`sluggable_string` is used in the Document class:
  `friendly_id :sluggable_string`. But the Attachment class uses
  `friendly_id :title`, so overriding the `sluggable_string`
  method had no effect).
- Adds a test for friendly_id's native "uniqueness" feature
  (which we've opted into via `use: :scoped, scope: :attachable`
  on the Attachment class), so in theory we should be protected
  from clashes in the attachment slug namespace. Therefore the
  issue alluded to in #5321 (which reverts the last time this
  was attempted) should not happen again.
- Adds a basic safeguard that the slug-generation results in a
  slug that contains at least 4 a-z or A-Z characters. This is
  to protect against generating a slug like `-`, which could
  happen from sluggifying a title like `{chinese}-{chinese}`.
- That said, if we somehow DO get into a state whereby an
  attachment has what we'd call an invalid slug (<4 azAZ chars),
  we don't want to fix it if the attachment is already live.
  So we check for `safely_resluggable?`, introduced in #5942.

Result:

- Most HTML attachments should now have human-readable slugs.
  Only languages that are non-ASCII and cannot be converted to
  ASCII, such as Chinese, will continue to fall back to the
  content-ID based URLs.

Next steps:

- Now that all Whitehall tables use the UTF8MB4 charset (#9767),
  we could potentially use CJK character sets natively in
  slugs - but it may not always show nicely in the browser.
  Compare `https://example.com/你好` with
  `https://example.com/%E4%BD%A0%E5%A5%BD`.
- The option we'd likely want to go with is conversion to their
  Pinyin representations, e.g. `https://example.com/ni-hao`,
  probably using the `pinyin` or `ruby-pinyin` gem.

Trello: https://trello.com/c/YGsEG9YV/3418-improve-html-attachment-slug-creation
ChrisBAshton added a commit that referenced this pull request Feb 5, 2025
This PR:

- No longer sets `slug` to `nil` when the locale is non-English.
  Every locale is considered "sluggable".
- This also means we no longer need the `sluggable_locale?`,
  `slug_eligible?` methods and ASCII checks. (The latter has
  caused a number of Zendesk tickets, whereby even English HTML
  Attachments have been given a content-ID based URL, because
  they contain a non-ASCII character like a fancy single quote).
- `sluggable_string` doesn't appear to have been in use, so it
  was deleted. (`sluggable_string` is used in the Document class:
  `friendly_id :sluggable_string`. But the Attachment class uses
  `friendly_id :title`, so overriding the `sluggable_string`
  method had no effect).
- Adds a test for friendly_id's native "uniqueness" feature
  (which we've opted into via `use: :scoped, scope: :attachable`
  on the Attachment class), so in theory we should be protected
  from clashes in the attachment slug namespace. Therefore the
  issue alluded to in #5321 (which reverts the last time this
  was attempted) should not happen again.
- Adds a basic safeguard that the slug-generation results in a
  slug that contains at least 4 a-z or A-Z characters. This is
  to protect against generating a slug like `-`, which could
  happen from sluggifying a title like `{chinese}-{chinese}`.
- That said, if we somehow DO get into a state whereby an
  attachment has what we'd call an invalid slug (<4 azAZ chars),
  we don't want to fix it if the attachment is already live.
  So we check for `safely_resluggable?`, introduced in #5942.

Result:

- Most HTML attachments should now have human-readable slugs.
  Only languages that are non-ASCII and cannot be converted to
  ASCII, such as Chinese, will continue to fall back to the
  content-ID based URLs.

Next steps:

- Now that all Whitehall tables use the UTF8MB4 charset (#9767),
  we could potentially use CJK character sets natively in
  slugs - but it may not always show nicely in the browser.
  Compare `https://example.com/你好` with
  `https://example.com/%E4%BD%A0%E5%A5%BD`.
- The option we'd likely want to go with is conversion to their
  Pinyin representations, e.g. `https://example.com/ni-hao`,
  probably using the `pinyin` or `ruby-pinyin` gem.

Trello: https://trello.com/c/YGsEG9YV/3418-improve-html-attachment-slug-creation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants