-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a validation bug #237
Fix a validation bug #237
Conversation
The HTML Validator was sanitizing Markdown which had been converted to HTML and had post-processors applied. This meant that any unsafe HTML added by a post-processor was causing the source Markdown to be considered invalid. Logically, it seems more sensible for HTML to be validated after conversion from Markdown, but **before** the post-processors have altered the HTML. This way, we only validate user input rather than validating the output of our own post-processors, which we consider to be trusted.
This commit fixes an inconsistency in how the `allowed_image_hosts` option was used when sanitizing generated HTML. The option was used by `Govspeak::HtmlValidator` class. However, that option had no effect when sanitization was performed by `Govspeak::Document` when initialised with the option `sanitize: true`. In practice, this means that generated HTML markup could still contain disallowed image hosts if the source Markdown wasn't validated with `HtmlValidator` first.
Additional thoughts about this codeThe interfaces of some of these classes – particularly the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great investigation work @ollietreend! Code changes all look good to me.
There appears to be a lack of testing around the post processing. I wonder if increasing the test coverage there would have caught this problem and blocked the PR that bumped us to the newer version of govuk_publishing_components
?
9e8e2ff
to
79459bc
Compare
The SafeHtmlValidator checks values for unsafe HTML. For HTML attachments, this validator has been running both on the Govspeak input as well as the generated HTML output. However, it's possible (and perfectly valid) for generated HTML output to contain HTML that would be considered unsafe if entered by a user. When Govspeak is converted to HTML, it's possible for govuk_publishing_components to be included – for example, when adding a [button] to a document – or indeed any other HTML that might be added in post-processing. This commit changes the validation rule so it only applies to Govspeak input provided by the user. Govspeak HTML output is no longer validated. This is safe to do because Govspeak post-processing is trusted code. For more information, see alphagov/govspeak#237 which established this as safe behaviour. [button]: https://github.com/alphagov/govspeak#button
What
This PR fixes a couple of bugs in the way generated HTML markup was being sanitised and validated.
1. Validation before post-processing
Validation is now performed before post-processors have acted upon the generated HTML. This means that disallowed HTML tags and attributes can now be inserted into a document by post-processors – since they are trusted code – but still cannot be inserted into the source Markdown by users.
In other words, the order of processing is now:
This seems to be the most logical way for validation to be performed, although it wasn't actually being done that way in practice.
Previously, post-processing would happen before the validation step:
Resulting in changes in post-processed elements – such as button elements – to be run through the validator.
2. Apply
allowed_image_hosts
consistentlyThe first change resulted in tests covering the
allowed_image_hosts
sanitiser option to fail.It turns out the
allowed_image_hosts
wasn't being carried through to theGovspeak::Document
object which performed the final HTML conversion & sanitisation, meaning this rule wasn't being applied consistently between the validation and final markup produced.Why is this needed?
This was brought to our attention by users of Publishing apps who have recently become unable to save documents containing Start Button elements.
Whenever a document contained the following snippet of Govspeak-flavoured Markdown:
a validation error would be displayed, and the user's changes wouldn't save.
Therefore, users have been unable to create/update any pages that happen to contain a start button.
Why did this happen?
This bug only appeared recently, and it wasn't caused by a change in this gem.
This gem is dependent on the govuk_publishing_components gem to render button elements. A recent change to that gem, which tweaked the HTML markup used for buttons, had the side effect of exposing a flaw in the way validation was being performed in this gem.
The new HTML markup for buttons was triggering a validation error due to it using disallowed HTML attributes – even though this wasn't HTML entered by the user, but was instead generated by the govuk_publishing_components gem.
Visual Changes
None.
Trello: https://trello.com/c/HQ85E1ux/1315-publishers-unable-to-save-drafts-when-there-is-a-start-button