Support Whitehall blockquote changes #81

kevindew · 2016-08-05T11:24:32Z

This PR introduces two features of whitehall related to blockquotes, both are somewhat interesting and potentially contentious.

1. Removing quotes from blockquotes

These are removed as quotes are added automatically as part of styling on the frontend. It will change markdown of > "test" to > test.

As this was a global applying of a function I placed it outside the extension system as it didn't have a regexp to match - this doesn't feel a very nice way to handle it though, so I welcome suggestions on how to improve this.

This also may not work particularly well. I've used the same code as Whitehall so it has full compatibility, but it does look like there are some dodgy ways it can run.

2. adding a class to the last `<p>` element within a blockquote.

For example:

<blockquote>
  <p>First Paragraph</p>
  <p>Second Paragraph</p>
</blockquote>

becomes:

<blockquote>
  <p>First Paragraph</p>
  <p class="last-child">Second Paragraph</p>
</blockquote>

To achieve this we've used nokogiri as per the implementation on Whitehall. This has had a number of side effects:

Formatting of the HTML returned is different (notably block level elements are on new lines)
HTML encoded UTF-8 characters that nokogiri considers safe are output as UTF-8 rather than HTML encoded. For example: ¥ is returned as ¥
HTML encoding done numerically is converted to symbolically. For example > is returned as >

For this feature I've added a PostProcessor class which has a similar extension system to that of the extensions within the main govspeak class, a key difference however being that these are performed on the entire body rather than a matched section. It has the potential though that this could be changed to match a css selector to scope the nokogiri elements coming in.

dougdroper · 2016-08-05T12:03:21Z

lib/govspeak/blockquote_extra_quote_remover.rb

+    # will be formatted to:
+    # > test
+    def self.remove(source)
+      return nil if source.nil?


nil is redundant here

This introduces a PostProcessor class which can perform the post processing tasks. The first task in this is one to add set the last <p> within a blockquote to have a class of 'last-child'

Notably nokogiri converts HTML entities into UTF-8 characters where appropriate and changes numeric entities into symbolic ones. This has left us with a broken test.

Importing the blockquote extra removal class from Whitehall. The tests have been maintained as per those in Whitehall. It does look like there are likely a number of problems within this function and tests: - It likely catches items on a single line which would not be rendered in a blockquote if there is not a line break before them - It seems to handle quotes within the blockquote outside of the condition here poorly. In a number of cases it looks like tests have been planned for them to work properly but wasn't implemented. Due to this being planned for consistency with Whitehall I'm reluctant to change behaviour here in case that causes backwards compatibility issues.

This isn't particularly nice as unlike the other extensions this is applied to the whole body rather than a matched section. Thus this is applied outside of the extension system.

Prior to post-processing being introduced to govspeak we had the ability to specify the entity_output of the HTML. By using nokogiri to format to format the HTML we lose the ability to do this as nokogiri will automatically perform conversions to the entities. In cases where nokogiri encounters a encoded HTML entity that it can safely output as a utf-8 character it will automatically replace the entity with the character. For example: `¥` is converted to `¥` In cases where nokogiri encounters a numeric HTML entity it will convert that into a symbolic one. For example: `>` is converted to `>` I haven't found any instances where our applications are using the `:numeric` option and believe it was only introduced as an option as a side effect of a different change, rather than being something that was required: 74ea8c7

dougdroper reviewed Aug 5, 2016
View reviewed changes

kevindew added 5 commits August 5, 2016 14:59

Blockquote post-processing

4f8e229

This introduces a PostProcessor class which can perform the post processing tasks. The first task in this is one to add set the last <p> within a blockquote to have a class of 'last-child'

Update tests for Nokogiri HTML output

a227e2c

Notably nokogiri converts HTML entities into UTF-8 characters where appropriate and changes numeric entities into symbolic ones. This has left us with a broken test.

Apply quote remover to document body

7a597b7

This isn't particularly nice as unlike the other extensions this is applied to the whole body rather than a matched section. Thus this is applied outside of the extension system.

kevindew force-pushed the blockquotes branch from 59fc869 to 7f5b905 Compare August 5, 2016 14:05

dougdroper merged commit 5a86381 into master Aug 5, 2016

dougdroper deleted the blockquotes branch August 5, 2016 14:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Whitehall blockquote changes #81

Support Whitehall blockquote changes #81

kevindew commented Aug 5, 2016

dougdroper Aug 5, 2016

Support Whitehall blockquote changes #81

Support Whitehall blockquote changes #81

Conversation

kevindew commented Aug 5, 2016

1. Removing quotes from blockquotes

2. adding a class to the last <p> element within a blockquote.

dougdroper Aug 5, 2016

Choose a reason for hiding this comment

2. adding a class to the last `<p>` element within a blockquote.