Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Whitehall blockquote changes #81

Merged
merged 5 commits into from
Aug 5, 2016
Merged

Support Whitehall blockquote changes #81

merged 5 commits into from
Aug 5, 2016

Conversation

kevindew
Copy link
Member

@kevindew kevindew commented Aug 5, 2016

This PR introduces two features of whitehall related to blockquotes, both are somewhat interesting and potentially contentious.

1. Removing quotes from blockquotes

These are removed as quotes are added automatically as part of styling on the frontend. It will change markdown of > "test" to > test.

As this was a global applying of a function I placed it outside the extension system as it didn't have a regexp to match - this doesn't feel a very nice way to handle it though, so I welcome suggestions on how to improve this.

This also may not work particularly well. I've used the same code as Whitehall so it has full compatibility, but it does look like there are some dodgy ways it can run.

2. adding a class to the last <p> element within a blockquote.

For example:

<blockquote>
  <p>First Paragraph</p>
  <p>Second Paragraph</p>
</blockquote>

becomes:

<blockquote>
  <p>First Paragraph</p>
  <p class="last-child">Second Paragraph</p>
</blockquote>

To achieve this we've used nokogiri as per the implementation on Whitehall. This has had a number of side effects:

  1. Formatting of the HTML returned is different (notably block level elements are on new lines)
  2. HTML encoded UTF-8 characters that nokogiri considers safe are output as UTF-8 rather than HTML encoded. For example: &yen; is returned as ¥
  3. HTML encoding done numerically is converted to symbolically. For example &#62; is returned as &gt;

For this feature I've added a PostProcessor class which has a similar extension system to that of the extensions within the main govspeak class, a key difference however being that these are performed on the entire body rather than a matched section. It has the potential though that this could be changed to match a css selector to scope the nokogiri elements coming in.

# will be formatted to:
# > test
def self.remove(source)
return nil if source.nil?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nil is redundant here

This introduces a PostProcessor class which can perform the post
processing tasks.

The first task in this is one to add set the last <p> within a
blockquote to have a class of 'last-child'
Notably nokogiri converts HTML entities into UTF-8 characters where
appropriate and changes numeric entities into symbolic ones. This has
left us with a broken test.
Importing the blockquote extra removal class from Whitehall. The tests
have been maintained as per those in Whitehall.

It does look like there are likely a number of problems within this
function and tests:

- It likely catches items on a single line which would not be rendered
  in a blockquote if there is not a line break before them
- It seems to handle quotes within the blockquote outside of the
  condition here poorly. In a number of cases it looks like tests have
  been planned for them to work properly but wasn't implemented.

Due to this being planned for consistency with Whitehall I'm reluctant
to change behaviour here in case that causes backwards compatibility
issues.
This isn't particularly nice as unlike the other extensions this is
applied to the whole body rather than a matched section. Thus this is
applied outside of the extension system.
Prior to post-processing being introduced to govspeak we had the ability
to specify the entity_output of the HTML. By using nokogiri to format
to format the HTML we lose the ability to do this as nokogiri will
automatically perform conversions to the entities.

In cases where nokogiri encounters a encoded HTML entity that it can
safely output as a utf-8 character it will automatically replace the
entity with the character. For example: `&yen;` is converted to `¥`

In cases where nokogiri encounters a numeric HTML entity it will convert
that into a symbolic one. For example: `&#62;` is converted to `&gt;`

I haven't found any instances where our applications are using the
`:numeric` option and believe it was only introduced as an option as a
side effect of a different change, rather than being something that was
required:
74ea8c7
@dougdroper dougdroper merged commit 5a86381 into master Aug 5, 2016
@dougdroper dougdroper deleted the blockquotes branch August 5, 2016 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants