Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Try: Use JSON to encode block attribute values #1088

Closed
wants to merge 1 commit into from

Conversation

westonruter
Copy link
Member

Depends on #1087, among many other things, including a PHP parser generator for the PEG.js grammar to be utilized in the do_blocks PHP function.

Fixes #1086.

@nb
Copy link
Member

nb commented Jun 12, 2017

❤️ the idea, I feel this will make things a lot simpler on so many levels:

  • for just a couple of attributes it’s almost as pretty;
  • allows more sophisticated attribute structures;
  • has some type support as it differentiates between numbers/strings/booleans/arrays/key-value pairs;
  • makes parsing trivial (and maybe faster);
  • makes serialization simpler and more accessible for other tools, plenty of JSON encoders there;
  • solves some escaping/quoting problems we already have.

@dmsnell
Copy link
Member

dmsnell commented Jun 12, 2017

thanks @nb for the comments.

for just a couple of attributes it’s almost as pretty;

I'm not sure I agree on this

name="Alice" age=25 vs. {"name":"Alice","age":25}

allows more sophisticated attribute structures;

for sure. of course, simply storing JSON in the existing attributes lets us do this already

has some type support as it differentiates between numbers/strings/booleans/arrays/key-value pairs;
makes serialization simpler and more accessible for other tools, plenty of JSON encoders there;

yes. 👍
(although still easily possible by just storing JSON in an existing named attribute)

makes parsing trivial (and maybe faster);

I doubt this would speed up the parse because we'd be moving from native JSON parsing in the browser (really really fast) to parsing in JavaScript.

solves some escaping/quoting problems we already have.

yeah I think so, through I don't see escaping as a big deal. there should be almost no exceptions on escaping that I can think of (single and double quote characters) so I'm not sure this is a big advantage here. more so it brings some new challenges because while JSON and POJOs (plain-old JavaScript objects) are similar, JSON has more constraints and we'd need to make sure we carefully consider which specifications we want to implement.

thanks!

@nb
Copy link
Member

nb commented Jun 12, 2017

I'm not sure I agree on this

Almost :)

for sure. of course, simply storing JSON in the existing attributes lets us do this already

Adds another branch – you know the long-term effect on logic branching, especially at such a higher level.

types & serialization
yes. 👍
(although still easily possible by just storing JSON in an existing named attribute)

This is one of the strongest points for me – parsing consistency (in baba="1" is this a true or a 1 or the string one). Also, if I want to insert a block via the REST API from a totally different language, I don’t have a consistent and familiar way to do it. Maybe something in the vein of htmlspecialchars in PHP (or HTML escaping in any other language) will do it, but my developer confidence wouldn’t be very high. It’s so much easier to communicate to everybody – “JSON!” and it’s all 100% clear.

@dmsnell
Copy link
Member

dmsnell commented Jun 15, 2017

Proposed by @nb maybe online somewhere or maybe in person…

<!-- wp:core/text {
  description: "content is a valid JSON object or POJO"
  terms: {
    POJO: "Plain-old-javascript object. Like JSON but with fewer double-quotes"
  }
} -->
Just some content
<!-- /wp:core/text -->

It would be nice to know what kind of "gotcha"s we might encounter doing unserialized JSON/JS in the HTML comment. Further, should we include @westonruter's import of the JSON grammar? (at first I thought probably not, but now that we're discussing actual unescaped JSON I could easily see it being valuable)

  • What happens if the JSON parse fails but the block otherwise would be good?
  • What do we need to do to prevent breaking out of the HTML comment? (e.g. { text: 'End a comment with "-->"' } )

@sirreal
Copy link
Member

sirreal commented Jun 15, 2017

Spec

The HTML spec on comments is pretty quick to read:

Comments must start with the four character sequence U+003C LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS ().

I'll attempt to TL;DR:

  • start with <!--
  • start sequence isn't immediately followed by > or ->, ie <!--> and <!---> are not valid start sequences.
  • not contain --
  • not end with -, meaning ---> is an invalid closing sequence.
  • end with -->

Here's an example HTML to highlight problematic comments which can be pasted into a validator:

<!doctype html>
<html>
<head><title>Title</title></head>
<body>
<!-- VALID -->
<!--
  <Everything/> else is -\- `{ fair: "game" }`.
  Starting and ending with fixed character such as " " is a good practice.
  The remaining problem is two sequential "-" characters.
-->

<!--> INVALID -->
<!---> INVALID -->
<!-- INVALID -- -->
<!-- INVALID --->
</body>
</html>

Suggestions

The peculiarities of opening and closing sequences are easily mitigated by opening comments with something static, this appears to be either or / currently, so there shouldn't be any problems.

That means the only invalid sequence that concerns us is --. I'm not sure what the best way of escaping this is.

A few proposals to consider are:

  • Considering objects and arrays, I don't believe - can appear anywhere outside of a string. That would mean that -- could be escaped as -\-, identical in Javascript strings and legal comment text content
  • The unicode sequence is another alternative \u002D.
"--" === "-\-"          // true
"--" === "\u002D\u002D" // true

@dmsnell
Copy link
Member

dmsnell commented Jun 15, 2017

@sirreal thanks so much for finding and summarizing the spec. looks to me like replacing the hyphens with the Unicode literal would be best. we could run into other troubles with strings of dashes, like -------

though I would propose we use \u{2d} instead of \u002d even though they have the same number of characters. personal style is all.

@nb
Copy link
Member

nb commented Jun 15, 2017

@dmsnell, @sirreal are there any notes from the work done today in person?

@dmsnell
Copy link
Member

dmsnell commented Jun 16, 2017

@nb after talking with you and fighting your very good challenges, we arrived on this transform…

const serializeAttrs = attrs => JSON.stringify( attrs )
	.replace( /--/g, '\\u002d\\u002d' ) // don't break HTML comments
	.replace( /</g, '\\u003c' ) // don't break standard-non-compliant tools
	.replace( />/g, '\\u003e' ) // ibid
	.replace( /&/g, '\\u0026' ) // ibid

@nb
Copy link
Member

nb commented Jun 16, 2017

❤️

And unserializeAttr is still just JSON.parse?

@dmsnell
Copy link
Member

dmsnell commented Jun 16, 2017

@nb check out #1213 where I have implemented this idea

And unserializeAttr is still just JSON.parse?

yep, though it happens in the parser and the app need not change it. this was a choice I made because I didn't want the parser to produce an output which needed further processes sing to be a basic reasonable output

@westonruter
Copy link
Member Author

Closing in favor of #1213. Love the outcome here guys. 👍

@westonruter westonruter deleted the try/json-attribute-value-encoding branch June 16, 2017 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants