-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent whitespace treatement #35
Comments
@cdsmith Any thoughts on this? I went back a ways to commit 7779316 to see if this behavior might have been related to this issue snapframework/heist#8. But we still had the current behavior back then, so at the moment I can't think of any reason not to fix this. |
Yep, I have thoughts. We have always dropped text at the beginning of a document. Code for the XML case is at https://github.com/snapframework/xmlhtml/blob/master/src/Text/XmlHtml/XML/Parse.hs#L336, where The original decision was caught up in the complex mess of relationships between several goals:
It turns out it's hard to satisfy all of these constraints at once. Because the xml decl and doctype aren't in the node tree, a naive implementation would run them together at the beginning of the document like
To avoid this, we choose to always write a newline after the xml decl and DOCTYPE lines. But if that were parsed as text in the document, then I'm not sure this is the right balance. And in fact, we don't even preserve all those nice properties, because it's possible to manually construct a Maybe there's a better way to reconcile all these nice goals with more special-case rules. Maybe since one of them has been broken for a year and a half in some corner cases anyway, it's time to just give up on the By the way, changing this is not a backward-compatible change. It's among the more benign sorts of non-backward-compatible changes in that code that breaks was probably a little broken already; but still, it's possible someone is doing something like just grabbing the first node of a parsed |
Leading whitespace treatment in
xmlhtml
is inconsistent.Consider the following behavior of
parseXML
(the same happens withparseHTML
) where leading whitespace is always dropped, and trailing whitespace is always kept:See what happens, however, when the “leading whitespace” comes after some element:
These two examples behave differently, and I think the correct behavior is the one from the latter example, since
xmlhtml
should not be discarding the contents of a text node.So, my proposal is:
Keep the behavior of leading whitespace after an element as it is today.
Keep the behavior of trailing whitespace everywhere as it is today.
Fix top-level text node parsing so that it doesn't discard leading whitespace.
The text was updated successfully, but these errors were encountered: