-
-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lambdasoup eats the doctype #32
Comments
It is |
Does the attached commit, which is now in |
Yes, current master fixes the issue, thanks! It seems to have two minor problems though:
None of the two affect me, but could cause some surprise for other users |
This is now in opam in
Do you mean the casing? Or the presence/absence of the doctype?
Lambda Soup would need to distinguish explicitly between full documents and fragments to handle that intelligently. Also, Of course, a user can manipulate the tree with the intention of having a complete document, and not add/maintain an In all cases, there is the escape hatch of using We can solve any further issues when they come up. |
Also non-html5 doctypes, although those are probably even less common than missing html tags.
Agreed. Cheers for the quick fix. |
Guys, I have no choice but to go passive aggressive now. You could consider that after this change, their code will produce nonsensical pages with a duplicate doctype. Hell, you could at least mention the maintainers of those projects in the issue and ask their opinions. Instead you chose to silently break compatibility and leave them without even an option to disable this behaviour or specify their own doctype. @aantron I'm very grateful to you for creating and maintaining lambdasoup. Soupault would never be possible without your work. But for goodness sake, why couldn't this be an optional argument at least? |
@dmbaturin, @copy, what about an approach where Lambda Soup saves the doctype, if present, in the top-level soup node, and emits it on serialization? This makes at least some kind of sense to me, as the soup node represents the whole document. @dmbaturin, would you still want to suppress it? |
@aantron That sounds good to me. |
@aantron There may be valid reasons to supress the original doctype and supply your own. For example, if you are adding HTML5 elements to user-supplied pages, it makes sense to force the doctype to HTML5 because user's original doctype could be XHTML 1.0 for example, and we just purposely broke XHTML compatibility. Something like a |
@dmbaturin, I have two other suggestions:
I'm trying to decide which approach is the least "magical" and least awkward. Ideally, Lambda Soup's behavior would remain simple, and the APIs would also remain simple for people that don't want to bother thinking about the doctype at all. |
The commit linked above stores the doctype in the soup node, if the doctype was present. If the doctype was present and one would like to forcibly drop it, it is possible to do so in this version by selecting the top-level elements from the document and serializing those instead of the document, for example with ("doctype" >:: fun _ ->
assert_equal
("<html></html>" |> parse |> to_string)
"<html><head></head><body></body></html>";
assert_equal
("<!DOCTYPE html><html></html>" |> parse |> to_string)
"<!DOCTYPE html><html><head></head><body></body></html>";
assert_equal
("<!DOCTYPE html><html></html>" |> parse $ "html" |> to_string)
"<html><head></head><body></body></html>"); This isn't obvious, but I decided to defer documenting it until someone asks about it. I also decided to defer adding manipulators for the doctype until they are needed by someone. I believe the above three cases cover all your needs, @copy and @dmbaturin, as I understood them, and are fairly intuitive. Please let me know if that is not the case. |
@aantron I think it's a sensible approach, thanks! When do you plan to make an opam release? |
I'll release this "very soon" (today or tomorrow). In fact, your feedback was the last thing remaining to get before doing it :) |
@aantron The new version will be 0.8.0? |
Probably 0.7.2. Sorry about the delay, I had to switch computers and haven't had time to set up. Planning for next week. |
0.7.2 is now available in opam. |
I'm doing some rewriting similar to postprocess.ml and found that either
Soup.parse
orSoup.to_string
remove the doctype.The online docs created by postprocess.ml don't have a doctype either.
The text was updated successfully, but these errors were encountered: