Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I use this library to convert a tag-balanced HTML fragment into a node list idiomatically, reliably and 1:1? #53

Open
rulatir opened this issue Mar 20, 2024 · 0 comments

Comments

@rulatir
Copy link

rulatir commented Mar 20, 2024

What is the idiomatic way to use this library to convert a tag-balanced HTML fragment in a string into a node list, in a reliable 1:1 manner that doesn't require checking for multiple corner cases?

$nodeList = what_goes_here("Some text <span>a tag</span> some more text");

// $node list should now contain the exact structure [ TEXT, <span> [ TEXT ] </span>, TEXT ]
// as starkly opposed to [ <p> [ TEXT, <span> [ TEXT ] </span>, TEXT ] </p> ]
// which is what I obtain from ->create("Some text <span>a tag</span> some more text")

EDIT: the issue seems to be that there is no way to specify LIBXML_HTML_NOIMPLIED as a global policy. Even if you set the option after creating the document and before loading contents, various manipulation functions will create other document objects internally for processing, and they won't propagate the LIBXML_HTML_NOIMPLIED option to them; looks like they couldn't even do that at all, because there is no Document::getLibxmlOptions().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant