-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Selector.root is not an instance of lxml.html.HtmlElement even if parser is html #40
Comments
@kmike , there was some discussion in the past about moving to |
Another unscientific benchmark results (Python 3.5.1, OS X, latest lxml): https://gist.github.com/kmike/af647777cef39c3d01071905d176c006. It seems there is a 1-5% penalty in using lxml.html.HTMLParser, based on ~3700 random pages. My vote is to set HTMLParser as a default. |
Another way to look at it: additional time required for using HTMLParser is 0.0001s per web page on average (3700 pages, total parsing time is increased by 0.4s). |
yeah, let's just do it! 👍 |
I'm trying to use lxml.Cleaner without parsing response multiple times:
This doesn't work because Cleaner needs a
lxml.html.HtmlElement
instance, while Selector.root is alwayslxml.etree._Element
, so it doesn't have a required.rewrite_links
method.Why is lxml.etree.HtmlParser used for html and not lxml.html.HtmlParser?
The text was updated successfully, but these errors were encountered: