-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let extended translators override what XPathExpr class is used #22
Conversation
GenericTranslator offers an excelent way to support custom selectors trough method hooks and allowing to return a *new* XPathExpr from this hooks. The main problem is that returning extended `XPathExpr` instances fail for combiners because `XPathExpr.join()` assume a fixed XPathExpr instance attributes (element, path and condition) to copy from `other` to `self` `XPathExpr.join()` can be extended in subclass but needs that `left` xpath instance to be of the extended class too, and right now we can only control `right` xpath type. The problem can be mitigated by recasting all xpath returned from `GenericTranslator.xpath_element()` that only works because it is the only hook that cast `XPathExpr` instances. The proposed change allow projects extending GenericTranslator to also safely extend `XPathExpr` to correctly support combiners in extended features.
To give some real context to this pull request you must know I am working on adding CSS selectors to Scrapy project based on cssselect right now CSS can't select xpath text nodes or attributes and we want to keep this functionality within the selector query I found GenericTranslator can be extended trough hook (great!) but found some quirks trying to implement it, and example translations at scrapy/tests/test_selector_csselect.py thanks! |
This change looks easy enough, but I’m reluctant to encourage users to hook into the translators and even more so However I am not confident at all in I might still accept this change, but use this kind of hook at your own risks. In particular you might want to have |
I just found pyquery uses cssselect, it extends Do you think the change is good for pyquery too @gawel? |
Well, all of the warnings above also apply to pyquery. They should at least consider requiring Overall I’m not sure how to manage this whole situation. |
hi @SimonSapin, I am more worried about I took a look to |
No problem. I'll just have to check if the class attribute exist and use the evil monkey patch iif it's needed |
@gawel: the change is pretty simple and much better than monkeypatching, Simon is OK to merge it if we take the risk of changes to cssselect breaking our projects code. This is something we already do by extending undocumented/unstable GenericTranslator and XPathExpr classes. @SimonSapin: sorry for insisting, my only concern on using cssselect is if you plan to completely remove its extensibility in future release, just breaking current apis doesn't matter to me. thanks! |
@dangra I’m not planning on removing the extensibility, just keeping it undocumented and reserve the right to break it at every minor version until I’m confident that the API is good enough to be frozen. (By beaking I mean changing it in backward-incompatible ways, not removing features or extensibility.) As to lselect, it’s a two-weekends hack that is still highly experimental and probably broken. I’m not sure yet if it’ll be extensible or not, but in any case it’s completely separated from cssselect. cssselect won’t go away. |
cool 👍 |
Let extended translators override what XPathExpr class is used
See eac05a4 |
Version 0.9.1 ------------- Released on 2013-10-17. * **Backward incompatible change from 0.9**: :meth:`~GenericTranslator.selector_to_xpath` defaults to ignoring pseudo-elements, as it did in 0.8 and previous versions. (:meth:`~GenericTranslator.css_to_xpath` doesn’t change.) * Drop official support for Python 2.4 and 3.1, as testing was becoming difficult. Nothing will break overnight, but future releases may on may not work on these versions. Older releases will remain available on PyPI. Version 0.9 ----------- Released on 2013-10-11. Add parser support for :attr:`functional pseudo-elements <Selector.pseudo_element>`. *Update:* This version accidentally introduced a **backward incompatible** change: :meth:`~GenericTranslator.selector_to_xpath` defaults to rejecting pseudo-elements instead of ignoring them. Version 0.8 ----------- Released on 2013-03-15. Improvements: * `#22 <https://github.com/SimonSapin/cssselect/issues/22>`_ Let extended translators override what XPathExpr class is used * `#19 <https://github.com/SimonSapin/cssselect/issues/19>`_ Use the built-in ``lang()`` XPath function for implementing the ``:lang()`` pseudo-class with XML documents. This is probably faster than ``ancestor-or-self::``. Bug fixes: * `#14 <https://github.com/SimonSapin/cssselect/issues/14>`_ Fix non-ASCII pseudo-classes. (Invalid selector instead of crash.) * `#20 <https://github.com/SimonSapin/cssselect/issues/20>`_ As per the spec, elements containing only whitespace are not considered empty for the ``:empty`` pseudo-class. Version 0.7.1 ------------- Released on 2012-06-14. Code name *remember-to-test-with-tox*. 0.7 broke the parser in Python 2.4 and 2.5; the tests in 2.x. Now all is well again. Also, pseudo-elements are now correctly made lower-case. (They are supposed to be case-insensitive.)
Version 0.9.1 ------------- Released on 2013-10-17. * **Backward incompatible change from 0.9**: :meth:`~GenericTranslator.selector_to_xpath` defaults to ignoring pseudo-elements, as it did in 0.8 and previous versions. (:meth:`~GenericTranslator.css_to_xpath` doesn’t change.) * Drop official support for Python 2.4 and 3.1, as testing was becoming difficult. Nothing will break overnight, but future releases may on may not work on these versions. Older releases will remain available on PyPI. Version 0.9 ----------- Released on 2013-10-11. Add parser support for :attr:`functional pseudo-elements <Selector.pseudo_element>`. *Update:* This version accidentally introduced a **backward incompatible** change: :meth:`~GenericTranslator.selector_to_xpath` defaults to rejecting pseudo-elements instead of ignoring them. Version 0.8 ----------- Released on 2013-03-15. Improvements: * `#22 <https://github.com/SimonSapin/cssselect/issues/22>`_ Let extended translators override what XPathExpr class is used * `#19 <https://github.com/SimonSapin/cssselect/issues/19>`_ Use the built-in ``lang()`` XPath function for implementing the ``:lang()`` pseudo-class with XML documents. This is probably faster than ``ancestor-or-self::``. Bug fixes: * `#14 <https://github.com/SimonSapin/cssselect/issues/14>`_ Fix non-ASCII pseudo-classes. (Invalid selector instead of crash.) * `#20 <https://github.com/SimonSapin/cssselect/issues/20>`_ As per the spec, elements containing only whitespace are not considered empty for the ``:empty`` pseudo-class. Version 0.7.1 ------------- Released on 2012-06-14. Code name *remember-to-test-with-tox*. 0.7 broke the parser in Python 2.4 and 2.5; the tests in 2.x. Now all is well again. Also, pseudo-elements are now correctly made lower-case. (They are supposed to be case-insensitive.)
Version 0.9.1 ------------- Released on 2013-10-17. * **Backward incompatible change from 0.9**: :meth:`~GenericTranslator.selector_to_xpath` defaults to ignoring pseudo-elements, as it did in 0.8 and previous versions. (:meth:`~GenericTranslator.css_to_xpath` doesn’t change.) * Drop official support for Python 2.4 and 3.1, as testing was becoming difficult. Nothing will break overnight, but future releases may on may not work on these versions. Older releases will remain available on PyPI. Version 0.9 ----------- Released on 2013-10-11. Add parser support for :attr:`functional pseudo-elements <Selector.pseudo_element>`. *Update:* This version accidentally introduced a **backward incompatible** change: :meth:`~GenericTranslator.selector_to_xpath` defaults to rejecting pseudo-elements instead of ignoring them. Version 0.8 ----------- Released on 2013-03-15. Improvements: * `#22 <https://github.com/SimonSapin/cssselect/issues/22>`_ Let extended translators override what XPathExpr class is used * `#19 <https://github.com/SimonSapin/cssselect/issues/19>`_ Use the built-in ``lang()`` XPath function for implementing the ``:lang()`` pseudo-class with XML documents. This is probably faster than ``ancestor-or-self::``. Bug fixes: * `#14 <https://github.com/SimonSapin/cssselect/issues/14>`_ Fix non-ASCII pseudo-classes. (Invalid selector instead of crash.) * `#20 <https://github.com/SimonSapin/cssselect/issues/20>`_ As per the spec, elements containing only whitespace are not considered empty for the ``:empty`` pseudo-class. Version 0.7.1 ------------- Released on 2012-06-14. Code name *remember-to-test-with-tox*. 0.7 broke the parser in Python 2.4 and 2.5; the tests in 2.x. Now all is well again. Also, pseudo-elements are now correctly made lower-case. (They are supposed to be case-insensitive.)
Version 0.9.1 ------------- Released on 2013-10-17. * **Backward incompatible change from 0.9**: :meth:`~GenericTranslator.selector_to_xpath` defaults to ignoring pseudo-elements, as it did in 0.8 and previous versions. (:meth:`~GenericTranslator.css_to_xpath` doesn’t change.) * Drop official support for Python 2.4 and 3.1, as testing was becoming difficult. Nothing will break overnight, but future releases may on may not work on these versions. Older releases will remain available on PyPI. Version 0.9 ----------- Released on 2013-10-11. Add parser support for :attr:`functional pseudo-elements <Selector.pseudo_element>`. *Update:* This version accidentally introduced a **backward incompatible** change: :meth:`~GenericTranslator.selector_to_xpath` defaults to rejecting pseudo-elements instead of ignoring them. Version 0.8 ----------- Released on 2013-03-15. Improvements: * `#22 <https://github.com/SimonSapin/cssselect/issues/22>`_ Let extended translators override what XPathExpr class is used * `#19 <https://github.com/SimonSapin/cssselect/issues/19>`_ Use the built-in ``lang()`` XPath function for implementing the ``:lang()`` pseudo-class with XML documents. This is probably faster than ``ancestor-or-self::``. Bug fixes: * `#14 <https://github.com/SimonSapin/cssselect/issues/14>`_ Fix non-ASCII pseudo-classes. (Invalid selector instead of crash.) * `#20 <https://github.com/SimonSapin/cssselect/issues/20>`_ As per the spec, elements containing only whitespace are not considered empty for the ``:empty`` pseudo-class. Version 0.7.1 ------------- Released on 2012-06-14. Code name *remember-to-test-with-tox*. 0.7 broke the parser in Python 2.4 and 2.5; the tests in 2.x. Now all is well again. Also, pseudo-elements are now correctly made lower-case. (They are supposed to be case-insensitive.)
GenericTranslator offers an excelent way to support custom selectors
trough method hooks and allowing to return a new XPathExpr from this
hooks.
The main problem is that returning extended
XPathExpr
instances failfor combiners because
XPathExpr.join()
assume a fixed XPathExprinstance attributes (element, path and condition) to copy from
other
toself
XPathExpr.join()
can be extended in subclass but needs thatleft
xpath instance to be of the extended class too, and right now we can
only control
right
xpath type.The problem can be mitigated by recasting all xpath returned from
GenericTranslator.xpath_element()
that only works because it is theonly hook that cast
XPathExpr
instances.The proposed change allow projects extending GenericTranslator to also
safely extend
XPathExpr
to correctly support combiners in extendedfeatures.