You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I believe the correct handling when there are no nodes in the document that match the xPath specified in pageContentFilterXPath, then the document should not be processed. To implement this I would suggest inserting:
if ( pageContentFilterNodeList.size() == 0) {
return false;
}
after the line:
List pageContentFilterNodeList = xPathPageContentFilter.selectNodes(cleanedXmlHtml);
The text was updated successfully, but these errors were encountered:
ZakarFin
pushed a commit
to ZakarFin/nutch-plugins
that referenced
this issue
Jul 4, 2014
I believe the correct handling when there are no nodes in the document that match the xPath specified in pageContentFilterXPath, then the document should not be processed. To implement this I would suggest inserting:
if ( pageContentFilterNodeList.size() == 0) {
return false;
}
after the line:
List pageContentFilterNodeList = xPathPageContentFilter.selectNodes(cleanedXmlHtml);
The text was updated successfully, but these errors were encountered: