Handling of pageContentFilterXPath #3

ilopata1 · 2013-05-02T21:21:28Z

I believe the correct handling when there are no nodes in the document that match the xPath specified in pageContentFilterXPath, then the document should not be processed. To implement this I would suggest inserting:

if ( pageContentFilterNodeList.size() == 0) {
return false;
}

after the line:

List pageContentFilterNodeList = xPathPageContentFilter.selectNodes(cleanedXmlHtml);

Indexing a custom value

ZakarFin pushed a commit to ZakarFin/nutch-plugins that referenced this issue Jul 4, 2014

Merge pull request ATLANTBH#3 from Drewch/add-custom-value

3b76dfb

Indexing a custom value

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling of pageContentFilterXPath #3

Handling of pageContentFilterXPath #3

ilopata1 commented May 2, 2013

Handling of pageContentFilterXPath #3

Handling of pageContentFilterXPath #3

Comments

ilopata1 commented May 2, 2013