Skip to content
This repository has been archived by the owner on Apr 11, 2024. It is now read-only.

Commit

Permalink
TIKA-2910 -- remove hard coded HTMLParser handling of xml files in
Browse files Browse the repository at this point in the history
tika-server
  • Loading branch information
tballison committed Aug 16, 2019
1 parent 68c5323 commit a0e87fb
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 2 deletions.
6 changes: 6 additions & 0 deletions CHANGES.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
Release 1.23 - ??/??/???

* NOTE: tika-server no longer hard-codes the HtmlParser to handle
XML files (TIKA-2910). Users must now configure that behavior
via a tika-config.xml file.

Release 1.22 - 07/29/2019

* NOTE: Known regression: PDFBOX-4587 -- PDF passwords with codepoints
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@
import org.apache.tika.parser.ParserDecorator;
import org.apache.tika.parser.PasswordProvider;
import org.apache.tika.parser.html.BoilerpipeContentHandler;
import org.apache.tika.parser.html.HtmlParser;
import org.apache.tika.parser.ocr.TesseractOCRConfig;
import org.apache.tika.parser.pdf.PDFParserConfig;
import org.apache.tika.sax.BodyContentHandler;
Expand Down Expand Up @@ -119,7 +118,7 @@ public static Parser createParser() {
final Parser parser = new AutoDetectParser(tikaConfig);

Map<MediaType, Parser> parsers = ((AutoDetectParser)parser).getParsers();
parsers.put(MediaType.APPLICATION_XML, new HtmlParser());

((AutoDetectParser)parser).setParsers(parsers);

((AutoDetectParser)parser).setFallback(new Parser() {
Expand Down

0 comments on commit a0e87fb

Please sign in to comment.