Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot read XML message in UTF-16 encoding #112

Open
leonardehrenfried opened this issue Apr 26, 2023 · 3 comments
Open

Cannot read XML message in UTF-16 encoding #112

leonardehrenfried opened this issue Apr 26, 2023 · 3 comments

Comments

@leonardehrenfried
Copy link
Contributor

When an XML message in UTF-16 arrives from a subscription, anshar throws the following error:

 08:52:44.041 [WARN] (SiriXmlValidator.java:229) Caught exception when parsing
 com.ctc.wstx.exc.WstxParsingException: Declared encoding 'UTF-16' uses 2 bytes per character; but physical encoding appeared to use 1; cannot decode
  at [row,col,system-id]: [1,39,"N/A"]
         at com.ctc.wstx.io.InputBootstrapper.reportXmlProblem(InputBootstrapper.java:489)
         at com.ctc.wstx.io.StreamBootstrapper.verifyEncoding(StreamBootstrapper.java:991)
         at com.ctc.wstx.io.StreamBootstrapper.verifyXmlEncoding(StreamBootstrapper.java:459)
         at com.ctc.wstx.io.StreamBootstrapper.bootstrapInput(StreamBootstrapper.java:175)
         at com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:577)
         at com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:637)
         at com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:651)
         at com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:338)
         at no.rutebanken.anshar.routes.validation.SiriXmlValidator.parseXml(SiriXmlValidator.java:193)
         at no.rutebanken.anshar.routes.validation.SiriXmlValidator.parseXml(SiriXmlValidator.java:183)
         at no.rutebanken.anshar.routes.messaging.MessagingRoute.lambda$configure$3(MessagingRoute.java:188)
         at org.apache.camel.support.processor.DelegateSyncProcessor.process(DelegateSyncProcessor.java:65)
         at org.apache.camel.processor.errorhandler.RedeliveryErrorHandler$SimpleTask.run(RedeliveryErrorHandler.java:477)
         at org.apache.camel.impl.engine.DefaultReactiveExecutor$Worker.schedule(DefaultReactiveExecutor.java:181)
         at org.apache.camel.impl.engine.DefaultReactiveExecutor.scheduleMain(DefaultReactiveExecutor.java:59)
         at org.apache.camel.processor.Pipeline.process(Pipeline.java:165)
         at org.apache.camel.impl.engine.CamelInternalProcessor.process(CamelInternalProcessor.java:392)
         at org.apache.camel.component.jetty.CamelContinuationServlet.doService(CamelContinuationServlet.java:245)
         at org.apache.camel.http.common.CamelServlet.service(CamelServlet.java:130)
         at javax.servlet.http.HttpServlet.service(HttpServlet.java:596)
         at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
         at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:554)
         at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
         at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
         at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
         at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
         at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
         at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
         at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
         at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
         at org.eclipse.jetty.server.Server.handle(Server.java:516)
         at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
         at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
         at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
         at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
         at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
         at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
         at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
         at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
         at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
         at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
         at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
         at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
         at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
         at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
         at java.base/java.lang.Thread.run(Unknown Source)
 08:52:44.041 [ERROR] (CamelLogger.java:205) Failed delivery for (MessageId: 5F2578A098A0CEA-0000000000122B53 on ExchangeId: 5F2578A098A0CEA-0000000000122B53). Exhausted after delivery attempt: 1 caught: co>
  at [row,col,system-id]: [1,39,"N/A"]

This is because the String from the HTTP body is converted to BufferedInputStream without checking the encoding.

@leonardehrenfried
Copy link
Contributor Author

leonardehrenfried commented Apr 26, 2023

I think this would genuinely not work with UTF-16 but my input data claims to be UTF-16 but I think in reality it's UTF-8.

@lassetyr
Copy link
Contributor

We have also had several cases with SIRI producers that send XML that claims the data is UTF-16, when it actually is UTF-8.
We have then reported this to the producers, and they have fixed the data/encoding.

In our cases, we have received xml starting with e.g.
<?xml version="1.0" encoding="utf-16"?>

...but is actually encoded in utf-8...

We have not made an effort to handle these types of encoding issues, but have reported it to the producers - who has then fixed it on their side.

@leonardehrenfried
Copy link
Contributor Author

I managed to force the XML parser to use UTF-8 when parsing the stream.

I'm unsure it's a good idea to do that but if you want a patch, I have one at hand and can send a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants