Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML Extraction failing wih SaxParseException #745

Open
rgalv opened this issue May 3, 2022 · 3 comments
Open

XML Extraction failing wih SaxParseException #745

rgalv opened this issue May 3, 2022 · 3 comments
Assignees
Labels
bug A product defect that needs fixing P1 High priority issues to be scheduled in the upcoming release

Comments

@rgalv
Copy link

rgalv commented May 3, 2022

Hi,

I have an XML files that is failing with below error.

Error/s returned during metadata extraction (SaxParseException: java.lang.ClassCastException: class sun.net.www.protocol.file.FileURLConnection cannot be cast to class java.net.HttpURLConnection (sun.net.www.protocol.file.FileURLConnection and java.net.HttpURLConnection are in module java.base of loader 'bootstrap'),Failed to retrieve extractor properties) Agent: JHOVE 1.24.2, XML-hul 1.5.1 , Plugin Version 6.0

Can you please advise the cause of this error and how can we fix this?
Thanks.

@david-russo
Copy link
Member

I believe this may be fixed in the next release of JHOVE (v1.26), of which there is currently a release candidate available if you'd like to test and see before the final release.

@carlwilson
Copy link
Member

carlwilson commented Jun 15, 2022

We believe that this is fixed in the recent v1.26 release. Would it be possible to test this and let us know if it's fixed please @rgalv. Even better would it be possible to post the test file on here and we can test/add it to our regression tests suite.

@carlwilson carlwilson added the bug A product defect that needs fixing label Jun 15, 2022
@carlwilson carlwilson self-assigned this Jun 15, 2022
@carlwilson carlwilson added the P1 High priority issues to be scheduled in the upcoming release label Jun 15, 2022
@carlwilson carlwilson added this to the JHOVE 1.28 milestone Oct 18, 2022
@leninoc
Copy link

leninoc commented Feb 23, 2023

hi, we are seeing strange behaviour which might possibly be related to this. We have an alto xml file which has a link to local xsd file in it. When I run standalone jhove 1.24 (xml-hul 1.5.1) I get XML-HUL-3 SaxException cause: java.lang.ClassCastException. WHen i run same file in 1.26 jhove (xml-hul 1.5.2) then I get XML-HUL-1 SAXParseException error, plus 67748 (!!!) InfoMessages with this SubMessage:
SubMessage: schema_reference.4: Failed to read schema document '//docstorage2/impdata1/docWORKS_KBNL/schema/alto-1-2.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not xsd:schema.

Submessage is correct, its ok that the standalone jhove cannot read the local xsd. In Rosetta we have a mechanism which makes this possible for jhove plugins (via jhove.conf etc.).

Conclusion of some sort - it looks like the original #745 issue was fixed, but now JHOVE 1.26 is complaining about different thing, with huge number of repeating infomessages. The number of infomessages is using a lot of resources in our system.
In Rosetta it seems that jhove 1.26 plugin cannot get to read the local xsd, unline previous Jhove 1.24 or 1.17 based plugins.
Just thought i would share this, the xml file in question is attached as a zip 0003.zip
best
Jan

@carlwilson carlwilson modified the milestones: JHOVE 1.28, OPF Hackathon 2023 Tasks Jun 21, 2023
@carlwilson carlwilson removed this from the OPF Hackathon 2023 Tasks milestone Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A product defect that needs fixing P1 High priority issues to be scheduled in the upcoming release
Projects
Development

No branches or pull requests

4 participants