Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete XML serialization #238

Closed
ansFourtyTwo opened this issue Aug 28, 2020 · 4 comments
Closed

Incomplete XML serialization #238

ansFourtyTwo opened this issue Aug 28, 2020 · 4 comments
Labels
bug Something isn't working

Comments

@ansFourtyTwo
Copy link

Hi @tefra ,

I ran into another issue when serializing the data model back to XML again. It might be related to former issue #213.

Once again my data models are generated using the command below:

xsdata https://www.omg.org/spec/ReqIF/20110401/reqif.xsd --package reqif.models --ns-struct

I am still using the example file attached below:
example.reqif.txt

To parse the file and serialize it back to XML I use the follwing code:

from pathlib import Path

from xsdata.formats.dataclass.parsers.xml import XmlParser
from xsdata.formats.dataclass.serializers.xml import XmlSerializer
from models.reqif.omg_org_spec_req_if_20110401_reqif import ReqIf

parser = XmlParser()
reqif = parser.from_path(Path('example.reqif.txt'), ReqIf)

serializer = XmlSerializer(pretty_print=True, encoding='ascii')

with open('example.reqif.txt.out', 'w') as f:
    f.write(serializer.render(reqif, parser.namespaces))

I mostly use your tool to parse and write ReqIF files, where it is all about Specification Objects and their field attributes (The values of the attributes, to be more precise). Some of them contain some XHTML for formatted text. For the example file above I found out, that at least one attribute is not searialized correctly and most probably not parsed correctly.

To understand the problem, have a look at following XML Element in the example:

<SPEC-OBJECT IDENTIFIER="_47051ba0-c21f-4218-b933-d1dd23787ce7" LAST-CHANGE="2020-07-07T15:13:24.960890">
          <VALUES>
             [...]
            <ATTRIBUTE-VALUE-XHTML>
               <!-- This is the part of concern -->
              <THE-VALUE>
                <xhtml:p>Ego-Fahrzeug:  <xhtml:br/>
Das eigene Fahrzeug auf das sich die Funktionsberschreibung bezieht  </xhtml:p>
              </THE-VALUE>
              [...]
            </ATTRIBUTE-VALUE-XHTML>
            [...]
          </VALUES>
</SPEC-OBJECT>

When serialized back the resulting content within looks like this:

[...]
              <THE-VALUE>
                <xhtml:p xml:space="preserve">Ego-Fahrzeug:<xhtml:br xml:space="preserve"/></xhtml:p>
              </THE-VALUE>
[...]

As you can see, the serialized file does not contain the second part of the text within the <xhtml:p> tags, i.e. missing the part "Das eigene Fahrzeug auf das sich die Funktionsberschreibung bezieht " after the first <xhtml:br/>.

@tefra
Copy link
Owner

tefra commented Aug 28, 2020

Thanks for reporting @ansFourtyTwo , yeah it's on the same topic mixed content, this time the parser favors the dedicated b attribute and skips the handling of it's tail content.

@tefra tefra added the bug Something isn't working label Aug 28, 2020
tefra added a commit that referenced this issue Aug 28, 2020
Notes:
That's my worst case solution... I want to resolve this in
the xml nodes but that seems tricky
@tefra tefra closed this as completed in 623ee6d Aug 28, 2020
@tefra
Copy link
Owner

tefra commented Aug 28, 2020

Hey @ansFourtyTwo

The fix is on master and it will be included in the next release 👍

Here is the updated sample rendered.
tefra/xsdata-samples@e975e21

I am relying heavily on the w3c-test-suite for evaluation and out of the 25k tests none were able to reproduce your case, thanks again! There are a couple more things I still want to address regarding mixed content like whitespace preserve but your feedback has helped to improve wildcard/mixed content support a lot!

@ansFourtyTwo
Copy link
Author

Hi @tefra ,

your solution works as expected now. Thanks for the fix. Regarding whitespace preservation, I've checked some of the content and indeed it seems that parsing and serializing content back and forth strips out some of the whitespace.

Currently, for me this is a cosmetic defect, but it in the end i will rely on propper formatting. Any idea, when you will tackle this issue?

Best of luck

@tefra
Copy link
Owner

tefra commented Aug 31, 2020

I've opened the #243, it sounds easy...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants