Incomplete XML serialization #238

ansFourtyTwo · 2020-08-28T08:36:38Z

I ran into another issue when serializing the data model back to XML again. It might be related to former issue #213.

Once again my data models are generated using the command below:

xsdata https://www.omg.org/spec/ReqIF/20110401/reqif.xsd --package reqif.models --ns-struct

I am still using the example file attached below:
example.reqif.txt

To parse the file and serialize it back to XML I use the follwing code:

from pathlib import Path

from xsdata.formats.dataclass.parsers.xml import XmlParser
from xsdata.formats.dataclass.serializers.xml import XmlSerializer
from models.reqif.omg_org_spec_req_if_20110401_reqif import ReqIf

parser = XmlParser()
reqif = parser.from_path(Path('example.reqif.txt'), ReqIf)

serializer = XmlSerializer(pretty_print=True, encoding='ascii')

with open('example.reqif.txt.out', 'w') as f:
    f.write(serializer.render(reqif, parser.namespaces))

I mostly use your tool to parse and write ReqIF files, where it is all about Specification Objects and their field attributes (The values of the attributes, to be more precise). Some of them contain some XHTML for formatted text. For the example file above I found out, that at least one attribute is not searialized correctly and most probably not parsed correctly.

To understand the problem, have a look at following XML Element in the example:

<SPEC-OBJECT IDENTIFIER="_47051ba0-c21f-4218-b933-d1dd23787ce7" LAST-CHANGE="2020-07-07T15:13:24.960890">
          <VALUES>
             [...]
            <ATTRIBUTE-VALUE-XHTML>
               <!-- This is the part of concern -->
              <THE-VALUE>
                <xhtml:p>Ego-Fahrzeug:  <xhtml:br/>
Das eigene Fahrzeug auf das sich die Funktionsberschreibung bezieht  </xhtml:p>
              </THE-VALUE>
              [...]
            </ATTRIBUTE-VALUE-XHTML>
            [...]
          </VALUES>
</SPEC-OBJECT>

When serialized back the resulting content within looks like this:

[...]
              <THE-VALUE>
                <xhtml:p xml:space="preserve">Ego-Fahrzeug:<xhtml:br xml:space="preserve"/></xhtml:p>
              </THE-VALUE>
[...]

As you can see, the serialized file does not contain the second part of the text within the <xhtml:p> tags, i.e. missing the part "Das eigene Fahrzeug auf das sich die Funktionsberschreibung bezieht " after the first <xhtml:br/>.

The text was updated successfully, but these errors were encountered:

tefra · 2020-08-28T09:32:31Z

Thanks for reporting @ansFourtyTwo , yeah it's on the same topic mixed content, this time the parser favors the dedicated b attribute and skips the handling of it's tail content.

Notes: That's my worst case solution... I want to resolve this in the xml nodes but that seems tricky

tefra · 2020-08-28T23:00:56Z

Hey @ansFourtyTwo

The fix is on master and it will be included in the next release 👍

Here is the updated sample rendered.
tefra/xsdata-samples@e975e21

I am relying heavily on the w3c-test-suite for evaluation and out of the 25k tests none were able to reproduce your case, thanks again! There are a couple more things I still want to address regarding mixed content like whitespace preserve but your feedback has helped to improve wildcard/mixed content support a lot!

ansFourtyTwo · 2020-08-31T11:04:51Z

Hi @tefra ,

your solution works as expected now. Thanks for the fix. Regarding whitespace preservation, I've checked some of the content and indeed it seems that parsing and serializing content back and forth strips out some of the whitespace.

Currently, for me this is a cosmetic defect, but it in the end i will rely on propper formatting. Any idea, when you will tackle this issue?

Best of luck

tefra · 2020-08-31T13:44:51Z

I've opened the #243, it sounds easy...

tefra added the bug Something isn't working label Aug 28, 2020

tefra added a commit that referenced this issue Aug 28, 2020

Fix #238 Remove all other attrs when mix content attr is present

17ae9bc

Notes: That's my worst case solution... I want to resolve this in the xml nodes but that seems tricky

tefra added a commit that referenced this issue Aug 28, 2020

Fix #238 Switch to Wildcard XmlNode when mixed content var is available

3863eee

tefra added a commit that referenced this issue Aug 28, 2020

Fix #238 XmlSerializer: render mixed content with non generics.

3235c72

tefra added a commit that referenced this issue Aug 28, 2020

Fix #238 XmlSerializer: render mixed content with non generics.

0cc7ae5

tefra closed this as completed in 623ee6d Aug 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incomplete XML serialization #238

Incomplete XML serialization #238

ansFourtyTwo commented Aug 28, 2020

tefra commented Aug 28, 2020

tefra commented Aug 28, 2020 •

edited

Loading

ansFourtyTwo commented Aug 31, 2020

tefra commented Aug 31, 2020

Incomplete XML serialization #238

Incomplete XML serialization #238

Comments

ansFourtyTwo commented Aug 28, 2020

tefra commented Aug 28, 2020

tefra commented Aug 28, 2020 • edited Loading

ansFourtyTwo commented Aug 31, 2020

tefra commented Aug 31, 2020

tefra commented Aug 28, 2020 •

edited

Loading