Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - Add support for MathML 3.0 #10

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

WIP - Add support for MathML 3.0 #10

wants to merge 4 commits into from

Conversation

aiwenar
Copy link

@aiwenar aiwenar commented Mar 14, 2018

While MathML 3.0 is backwards compatible with MathML 2.0 (modulo deprecations), it's actually impossible to switch CNXML 0.7 to MathML 3.0 without breaking existing content, because the RelaxNG schema used for MathML 2.0 is not correct. It defines each (presentation) element as

<element name="pname">
    <ref name="attlist-pelement"/>
    <ref name="PresExpression"/>
</element>

<define name="PresExpression">
    <zeroOrMore>
        <choice>
            <ref name="Presentation"/>
            <ref name="ContInPres"/>
        </choice>
    </zeroOrMore>
</define>

while MathML specification (both version 2.0 and 3.0) specifies exact number of children allowed per element. <msub> for example is defined to have exactly two children: <msub> base subscript </msub>. This means that CNXML 0.7 allows arbitrary invalid MathML. In fact, even the test case valid.cnxml includes invalid MathML:

<m:msub>
    <m:mi>v</m:mi>
    <m:mn>0</m:mn>
    <m:mn>2</m:mn>
</m:msub>

and

<m:msub>
    <m:mn>0</m:mn>
</m:msub>

are both present in that file.

The official RelaxNG schema for MathML 3.0 (the one included in PR) on the other hand is correct and rejects those fragments. This means that just upgrading MathML version will probably break many existing modules.

To circumvent that I created “CNXML 0.8”, which differs from 0.7 just by switching to MathML 3.0. Verification has been changed to use cnxml/schema/rng/any, which then dispatches to either cnxml/schema/rng/0.7 or cnxml/schema/rng/0.8, depending on the value of cnxml-version.

The additional -i switch to jing is there to disable checking for ID uniqueness. Unfortunately MathML RelaxNG includes wildcard selector, which jing rejects as invalid RelaxNG when used without -i. You can read more about this issue in this blog post by James Clark.

@aiwenar aiwenar requested a review from reedstrm March 14, 2018 12:48
@codecov
Copy link

codecov bot commented Mar 14, 2018

Codecov Report

Merging #10 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master    #10   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files           5      5           
  Lines          74     74           
=====================================
  Hits           74     74
Impacted Files Coverage Δ
cnxml/jing.py 100% <100%> (ø) ⬆️
cnxml/validation.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 151f3d7...0f2f2a3. Read the comment docs.

Copy link
Contributor

@mmulich mmulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good work. And nice pull request description.

The use of any is a good call. Thanks making this changeset backwards compatible.

(I'll abstain from reviewing the RNG since I'm not deeply familiar with MathML)

@reedstrm
Copy link
Contributor

@aiwenar Any possibility of making this work with ID-uniqueness by "tweaking" the mathml RNG? We deploy our own wrapping of the schema, in any case.

@aiwenar
Copy link
Author

aiwenar commented Apr 19, 2018

I don't know, I'm not actually very fluent in RNG. If we can't then we could always try following Clark's advice and validating IDs independently with Schematron or something other, something like this perhaps: https://github.com/Connexions/cnxml/blob/schematron-ids/cnxml/xml/cnxml/schema/schematron/0.8/cnxml.sch

@reedstrm
Copy link
Contributor

reedstrm commented Jun 5, 2018

This needs reworking to use two schemas, like I did for legacy - this allows the unique-id-test to be left in the validation. The problem results from the schemas including some magic to allow arbitrary child nodes at one point in the schema (inder the tag) and this collides with the all-in-one schema.

@edwoodward edwoodward changed the title Add support for MathML 3.0 WIP - Add support for MathML 3.0 Oct 15, 2018
@openstaxalina openstaxalina added this to the CNX Backlog milestone Oct 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants