Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate XML with DTD/XML Schema by using xml-model #688

Merged
merged 1 commit into from
May 15, 2020

Conversation

angelozerr
Copy link
Contributor

Validate XML with DTD/XML Schema by using xml-model

See #633

Signed-off-by: azerr [email protected]

@angelozerr angelozerr self-assigned this May 13, 2020
@angelozerr angelozerr added this to the 0.12.0 milestone May 13, 2020
@angelozerr angelozerr force-pushed the xml-model branch 13 times, most recently from 775ab48 to 8171763 Compare May 14, 2020 15:34
@angelozerr angelozerr marked this pull request as ready for review May 14, 2020 15:34
@angelozerr
Copy link
Contributor Author

angelozerr commented May 14, 2020

This PR supports validation with DTD, XML Schema (with or without namespace) with xml-model processing instruction. You cand find 3 XML files in org.eclipse.lemminx/src/test/resources/xml-model folder with those 3 cases.

For instance, create the XML file:

<?xml version="1.0" encoding="UTF-8" ?>
<?xml-model href="http://www.docbook.org/xml/5.0/xsd/docbook.xsd"?>
<book xmlns="http://docbook.org/ns/docbook"> 
	<title />
  	<BAD_ELEMENT />
</book> 

which means that XML file must be validated with XSD http://www.docbook.org/xml/5.0/xsd/docbook.xsd and we must define xmlns="http://docbook.org/ns/docbook" because the XSD defines a targetNamespace="http://docbook.org/ns/docbook" (it's my supposition because specification is not clear about namespace management).

And you should see (after the XML Schema is downloaded in the cache folder):

image

This PR doesn't manage validator for xml-model (ex : href with bad XSD uri, an xml-model which doesn't define a href attribute etc) I will create an issue for that.

@xorye @fbricon @BalduinLandolt please give me feedback. Thanks!

@BalduinLandolt
Copy link
Contributor

BalduinLandolt commented May 14, 2020

I only looked over it briefly, so a real review of the changes is still necvessary... but: Good job!

I'm guessing it will need a unit test? And some more documentation?
But apart from that it pretty much seems to do what it should.

Concerning the namespace, I still think it works the way it is supposed to. Have you tried if you get an error if you change the namespace for an element, that it otherwise legal? It should then produce an error, ideally saying that the elenemt "bad_namespace:good_element" is not legal.
And related to that: so far, the error message doesn't specify namespaces. It will be very confusing if that stays that way. Imagine you have "bad:element" where "good:element" is expected. The message "invalid element: bad:element, expected: good:element" is a lot more helpful than "invalid element: element, expected: element".

Next thing: I'd also test, if an XML with multiple works correctly.
I can generally think of three scenarios where people have multiple schema:

  • different kinds of schema (xsd and dtd; rng and xsd; etc.)
  • a generic schema for interoperability (like tei_all.xsd) and a more strict, project specific schema.
  • different namespaces (?)

And finally: https://www.w3.org/TR/xml-model/ suggests a series of other pseudo-attributes (of which I have seen "type" a lot, all the others not so much...), would there be any point in including them in some way?
If not, that's fine by me. I'm just wondering.
Like, for example, if "title" is specified, it might be useful to make error messages more informative, right?

@angelozerr
Copy link
Contributor Author

I only looked over it briefly, so a real review of the changes is still necvessary... but: Good job!

Thanks :) But please play with this PR to check it works like you expect.

I'm guessing it will need a unit test?

JUnit tests exists with the 3 usecases. See https://github.com/eclipse/lemminx/pull/688/files#diff-8ffaa3160f7438abd9b1067ca631c6df

And some more documentation?

I comment my code. Which documentation do you want?

Concerning the namespace, I still think it works the way it is supposed to. Have you tried if you get an error if you change the namespace for an element, that it otherwise legal?

I created 2 issues for that to improve range of error + code action (quick fix) , see:

If you could be interested to implement one of this issue, please tell me and I will explain more how to do that by giving you some directives.

Next thing: I'd also test, if an XML with multiple works correctly.
I can generally think of three scenarios where people have multiple schema:

different kinds of schema (xsd and dtd; rng and xsd; etc.)
a generic schema for interoperability (like tei_all.xsd) and a more strict, project specific schema.
different namespaces (?)
And finally: https://www.w3.org/TR/xml-model/ suggests a series of other pseudo-attributes (of which I have seen "type" a lot, all the others not so much...), would there be any point in including them in some way?
If not, that's fine by me. I'm just wondering.
Like, for example, if "title" is specified, it might be useful to make error messages more informative, right?

Please create issues for that.

@BalduinLandolt
Copy link
Contributor

I only looked over it briefly, so a real review of the changes is still necvessary... but: Good job!

Thanks :) But please play with this PR to check it works like you expect.

As soon as I have time!
(I'm 10 days before my MA exams, so I really don' have much free time ATM...)

I'm guessing it will need a unit test?

JUnit tests exists with the 3 usecases. See https://github.com/eclipse/lemminx/pull/688/files#diff-8ffaa3160f7438abd9b1067ca631c6df

And some more documentation?

I comment my code. Which documentation do you want?

Woops... I guess I looked over it all troo briefly. Sorry! In that case, even better. :)

Concerning the namespace, I still think it works the way it is supposed to. Have you tried if you get an error if you change the namespace for an element, that it otherwise legal?

I created 2 issues for that to improve range of error + code action (quick fix) , see:

* #703

* #704

If you could be interested to implement one of this issue, please tell me and I will explain more how to do that by giving you some directives.

Again: as soon as I have more time... :)

Next thing: I'd also test, if an XML with multiple works correctly.
I can generally think of three scenarios where people have multiple schema:

different kinds of schema (xsd and dtd; rng and xsd; etc.)
a generic schema for interoperability (like tei_all.xsd) and a more strict, project specific schema.
different namespaces (?)
And finally: https://www.w3.org/TR/xml-model/ suggests a series of other pseudo-attributes (of which I have seen "type" a lot, all the others not so much...), would there be any point in including them in some way?
If not, that's fine by me. I'm just wondering.
Like, for example, if "title" is specified, it might be useful to make error messages more informative, right?

Please create issues for that.

And again: as soon as I have time. ^^
(that is, of course, if no one is faster.)

@xorye
Copy link

xorye commented May 14, 2020

Might not be an issue with this PR specifically, but any idea why ttp://www.springframework.org/schema/beans"] is in the list of expected elements?

image

@angelozerr
Copy link
Contributor Author

Might not be an issue with this PR specifically, but any idea why ttp://www.springframework.org/schema/beans"] is in the list of expected elements?

Indeed it's an another issue, please create an issue for that. As Xerces messages was not easy to read, @NikolasKomonen reformat the original message to provide a better understand of the message. I think there is a bug in LSPMessageFormatter, please create an issur for that. Thanks!

@angelozerr
Copy link
Contributor Author

I'm 10 days before my MA exams, so I really don' have much free time ATM...

Good luck!

Again: as soon as I have more time... :)

I understand totally, but I would like to avoid working on the same feature than you. For instance even if your PR #685 could be good, I cannot use it, because there are some many changed to do (it was difficult to help you before trying t implement validation with xerces).

We would love to have other contributors, that's why it's very important to keep your motivation. It seems that you are interested to work on xml-model topic, that's why I think we should discuss together what issues you want to do. I could help you if you need because when you don't know the code base it's difficult to contribute, but I think you should start with little contribution. Please create (little) issues or please tell me which issues you want to be assigned.

Please note, if you wish to contribute you need to have an Eclipse account (to sign ECA) and sign your commit.

@BalduinLandolt
Copy link
Contributor

I'm 10 days before my MA exams, so I really don' have much free time ATM...

Good luck!

thanks! :)

I understand totally, but I would like to avoid working on the same feature than you. For instance even if your PR #685 could be good, I cannot use it, because there are some many changed to do (it was difficult to help you before trying t implement validation with xerces).

We would love to have other contributors, that's why it's very important to keep your motivation. It seems that you are interested to work on xml-model topic, that's why I think we should discuss together what issues you want to do. I could help you if you need because when you don't know the code base it's difficult to contribute, but I think you should start with little contribution. Please create (little) issues or please tell me which issues you want to be assigned.

Please note, if you wish to contribute you need to have an Eclipse account (to sign ECA) and sign your commit.

I understand completly and I'm glad my work (or for now: enthusiasm) is appreciated.
I'll definitely let you know before I pick up working on it again.
In fact, if that's ok with you, I then might even contact you via e-mail. I think there are some really basic things that might improve my productivity considerably, but don't need to clutter a PR or issue they don't really have anything to do with.

@angelozerr
Copy link
Contributor Author

In fact, if that's ok with you, I then might even contact you via e-mail. I think there are some really basic things that might improve my productivity considerably, but don't need to clutter a PR or issue they don't really have anything to do with.

I fear that I will not have time to explain related topic like using git, commit, etc. I can help you to give you some information with code base (ex : which utilities class you should use, etc), but for other thing (ex : explain how to sign commit, etc), I fear I will have no time to do that.

@BalduinLandolt
Copy link
Contributor

In fact, if that's ok with you, I then might even contact you via e-mail. I think there are some really basic things that might improve my productivity considerably, but don't need to clutter a PR or issue they don't really have anything to do with.

I fear that I will not have time to explain related topic like using git, commit, etc. I can help you to give you some information with code base (ex : which utilities class you should use, etc), but for other thing (ex : explain how to sign commit, etc), I fear I will have no time to do that.

I wasn't thinking that basic. :)
But I see your point. Let's keep it in the PRs/issues for now, then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants