Change all URIs to IRIs or URLs, depending on context. Resolves #369 #466

jakebeal · 2021-09-17T12:12:20Z

Change all URIs to IRIs or URLs, depending on context. Resolves #369
Also adds mapping information for SBOL2/SBOL3 regarding namespace, identity, and version

Per note on #369, I believe this is a non-SEP change.

Also adds mapping information for SBOL2/SBOL3 regarding namespace, identity, and version

jakebeal · 2021-09-17T12:14:17Z

@udp : I'd like you to look especially at the section on mapping identifiers and versions between SBOL2 and SBOL3, since I think you might want to adjust sbolgraph based on these recommendations.

cjmyers · 2021-09-17T15:37:33Z

I'm not comfortable with this change without more discussion. I understand the issue from the tracker. However, the only part of the URI I would think where this makes a big difference is displayIds, since this is the main part people see. However, we have long had these limited to alphanumeric underscore. The introduction of additional special characters to support other language alphabets has a high potential to break software. There is code in many places that checks displayIds are restricted to English alphanumeric, meaning that some software will declare these as invalid SBOL files and refuse to process them accordingly. I would suggest we delay this change for now until testing can be done.

jakebeal · 2021-09-17T15:54:58Z

Is this something that's a libSBOLj restriction?

The SBOL2 document doesn't actually specify English anywhere as a restriction on alphanumeric. A displayID is a string, and the referenced string-type includes unicode. The definition of anyURI that is linked also actually allows the full range of IRIs as well, despite being called "anyURI":

anyURI represents an Internationalized Resource Identifier Reference (IRI)

As a consequence, pySBOL has long supported any unicode character that tests as true for being alphanumeric, since that's what the specification already required.

cjmyers · 2021-09-17T15:59:31Z

The issue is I don't know if it is or is not. I've not tested this. If you have test files that we can use to verify that software will not break, then we can validate there are no issues. But before testing, I'm not comfortable with this change.

Try uploading an SBOL2 file with international characters in their displayIds to SBH and see what happens. Try opening with SBOLCanvas also. I'm really unsure if there are going to be problems, but I would prefer not making the change until we are sure there will be none.

jakebeal · 2021-09-17T18:40:25Z

I just tested with SBOL Canvas and SynBioHub. Both of them reject the characters as invalid.

Why is this a problem, though? Both of them are SBOL2, and the draft says that you SHOULD escape these characters when converting from SBOL3 to SBOL2.

cjmyers · 2021-09-17T19:01:12Z

The library uses simple RegEx to check validity. The RegEx does not include non-English characters, so they are rejected.

Given you experiment and the fact that many tools in the wild use libSBOLj, I think we should hold this change for now. It would break tools. Even if we update libSBOLj, we cannot guarantee developers will update their tools to the new version immediately.

By the way, there are ways to convert special characters into English alphabets (at least according to my German student). We could consider converting them in an SBOL3 to SBOL2 converter, assuming SBOL3 libraries are ALL okay with this. Have you tested Goksel's libSBOLj3?

jakebeal · 2021-09-17T19:37:32Z

I think that we're in agreement that SBOL2 doesn't in practice support IRIs, and that conversion from unicode to ASCII would typically be necessary for SBOL3->SBOL2. That's not a problem.

For SBOL3, I expect that @udp's library supports IRIs, since he requested the change. I don't know about @goksel, so have added him as a reviewer.

tcmitchell · 2021-09-21T15:33:49Z

This is a change to SBOL3, not SBOL2. I don't think we should worry to much about SBOL2 tools (SynBioHub, libSBOLj) and how they handle SBOL2 displayIds. Since SBOL3 is based on tooling that has a broader definition of alphanumeric than "English alphanumeric" (or ASCII), I think SBOL3 should embrace a wider variety of characters than "English alphanumeric". I had interpreted the specification more broadly when I read it.

As far as implementation, pySBOL3 uses Python's isalnum, which relies on isalpha, which uses this definition of alphabetic characters:

Alphabetic characters are those characters defined in the Unicode character database as “Letter”, i.e., those with general category property being one of “Lm”, “Lt”, “Lu”, “Ll”, or “Lo”. Note that this is different from the “Alphabetic” property defined in the Unicode Standard.

I think the spec should be updated with definitions for "alphanumeric" and "underscore" and "digit". Something along the lines of the above definition so that there is less room for interpretation by individual tools and developers.

tcmitchell · 2021-09-21T15:39:28Z

identified.tex

@@ -16,18 +16,18 @@ \subsection{Identified}

 \subparagraph{The \sbolheading{displayId} property}
 \label{sec:displayId}
-The \sbol{displayId} property is an OPTIONAL identifier with a data type of \sbol{String}. This property is intended to be an intermediate between a URI and the \sbol{name} property that is machine-readable, but more human-readable than the full URI of an object.
+The \sbol{displayId} property is an OPTIONAL identifier with a data type of \sbol{String}. This property is intended to be an intermediate between a IRI and the \sbol{name} property that is machine-readable, but more human-readable than the full IRI of an object.

 If the \sbol{displayId} property is used, then its \sbol{String} value MUST be composed of only alphanumeric or underscore characters and MUST NOT begin with a digit.


Can definitions be added for "alphanumeric", "underscore", and "digit"?

For example, Python uses the following definition of "alphabetic" (a component of "alphanumeric"):

Alphabetic characters are those characters defined in the Unicode character database as “Letter”, i.e., those with general category property being one of “Lm”, “Lt”, “Lu”, “Ll”, or “Lo”. Note that this is different from the “Alphabetic” property defined in the Unicode Standard.

For "underscore" I think we probably mean Unicode U+005F. Wikipedia's definition of underscore lists three possibilities though.

That's correct, the one that's equivalent to ASCII underscore 0x5F

cjmyers · 2021-09-21T16:14:39Z

I would like to discuss this at our next SBOL3 meeting. You are correct that this is an SBOL3 change, but it is potentially going to affect SBOL2 tools as well. For example, it is possible to upload SBOL3 to SynBioHub now, but this would break if you used non-English alphabets in displayIds. Also, we need to ensure that conversion tools are capable of changing non-English characters to English characters when converting from SBOL3 to SBOL2. I would like to propose that this is a 3.1.0 change, so we can have some time to work out these issues, and avoid delaying the release of 3.0.1 as we work them out.

jakebeal · 2021-09-21T16:51:17Z

I'm fine with pushing this to 3.1 as long as you're OK that pySBOL3 allows the more liberal definition.

cjmyers · 2021-09-21T18:55:06Z

If pySBOL3 can create content with non-English alphabets, then there will be issues with these files. Is there an urgent need to support this now?

tcmitchell · 2021-09-21T19:12:55Z

No, we do not have an urgent need for you to support this now. It will be good to clarify the spec as a first step. Users of SynBioHub will have to be careful to limit themselves to ASCII characters.

cjmyers · 2021-09-21T19:19:09Z

Actually, my question is do you have an urgent need to have pySBOL3 support this now?

tcmitchell · 2021-09-21T19:30:38Z

pySBOL3 has supported Unicode alphanumeric displayIds since at least August, 2020. We're not making a change to support Unicode displayIds, which is probably why I interpreted your question differently. We have supported this for over a year at least. Probably longer than that.

cjmyers · 2021-09-21T23:11:39Z

I see. Ok, well, hopefully we can get some solution to this soon then. Not sure if many people are using this feature as of yet. Have you seen it being used?

jakebeal · 2021-09-21T23:36:16Z

Not blatantly, but with Excel-to-SBOL it may be getting used already without being obvious.

# Conflicts: # apdx-validation.tex # feature.tex # location.tex # uml/feature.pdf # umlet_source/feature.uxf # vocabulary.tex

Change all URIs to IRIs or URLs, depending on context. Resolves #369

be90c67

Also adds mapping information for SBOL2/SBOL3 regarding namespace, identity, and version

jakebeal requested review from tcmitchell, cjmyers and jamesamcl September 17, 2021 12:12

jakebeal requested a review from goksel September 17, 2021 19:36

tcmitchell reviewed Sep 21, 2021

View reviewed changes

jakebeal added 2 commits August 30, 2022 10:17

Merge branch 'master' into issue-369-IRIs

3b581fe

# Conflicts: # apdx-validation.tex # feature.tex # location.tex # uml/feature.pdf # umlet_source/feature.uxf # vocabulary.tex

merge changes in feature UML

cd94a1e

jakebeal mentioned this pull request Aug 30, 2022

URIs -> IRIs #369

Closed

cjmyers approved these changes Oct 19, 2022

View reviewed changes

Merge branch 'master' into issue-369-IRIs

02fc665

LukasBuecherl merged commit 8e762c3 into master Nov 9, 2022

LukasBuecherl deleted the issue-369-IRIs branch November 9, 2022 16:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change all URIs to IRIs or URLs, depending on context. Resolves #369 #466

Change all URIs to IRIs or URLs, depending on context. Resolves #369 #466

jakebeal commented Sep 17, 2021

jakebeal commented Sep 17, 2021

cjmyers commented Sep 17, 2021

jakebeal commented Sep 17, 2021

cjmyers commented Sep 17, 2021

jakebeal commented Sep 17, 2021

cjmyers commented Sep 17, 2021

jakebeal commented Sep 17, 2021

tcmitchell commented Sep 21, 2021

tcmitchell Sep 21, 2021

jakebeal Sep 21, 2021

cjmyers commented Sep 21, 2021

jakebeal commented Sep 21, 2021

cjmyers commented Sep 21, 2021

tcmitchell commented Sep 21, 2021

cjmyers commented Sep 21, 2021

tcmitchell commented Sep 21, 2021

cjmyers commented Sep 21, 2021

jakebeal commented Sep 21, 2021

Change all URIs to IRIs or URLs, depending on context. Resolves #369 #466

Change all URIs to IRIs or URLs, depending on context. Resolves #369 #466

Conversation

jakebeal commented Sep 17, 2021

jakebeal commented Sep 17, 2021

cjmyers commented Sep 17, 2021

jakebeal commented Sep 17, 2021

cjmyers commented Sep 17, 2021

jakebeal commented Sep 17, 2021

cjmyers commented Sep 17, 2021

jakebeal commented Sep 17, 2021

tcmitchell commented Sep 21, 2021

tcmitchell Sep 21, 2021

Choose a reason for hiding this comment

jakebeal Sep 21, 2021

Choose a reason for hiding this comment

cjmyers commented Sep 21, 2021

jakebeal commented Sep 21, 2021

cjmyers commented Sep 21, 2021

tcmitchell commented Sep 21, 2021

cjmyers commented Sep 21, 2021

tcmitchell commented Sep 21, 2021

cjmyers commented Sep 21, 2021

jakebeal commented Sep 21, 2021