Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rd2dot Escape HTML in node label and URI text #1209

Merged

Conversation

blake-regalia
Copy link

Fixes #1208

Proposed Changes

  • escapes HTML in the node label
  • escapes HTML in the URI text

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.06%) to 75.421% when pulling 1ee96eb on blake-regalia:fix/rdf2dot-ampersand-escape into c5ff127 on RDFLib:master.

@coveralls
Copy link

coveralls commented Dec 4, 2020

Coverage Status

Coverage decreased (-0.1%) to 75.353% when pulling 1ee96eb on blake-regalia:fix/rdf2dot-ampersand-escape into c5ff127 on RDFLib:master.

@nicholascar
Copy link
Member

Open Close Travis

Copy link
Member

@nicholascar nicholascar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. The Travis test failure is due only to connection issues in Travis, not anything from the code changes so I approve. Awaiting second review.

@nicholascar nicholascar merged commit b557dca into RDFLib:master Dec 27, 2020
@white-gecko
Copy link
Member

Actually now I tried to reproduce your example from #1208 resp. my version #1208 (comment) and I get:

Error: not well-formed (invalid token) in line 1 
... <B>This & That ...
in label of node node0

the result is:

digraph { 
 node [ fontname="DejaVu Sans" ] ; 
	node0 -> node1 [ color=BLACK, label=< <font point-size='10' color='#336633'>ns1:predicate</font> > ] ;
# eg://org/?foo=bar&baz=bux node0
node0 [ shape=none, color=black label=< <table color='#666666' cellborder='0' cellspacing='0' border='1'><tr><td colspan='2' bgcolor='grey'><B>This & That</B></td></tr><tr><td href='eg://org/?foo=bar&baz=bux' bgcolor='#eeeeee' colspan='2'><font point-size='10' color='#6666ff'>eg://org/?foo=bar&baz=bux</font></td></tr><tr><td align='left'>ns2:title</td><td align='left'>&quot;This &amp; That&quot;</td></tr></table> > ] 
# eg://org/object node1
node1 [ shape=none, color=black label=< <table color='#666666' cellborder='0' cellspacing='0' border='1'><tr><td colspan='2' bgcolor='grey'><B>object</B></td></tr><tr><td href='eg://org/object' bgcolor='#eeeeee' colspan='2'><font point-size='10' color='#6666ff'>eg://org/object</font></td></tr></table> > ] 
}

so there is still an unescaped &

@white-gecko
Copy link
Member

Maybe it would be good to have the example as test

@white-gecko
Copy link
Member

I think I did't acutally test the fixed version, now I did bot got the following error an result:

Error: not well-formed (invalid token) in line 1 
... <tr><td href='eg://org/?foo=bar&baz=bux' bgcolor='#eeeeee' colspan='2'> ...
in label of node node0
digraph { 
 node [ fontname="DejaVu Sans" ] ; 
	node0 -> node1 [ color=BLACK, label=< <font point-size='10' color='#336633'>ns1:predicate</font> > ] ;
# eg://org/?foo=bar&baz=bux node0
node0 [ shape=none, color=black label=< <table color='#666666' cellborder='0' cellspacing='0' border='1'><tr><td colspan='2' bgcolor='grey'><B>This &amp; That</B></td></tr><tr><td href='eg://org/?foo=bar&baz=bux' bgcolor='#eeeeee' colspan='2'><font point-size='10' color='#6666ff'>eg://org/?foo=bar&amp;baz=bux</font></td></tr><tr><td align='left'>ns2:title</td><td align='left'>&quot;This &amp; That&quot;</td></tr></table> > ] 
# eg://org/object node1
node1 [ shape=none, color=black label=< <table color='#666666' cellborder='0' cellspacing='0' border='1'><tr><td colspan='2' bgcolor='grey'><B>object</B></td></tr><tr><td href='eg://org/object' bgcolor='#eeeeee' colspan='2'><font point-size='10' color='#6666ff'>eg://org/object</font></td></tr></table> > ] 
}

@white-gecko
Copy link
Member

white-gecko commented Dec 28, 2020

For me it is not very clear, how actually an & should be encoded for dot within a hyperreference.

@blake-regalia and @nicholascar what was the output of your tests?

@blake-regalia
Copy link
Author

blake-regalia commented Dec 29, 2020

Hmm, yes it seems that the href attribute value should also be escaped:

As HTML strings are processed like HTML input, any use of the ", &, <, and > characters in literal text or in attribute values need to be replaced by the corresponding escape sequence. For example, if you want to use & in an href value, this should be represented as &amp;.

https://graphviz.org/doc/info/shapes.html#html

But it seems no matter what I do, the DOT parsers cannot tolerate an ampersand in the href value whether or not it is in an HTML escape sequence &amp;. This could be a bug in those DOT parser implementations... I haven't found any hits on search relating to this exact problem.

For reference, here is the complete self-contained minimum example snippet:

import io
from rdflib import URIRef, Literal
from rdflib.namespace import DC
subj = URIRef('eg://org/?foo=bar&baz=bux')
pred = URIRef('eg://org/predicate')
obj = URIRef('eg://org/object')

g = rdflib.Graph()
g.add((subj, pred, obj))
g.add((subj, DC.title, Literal('This & That')))
out = io.StringIO()
rdf2dot(g, out)
print(out.getvalue())

@white-gecko white-gecko removed their request for review January 2, 2021 13:12
@white-gecko white-gecko added this to the rdflib 6.0.0 milestone Mar 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rdf2dot ampersand escape
4 participants