Escape spaces in URIs to make them parsable #170

RST-J · 2014-10-29T22:48:40Z

Fixes #100
normalized_uri passes the given data to URI for parsing which is not happy with spaces that are not encoded.

iainbeeston · 2014-10-30T06:45:15Z

Can I suggest a different fix? I had a similar problem in #144. I think this is a more general problem than spaces and could also happen for non-latin characters. Could we (instead) in normalize url do "data = URI.parse(URI.escape(data))"? (See #114 to see that in action)

Also, if we escape spaces (or non latin characters) in normalize_uri why do we have to do the same again in URI::File? (I might be wrong, but I would have thought that if we have a file URI it would have come through normalize_uri

RST-J · 2014-10-30T08:04:04Z

I first used URI.escape but that lead to problems with looking up cached schemas as "http://example.com#" isn't literally the same as "http://example.com%23".
I thought, escaped hashtags would be sent to the server and interpreted as being part of the URL but this is probably wrong (can anyone clarify?). If so, I'll go through the code and make sure that the cache will use escaped keys so that we can use URI.escape.

As of now, in URI::File there is not another escaping of spaces but the opposite because File does not consider URI escape sequences and says that directory "json%20schema" does not exist.

iainbeeston · 2014-10-30T08:50:37Z

In that case maybe URI::File should use URI.unescape.

URI encoding of the fragment symbol is a problem. There's a long discussion of it here:

http://stackoverflow.com/questions/2824126/whats-the-difference-between-uri-escape-and-cgi-escape

The most sensible suggestion that I've seen would be to use URI.escape but pass two arguments: the url AND a regexp for unsafe characters. By default this regexp marks "#" as unsafe and encodes it (which is correct except when it designates a fragment). I would try using that regexp but excluding the "#" character (ie not including it as unsafe, as is the default).=

iainbeeston · 2014-10-30T08:51:19Z

And whatever we do here should also be done for the uri property!

RST-J · 2014-10-30T10:14:15Z

Ok, then this unfortunately is in deed a more complex issue - thanks Murphy :).
I'll look into this then.

@iainbeeston You mean really only the uri property or also anything that could contain a string which will be used as an URI (but this would be covered by the normalization then)? Just want to make sure not to oversee something.

iainbeeston · 2014-10-30T10:40:29Z

I've been reading more into this and it seems like a pain to do this correctly! Take a look at this, for example:

https://www.ruby-forum.com/topic/207489

The uri property needs to parse uris and so does normalize_uri, and I think that they should do it in the same way (hopefully the right way), so that it works for file uris and non-latin uris.

I'm starting to reconsider - perhaps addressable would be a good solution, to not only this (probably more thoroughly than we ever could), but also to replace the file uri class in json-schema. However, it'd be a shame to add an extra dependency. What do you think? Also, @hoxworth what do you think to the idea of adding addressable?

pd · 2014-10-30T12:18:39Z

@iainbeeston lol I ran into the same deprecation yesterday when I was checking to see if json-schema could run without any warnings. If you wanna see noisy tests, try RUBYOPT='-w -W2' rake =)

So it seems we need to stop using URI.escape and URI.unescape entirely.

iainbeeston · 2014-10-30T12:22:32Z

Hey that's an interesting point, maybe we should display warnings on travis, or something

pd · 2014-10-30T12:36:12Z

My vote's for addressable; any implementation we end up with here would inevitably be full of bugs. Zero dependencies is nice, but it's not like we're adding in ActiveSupport and EventMachine or something.

RST-J · 2014-11-01T00:56:19Z

Ok, seems like this whole URI thing - apart from #100 - is a bigger issue than expected, although it seems not to actually hit active users. But I'd like to straighten possible corner cases right away.
To have an overview about all the things that are afftected here, please add comments about what we have to deal with - I'll then aim to cover as much of the URI quirks as possible.

iainbeeston · 2014-11-01T08:47:53Z

Sorry, I'm not sure I understand completely - you mean list all of the things that don't work with the current implementation?

One example is the original bug in this pull request - we need to be able to make sure that our URIs are properly escaped (and that seems like a problem almost as bad as validating email - #110), another is getting URIs to work consistently across platforms (eg. #109).

As @pd said, it seems like a larger problem than we can (or should) handle in this gem itself, and I suspect there could be many bugs (due to corner cases in uri parsing/handling/loading) that we will find over time. My assumption is that Addressable will "magic away" these problems for us, because that's what it does.

Escape spaces in URIs to make them parsable

cb9d860

RST-J closed this Nov 2, 2014

RST-J mentioned this pull request Nov 2, 2014

Use Addressable for handling URIs? #173

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Escape spaces in URIs to make them parsable #170

Escape spaces in URIs to make them parsable #170

RST-J commented Oct 29, 2014

iainbeeston commented Oct 30, 2014

RST-J commented Oct 30, 2014

iainbeeston commented Oct 30, 2014

iainbeeston commented Oct 30, 2014

RST-J commented Oct 30, 2014

iainbeeston commented Oct 30, 2014

pd commented Oct 30, 2014

iainbeeston commented Oct 30, 2014

pd commented Oct 30, 2014

RST-J commented Nov 1, 2014

iainbeeston commented Nov 1, 2014

Escape spaces in URIs to make them parsable #170

Escape spaces in URIs to make them parsable #170

Conversation

RST-J commented Oct 29, 2014

iainbeeston commented Oct 30, 2014

RST-J commented Oct 30, 2014

iainbeeston commented Oct 30, 2014

iainbeeston commented Oct 30, 2014

RST-J commented Oct 30, 2014

iainbeeston commented Oct 30, 2014

pd commented Oct 30, 2014

iainbeeston commented Oct 30, 2014

pd commented Oct 30, 2014

RST-J commented Nov 1, 2014

iainbeeston commented Nov 1, 2014