-
-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Escape spaces in URIs to make them parsable #170
Conversation
Can I suggest a different fix? I had a similar problem in #144. I think this is a more general problem than spaces and could also happen for non-latin characters. Could we (instead) in normalize url do "data = URI.parse(URI.escape(data))"? (See #114 to see that in action) Also, if we escape spaces (or non latin characters) in normalize_uri why do we have to do the same again in URI::File? (I might be wrong, but I would have thought that if we have a file URI it would have come through normalize_uri |
I first used URI.escape but that lead to problems with looking up cached schemas as "http://example.com#" isn't literally the same as "http://example.com%23". As of now, in URI::File there is not another escaping of spaces but the opposite because File does not consider URI escape sequences and says that directory "json%20schema" does not exist. |
In that case maybe URI::File should use URI.unescape. URI encoding of the fragment symbol is a problem. There's a long discussion of it here: http://stackoverflow.com/questions/2824126/whats-the-difference-between-uri-escape-and-cgi-escape The most sensible suggestion that I've seen would be to use URI.escape but pass two arguments: the url AND a regexp for unsafe characters. By default this regexp marks "#" as unsafe and encodes it (which is correct except when it designates a fragment). I would try using that regexp but excluding the "#" character (ie not including it as unsafe, as is the default).= |
And whatever we do here should also be done for the uri property! |
Ok, then this unfortunately is in deed a more complex issue - thanks Murphy :). @iainbeeston You mean really only the uri property or also anything that could contain a string which will be used as an URI (but this would be covered by the normalization then)? Just want to make sure not to oversee something. |
I've been reading more into this and it seems like a pain to do this correctly! Take a look at this, for example: https://www.ruby-forum.com/topic/207489 The uri property needs to parse uris and so does normalize_uri, and I think that they should do it in the same way (hopefully the right way), so that it works for file uris and non-latin uris. I'm starting to reconsider - perhaps addressable would be a good solution, to not only this (probably more thoroughly than we ever could), but also to replace the file uri class in json-schema. However, it'd be a shame to add an extra dependency. What do you think? Also, @hoxworth what do you think to the idea of adding addressable? |
@iainbeeston lol I ran into the same deprecation yesterday when I was checking to see if json-schema could run without any warnings. If you wanna see noisy tests, try So it seems we need to stop using |
Hey that's an interesting point, maybe we should display warnings on travis, or something |
My vote's for addressable; any implementation we end up with here would inevitably be full of bugs. Zero dependencies is nice, but it's not like we're adding in ActiveSupport and EventMachine or something. |
Ok, seems like this whole URI thing - apart from #100 - is a bigger issue than expected, although it seems not to actually hit active users. But I'd like to straighten possible corner cases right away. |
Sorry, I'm not sure I understand completely - you mean list all of the things that don't work with the current implementation? One example is the original bug in this pull request - we need to be able to make sure that our URIs are properly escaped (and that seems like a problem almost as bad as validating email - #110), another is getting URIs to work consistently across platforms (eg. #109). As @pd said, it seems like a larger problem than we can (or should) handle in this gem itself, and I suspect there could be many bugs (due to corner cases in uri parsing/handling/loading) that we will find over time. My assumption is that Addressable will "magic away" these problems for us, because that's what it does. |
Fixes #100
normalized_uri passes the given data to URI for parsing which is not happy with spaces that are not encoded.