Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape spaces in URIs to make them parsable #170

Closed
wants to merge 1 commit into from

Conversation

RST-J
Copy link
Contributor

@RST-J RST-J commented Oct 29, 2014

Fixes #100
normalized_uri passes the given data to URI for parsing which is not happy with spaces that are not encoded.

@iainbeeston
Copy link
Contributor

Can I suggest a different fix? I had a similar problem in #144. I think this is a more general problem than spaces and could also happen for non-latin characters. Could we (instead) in normalize url do "data = URI.parse(URI.escape(data))"? (See #114 to see that in action)

Also, if we escape spaces (or non latin characters) in normalize_uri why do we have to do the same again in URI::File? (I might be wrong, but I would have thought that if we have a file URI it would have come through normalize_uri

@RST-J
Copy link
Contributor Author

RST-J commented Oct 30, 2014

I first used URI.escape but that lead to problems with looking up cached schemas as "http://example.com#" isn't literally the same as "http://example.com%23".
I thought, escaped hashtags would be sent to the server and interpreted as being part of the URL but this is probably wrong (can anyone clarify?). If so, I'll go through the code and make sure that the cache will use escaped keys so that we can use URI.escape.

As of now, in URI::File there is not another escaping of spaces but the opposite because File does not consider URI escape sequences and says that directory "json%20schema" does not exist.

@iainbeeston
Copy link
Contributor

In that case maybe URI::File should use URI.unescape.

URI encoding of the fragment symbol is a problem. There's a long discussion of it here:

http://stackoverflow.com/questions/2824126/whats-the-difference-between-uri-escape-and-cgi-escape

The most sensible suggestion that I've seen would be to use URI.escape but pass two arguments: the url AND a regexp for unsafe characters. By default this regexp marks "#" as unsafe and encodes it (which is correct except when it designates a fragment). I would try using that regexp but excluding the "#" character (ie not including it as unsafe, as is the default).=

@iainbeeston
Copy link
Contributor

And whatever we do here should also be done for the uri property!

@RST-J
Copy link
Contributor Author

RST-J commented Oct 30, 2014

Ok, then this unfortunately is in deed a more complex issue - thanks Murphy :).
I'll look into this then.

@iainbeeston You mean really only the uri property or also anything that could contain a string which will be used as an URI (but this would be covered by the normalization then)? Just want to make sure not to oversee something.

@iainbeeston
Copy link
Contributor

I've been reading more into this and it seems like a pain to do this correctly! Take a look at this, for example:

https://www.ruby-forum.com/topic/207489

The uri property needs to parse uris and so does normalize_uri, and I think that they should do it in the same way (hopefully the right way), so that it works for file uris and non-latin uris.

I'm starting to reconsider - perhaps addressable would be a good solution, to not only this (probably more thoroughly than we ever could), but also to replace the file uri class in json-schema. However, it'd be a shame to add an extra dependency. What do you think? Also, @hoxworth what do you think to the idea of adding addressable?

@pd
Copy link
Contributor

pd commented Oct 30, 2014

@iainbeeston lol I ran into the same deprecation yesterday when I was checking to see if json-schema could run without any warnings. If you wanna see noisy tests, try RUBYOPT='-w -W2' rake =)

So it seems we need to stop using URI.escape and URI.unescape entirely.

@iainbeeston
Copy link
Contributor

Hey that's an interesting point, maybe we should display warnings on travis, or something

@pd
Copy link
Contributor

pd commented Oct 30, 2014

My vote's for addressable; any implementation we end up with here would inevitably be full of bugs. Zero dependencies is nice, but it's not like we're adding in ActiveSupport and EventMachine or something.

@RST-J
Copy link
Contributor Author

RST-J commented Nov 1, 2014

Ok, seems like this whole URI thing - apart from #100 - is a bigger issue than expected, although it seems not to actually hit active users. But I'd like to straighten possible corner cases right away.
To have an overview about all the things that are afftected here, please add comments about what we have to deal with - I'll then aim to cover as much of the URI quirks as possible.

@iainbeeston
Copy link
Contributor

Sorry, I'm not sure I understand completely - you mean list all of the things that don't work with the current implementation?

One example is the original bug in this pull request - we need to be able to make sure that our URIs are properly escaped (and that seems like a problem almost as bad as validating email - #110), another is getting URIs to work consistently across platforms (eg. #109).

As @pd said, it seems like a larger problem than we can (or should) handle in this gem itself, and I suspect there could be many bugs (due to corner cases in uri parsing/handling/loading) that we will find over time. My assumption is that Addressable will "magic away" these problems for us, because that's what it does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When extending a schema in folder names with spaces
3 participants