-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Path encoding issue after upgrade from 0.9.6 to 1.0.1 #1561
Comments
@aaronjwhiteside closing as won't fix. we just use apache behind the scenes. you are welcome to provide a PR to fix it |
@ptrthomas I'm ok with taking a stab at fixing this. But I just want to make sure I'm fixing the right thing.. From the README.md
I take this to mean that if I use the Comma delimited style that every "part" of that path should be escaped. For example: Given path 'abc', '/', '123|', '&', '?', '=' Should come out like this: But Given path 'abc//123|&?=' Should be left alone? I have a feeling that isn't right. Because
is part of the sentence that talks about comma delimited values, it can be taken to imply that URL-encoding only happens when using comma delimited values. And not when using a single value. Looking at the code I don't see anything that treats comma separated paths specially, vs non comma separated, because everything seems to end up as a List. I even see code that assumes comma separated parts can have trailing slashes that are left in place.. and not urlencoded? If my interpretation of the README.md is correct, then my approach would be:
But there are difficulties when dealing with trailing slashes in this case, questions like: was the slash in the part intended to be url encoded or to cap off the path? Maybe slashes are the one thing that are excluded from url encoding? I suspect this is probably "correct" in one sense, but may break for a few people.. Another approach might be to maintain a list of characters that the apache http client doesn't think are valid in the URIs it accepts and just url encode them. I think that's probably less "correct" but much less likely to break people's existing usage of the |
@aaronjwhiteside all this is fine. in 1.X we decided to simplify what "http client" implementations need to handle and all the processing that the http-request-builder does ends up as a now in this object, the path-chunks and even query-string-params have already been boiled into a single string |
@aaronjwhiteside one more thing, I'm open to adding keywords like |
@ptrthomas So I think I have a grasp on it now. The problem is that here we're expecting Apache Http Client to know how to encode a full url we pass to it. But it expects the URL to already be encoded, and performs no translation/encoding on it. They provide a utility to help users in this regard: This makes sense if you think about the following scenarios:
Is the second The same issue exists for other examples like this:
Is the middle The Apache Http Client cannot know our intent here and so expects it to already be encoded correctly. Now on to a possible solution:
I've also got a solution to the trailing Given path 'hello', 'world', '' Should produce So question for you, it looks like the https://github.com/intuit/karate/blob/master/karate-core/src/main/java/com/intuit/karate/http/HttpRequest.java and https://github.com/intuit/karate/blob/master/karate-core/src/main/java/com/intuit/karate/http/HttpRequestBuilder.java are intended to be abstractions over any implementation of the How do you feel about using https://javadoc.io/doc/org.apache.httpcomponents/httpclient/latest/org/apache/http/client/utils/URIBuilder.html inside Karate's If not, and we modify the Personally I'm in favour of just putting the logic into the But we could of course recreate Apache's Let me know.. |
reopening. thanks for the research, that helps. agree that I don't mind using URIBuilder inside the http-request-builder, we decided to hard-depend on apache anyway in version 1.0 - I would consider using netty utilities btw, netty is IMO the lib we will always depend on - and in the future apache may need to be swapped out e.g. for http2/3 and async, but not a big deal IIRC you can perform the url-encoding + path-building using the java URL class |
I think if users append the forward slash to the first path segment, we continue to strip it off as is done right now, just to keep that part backwards compatible. The trailing slash is actually an artifact of how the current code is written and is the recommended way when using Apache's
Ok, even better by me, I can use that there and keep
I just checked this and URI and URL only provide ways to pass the the full path as a single argument. https://docs.oracle.com/javase/7/docs/api/java/net/URI.html
Agree, I'll take a look and see what netty provide in this regard. |
@aaronjwhiteside all good, so a PR with some extra tests should be fine. I just worry about teams I've seen doing things like |
@aaronjwhiteside thinking more about my example |
HttpRequestBuilder.java - replaced armeria QueryParamsBuilder with apache http URIBuilder. - minor generics cleanup - introduced support for fragments. - URIBuilder now handles path segment encoding, query parameters and any fragment. - care was taken to ensure that path's prefixed with / are accepted, but a warning is printed letting people know they should remove the prefixing / encoding.feature - added more demo/test cases for path encoding HttpUtils.java - minor generics cleanup - final is redundant on static methods karate-demo/pom.xml - scope = import only works in the dependencyManagement section, this fixes the maven warning. Request.java - minor generics cleanup - use computeIfAbsent() HttpClientFactory.java - public static final are redundant on interface fields - also use method reference
HttpRequestBuilder.java - replaced armeria QueryParamsBuilder with apache http URIBuilder. - minor generics cleanup - introduced support for fragments. - URIBuilder now handles path segment encoding, query parameters and any fragment. - care was taken to ensure that path's prefixed with / are accepted, but a warning is printed letting people know they should remove the prefixing / encoding.feature - added more demo/test cases for path encoding HttpUtils.java - minor generics cleanup - final is redundant on static methods karate-demo/pom.xml - scope = import only works in the dependencyManagement section, this fixes the maven warning. Request.java - minor generics cleanup - use computeIfAbsent() HttpClientFactory.java - public static final are redundant on interface fields - also use method reference
hi @ptrthomas I've opened a draft PR, #1565 Still working through some issues: karate-demo passes, but karate-mock-servlet fails
|
@aaronjwhiteside yes, that was in response to a very old issue raised - and since I've felt that we may be doing "too much" take a look at the server-side unpacking of the request which uses armeria / netty and you can see what I mean, and I did that in a hurry: so we don't need to treat that test-case as the "truth" - but do what makes sense to us |
@ptrthomas OK, I'll take a look tomorrow, getting late here :) |
HttpRequestBuilder.java - replaced armeria QueryParamsBuilder with apache http `URIBuilder`. - minor generics cleanup - `URIBuilder` now handles path segment encoding and query parameters. - we now check if any paths are prefixed with `/` and issue a warning log, that this is probably a mistake encoding.feature - added more demo/test cases for path encoding HttpUtils.java - minor generics cleanup - final is redundant on static methods karate-demo/pom.xml - scope = import only works in the dependencyManagement section, this fixes the maven warning. Request.java - minor generics cleanup - use computeIfAbsent() HttpClientFactory.java - public static final are redundant on interface fields - also use method reference KarateMockHandlerTest.java KarateHttpMockHandlerTest.java - removed all `/` path prefixes README.md - updated with details on how to correctly use path - changes any erroneous examples of path usage
So the issue turns out to be that http servlet engines are required to decode the url before passing it to servlets.. A quick hack of, in It's not trivial to decode a url, some things need to be decoded but essentially ignored by the servlet engine, the case of The naive approach of using So for the moment I'll keep digging, otherwise I've updated the PR with the changes discussed in the PR comments. public void setUrl(String url) {
// http servlet engines are required to decode the URL before passing to any servlets
// this solution isn't ideal, as URLDecoder really deals with x-www-form-urlencoded decoding,
// but it's good enough for our mock http servlet.
try {
urlAndPath = URLDecoder.decode(url, StandardCharsets.UTF_8.name());
} catch (UnsupportedEncodingException e) {
throw new RuntimeException(e);
}
StringUtils.Pair pair = HttpUtils.parseUriIntoUrlBaseAndPath(url);
urlBase = pair.left;
QueryStringDecoder qsd = new QueryStringDecoder(pair.right);
setPath(qsd.path());
setParams(qsd.parameters());
} |
@aaronjwhiteside thanks, the changes look ok and thanks for the readme edits. I'll wait for the ci to be green then cc @joelpramos for a second opinion, especially about the proposal to break users who have mixed forward-slashes into |
Just pushed a new commit. Long story short tracked down the issue to Spring, it was decoding the path info using Latin-1 instead of UTF-8. Two changes,
Some references: |
@aaronjwhiteside great, thanks for the research. if possible - can you suggest a way teams can scan their existing feature files to see if any usage of |
Something like this should work: .* path.*('\/.*)+ |
I have mixed feelings about this change - primarily as I find comma separated paths counter intuitive as an HTTP path can definitely have forward slashes (regardless of interpretation for RESTful this is observed with SOAP and just in general across the internet). My feeling is that the impacted users of this change will vastly exceed the number of users with needs for special encoding. Would we be able to apply this change only when there is a List of paths (provided by comma separated list or when the keyword shows multiple times) and documenting that if you have special needs for encoding you should use that alternative? |
@aaronjwhiteside I think the other steps are fine, this is the only one that just takes an array. the rest are key-values (param/s, header/s etc which already support lists as values) I'm in 2 minds. I propose that users like @gerben86 who have anyway done non-trivial JS have to manually fix tests anyway, might as well use |
This is the way I lean too, better to keep moving forward without workarounds/hacks.. |
@aaronjwhiteside thanks. I've already gotten a handful of comments from users who are not very comfortable with this change. but as of now, I'm clear that we should do this. Karate is an API testing framework and the rules of HTTP come first. also for the record, I always intended for path to work this way - without users having to "worry" about slashes. it devolved a bit, even in some of my own examples. so IMO this has to be done right going forward
|
Recommend looking at another option: We leave path as it is, but if you want to escape a literal character, put a backslash ('\') in front of it. Path would take ALL strings passed in (one or more through comma-separated list of strings or lists-of-strings) and do the following:
This allows the user to pass in any number of strings as a list, and should include lists of lists as you mentioned earlier. It allows us to escape specials we want to be used literally. It doesn't break most existing tests. I can think of only some weird, odd-ball situations that someone might do, but as @aaronjwhiteside said there are so many ways people use and abuse URIs/URLs/servelets/etc. that we can't account for them all, but I think this method will be the most flexible for the most people. This would also require users to go in and escape all path characters in their tests that are to be literal, that could be an issue for existing tests as well, but should be a one-time change, although that can be a lot of work. |
I agree with @krstcs as the approach will NOT break existing consumers especially after teams have already spent 1-2 sprints on 1.0 migration of their projects. |
I'm also thinking, we might want to do the same thing with url for consistency? The backend mechanic would be the same that way. url would just have the task of setting the url itself while path would just add paths to the set url. But I don't know if that makes sense? Maybe everything after the first single-forward-slash gets the same treatment? Edit: i.e.: <literal>protocol://host:port/</literal><encode>the rest</encode>. |
I think this will work, but I prefer the Anyway, I understand that we need to update something, but I'll wait for the final future-proof solution. |
yes, the whole point of the RC is to get this feedback and I'm thankful for the conversation here. @aaronjwhiteside I like the proposal to keep path as-is and use an escape character let me know if you are ok or I can do this change over the weekend |
@krstcs sub-lists if I understand you correctly, are not actually supported, and are probably better suited to another issue? I think they could be an interesting feature proposition on their own.
@ptrthomas Consider the use case where I have some dynamic data, say I have called an api and got a response and I'm using some value from the response to make another api call.. Background:
* def otherFeature = call read('.....') { xxx: yyy }
* def someId = otherFeature.response.id
Scenario:
Given url baseUrl
And path 'service', 'entities', someId
When method GET Now consider that If Since the escape character Given path '/hello/world/this_path_contains_\/_a_slash' vs Given path 'hello', 'world', 'this_path_contains_/_a_slash'
# or
Given raw path '/hello/world/this_path_contains_%2F_a_slash' And at least with the But maybe the escape character has a use within the In summary: |
@aaronjwhiteside I hear all your arguments - but FWIW you were the first one ever to find issues with path encoding - so there is an aspect of how relevant this change is to the larger community maybe we should just revert to 0.9.6 behavior and think about it a little more. I do think the edge case where users need to re-use a path segment from a previous response will always be a full URL - going by REST / HATEOAS concepts. |
@ptrthomas this is perfectly ok by me, I think that's all I wanted in the beginning :) I think the
This is more or less our entire use case for Karate.. We have a platform made up of over 60+ microservices, testing these usually entails making an api call to a top level microservice, gathering the id(s) from the response and querying downstream microservices to ensure state was mutated correctly and or entities were created as expected. Our APIs are RESTful, but do not follow HATEOAS concepts.. which wouldn't help with cross microservice interaction anyway. |
@aaronjwhiteside I'm still confused with your understanding of
|
@ptrthomas At the time I was implementing the
It's expected to be unescaped because it parses the path and splits it by the path separator into segments, stored internally, and when building the final url it does encoding of a subset of what it considers special characters, for example This obviously wasn't my intention, I wanted it to just accept whatever I passed in as the raw path and not touch it in any way. But if you just wanted to keep things simple and get back to how
I still think it should encode
I think I get your comment now about inverse, my take on prefixing something with So using an escape prefix to force something to be encoded, aka treated differently, vs treated as verbatim, doesn't make sense to me, as it goes against prior experience with escaping. And I suspect it would with others too? But maybe this can just be a quirk of Karate, because sometimes things are just not that simple. shrug Having said all that, I'm willing to accept I think internally we can split the path into segments, so that
This still feels ambiguous to me:
Maybe
Agree! On a technical note, to really keep this path untouched, we might have to drop using
Agree!
So long as it correctly encodes for URLs. Remembering that the JDK's URLEncoder doesn't quite do this, as it's intended for url encoded form bodies, which have slightly different rules to actual URLs. |
OK, I think I see the issue. Forward slash would be backwards from every other special character in my use-case because it would NOT be encoded normally, but would only be encoded if it was escaped, but then forward-slash IS a special case in URI encoding anyway so it makes sense (to me???) that it is treated differently. Every other special character would be encoded UNLESS it was escaped, forward-slash would be encoded if it IS escaped. Escaping should make the character be handled the reverse of how it's normally handled. Now, again, I understand the issue that we have here, because one character IS being treated differently than all the rest. But isn't that always the case in URIs?? I still don't see the need for pathEncoded though. I think it's more fluff that isn't necessary if we treat ONLY forward-slash as a special case for escaping. The other option would be to encoded EVERY special character (so != [a-zA-Z0-9]) that isn't escaped, which would force us to then change all forward-slashes in all tests to be escaped, that doesn't seem to be what we want does it? |
Hey there, IMHO, there need to be a way to encode URLs, but it should NOT be the default.
looks way more readable to me than :
Plus, the former is copy-pastable from a browser/curl command, which is quite handy to write a test easily. So I would either :
These are just my two cents, but I'd be happy to discuss if people disagree with this. |
bring back 0.9.6 behavior forward-slashes are supported not sure about trailing slashes but not a big deal imo added karate.urlEncode() and karate.urlDecode() helpers no need for raw path or equivalent
to everyone subscribed here, I've made changes to bring back the old behavior, there will be no breaking change. expect a 1.1.0.RC3 shortly - but it would be great if anyone can test in the meantime following the developer guide as of now, you can get the http client to encode a forward-slash by doing this: * path '/hello\\\\/world' which is kinda hilarious but I don't care anymore lol. haven't dug into it but my guess is most servers will process the EDIT: no more
|
In case if it helps anyone in the light of the changes above for the case with the json payload in a request body (was struggling trying to get it working with 1.1.0.RC3 until found this thread): before (1.1.0.RC2 and earlier)
after (1.1.0.RC3)
Couldn't find less ugly (and requiring minimal refactoring) workaround. |
@hlemmur can you explain that more. if your server doesn't work unless the URL ends with a |
@ptrthomas no, it's not a negative case. Checked the source code of the requested service, the request handler is defined with a trailing
Trailing Though I think my case is a special case, so I don't mind to use To me the more consistent behaviour would be to consider adding the trailing |
@hlemmur just made the change so at least the so yes, in your case I think the escaping is fine. and a rare case, so I don't plan to make any changes to the docs (PR-s are welcome as always) |
an update for all watching this thread, the next version will support preserving a trailing slash even when you use |
From reading the documentation it seems that paths should be url encoded by Karate.
This example works in 0.9.6 but fails after upgrading to 1.0.1.
with the following error:
Looking at the spec, https://tools.ietf.org/html/rfc3986, the character
|
does not appear to be special?Changing the scope variable to be
'root%7C'
seems to fix the problem and karate is happy, but|
is not a special character for URLs, and this works fine in 0.9.6..log output from 0.9.6
I can provide a full minimal project if requested, but I feel this should be enough to reproduce, let me know and I'll attach it to the ticket.
The text was updated successfully, but these errors were encountered: