-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up various use of escape/unescape functions for URL generation #6334
Conversation
Currently branch names with a '+' fail in certain situations because QueryUnescape replaces the + character with a blank space. Using PathUnescape should be better since it is defined as: // PathUnescape is identical to QueryUnescape except that it does not // unescape '+' to ' ' (space). Fixes go-gitea#6333
This is only a partial fix. You need to go through the rest of the code that generates these paths and use something similar to Ideally we should move the generation of compare paths into some kind of helper function so that if we change the compare url again we can just change it in one place and one place only (or very easy find where it's being used.) Perhaps use the |
Codecov Report
@@ Coverage Diff @@
## master #6334 +/- ##
=========================================
Coverage ? 38.88%
=========================================
Files ? 364
Lines ? 51215
Branches ? 0
=========================================
Hits ? 19916
Misses ? 28432
Partials ? 2867
Continue to review full report at Codecov.
|
Ah I see yes I see what you mean, I can fix that. I do feel a bit weary to use this: gitea/modules/templates/helper.go Lines 115 to 117 in e09fe48
To replace existing code (and choosing it vs PathEscape). Are those really the only 4 things that need to be escaped? Or to put it another way, is there a known reason to want to write an alternative Escape function? Does PathEscape do things that are unwanted in particular situations? I see EscapePound is only used in templates now where as theres a mix of QueryEscape and PathEscape in the code. |
So |
Ah I see yea that is no good. What do you think of a function that would just call PathEscape on everything but a /
So just eliminating the one thing we don't like about PathEscape rather than trying to recreate all the things we would want. |
An alternative would be to run PathEscape and then replace all The main question is which is more understandable. |
I like the split method because it feels more intentional that we are really only interested in operating on the segments of a path rather than the path string as a whole (and it feels weird to replace something and then try to replace it back). a PathSegmentEscape if you will :) I think either method would just require a short comment like |
Anyway, I would be delighted if we could migrate to using the correct functions in the correct places. I.e. QueryEscape in queries, PathEscape in a path segment, GiteaPathEscape (or maybe PathSegmentsEscape?) in things that represent a multiple path segments. I too am deeply suspicious about EscapePound. |
Where would be the preferred place to define a new somewhat general use function like this? |
This function simply runs PathEscape on each segment of a path without touching the forward slash itself. We want to use this instead of PathEscape/QueryEscape in most cases because a forward slash is a valid name for a branch etc... and we don't want that escaped in a URL. Putting this in new file url.go and also moving a couple similar functions into that file as well.
Replace various uses of EscapePath/EscapeQuery with new EscapePathSegments. Also remove uncessary uses of various escape/unescape functions when the text had already been escaped or was not escaped.
OK, made all of the discussed changes and have edited the description of this PR per the requested scope. |
models/repo.go
Outdated
@@ -838,7 +837,7 @@ type CloneLink struct { | |||
|
|||
// ComposeHTTPSCloneURL returns HTTPS clone URL based on given owner and repository name. | |||
func ComposeHTTPSCloneURL(owner, repo string) string { | |||
return fmt.Sprintf("%s%s/%s.git", setting.AppURL, url.QueryEscape(owner), url.QueryEscape(repo)) | |||
return fmt.Sprintf("%s%s/%s.git", setting.AppURL, util.PathEscapeSegments(owner), util.PathEscapeSegments(repo)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one should be url.PathEscape
- although we don't allow /
in names at present we might in future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
modules/context/context.go
Outdated
@@ -210,7 +211,7 @@ func Contexter() macaron.Handler { | |||
if err == nil && len(repo.DefaultBranch) > 0 { | |||
branchName = repo.DefaultBranch | |||
} | |||
prefix := setting.AppURL + path.Join(url.QueryEscape(ownerName), url.QueryEscape(repoName), "src", "branch", branchName) | |||
prefix := setting.AppURL + path.Join(util.PathEscapeSegments(ownerName), util.PathEscapeSegments(repoName), "src", "branch", branchName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ownerName
and repoName
should be url.PathEscape
and branchName
should be escaped with util.PathEscapeSegments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
modules/context/repo.go
Outdated
@@ -172,7 +172,7 @@ func RetrieveBaseRepo(ctx *Context, repo *models.Repository) { | |||
|
|||
// ComposeGoGetImport returns go-get-import meta content. | |||
func ComposeGoGetImport(owner, repo string) string { | |||
return path.Join(setting.Domain, setting.AppSubURL, url.QueryEscape(owner), url.QueryEscape(repo)) | |||
return path.Join(setting.Domain, setting.AppSubURL, util.PathEscapeSegments(owner), util.PathEscapeSegments(repo)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
url.PathEscape
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
modules/private/internal.go
Outdated
@@ -77,7 +77,7 @@ func CheckUnitUser(userID, repoID int64, isAdmin bool, unitType models.UnitType) | |||
|
|||
// GetRepositoryByOwnerAndName returns the repository by given ownername and reponame. | |||
func GetRepositoryByOwnerAndName(ownerName, repoName string) (*models.Repository, error) { | |||
reqURL := setting.LocalURL + fmt.Sprintf("api/internal/repo/%s/%s", url.PathEscape(ownerName), url.PathEscape(repoName)) | |||
reqURL := setting.LocalURL + fmt.Sprintf("api/internal/repo/%s/%s", util.PathEscapeSegments(ownerName), util.PathEscapeSegments(repoName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
url.PathEscape
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
modules/private/repository.go
Outdated
@@ -44,7 +44,7 @@ func GetRepository(repoID int64) (*models.Repository, bool, error) { | |||
|
|||
// ActivePullRequest returns an active pull request if it exists | |||
func ActivePullRequest(baseRepoID int64, headRepoID int64, baseBranch, headBranch string) (*models.PullRequest, error) { | |||
reqURL := setting.LocalURL + fmt.Sprintf("api/internal/active-pull-request?baseRepoID=%d&headRepoID=%d&baseBranch=%s&headBranch=%s", baseRepoID, headRepoID, url.QueryEscape(baseBranch), url.QueryEscape(headBranch)) | |||
reqURL := setting.LocalURL + fmt.Sprintf("api/internal/active-pull-request?baseRepoID=%d&headRepoID=%d&baseBranch=%s&headBranch=%s", baseRepoID, headRepoID, util.PathEscapeSegments(baseBranch), util.PathEscapeSegments(headBranch)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These ones are actually in a query so url.QueryEscape
is the correct thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
routers/private/repository.go
Outdated
@@ -56,15 +56,15 @@ func GetRepository(ctx *macaron.Context) { | |||
func GetActivePullRequest(ctx *macaron.Context) { | |||
baseRepoID := ctx.QueryInt64("baseRepoID") | |||
headRepoID := ctx.QueryInt64("headRepoID") | |||
baseBranch, err := url.QueryUnescape(ctx.QueryTrim("baseBranch")) | |||
baseBranch, err := url.PathUnescape(ctx.QueryTrim("baseBranch")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctx.QueryTrim
will already unescape its contents so you don't actually need to unescape these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed that call. Included a check if baseBranch is empty to preserve the error handling at this point.
routers/private/repository.go
Outdated
if err != nil { | ||
ctx.JSON(http.StatusInternalServerError, map[string]interface{}{ | ||
"err": err.Error(), | ||
}) | ||
return | ||
} | ||
|
||
headBranch, err := url.QueryUnescape(ctx.QueryTrim("headBranch")) | ||
headBranch, err := url.PathUnescape(ctx.QueryTrim("headBranch")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctx.QueryTrim
will already unescape its contents so you don't actually need to unescape these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be sensible to change models/repo.go
Repository.ComposeCompareUrl
to use the new path escape segments too.
) | ||
|
||
// GetProtectedBranchBy get protected branch information | ||
func GetProtectedBranchBy(repoID int64, branchName string) (*models.ProtectedBranch, error) { | ||
// Ask for running deliver hook and test pull request tasks. | ||
reqURL := setting.LocalURL + fmt.Sprintf("api/internal/branch/%d/%s", repoID, url.PathEscape(branchName)) | ||
reqURL := setting.LocalURL + fmt.Sprintf("api/internal/branch/%d/%s", repoID, util.PathEscapeSegments(branchName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just check that this doesn't break. I put the PathEscape in recently and it worked for branches with subpaths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this still works the same and as a bonus makes the logging a tiny bit more clear (though both should work fine here). With just PathEscape, you end up with a log like:
GetProtectedBranchBy: http://localhost:3000/api/internal/branch/3/refs%2Ftags%2Fdebian%2F1%251.6.0-4
And with PathEscapeSegments you end up with:
GetProtectedBranchBy: http://localhost:3000/api/internal/branch/3/refs/tags/debian/1%251.6.0-3
Since we tolerate the forward slashes in a URL anyway the second choice seems preferable for clarity and consistency with how we escape the branchName in other places now.
// Use of this source code is governed by a MIT-style | ||
// license that can be found in the LICENSE file. | ||
|
||
package util |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A package named util
is not a good idea maybe we could change this and begin from this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
util does feel right for something like this but I assume the advice is to not re-use common stdlib names in general? What namespace would be preferred here?
I'm not very good at naming things, perhaps just a generic 'gitea' for these utility type functions that are local to this project
gitea.PathEscapeSegments(myString)
Not very clever but at least obvious to where it comes from. Lemme know what makes sense and I can update this PR.
If the name needs more thought/feedback, I think it would be better to add this PR 'as is' and create a second PR for changing the name of the currently used modules/util since this will fix a few existing bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lunny I think it's reasonable to continue using the util module here. We should refactor the module later as a separate pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could create a new package named module/url
for all url related functions. But It's OK for another PR to do that.
Edit:
This is a pull request to clean up various uses of escape/unescape functions such as PathEscape/QueryEscape (and their Unescape partners).
Gitea currently fails on most 'pathological' cases as shown here:
#6225 (comment)
This fixes those by introducing a new function PathEscapeSegments that runs PathEscape on each segment of a path while leaving the forward slash in place. In theory should fix any 404/500 errors that are a result of bad escaping (as in examples above).
This touches a lot but should change very little since it is ultimately using the same PathEscape function underneath.
Fixes #6333