Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up various use of escape/unescape functions for URL generation #6334

Merged
merged 13 commits into from
Mar 18, 2019

Conversation

mrsdizzie
Copy link
Member

@mrsdizzie mrsdizzie commented Mar 14, 2019

Edit:
This is a pull request to clean up various uses of escape/unescape functions such as PathEscape/QueryEscape (and their Unescape partners).

Gitea currently fails on most 'pathological' cases as shown here:
#6225 (comment)

This fixes those by introducing a new function PathEscapeSegments that runs PathEscape on each segment of a path while leaving the forward slash in place. In theory should fix any 404/500 errors that are a result of bad escaping (as in examples above).

This touches a lot but should change very little since it is ultimately using the same PathEscape function underneath.

Fixes #6333

Currently branch names with a '+' fail in certain situations because
QueryUnescape replaces the + character with a blank space.

Using PathUnescape should be better since it is defined as:

// PathUnescape is identical to QueryUnescape except that it does not
// unescape '+' to ' ' (space).

Fixes go-gitea#6333
@zeripath
Copy link
Contributor

This is only a partial fix.

You need to go through the rest of the code that generates these paths and use something similar to url.PathEscape - basically one that escapes everything url.PathEscape does except for / which can remain the same (as we can tolerate this). I think templates.EscapePound is intended for this but you should check it actually does everything it should - I worry that there may be missing bits that could bite us later.

Ideally we should move the generation of compare paths into some kind of helper function so that if we change the compare url again we can just change it in one place and one place only (or very easy find where it's being used.) Perhaps use the Repository.ComposeCompareURL function.

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Mar 14, 2019
@codecov-io
Copy link

codecov-io commented Mar 14, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@c151682). Click here to learn what that means.
The diff coverage is 56%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #6334   +/-   ##
=========================================
  Coverage          ?   38.88%           
=========================================
  Files             ?      364           
  Lines             ?    51215           
  Branches          ?        0           
=========================================
  Hits              ?    19916           
  Misses            ?    28432           
  Partials          ?     2867
Impacted Files Coverage Δ
modules/util/util.go 65.21% <ø> (ø)
modules/context/repo.go 58.4% <0%> (ø)
routers/home.go 45.45% <0%> (ø)
routers/private/repository.go 25.86% <0%> (ø)
modules/context/context.go 50% <0%> (ø)
models/repo.go 47.44% <100%> (ø)
routers/repo/pull.go 34.24% <100%> (ø)
modules/context/auth.go 26.08% <33.33%> (ø)
modules/util/url.go 69.44% <69.44%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c151682...120f7f7. Read the comment docs.

@mrsdizzie
Copy link
Member Author

mrsdizzie commented Mar 14, 2019

Ah I see yes I see what you mean, I can fix that.

I do feel a bit weary to use this:

"EscapePound": func(str string) string {
return strings.NewReplacer("%", "%25", "#", "%23", " ", "%20", "?", "%3F").Replace(str)
},

To replace existing code (and choosing it vs PathEscape). Are those really the only 4 things that need to be escaped? Or to put it another way, is there a known reason to want to write an alternative Escape function? Does PathEscape do things that are unwanted in particular situations?

I see EscapePound is only used in templates now where as theres a mix of QueryEscape and PathEscape in the code.

@zeripath
Copy link
Contributor

So PathEscape will escape / which is not wanted in branch names.

@mrsdizzie
Copy link
Member Author

Ah I see yea that is no good. What do you think of a function that would just call PathEscape on everything but a /

func GiteaEscapePath(path string) string {

	slice := strings.Split(path, "/")
	for index := range slice {
		slice[index] = url.PathEscape(slice[index])
	}
	escapedPath := strings.Join(slice, "/")
	return escapedPath
}

Input: /this/is/my "stra"nge#/pa?h/ 
Output: /this/is/my%20%22stra%22nge%23/pa%3Fh/
Input: release/v1.7 
Output: release/v1.7

So just eliminating the one thing we don't like about PathEscape rather than trying to recreate all the things we would want.

@zeripath
Copy link
Contributor

zeripath commented Mar 15, 2019

An alternative would be to run PathEscape and then replace all %2F with /.

The main question is which is more understandable.

@mrsdizzie
Copy link
Member Author

I like the split method because it feels more intentional that we are really only interested in operating on the segments of a path rather than the path string as a whole (and it feels weird to replace something and then try to replace it back). a PathSegmentEscape if you will :)

I think either method would just require a short comment like // we don't want escaped forward slashes and that would probably be 'clear enough'.

@zeripath
Copy link
Contributor

Anyway, I would be delighted if we could migrate to using the correct functions in the correct places. I.e. QueryEscape in queries, PathEscape in a path segment, GiteaPathEscape (or maybe PathSegmentsEscape?) in things that represent a multiple path segments.

I too am deeply suspicious about EscapePound.

@lafriks lafriks added the type/enhancement An improvement of existing functionality label Mar 15, 2019
@lafriks lafriks added this to the 1.9.0 milestone Mar 15, 2019
@mrsdizzie
Copy link
Member Author

Where would be the preferred place to define a new somewhat general use function like this?

This function simply runs PathEscape on each segment of a path without
touching the forward slash itself. We want to use this instead of
PathEscape/QueryEscape in most cases because a forward slash is a valid name for a
branch etc... and we don't want that escaped in a URL.

Putting this in new file url.go and also moving a couple similar
functions into that file as well.
Replace various uses of EscapePath/EscapeQuery with new
EscapePathSegments. Also remove uncessary uses of various
escape/unescape functions when the text had already been escaped or was
not escaped.
@mrsdizzie mrsdizzie changed the title Use PathUnescape instead of QueryUnescape when working with branch names Clean up various use of escape/unescape functions for URL generation Mar 15, 2019
@mrsdizzie
Copy link
Member Author

OK, made all of the discussed changes and have edited the description of this PR per the requested scope.

models/repo.go Outdated
@@ -838,7 +837,7 @@ type CloneLink struct {

// ComposeHTTPSCloneURL returns HTTPS clone URL based on given owner and repository name.
func ComposeHTTPSCloneURL(owner, repo string) string {
return fmt.Sprintf("%s%s/%s.git", setting.AppURL, url.QueryEscape(owner), url.QueryEscape(repo))
return fmt.Sprintf("%s%s/%s.git", setting.AppURL, util.PathEscapeSegments(owner), util.PathEscapeSegments(repo))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should be url.PathEscape - although we don't allow / in names at present we might in future

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -210,7 +211,7 @@ func Contexter() macaron.Handler {
if err == nil && len(repo.DefaultBranch) > 0 {
branchName = repo.DefaultBranch
}
prefix := setting.AppURL + path.Join(url.QueryEscape(ownerName), url.QueryEscape(repoName), "src", "branch", branchName)
prefix := setting.AppURL + path.Join(util.PathEscapeSegments(ownerName), util.PathEscapeSegments(repoName), "src", "branch", branchName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ownerName and repoName should be url.PathEscape and branchName should be escaped with util.PathEscapeSegments

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -172,7 +172,7 @@ func RetrieveBaseRepo(ctx *Context, repo *models.Repository) {

// ComposeGoGetImport returns go-get-import meta content.
func ComposeGoGetImport(owner, repo string) string {
return path.Join(setting.Domain, setting.AppSubURL, url.QueryEscape(owner), url.QueryEscape(repo))
return path.Join(setting.Domain, setting.AppSubURL, util.PathEscapeSegments(owner), util.PathEscapeSegments(repo))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url.PathEscape

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -77,7 +77,7 @@ func CheckUnitUser(userID, repoID int64, isAdmin bool, unitType models.UnitType)

// GetRepositoryByOwnerAndName returns the repository by given ownername and reponame.
func GetRepositoryByOwnerAndName(ownerName, repoName string) (*models.Repository, error) {
reqURL := setting.LocalURL + fmt.Sprintf("api/internal/repo/%s/%s", url.PathEscape(ownerName), url.PathEscape(repoName))
reqURL := setting.LocalURL + fmt.Sprintf("api/internal/repo/%s/%s", util.PathEscapeSegments(ownerName), util.PathEscapeSegments(repoName))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url.PathEscape

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -44,7 +44,7 @@ func GetRepository(repoID int64) (*models.Repository, bool, error) {

// ActivePullRequest returns an active pull request if it exists
func ActivePullRequest(baseRepoID int64, headRepoID int64, baseBranch, headBranch string) (*models.PullRequest, error) {
reqURL := setting.LocalURL + fmt.Sprintf("api/internal/active-pull-request?baseRepoID=%d&headRepoID=%d&baseBranch=%s&headBranch=%s", baseRepoID, headRepoID, url.QueryEscape(baseBranch), url.QueryEscape(headBranch))
reqURL := setting.LocalURL + fmt.Sprintf("api/internal/active-pull-request?baseRepoID=%d&headRepoID=%d&baseBranch=%s&headBranch=%s", baseRepoID, headRepoID, util.PathEscapeSegments(baseBranch), util.PathEscapeSegments(headBranch))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These ones are actually in a query so url.QueryEscape is the correct thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -56,15 +56,15 @@ func GetRepository(ctx *macaron.Context) {
func GetActivePullRequest(ctx *macaron.Context) {
baseRepoID := ctx.QueryInt64("baseRepoID")
headRepoID := ctx.QueryInt64("headRepoID")
baseBranch, err := url.QueryUnescape(ctx.QueryTrim("baseBranch"))
baseBranch, err := url.PathUnescape(ctx.QueryTrim("baseBranch"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx.QueryTrim will already unescape its contents so you don't actually need to unescape these.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed that call. Included a check if baseBranch is empty to preserve the error handling at this point.

if err != nil {
ctx.JSON(http.StatusInternalServerError, map[string]interface{}{
"err": err.Error(),
})
return
}

headBranch, err := url.QueryUnescape(ctx.QueryTrim("headBranch"))
headBranch, err := url.PathUnescape(ctx.QueryTrim("headBranch"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx.QueryTrim will already unescape its contents so you don't actually need to unescape these.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Contributor

@zeripath zeripath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be sensible to change models/repo.go Repository.ComposeCompareUrl to use the new path escape segments too.

)

// GetProtectedBranchBy get protected branch information
func GetProtectedBranchBy(repoID int64, branchName string) (*models.ProtectedBranch, error) {
// Ask for running deliver hook and test pull request tasks.
reqURL := setting.LocalURL + fmt.Sprintf("api/internal/branch/%d/%s", repoID, url.PathEscape(branchName))
reqURL := setting.LocalURL + fmt.Sprintf("api/internal/branch/%d/%s", repoID, util.PathEscapeSegments(branchName))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just check that this doesn't break. I put the PathEscape in recently and it worked for branches with subpaths.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this still works the same and as a bonus makes the logging a tiny bit more clear (though both should work fine here). With just PathEscape, you end up with a log like:

GetProtectedBranchBy: http://localhost:3000/api/internal/branch/3/refs%2Ftags%2Fdebian%2F1%251.6.0-4

And with PathEscapeSegments you end up with:

GetProtectedBranchBy: http://localhost:3000/api/internal/branch/3/refs/tags/debian/1%251.6.0-3

Since we tolerate the forward slashes in a URL anyway the second choice seems preferable for clarity and consistency with how we escape the branchName in other places now.

@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Mar 16, 2019
@zeripath zeripath removed this from the 1.9.0 milestone Mar 17, 2019
@zeripath zeripath added this to the 1.8.0 milestone Mar 17, 2019
// Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file.

package util
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A package named util is not a good idea maybe we could change this and begin from this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

util does feel right for something like this but I assume the advice is to not re-use common stdlib names in general? What namespace would be preferred here?

I'm not very good at naming things, perhaps just a generic 'gitea' for these utility type functions that are local to this project

gitea.PathEscapeSegments(myString)

Not very clever but at least obvious to where it comes from. Lemme know what makes sense and I can update this PR.

If the name needs more thought/feedback, I think it would be better to add this PR 'as is' and create a second PR for changing the name of the currently used modules/util since this will fix a few existing bugs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lunny I think it's reasonable to continue using the util module here. We should refactor the module later as a separate pr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could create a new package named module/url for all url related functions. But It's OK for another PR to do that.

@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Mar 18, 2019
@techknowlogick techknowlogick merged commit ca46385 into go-gitea:master Mar 18, 2019
@go-gitea go-gitea locked and limited conversation to collaborators Nov 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. type/bug type/enhancement An improvement of existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

404 On Pull Request when branch has plus character
7 participants