Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance during repository uploadFile. #14668

Closed
4 tasks
miou-gh opened this issue Feb 13, 2021 · 28 comments
Closed
4 tasks

Slow performance during repository uploadFile. #14668

miou-gh opened this issue Feb 13, 2021 · 28 comments
Labels
performance/speed performance issues with slow downs

Comments

@miou-gh
Copy link
Contributor

miou-gh commented Feb 13, 2021

  • Gitea version (or commit ref): 1.13.2
  • Database (use [x]):
    • [x ] PostgreSQL
    • MySQL
    • MSSQL
    • [ x] SQLite
  • Can you reproduce the bug at https://try.gitea.io:
    • Yes (provide example URL)
    • No
  • Log gist:

Description

When uploading a file into an existing repository through the API, it can take several seconds to complete, even when on localhost and the file is under a megabyte in size.

...

Screenshots

powershell_w64kmMrp8Z

(pprof) list repo.CreateFile
Total: 15.03s
ROUTINE ======================== code.gitea.io/gitea/routers/api/v1/repo.CreateFile in /home/allie/Gitea/gitea/routers/api/v1/repo/file.go
         0      1.64s (flat, cum) 10.91% of Total
         .          .    266:
         .          .    267:	if opts.Message == "" {
         .          .    268:		opts.Message = ctx.Tr("repo.editor.add", opts.TreePath)
         .          .    269:	}
         .          .    270:
         .      1.51s    271:	if fileResponse, err := createOrUpdateFile(ctx, opts); err != nil {
         .          .    272:		handleCreateOrUpdateFileError(ctx, err)
         .          .    273:	} else {
         .      130ms    274:		ctx.JSON(http.StatusCreated, fileResponse)
         .          .    275:	}
         .          .    276:}
         .          .    277:
         .          .    278:// UpdateFile handles API call for updating a file
         .          .    279:func UpdateFile(ctx *context.APIContext) {
(pprof) list rep.createOrUpdateFile
Total: 15.03s
(pprof) list repofiles.CreateOrUpdateRepoFile
Total: 15.03s
ROUTINE ======================== code.gitea.io/gitea/modules/repofiles.CreateOrUpdateRepoFile in /home/allie/Gitea/gitea/modules/repofiles/update.go
      10ms      1.51s (flat, cum) 10.05% of Total
         .          .    131:	if opts.NewBranch == "" {
         .          .    132:		opts.NewBranch = opts.OldBranch
         .          .    133:	}
         .          .    134:
         .          .    135:	// oldBranch must exist for this operation
         .       70ms    136:	if _, err := repo_module.GetBranch(repo, opts.OldBranch); err != nil {
         .          .    137:		return nil, err
         .          .    138:	}
         .          .    139:
         .          .    140:	// A NewBranch can be specified for the file to be created/updated in a new branch.
         .          .    141:	// Check to make sure the branch does not already exist, otherwise we can't proceed.
         .          .    142:	// If we aren't branching to a new branch, make sure user can commit to the given branch
         .          .    143:	if opts.NewBranch != opts.OldBranch {
         .          .    144:		existingBranch, err := repo_module.GetBranch(repo, opts.NewBranch)
         .          .    145:		if existingBranch != nil {
         .          .    146:			return nil, models.ErrBranchAlreadyExists{
         .          .    147:				BranchName: opts.NewBranch,
         .          .    148:			}
         .          .    149:		}
         .          .    150:		if err != nil && !git.IsErrBranchNotExist(err) {
         .          .    151:			return nil, err
         .          .    152:		}
         .          .    153:	} else {
         .       10ms    154:		protectedBranch, err := repo.GetBranchProtection(opts.OldBranch)
         .          .    155:		if err != nil {
         .          .    156:			return nil, err
         .          .    157:		}
         .          .    158:		if protectedBranch != nil {
         .          .    159:			if !protectedBranch.CanUserPush(doer.ID) {
         .          .    160:				return nil, models.ErrUserCannotCommit{
         .          .    161:					UserName: doer.LowerName,
         .          .    162:				}
         .          .    163:			}
         .          .    164:			if protectedBranch.RequireSignedCommits {
         .          .    165:				_, _, _, err := repo.SignCRUDAction(doer, repo.RepoPath(), opts.OldBranch)
         .          .    166:				if err != nil {
         .          .    167:					if !models.IsErrWontSign(err) {
         .          .    168:						return nil, err
         .          .    169:					}
         .          .    170:					return nil, models.ErrUserCannotCommit{
         .          .    171:						UserName: doer.LowerName,
         .          .    172:					}
         .          .    173:				}
         .          .    174:			}
         .          .    175:			patterns := protectedBranch.GetProtectedFilePatterns()
         .          .    176:			for _, pat := range patterns {
         .          .    177:				if pat.Match(strings.ToLower(opts.TreePath)) {
         .          .    178:					return nil, models.ErrFilePathProtected{
         .          .    179:						Path: opts.TreePath,
         .          .    180:					}
         .          .    181:				}
         .          .    182:			}
         .          .    183:		}
         .          .    184:	}
         .          .    185:
         .          .    186:	// If FromTreePath is not set, set it to the opts.TreePath
         .          .    187:	if opts.TreePath != "" && opts.FromTreePath == "" {
         .          .    188:		opts.FromTreePath = opts.TreePath
         .          .    189:	}
         .          .    190:
         .          .    191:	// Check that the path given in opts.treePath is valid (not a git path)
         .          .    192:	treePath := CleanUploadFileName(opts.TreePath)
         .          .    193:	if treePath == "" {
         .          .    194:		return nil, models.ErrFilenameInvalid{
         .          .    195:			Path: opts.TreePath,
         .          .    196:		}
         .          .    197:	}
         .          .    198:	// If there is a fromTreePath (we are copying it), also clean it up
         .          .    199:	fromTreePath := CleanUploadFileName(opts.FromTreePath)
         .          .    200:	if fromTreePath == "" && opts.FromTreePath != "" {
         .          .    201:		return nil, models.ErrFilenameInvalid{
         .          .    202:			Path: opts.FromTreePath,
         .          .    203:		}
         .          .    204:	}
         .          .    205:
         .          .    206:	message := strings.TrimSpace(opts.Message)
         .          .    207:
         .          .    208:	author, committer := GetAuthorAndCommitterUsers(opts.Author, opts.Committer, doer)
         .          .    209:
         .       10ms    210:	t, err := NewTemporaryUploadRepository(repo)
         .          .    211:	if err != nil {
         .          .    212:		log.Error("%v", err)
         .          .    213:	}
         .          .    214:	defer t.Close()
         .       50ms    215:	if err := t.Clone(opts.OldBranch); err != nil {
         .          .    216:		return nil, err
         .          .    217:	}
         .       30ms    218:	if err := t.SetDefaultIndex(); err != nil {
         .          .    219:		return nil, err
         .          .    220:	}
         .          .    221:
         .          .    222:	// Get the commit of the original branch
      10ms       70ms    223:	commit, err := t.GetBranchCommit(opts.OldBranch)
         .          .    224:	if err != nil {
         .          .    225:		return nil, err // Couldn't get a commit for the branch
         .          .    226:	}
         .          .    227:
         .          .    228:	// Assigned LastCommitID in opts if it hasn't been set
         .          .    229:	if opts.LastCommitID == "" {
         .          .    230:		opts.LastCommitID = commit.ID.String()
         .          .    231:	} else {
         .          .    232:		lastCommitID, err := t.gitRepo.ConvertToSHA1(opts.LastCommitID)
         .          .    233:		if err != nil {
         .          .    234:			return nil, fmt.Errorf("DeleteRepoFile: Invalid last commit ID: %v", err)
         .          .    235:		}
         .          .    236:		opts.LastCommitID = lastCommitID.String()
         .          .    237:
         .          .    238:	}
         .          .    239:
         .          .    240:	encoding := "UTF-8"
         .          .    241:	bom := false
         .          .    242:	executable := false
         .          .    243:
         .          .    244:	if !opts.IsNewFile {
         .          .    245:		fromEntry, err := commit.GetTreeEntryByPath(fromTreePath)
         .          .    246:		if err != nil {
         .          .    247:			return nil, err
         .          .    248:		}
         .          .    249:		if opts.SHA != "" {
         .          .    250:			// If a SHA was given and the SHA given doesn't match the SHA of the fromTreePath, throw error
         .          .    251:			if opts.SHA != fromEntry.ID.String() {
         .          .    252:				return nil, models.ErrSHADoesNotMatch{
         .          .    253:					Path:       treePath,
         .          .    254:					GivenSHA:   opts.SHA,
         .          .    255:					CurrentSHA: fromEntry.ID.String(),
         .          .    256:				}
         .          .    257:			}
         .          .    258:		} else if opts.LastCommitID != "" {
         .          .    259:			// If a lastCommitID was given and it doesn't match the commitID of the head of the branch throw
         .          .    260:			// an error, but only if we aren't creating a new branch.
         .          .    261:			if commit.ID.String() != opts.LastCommitID && opts.OldBranch == opts.NewBranch {
         .          .    262:				if changed, err := commit.FileChangedSinceCommit(treePath, opts.LastCommitID); err != nil {
         .          .    263:					return nil, err
         .          .    264:				} else if changed {
         .          .    265:					return nil, models.ErrCommitIDDoesNotMatch{
         .          .    266:						GivenCommitID:   opts.LastCommitID,
         .          .    267:						CurrentCommitID: opts.LastCommitID,
         .          .    268:					}
         .          .    269:				}
         .          .    270:				// The file wasn't modified, so we are good to delete it
         .          .    271:			}
         .          .    272:		} else {
         .          .    273:			// When updating a file, a lastCommitID or SHA needs to be given to make sure other commits
         .          .    274:			// haven't been made. We throw an error if one wasn't provided.
         .          .    275:			return nil, models.ErrSHAOrCommitIDNotProvided{}
         .          .    276:		}
         .          .    277:		encoding, bom = detectEncodingAndBOM(fromEntry, repo)
         .          .    278:		executable = fromEntry.IsExecutable()
         .          .    279:	}
         .          .    280:
         .          .    281:	// For the path where this file will be created/updated, we need to make
         .          .    282:	// sure no parts of the path are existing files or links except for the last
         .          .    283:	// item in the path which is the file name, and that shouldn't exist IF it is
         .          .    284:	// a new file OR is being moved to a new path.
         .          .    285:	treePathParts := strings.Split(treePath, "/")
         .          .    286:	subTreePath := ""
         .          .    287:	for index, part := range treePathParts {
         .          .    288:		subTreePath = path.Join(subTreePath, part)
         .       90ms    289:		entry, err := commit.GetTreeEntryByPath(subTreePath)
         .          .    290:		if err != nil {
         .          .    291:			if git.IsErrNotExist(err) {
         .          .    292:				// Means there is no item with that name, so we're good
         .          .    293:				break
         .          .    294:			}
         .          .    295:			return nil, err
         .          .    296:		}
         .          .    297:		if index < len(treePathParts)-1 {
         .          .    298:			if !entry.IsDir() {
         .          .    299:				return nil, models.ErrFilePathInvalid{
         .          .    300:					Message: fmt.Sprintf("a file exists where you’re trying to create a subdirectory [path: %s]", subTreePath),
         .          .    301:					Path:    subTreePath,
         .          .    302:					Name:    part,
         .          .    303:					Type:    git.EntryModeBlob,
         .          .    304:				}
         .          .    305:			}
         .          .    306:		} else if entry.IsLink() {
         .          .    307:			return nil, models.ErrFilePathInvalid{
         .          .    308:				Message: fmt.Sprintf("a symbolic link exists where you’re trying to create a subdirectory [path: %s]", subTreePath),
         .          .    309:				Path:    subTreePath,
         .          .    310:				Name:    part,
         .          .    311:				Type:    git.EntryModeSymlink,
         .          .    312:			}
         .          .    313:		} else if entry.IsDir() {
         .          .    314:			return nil, models.ErrFilePathInvalid{
         .          .    315:				Message: fmt.Sprintf("a directory exists where you’re trying to create a file [path: %s]", subTreePath),
         .          .    316:				Path:    subTreePath,
         .          .    317:				Name:    part,
         .          .    318:				Type:    git.EntryModeTree,
         .          .    319:			}
         .          .    320:		} else if fromTreePath != treePath || opts.IsNewFile {
         .          .    321:			// The entry shouldn't exist if we are creating new file or moving to a new path
         .          .    322:			return nil, models.ErrRepoFileAlreadyExists{
         .          .    323:				Path: treePath,
         .          .    324:			}
         .          .    325:		}
         .          .    326:
         .          .    327:	}
         .          .    328:
         .          .    329:	// Get the two paths (might be the same if not moving) from the index if they exist
         .       30ms    330:	filesInIndex, err := t.LsFiles(opts.TreePath, opts.FromTreePath)
         .          .    331:	if err != nil {
         .          .    332:		return nil, fmt.Errorf("UpdateRepoFile: %v", err)
         .          .    333:	}
         .          .    334:	// If is a new file (not updating) then the given path shouldn't exist
         .          .    335:	if opts.IsNewFile {
         .          .    336:		for _, file := range filesInIndex {
         .          .    337:			if file == opts.TreePath {
         .          .    338:				return nil, models.ErrRepoFileAlreadyExists{
         .          .    339:					Path: opts.TreePath,
         .          .    340:				}
         .          .    341:			}
         .          .    342:		}
         .          .    343:	}
         .          .    344:
         .          .    345:	// Remove the old path from the tree
         .          .    346:	if fromTreePath != treePath && len(filesInIndex) > 0 {
         .          .    347:		for _, file := range filesInIndex {
         .          .    348:			if file == fromTreePath {
         .          .    349:				if err := t.RemoveFilesFromIndex(opts.FromTreePath); err != nil {
         .          .    350:					return nil, err
         .          .    351:				}
         .          .    352:			}
         .          .    353:		}
         .          .    354:	}
         .          .    355:
         .          .    356:	content := opts.Content
         .          .    357:	if bom {
         .          .    358:		content = string(charset.UTF8BOM) + content
         .          .    359:	}
         .          .    360:	if encoding != "UTF-8" {
         .          .    361:		charsetEncoding, _ := stdcharset.Lookup(encoding)
         .          .    362:		if charsetEncoding != nil {
         .          .    363:			result, _, err := transform.String(charsetEncoding.NewEncoder(), content)
         .          .    364:			if err != nil {
         .          .    365:				// Look if we can't encode back in to the original we should just stick with utf-8
         .          .    366:				log.Error("Error re-encoding %s (%s) as %s - will stay as UTF-8: %v", opts.TreePath, opts.FromTreePath, encoding, err)
         .          .    367:				result = content
         .          .    368:			}
         .          .    369:			content = result
         .          .    370:		} else {
         .          .    371:			log.Error("Unknown encoding: %s", encoding)
         .          .    372:		}
         .          .    373:	}
         .          .    374:	// Reset the opts.Content to our adjusted content to ensure that LFS gets the correct content
         .          .    375:	opts.Content = content
         .          .    376:	var lfsMetaObject *models.LFSMetaObject
         .          .    377:
         .          .    378:	if setting.LFS.StartServer {
         .          .    379:		// Check there is no way this can return multiple infos
         .       30ms    380:		filename2attribute2info, err := t.CheckAttribute("filter", treePath)
         .          .    381:		if err != nil {
         .          .    382:			return nil, err
         .          .    383:		}
         .          .    384:
         .          .    385:		if filename2attribute2info[treePath] != nil && filename2attribute2info[treePath]["filter"] == "lfs" {
         .          .    386:			// OK so we are supposed to LFS this data!
         .          .    387:			oid, err := models.GenerateLFSOid(strings.NewReader(opts.Content))
         .          .    388:			if err != nil {
         .          .    389:				return nil, err
         .          .    390:			}
         .          .    391:			lfsMetaObject = &models.LFSMetaObject{Oid: oid, Size: int64(len(opts.Content)), RepositoryID: repo.ID}
         .          .    392:			content = lfsMetaObject.Pointer()
         .          .    393:		}
         .          .    394:	}
         .          .    395:	// Add the object to the database
         .       80ms    396:	objectHash, err := t.HashObject(strings.NewReader(content))
         .          .    397:	if err != nil {
         .          .    398:		return nil, err
         .          .    399:	}
         .          .    400:
         .          .    401:	// Add the object to the index
         .          .    402:	if executable {
         .          .    403:		if err := t.AddObjectToIndex("100755", objectHash, treePath); err != nil {
         .          .    404:			return nil, err
         .          .    405:		}
         .          .    406:	} else {
         .       90ms    407:		if err := t.AddObjectToIndex("100644", objectHash, treePath); err != nil {
         .          .    408:			return nil, err
         .          .    409:		}
         .          .    410:	}
         .          .    411:
         .          .    412:	// Now write the tree
         .       60ms    413:	treeHash, err := t.WriteTree()
         .          .    414:	if err != nil {
         .          .    415:		return nil, err
         .          .    416:	}
         .          .    417:
         .          .    418:	// Now commit the tree
         .          .    419:	var commitHash string
         .          .    420:	if opts.Dates != nil {
         .      170ms    421:		commitHash, err = t.CommitTreeWithDate(author, committer, treeHash, message, opts.Signoff, opts.Dates.Author, opts.Dates.Committer)
         .          .    422:	} else {
         .          .    423:		commitHash, err = t.CommitTree(author, committer, treeHash, message, opts.Signoff)
         .          .    424:	}
         .          .    425:	if err != nil {
         .          .    426:		return nil, err
         .          .    427:	}
         .          .    428:
         .          .    429:	if lfsMetaObject != nil {
         .          .    430:		// We have an LFS object - create it
         .          .    431:		lfsMetaObject, err = models.NewLFSMetaObject(lfsMetaObject)
         .          .    432:		if err != nil {
         .          .    433:			return nil, err
         .          .    434:		}
         .          .    435:		contentStore := &lfs.ContentStore{ObjectStorage: storage.LFS}
         .          .    436:		exist, err := contentStore.Exists(lfsMetaObject)
         .          .    437:		if err != nil {
         .          .    438:			return nil, err
         .          .    439:		}
         .          .    440:		if !exist {
         .          .    441:			if err := contentStore.Put(lfsMetaObject, strings.NewReader(opts.Content)); err != nil {
         .          .    442:				if _, err2 := repo.RemoveLFSMetaObjectByOid(lfsMetaObject.Oid); err2 != nil {
         .          .    443:					return nil, fmt.Errorf("Error whilst removing failed inserted LFS object %s: %v (Prev Error: %v)", lfsMetaObject.Oid, err2, err)
         .          .    444:				}
         .          .    445:				return nil, err
         .          .    446:			}
         .          .    447:		}
         .          .    448:	}
         .          .    449:
         .          .    450:	// Then push this tree to NewBranch
         .       70ms    451:	if err := t.Push(doer, commitHash, opts.NewBranch); err != nil {
         .          .    452:		log.Error("%T %v", err, err)
         .          .    453:		return nil, err
         .          .    454:	}
         .          .    455:
         .          .    456:	commit, err = t.GetCommit(commitHash)
         .          .    457:	if err != nil {
         .          .    458:		return nil, err
         .          .    459:	}
         .          .    460:
         .      480ms    461:	file, err := GetFileResponseFromCommit(repo, commit, opts.NewBranch, treePath)
         .          .    462:	if err != nil {
         .          .    463:		return nil, err
         .          .    464:	}
         .      170ms    465:	return file, nil
         .          .    466:}
(pprof) list GetFileResponseFromCommit       
Total: 15.03s
ROUTINE ======================== code.gitea.io/gitea/modules/repofiles.GetFileResponseFromCommit in /home/allie/Gitea/gitea/modules/repofiles/file.go
         0      480ms (flat, cum)  3.19% of Total
         .          .     15:	api "code.gitea.io/gitea/modules/structs"
         .          .     16:)
         .          .     17:
         .          .     18:// GetFileResponseFromCommit Constructs a FileResponse from a Commit object
         .          .     19:func GetFileResponseFromCommit(repo *models.Repository, commit *git.Commit, branch, treeName string) (*api.FileResponse, error) {
         .      410ms     20:	fileContents, _ := GetContents(repo, treeName, branch, false) // ok if fails, then will be nil
         .       20ms     21:	fileCommitResponse, _ := GetFileCommitResponse(repo, commit)  // ok if fails, then will be nil
         .       50ms     22:	verification := GetPayloadCommitVerification(commit)
         .          .     23:	fileResponse := &api.FileResponse{
         .          .     24:		Content:      fileContents,
         .          .     25:		Commit:       fileCommitResponse,
         .          .     26:		Verification: verification,
         .          .     27:	}
(pprof) list GetContents              
Total: 15.03s
ROUTINE ======================== code.gitea.io/gitea/modules/repofiles.GetContents in /home/allie/Gitea/gitea/modules/repofiles/content.go
         0      410ms (flat, cum)  2.73% of Total
         .          .    119:		return nil, err
         .          .    120:	}
         .          .    121:	defer gitRepo.Close()
         .          .    122:
         .          .    123:	// Get the commit object for the ref
         .       50ms    124:	commit, err := gitRepo.GetCommit(ref)
         .          .    125:	if err != nil {
         .          .    126:		return nil, err
         .          .    127:	}
         .          .    128:	commitID := commit.ID.String()
         .          .    129:	if len(ref) >= 4 && strings.HasPrefix(commitID, ref) {
         .          .    130:		ref = commit.ID.String()
         .          .    131:	}
         .          .    132:
         .       50ms    133:	entry, err := commit.GetTreeEntryByPath(treePath)
         .          .    134:	if err != nil {
         .          .    135:		return nil, err
         .          .    136:	}
         .          .    137:
         .      110ms    138:	refType := gitRepo.GetRefType(ref)
         .          .    139:	if refType == "invalid" {
         .          .    140:		return nil, fmt.Errorf("no commit found for the ref [ref: %s]", ref)
         .          .    141:	}
         .          .    142:
         .       10ms    143:	selfURL, err := url.Parse(fmt.Sprintf("%s/contents/%s?ref=%s", repo.APIURL(), treePath, origRef))
         .          .    144:	if err != nil {
         .          .    145:		return nil, err
         .          .    146:	}
         .          .    147:	selfURLString := selfURL.String()
         .          .    148:
         .          .    149:	// All content types have these fields in populated
         .          .    150:	contentsResponse := &api.ContentsResponse{
         .          .    151:		Name: entry.Name(),
         .          .    152:		Path: treePath,
         .       10ms    153:		SHA:  entry.ID.String(),
         .       90ms    154:		Size: entry.Size(),
         .          .    155:		URL:  &selfURLString,
         .          .    156:		Links: &api.FileLinksResponse{
         .          .    157:			Self: &selfURLString,
         .          .    158:		},
         .          .    159:	}
         .          .    160:
         .          .    161:	// Now populate the rest of the ContentsResponse based on entry type
         .          .    162:	if entry.IsRegular() || entry.IsExecutable() {
         .          .    163:		contentsResponse.Type = string(ContentTypeRegular)
         .       90ms    164:		if blobResponse, err := GetBlobBySHA(repo, entry.ID.String()); err != nil {
         .          .    165:			return nil, err
         .          .    166:		} else if !forList {
         .          .    167:			// We don't show the content if we are getting a list of FileContentResponses
         .          .    168:			contentsResponse.Encoding = &blobResponse.Encoding
         .          .    169:			contentsResponse.Content = &blobResponse.Content
@zeripath
Copy link
Contributor

Thank you for the listing pprof results.

Hmm. No clear smoking gun in that dump - it's all just a little a slow. Which I guess is related to you running on Synology. (You are running on a Synology right?)

One surprisingly slow thing was:

 refType := gitRepo.GetRefType(ref)

At 110ms but that doesn't really make much sense to me and it's not like it's a clear thing to improve.

It would be good to catch one of the really slow uploads because then we'd really be able to compare.

Now, some things:

  • I think it would be good to try 1.14 here - we've switched back to pure git for a lot of these commands which may well be quicker. Certainly the GetRefType should be a lot quicker.
  • you're doing a lot of API uploads - I think you need to explain what tool you are using to do this and why you aren't using git itself (mostly so I can be clear on this myself - there are several inefficiencies in this code so if it's genuinely a hot path I will spend more effort to make it better.)
  • It'd be good to get an idea of the size of these uploads to see if there was a cut off for each of these things.
  • It'd also be helpful to know what your filesystem back end is - are you using NFS, samba or the like - because if so that's likely to be your issue and you'll need to compare with a local disk.
  • It would also be helpful to know what version of git you're running. Git 2.29 is substantially faster than 1.7.2 in almost all tasks.

@lunny lunny added the performance/speed performance issues with slow downs label Feb 13, 2021
@miou-gh
Copy link
Contributor Author

miou-gh commented Feb 16, 2021

I have created a small application to benchmark, and attached it.
GiteaAddFilePOC.zip

Which I guess is related to you running on Synology. (You are running on a Synology right?)

I am not. I'm running on local filesystem.

  • I think it would be good to try 1.14 here - we've switched back to pure git for a lot of these commands which may well be quicker. Certainly the GetRefType should be a lot quicker.

I am trying 1.14 and there are still severe performance issues present.

  • you're doing a lot of API uploads - I think you need to explain what tool you are using to do this and why you aren't using git itself (mostly so I can be clear on this myself - there are several inefficiencies in this code so if it's genuinely a hot path I will spend more effort to make it better.)

I am automating the process of creating users, repositories and files in those repositories. I'm using the automatically generated SDK from https://github.com/belidzs/Gitea which appears to call the methods defined in Swagger, and nothing more really.

  • It'd be good to get an idea of the size of these uploads to see if there was a cut off for each of these things.

It appears as though there isn't any cut-off point.

Created user: Lucio31
Created repository for Lucio31: qvctrjhouf
Created file avhiilqfw.dat (813 KB) for repository qvctrjhouf which is owned by Lucio31 | took 779 ms.
Created file xotpcmuq.dat (885 KB) for repository qvctrjhouf which is owned by Lucio31 | took 769 ms.
Created file fczjp.dat (804 KB) for repository qvctrjhouf which is owned by Lucio31 | took 785 ms.
Created file jzqvvprvsz.dat (461 KB) for repository qvctrjhouf which is owned by Lucio31 | took 760 ms.
Created repository for Lucio31: ncufkvmjlp
Created file nietdh.dat (84 KB) for repository ncufkvmjlp which is owned by Lucio31 | took 597 ms.
Created file hlldvsin.dat (591 KB) for repository ncufkvmjlp which is owned by Lucio31 | took 769 ms.
Created file nxdxuuwp.dat (379 KB) for repository ncufkvmjlp which is owned by Lucio31 | took 703 ms.
Created repository for Lucio31: eqxofbvaap
Created file tggmnw.dat (562 KB) for repository eqxofbvaap which is owned by Lucio31 | took 753 ms.
Created file cqfjegq.dat (827 KB) for repository eqxofbvaap which is owned by Lucio31 | took 788 ms.
Created repository for Lucio31: xyzrajonzu
Created file ixtqaqc.dat (592 KB) for repository xyzrajonzu which is owned by Lucio31 | took 733 ms.
Created file gcwzclx.dat (833 KB) for repository xyzrajonzu which is owned by Lucio31 | took 782 ms.
Created file wsjhbrb.dat (402 KB) for repository xyzrajonzu which is owned by Lucio31 | took 712 ms.
Created file pcmyf.dat (75 KB) for repository xyzrajonzu which is owned by Lucio31 | took 733 ms.
Created file cxzwnr.dat (554 KB) for repository xyzrajonzu which is owned by Lucio31 | took 764 ms.
Created file pteuwupc.dat (662 KB) for repository xyzrajonzu which is owned by Lucio31 | took 804 ms.
Created user: Xavier_Effertz77
Created repository for Xavier_Effertz77: hqxtfswfdx
Created file eukuziip.dat (42 KB) for repository hqxtfswfdx which is owned by Xavier_Effertz77 | took 660 ms.
Created file mlkuocy.dat (343 KB) for repository hqxtfswfdx which is owned by Xavier_Effertz77 | took 685 ms.
Created file ifmooe.dat (465 KB) for repository hqxtfswfdx which is owned by Xavier_Effertz77 | took 717 ms.
Created file cjdrzjljz.dat (556 KB) for repository hqxtfswfdx which is owned by Xavier_Effertz77 | took 749 ms.
Created file myuixwelnf.dat (490 KB) for repository hqxtfswfdx which is owned by Xavier_Effertz77 | took 746 ms.
Created repository for Xavier_Effertz77: swjacxknhu
Created file jebvghs.dat (57 KB) for repository swjacxknhu which is owned by Xavier_Effertz77 | took 662 ms.
Created file knkuz.dat (909 KB) for repository swjacxknhu which is owned by Xavier_Effertz77 | took 794 ms.
Created file wjicqb.dat (770 KB) for repository swjacxknhu which is owned by Xavier_Effertz77 | took 891 ms.
Created file orekrvdvbo.dat (370 KB) for repository swjacxknhu which is owned by Xavier_Effertz77 | took 710 ms.
Created repository for Xavier_Effertz77: oxpydpdqba
Created file lpcjargg.dat (154 KB) for repository oxpydpdqba which is owned by Xavier_Effertz77 | took 686 ms.
  • It'd also be helpful to know what your filesystem back end is - are you using NFS, samba or the like - because if so that's likely to be your issue and you'll need to compare with a local disk.

I am using a SSD in perfect healthy condition.

  • It would also be helpful to know what version of git you're running. Git 2.29 is substantially faster than 1.7.2 in almost all tasks.

I'm using Git version 2.30.0.

@miou-gh
Copy link
Contributor Author

miou-gh commented Feb 16, 2021

pprof003.zip

@lunny
Copy link
Member

lunny commented Feb 16, 2021

@atillabyte Could you try to use other database i.e. mysql/postgres rather than sqlite3?

@zeripath
Copy link
Contributor

That data suggests that master is at least twice as fast and up to 10 times faster - possibly even more if you account for GC time. (You previously had logs with 9s delays.)

Whilst I appreciate you may want this to be running at sub 100ms speeds - I'm not certain we will ever get to that point. Some acknowledgement of the fact that it is actually at least twice as fast would be appreciated.

Now... looking at the pprof SVG, we have 2.11s of work running in the http pathways. About half of that time is spent in json.Decode. So that would imply that there would be some benefit to switching from the default encoding/json library - but the place that is calling the json.Decode here isn't shown in that graph.

One thing that could be more useful going forward is to change the values of edgefraction nodefraction and nodecount. These values determine how much is shown in a graph and could be hiding salient features. You can change them on the command line:

go tool pprof ... -edgefraction 0 -nodefraction 0 -nodecount 100000

but you can change them in the REPL that go tool pprof gives too. Setting these too extreme like this may make the graph more difficult to understand but you can change these values to decrease the noise as necessary.

I also note that SVG is clearly not helping you to post the images in here - it may be better to use png instead of svg.

It may be easier to meet on discord to help work through things a bit quicker.

@kyland-holmes
Copy link

Is there any plan to improve the perf of file creation / upload to a repo? I've been monitoring this bug for quite some time. We consistently see it take 1.9-2.5 seconds when committing trivial files to a new repo through the gitea API.

@zeripath
Copy link
Contributor

@KyKoPho have you checked on main?

@zeripath
Copy link
Contributor

(I've spent most of this cycle doing performance fixes elsewhere - so I apologise that this hasn't had been looked at.)

@kyland-holmes
Copy link

Hi @zeripath , thanks for the fast reply. I've tried the latest release, if that's what you mean. I could also clone and build from there.

@kyland-holmes
Copy link

1.14.3 to be specific.

@zeripath
Copy link
Contributor

I mean main. 1.15-dev.

@kyland-holmes
Copy link

We are using the public docker containers. Is gitea/gitea:latest built off of main? The only dev tags I see are back around 1.7.0, etc

@kyland-holmes
Copy link

I just tried gitea/gitea:latest and did see about an ~8% improvement. I was unable to find the gitea/gitea:1.15-dev image anywhere, but I'm guessing the latest image will have some of that baked in already.

@zeripath
Copy link
Contributor

Hi @KyKoPho sorry I missed this...

Would it be possible to get some update pprofs or good test example for this? It would be good to know where to target improvements.

My suspicion is that the interpretation of the json payload is still going to be the largest delay - but changes targeting that would be involved so it would be good to prove this first.

@kyland-holmes
Copy link

@zeripath Sure! Let me work on that. Never done that before, so may take some time.

@kyland-holmes
Copy link

@zeripath Here's the output of pprof raw sort cum > pprof.txt. Please let me know if this is useful or not. I'm assuming the 0s is a rounding / sampling artifact.
pprof.txt

@kyland-holmes
Copy link

kyland-holmes commented Jul 24, 2021

Hopefully this helps some. This is the result of go tool pprof http://localhost:6060/debug/pprof/heap immediately after creating a new repo and pushing a new file.

pprof001

and also pprof top

(pprof) top
Showing nodes accounting for 49.08MB, 72.80% of 67.41MB total
Showing top 10 nodes out of 119
      flat  flat%   sum%        cum   cum%
    9.28MB 13.76% 13.76%    16.78MB 24.89%  gopkg.in/ini%2ev1.(*Section).NewKey
    7.50MB 11.13% 24.89%     7.50MB 11.13%  gopkg.in/ini%2ev1.newKey (inline)
    6.25MB  9.27% 34.16%     6.25MB  9.27%  code.gitea.io/gitea/modules/public.glob..func1
    5.44MB  8.07% 42.23%     5.44MB  8.07%  github.com/go-enry/go-enry/v2/data.init
       5MB  7.42% 49.65%        5MB  7.42%  text/template/parse.(*Tree).newText
       4MB  5.94% 55.59%        4MB  5.94%  github.com/syndtr/goleveldb/leveldb/memdb.New
       3MB  4.45% 60.04%        3MB  4.45%  text/template/parse.(*Tree).newPipeline
       3MB  4.45% 64.49%        3MB  4.45%  text/template/parse.(*Tree).newCommand
    2.94MB  4.36% 68.85%     2.94MB  4.36%  io.ReadAll
    2.67MB  3.96% 72.80%     2.67MB  3.96%  github.com/denisenkom/go-mssqldb/internal/cp.init

@zeripath
Copy link
Contributor

That's memory ... We were looking at time

@kyland-holmes
Copy link

@zeripath here is top 10 from pprof during the same call / workflow. Note this is all work coming in via the API, not the web interface for gitea:

(pprof) top
Showing nodes accounting for 690ms, 57.02% of 1210ms total
Showing top 10 nodes out of 348
      flat  flat%   sum%        cum   cum%
     120ms  9.92%  9.92%      120ms  9.92%  crypto/sha256.sha256block
     120ms  9.92% 19.83%      120ms  9.92%  encoding/binary.bigEndian.PutUint32 (inline)
     100ms  8.26% 28.10%      310ms 25.62%  crypto/sha256.(*digest).Write
      90ms  7.44% 35.54%       90ms  7.44%  runtime.memmove
      60ms  4.96% 40.50%      440ms 36.36%  crypto/sha256.(*digest).checkSum
      50ms  4.13% 44.63%      760ms 62.81%  golang.org/x/crypto/pbkdf2.Key
      40ms  3.31% 47.93%      160ms 13.22%  crypto/sha256.block
      40ms  3.31% 51.24%       40ms  3.31%  runtime.futex
      40ms  3.31% 54.55%       50ms  4.13%  syscall.Syscall
      30ms  2.48% 57.02%      630ms 52.07%  crypto/hmac.(*hmac).Sum

@kyland-holmes
Copy link

Looking at the graph, it actually looks like most of the time is being spent decrypting the user password from disk with the pbkdf2 algo. (maybe?)
pprof003

@zeripath
Copy link
Contributor

Not surprised that pbkdf2 is a cause of slow down here - this is the cost of not having session support. If that really is the only slowdown left we've won though.

BTW I forgot to ask which version are you running. It's really important to state the version - plus its really only worth doing this on 1.15-RC2 or main (not 1.14.x).

@lunny
Copy link
Member

lunny commented Jul 24, 2021

Have you tried to use username and token with the API?

@kyland-holmes
Copy link

kyland-holmes commented Jul 24, 2021

@lunny Once I saw this was in the hash algo, I tried that last night. No dice, I think whether you use a token or basic auth, the same algo still gets used. I'm looking into setting up a reverse proxy for auth as we internally have a solution for that.

@zeripath I tried this on 1.14.2, 1.14.3, 1.14.5 and 1.15.0+dev-543-g1b29747f0. They all seem to have the same perf in this regard. How can I get 1.15-RC2? Are any of the tagged images in docker hub built from that? Disregard, just found it on docker hub.

@zeripath
Copy link
Contributor

Have you tried to use username and token with the API?

So unfortunately the tokens would still necessitate a pbkdf2 calculation as currently coded.

We'd need to cache successful tokens in an LRU cache, have sessions, or move to a JWT token approach. (JWT tokens is on my list of things to do.)

@lunny
Copy link
Member

lunny commented Jul 25, 2021

We can change the login check sequence on API. Firstly index the access_token table by token if token length is matched. If there is not exist then check password.

@zeripath
Copy link
Contributor

zeripath commented Jul 25, 2021

We can change the login check sequence on API. Firstly index the access_token table by token if token length is matched. If there is not exist then check password.

We already do that - #16164 (& #16171), #6724.


There are several inefficiencies with the current approach:

gitea/models/token.go

Lines 69 to 82 in 6a33b29

var tokens []AccessToken
lastEight := token[len(token)-8:]
err := x.Table(&AccessToken{}).Where("token_last_eight = ?", lastEight).Find(&tokens)
if err != nil {
return nil, err
} else if len(tokens) == 0 {
return nil, ErrAccessTokenNotExist{token}
}
for _, t := range tokens {
tempHash := hashToken(token, t.TokenSalt)
if subtle.ConstantTimeCompare([]byte(t.TokenHash), []byte(tempHash)) == 1 {
return &t, nil
}
}

Take a look at the loop starting line 77. The underlying assumption here is that the number of tokens returned is small - preferably one - however on a large install there may be many such tokens returned always in the same order.

Every time such a "pathological" token is used it will be hashed against every single token returned. This alone could be serious slowdown and in fact prior to #16164 this would take effect on passwords >8 characters if the last 8 characters were hex. #16164 means that this can only happen on passwords which are 40 characters long and all hex. (Hmm... thinking on I'm not sure how much this would happen)


This pathological case could be avoided if the user provides the username by limiting the tokens to those that match the username - I'll pop up a PR for this - but I'm not sure how often this would happen and paradoxically it may slow things down. However, that still leaves us with a pbkdf2 hash for each and every API call.


In the case of avoiding the pbkdf2 hash for each and every API call. The options are:

  • Cache the successful tokens - yes that means that they're kept in memory but ... they're not really passwords.
    • We could encrypt them with the SECRET_KEY but that might actually be slower.
    • We can't really cache a hash because we'd have to recalculate the hash which is just as slow.
  • Abandon tokens and move to OAuth2/JWT tokens - these are simply signed tokens which should be quicker to evaluate - and their validation can be cached with alacrity.
  • Provide API Sessions -> likely mediated through OAuth2/JWTs anyway

(As an aside, this issue is also why we need to convince administrators to ban BASIC authentication using passwords because these suffer the same problems.)


In terms of how to proceed:

  1. Consider ameliorating the pathological case as described above
  2. I would argue that we should offer to cache the successful token with the understanding that this is a potential security issue but that a balanced approach between performance requires it.
  3. Cache the signature validation of JWT tokens
  4. Once that is in we should look at providing a Session API which will return a shortlived JWT token - to allow deprecation of general token/basic password auth.

EDIT: Thinking on, I think that the pathological case described above is should be somewhat uncommon.

zeripath added a commit to zeripath/gitea that referenced this issue Jul 25, 2021
One of the issues holding back performance of the API is the problem of hashing.
Whilst banning BASIC authentication with passwords will help, the API Token scheme
still requires a PBKDF2 hash - which means that heavy API use (using Tokens) can
still cause enormous numbers of hash computations.

A slight solution to this whilst we consider moving to using JWT based tokens and/or
a session orientated solution is to simply cache the successful tokens. This has some
security issues but this should be balanced by the security issues of load from
hashing.

Related go-gitea#14668

Signed-off-by: Andrew Thornton <[email protected]>
@zeripath
Copy link
Contributor

@KyKoPho I suspect #16547 would significantly improve your experience and would be easily backportable 1.14 and 1.15.

@kyland-holmes
Copy link

@zeripath thanks so much for the heads up.

techknowlogick pushed a commit that referenced this issue Aug 17, 2021
One of the issues holding back performance of the API is the problem of hashing.
Whilst banning BASIC authentication with passwords will help, the API Token scheme
still requires a PBKDF2 hash - which means that heavy API use (using Tokens) can
still cause enormous numbers of hash computations.

A slight solution to this whilst we consider moving to using JWT based tokens and/or
a session orientated solution is to simply cache the successful tokens. This has some
security issues but this should be balanced by the security issues of load from
hashing.

Related #14668

Signed-off-by: Andrew Thornton <[email protected]>

Co-authored-by: Lunny Xiao <[email protected]>
@lunny lunny closed this as completed Feb 13, 2022
@go-gitea go-gitea locked and limited conversation to collaborators Apr 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
performance/speed performance issues with slow downs
Projects
None yet
Development

No branches or pull requests

4 participants