Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Diff: Ignore vendor/generated files on initial load #13936

Closed
wants to merge 3 commits into from

Conversation

mrsdizzie
Copy link
Member

(Hopefully) start of larger diff refactor to be done in smaller sections

Two main user features here are ignoring vendor/generated files and being able to selectively load the diff contents of any ignored file. This doesn't yet fix the general performance issues of diff, but it greatly speeds up and improves the performance of diffs with a lot of these types of files.

Fixes #4111

As an example:

https://gitea.com/gitea/tea/pulls/27

That pull has minimal changes to tea but adds a vendored library that makes the diff un-renderable on gitea.com

Testing locally, here are the differences:

Before this PR:

Screen Shot 2020-12-03 at 6 38 19 PM

After:

Screen Shot 2020-12-03 at 6 39 33 PM

Along with ignoring them is the ability to render them selectively:

diff

Diff Refactoring

The ultimate goal is to be able to load diffs in chunks as a user scrolls (like here on github). That means you can't overwhelm a server by requesting a particular very large commit/diff over and over as we currently load the entire thing into memory.

Right now we run git diff on the entire commit 2 times for each view:

One with --shortstat to get the total line changes
Once to generate a single patch file for the entire commit.

The 2nd one is much more expensive obviously.

Since we already run git diff --shortstat to get the total line number changes, we can also get the filenames, individual line changes, type of change, etc...This is pretty much free on the git side since it already has to internally process the entire diff in order to provide the --shortstat output. This lets us know all of the information we need about a particular file without needing to read the patch file line by line. In this PR/example, we have the same information on line changes/type/etc.. for vendored files even though we don't generate a patch for them.

In the future this can replace a lot of the more fragile checking inside of the parsePatch function (we can even know before hand if a particular file has too many diff lines rather than reading line by line until we get to one too many).

Also this changes how we generate patches by passing a specific list of files to git. Right now it is just all of them minus vendored files, but in the future it will be sets of file names for loading diff in batches.

The final goal should be to store the output of GetDiffAllStats in a cache so it only needs to be run once per commit (while in cache anyway) and then to use that information to generate diffs 'progressively'.

Didn't want this PR to be too large for review so just trying to set up the ground work for that here with minimal refactoring. More to come hopefully : )

WIP to solicit feed back and examples of what breaks

(Hopefully) start of larger diff refactor to be done in smaller sections
@mrsdizzie
Copy link
Member Author

@zeripath I tested every commit in every branch at https://try.gitea.io/arandomer/pathological which was very helpful in making weird cases (newline in file name!) work properly.

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Dec 11, 2020
@mrsdizzie mrsdizzie added this to the 1.14.0 milestone Dec 11, 2020
@mrsdizzie mrsdizzie added type/refactoring Existing code has been cleaned up. There should be no new functionality. performance/memory Performance issues affecting memory use performance/speed performance issues with slow downs pr/wip This PR is not ready for review labels Dec 11, 2020
Files []*DiffFile
}

// DiffFile represents stats of a file modified between two commits (based on gitdiff.DiffFile)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would sort of like to move DiffFile type out of service/gitdiff and to here because I can't use it here due to circular dependency. I think ultimately GetDiffAllStats could return Diff Type with all we need

Should be for a separate refactor after that doesn't also change features.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main problem is it depends on models.Comment.

@codecov-io
Copy link

codecov-io commented Dec 11, 2020

Codecov Report

Merging #13936 (fef00c5) into master (3a21f8a) will increase coverage by 0.00%.
The diff coverage is 50.44%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #13936    +/-   ##
========================================
  Coverage   42.35%   42.36%            
========================================
  Files         726      726            
  Lines       77839    78024   +185     
========================================
+ Hits        32970    33053    +83     
- Misses      39455    39549    +94     
- Partials     5414     5422     +8     
Impacted Files Coverage Δ
routers/repo/commit.go 28.73% <0.00%> (-0.66%) ⬇️
routers/repo/compare.go 37.25% <4.16%> (-1.97%) ⬇️
modules/git/repo_compare.go 59.90% <54.70%> (-9.47%) ⬇️
services/gitdiff/gitdiff.go 69.25% <64.06%> (+0.20%) ⬆️
modules/analyze/code_langauge.go 100.00% <100.00%> (ø)
routers/repo/pull.go 31.86% <100.00%> (+0.07%) ⬆️
routers/routes/macaron.go 93.17% <100.00%> (+0.02%) ⬆️
modules/git/repo_base_nogogit.go 63.63% <0.00%> (-9.10%) ⬇️
modules/indexer/stats/db.go 48.00% <0.00%> (-8.00%) ⬇️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c236fe2...fef00c5. Read the comment docs.

@silverwind
Copy link
Member

Maybe we can add some static placeholder SVG background texture on the unloaded diffs like GitHub has, but not Fomantic's 😉.

@mrsdizzie
Copy link
Member Author

@silverwind sure if you have any recommendations on where to get one or one you'd like to use I would be happy to add it. I don't like how the fomantic ones look either

@silverwind

This comment has been minimized.

@silverwind

This comment has been minimized.

@silverwind
Copy link
Member

silverwind commented Dec 11, 2020

@silverwind sure if you have any recommendations on where to get one or one you'd like to use I would be happy to add it. I don't like how the fomantic ones look either

Can't find anything suitable but I guess we can just make our own. Will try that later.

@lunny lunny modified the milestones: 1.14.0, 1.15.0 Jan 27, 2021
@MarkusAmshove
Copy link
Contributor

The detection if something has to be ignored seems to come completely from go-enry.
Could you consider adding customizing paths that will be ignored if you have the time to implement it?

I have a repository that I can't do a single review on in Gitea, because every commit changes around ~500 XML-Files (files for testdatabases) which I'd like to just ignore in reviews.

Maybe some kind of wildcard-field on the repository settings level, like it's done for secrets on branch protection :-)
Bildschirmfoto von 2021-04-23 11-57-49

@silverwind
Copy link
Member

@M4RKUS-11111 the preferred way would be via .gitignore entries but we don't support this yet.

@MarkusAmshove
Copy link
Contributor

MarkusAmshove commented Apr 23, 2021

@M4RKUS-11111 the preferred way would be via .gitignore entries but we don't support this yet.

Do you mean .gitattributes? That would be a good idea.
.gitignore wouldn't work because I want the files in the repository, it just doesn't make sense to review them and the amount of files makes Gitea load no diffs at all :-) If I set the file limit higher I have to scroll through all those XML files I don't care about, so I'm faster to review locally and write a summary in Gitea.

I came here through #4111 because it talks about ignoring folders, which would already solve my problem :)

@silverwind
Copy link
Member

Oops, yes, meant .gitattributes of course.

@techknowlogick techknowlogick modified the milestones: 1.15.0, 1.16.0 Jun 15, 2021
@bilderbuchi
Copy link

It seems to me that the ignore/vendoring pattern and some parts of this have been implemented in #16773. I'm not sure about the performance-related aspects -- the PR page linked in the OP loads for me, but it still seems slow (loads the diffs in the background?), and the folding/unfolding (> button) does not seem to work at all.

@mrsdizzie
Copy link
Member Author

Closing as this never got any traction and looks like diff code has changed since then.

@mrsdizzie mrsdizzie closed this Oct 29, 2021
@go-gitea go-gitea locked and limited conversation to collaborators Apr 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. performance/memory Performance issues affecting memory use performance/speed performance issues with slow downs pr/wip This PR is not ready for review type/refactoring Existing code has been cleaned up. There should be no new functionality.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add option to exclude directories in diff (pull request)
8 participants