-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Diff: Ignore vendor/generated files on initial load #13936
Conversation
(Hopefully) start of larger diff refactor to be done in smaller sections
@zeripath I tested every commit in every branch at https://try.gitea.io/arandomer/pathological which was very helpful in making weird cases (newline in file name!) work properly. |
Files []*DiffFile | ||
} | ||
|
||
// DiffFile represents stats of a file modified between two commits (based on gitdiff.DiffFile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would sort of like to move DiffFile type out of service/gitdiff and to here because I can't use it here due to circular dependency. I think ultimately GetDiffAllStats could return Diff Type with all we need
Should be for a separate refactor after that doesn't also change features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main problem is it depends on models.Comment
.
Codecov Report
@@ Coverage Diff @@
## master #13936 +/- ##
========================================
Coverage 42.35% 42.36%
========================================
Files 726 726
Lines 77839 78024 +185
========================================
+ Hits 32970 33053 +83
- Misses 39455 39549 +94
- Partials 5414 5422 +8
Continue to review full report at Codecov.
|
Maybe we can add some static placeholder SVG background texture on the unloaded diffs like GitHub has, but not Fomantic's 😉. |
@silverwind sure if you have any recommendations on where to get one or one you'd like to use I would be happy to add it. I don't like how the fomantic ones look either |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Can't find anything suitable but I guess we can just make our own. Will try that later. |
@M4RKUS-11111 the preferred way would be via |
Do you mean I came here through #4111 because it talks about ignoring folders, which would already solve my problem :) |
Oops, yes, meant |
It seems to me that the ignore/vendoring pattern and some parts of this have been implemented in #16773. I'm not sure about the performance-related aspects -- the PR page linked in the OP loads for me, but it still seems slow (loads the diffs in the background?), and the folding/unfolding ( |
Closing as this never got any traction and looks like diff code has changed since then. |
(Hopefully) start of larger diff refactor to be done in smaller sections
Two main user features here are ignoring vendor/generated files and being able to selectively load the diff contents of any ignored file. This doesn't yet fix the general performance issues of diff, but it greatly speeds up and improves the performance of diffs with a lot of these types of files.
Fixes #4111
As an example:
https://gitea.com/gitea/tea/pulls/27
That pull has minimal changes to tea but adds a vendored library that makes the diff un-renderable on gitea.com
Testing locally, here are the differences:
Before this PR:
After:
Along with ignoring them is the ability to render them selectively:
Diff Refactoring
The ultimate goal is to be able to load diffs in chunks as a user scrolls (like here on github). That means you can't overwhelm a server by requesting a particular very large commit/diff over and over as we currently load the entire thing into memory.
Right now we run git diff on the entire commit 2 times for each view:
One with --shortstat to get the total line changes
Once to generate a single patch file for the entire commit.
The 2nd one is much more expensive obviously.
Since we already run git diff --shortstat to get the total line number changes, we can also get the filenames, individual line changes, type of change, etc...This is pretty much free on the git side since it already has to internally process the entire diff in order to provide the --shortstat output. This lets us know all of the information we need about a particular file without needing to read the patch file line by line. In this PR/example, we have the same information on line changes/type/etc.. for vendored files even though we don't generate a patch for them.
In the future this can replace a lot of the more fragile checking inside of the parsePatch function (we can even know before hand if a particular file has too many diff lines rather than reading line by line until we get to one too many).
Also this changes how we generate patches by passing a specific list of files to git. Right now it is just all of them minus vendored files, but in the future it will be sets of file names for loading diff in batches.
The final goal should be to store the output of GetDiffAllStats in a cache so it only needs to be run once per commit (while in cache anyway) and then to use that information to generate diffs 'progressively'.
Didn't want this PR to be too large for review so just trying to set up the ground work for that here with minimal refactoring. More to come hopefully : )
WIP to solicit feed back and examples of what breaks