Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partial parse of git log #3166

Closed
totembe opened this issue Jun 12, 2020 · 6 comments
Closed

partial parse of git log #3166

totembe opened this issue Jun 12, 2020 · 6 comments

Comments

@totembe
Copy link

totembe commented Jun 12, 2020

Below you can see the commits on specific file in my git log output. After indexing, I only see history to February 22nd. March 14th and February 23rd commits are omitted.

This project has commit beyond March 14th like August 17th on other files and those histories are shown for those files.

Also when i look for git diff for that file between these commits, there are changes to file. Opengrok does not show the latest version as source also.

I was using opengrok-1.1-rc16. Because of this issue i give a try to opengrok-1.3.16 and rebuild whole index from zero. Problem still persists.

output of 'git log --abbrev-commit --abbrev=8 --name-only --prety=fuller --date=iso8601-strict -- somepath/actualfile.java'

commit 1
Merge 2 3
Author: second@somewhere
AuthorDate: 2017-03-14T17:01:26+03:00
Commit: second@somewhere
CommitDate: 2017-03-14T17:01:26+03:00

  Merge remote-tracking branch 'origin/br-11' into beta

  Conflicts:
    somepath/file1.java

commit 4
Merge 5 6
Author: first@somewhere
AuthorDate: 2017-03-23T08:28:06+02:00
Commit: first@somewhere
CommitDate: 2017-02-23T08:28:06+02:00

  Merge branch 'CR-33' of
  http://git/project.git
  into CR-33

  Conflicts:
    somepath/actualfile.java
    somepath/file2.java

commit 5
Author: first@somewhere
AuthorDate: 2017-02-22T12:30:56+02:00
Commit: first@somewhere
CommitDate: 2017-02-23T08:27:00+02:00

  some explanation 1

somepath/actualfile.java

commit 6
Author: first@somewhere
AuthorDate: 2017-02-22T12:30:56+02:00
Commit: first@somewhere
CommitDate: 2017-02-22T12:30:56+02:00

  some explanation 2

somepath/actualfile.java
@idodeclare
Copy link
Contributor

idodeclare commented Jun 12, 2020

GitRepository looks for associated files in the first column of output after each commit message. (Your capture above is was not showing leading whitespace, but in git output notice the commit message is indented with whitespace.)

With the log command that GitRepository uses, merge commits can omit the file list. I'm not actually sure if a file list is always omitted for merges or if there could be certain changes done in a merge that would be shown. E.g. if you ran git merge --no-commit and made additional changes, I'm guessing those would show — but never tested.

I have noticed too what you're seeing, and I tinkered around with modifying GitRepository to run git log adding the switch below. I had wanted to evaluate the effects when run against test repositories but didn't get the time.

       -m
           This flag makes the merge commits show the full diff like regular commits; for each merge parent, a
           separate log entry and diff is generated. An exception is that only diff against the first parent is shown
           when --first-parent option is given; in that case, the output represents the changes the merge brought into
           the then-current branch.

@totembe
Copy link
Author

totembe commented Jun 12, 2020

I have edited my original post to show identation. Sorry for my mistake.

For merges, identation is as you described.

For this particular merged file, I ran git log with -m option.
somepath/actualfile.java line had been added for merge commits too. But output format changes a little bit.

commit 1 (from 2)
Merge 2 3
Author: second@somewhere
AuthorDate: 2017-03-14T17:01:26+03:00
Commit: second@somewhere
CommitDate: 2017-03-14T17:01:26+03:00

  Merge remote-tracking branch 'origin/br-11' into beta

  Conflicts:
    somepath/file1.java

somepath/actualfile.java

commit 1 (from 3)
Merge 2 3
Author: second@somewhere
AuthorDate: 2017-03-14T17:01:26+03:00
Commit: second@somewhere
CommitDate: 2017-03-14T17:01:26+03:00

  Merge remote-tracking branch 'origin/br-11' into beta

  Conflicts:
    somepath/file1.java

somepath/actualfile.java

commit 4 (from 5)
Merge 5 6
Author: first@somewhere
AuthorDate: 2017-03-23T08:28:06+02:00
Commit: first@somewhere
CommitDate: 2017-02-23T08:28:06+02:00

  Merge branch 'CR-33' of
  http://git/project.git
  into CR-33

  Conflicts:
    somepath/actualfile.java
    somepath/file2.java

somepath/actualfile.java

commit 4 (from 6)
Merge 5 6
Author: first@somewhere
AuthorDate: 2017-03-23T08:28:06+02:00
Commit: first@somewhere
CommitDate: 2017-02-23T08:28:06+02:00

  Merge branch 'CR-33' of
  http://git/project.git
  into CR-33

  Conflicts:
    somepath/actualfile.java
    somepath/file2.java

somepath/actualfile.java

commit 5
Author: first@somewhere
AuthorDate: 2017-02-22T12:30:56+02:00
Commit: first@somewhere
CommitDate: 2017-02-23T08:27:00+02:00

  some explanation 1

somepath/actualfile.java

commit 6
Author: first@somewhere
AuthorDate: 2017-02-22T12:30:56+02:00
Commit: first@somewhere
CommitDate: 2017-02-22T12:30:56+02:00

  some explanation 2

somepath/actualfile.java

@idodeclare
Copy link
Contributor

idodeclare commented Jun 12, 2020

Maybe with --first-parent too makes sense to do

@idodeclare
Copy link
Contributor

I refreshed my memory, and I had tested much more than I remembered.

I originally noticed the problem of missing history on the FreeBSD repository. After investigation I learned in that repository there were instances of so-called "octopus merges" with more than two heads.

git log -m ... solved the problem of missing history. I did not use --first-parent because it would have left out some of the needed changes from the octopus branches. I did condense the separately reported head-merges though so that they didn't show up as separate, pointless entries in OpenGrok history.

My branch containing that work was really trying to massively speed up history, since FreeBSD history takes f-o-r-e-v-e-r to manage by OpenGrok since OpenGrok's data model copies commit contents (dates + commit message) to every affected file separately — and FreeBSD commits and merges very often touch thousands of files. The branch is still a work-in-progress and idle for a year, but maybe I can extract the octopus handling from it, which would solve your issue too.

idodeclare added a commit to idodeclare/OpenGrok that referenced this issue Jun 14, 2020
Also:
- Add FileHistoryCacheOctopusTest showing dupes
  for merges (before revising for this patch)
- Update parsing of Git revision with labels
- Fix oracle#3166 "partial parse of git log"
@vladak
Copy link
Member

vladak commented Jun 15, 2020

The original problem with merge commits in Git is tracked by #1167.

@tulinkry
Copy link
Contributor

Ah yes, I remember I was looking into this already!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants