Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

octopus-merge (part 5: tree-merge-ORT three-way) #1618

Merged
merged 23 commits into from
Nov 2, 2024
Merged

octopus-merge (part 5: tree-merge-ORT three-way) #1618

merged 23 commits into from
Nov 2, 2024

Conversation

Byron
Copy link
Member

@Byron Byron commented Oct 9, 2024

Three-way merging of trees.

Follow-up of #1612.

Tasks

  • baseline tests for tree-merging
  • successful first tree-merge without conflicts
  • make all tests pass (catch-all task that will be refined on the fly)
    • fully parse all information provided by Git in baseline
    • update the conflict-lookup table to reflect changes to the tree as well - avoid double conflicts or missed conflicts if something clashes with a newly added rename, for instance.
    • conflict-tree by index to allow getting 'next' of ours for lookahead
    • tree::Editor::get() to find unique path names
    • multi-tree traversal (without wildcard support!) - for now, let's just do it 'the simple' way and perform two diffs
    • See also, all the merge options
  • gracefully handle submodules (by not attempting resolution)
  • assure content-merges only happen across the same type/mode (and what happens with symlinks)
  • assure we handle mode-merges similarly
  • support for merge attribute
  • there is tests for merging merge-bases in diff3-conflict-markers
    • diff3-conflict-markers.sh - be sure to capture the 'empty tree' label , but also other special cases
    • binary merges are pre-empted as using 'ours' for the merge somehow…
  • have motivating tests to put possibly_rewritten_location() into all the right places - not feasible for now, let's get this merged and tested in the real world
  • treat symlink merges like content merges, and respect the text-merge resolution setting
    • when resolving merge-bases, symlink conflicts are resolved with the ancestor.
    • otherwise there is a choice which symlink edit to chose, similar to binary resolution
  • Allow to turn off renaming to use the version from the merge-base (and only that)
  • replace any kind of assertion with conflicts so it can't crash, and won't screw up, AKA fail gracefully
  • test copy versions of rewrites - it's not in merge-ORT so let's skip that
  • see if remove_existing_leaf() calls are necessary - they are needed
  • fix reversed tests for blobs
  • fuzzing for blob merges
  • remove TODOs - everything not specifically handled is a conflict
  • Repository::merge_trees
  • A trivial gix merge tree implementation, based on commits. Maybe create something to easily merge multiple commits while at it (in gix).
  • speed up writes

Next PR / Outscoped

  • Repository::merge_commits()
  • A trivial gix merge commit
  • A nicely wrapped gix API - for now all plumbing options and return values are exposed.
  • Submodule merges are also possible! Maybe outscope it though! libgit2 also doesn't try it.
  • textconv with context, see this gist for details.
    • There seem to be different 'tiers' of tools, some don't get GIT_DIR set, others do.
    • It also seems that diff-programs get too much context right now, but that depends on how much is passed to them by the caller as gix-command::Context.
  • How to model virtual-merge-bases? Can be none or many, user should have control over how this is done.
  • Actual tree-based merging
  • Make blob-merge based conflict handling possible in the tree merge from gix at least. - not needed for now
  • consider finding a way to more clearly say what happened, maybe with some additional explanation so it's more of a drill-down way of working, instead of a super-enum with a lot of very similar cases.

Research

  • optimization idea for when there are numbers: a 'brute force' implementation that uses threads would benefit from the

Everything is about MergeORT.

ability to re-use object caches of a repo that has seen the base-tree already, but overall, who knows*

  • merge-options passed with -X ours for instance don't affect tree-related auto-resolutions, just the ones related to content. This could be implemented when there is demand though.
  • it uses an empty tree if there is no merge-base - we must allow the same.
  • it allows for multiple merge-bases, creating a virtual one by merging all merge-bases together using the same algorithm, recursively.
  • merges can have conflicts without a individual files being involved, for instance when directory renames clash
  • Note that merge-ORT cannot properly handle renames within renamed directories, ending up with the source of the subdir-rename still present.
❯ git ls-tree -r $(git merge-tree main feat)
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    a
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    git-sec-renamed/2
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    git-sec-renamed/7
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    git-sec-renamed/subdir/6
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    git-sec/subdir-renamed/6
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    git-sequencer
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    gix/5
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    h
  • Must make sure that possible types of conflicts are properly communicated, to not degenerate information
  • It puts conflict-markers in the blobs of the result tree, with annotations to provide additional context
  • Need resolution configuration, see git2::MergeOptions.
  • data stored by path, and is interned in the map to allow pointer-based comparisons
    • merge-info with everything one needs to know, also related to renames
    • or conflict information
    • it uses a memory-pool/arena to get memory for many paths all at once (and also release it like that)
  • paths start out as conflicted, and then can later be changed to non-conflicting if a content-based merged succeeded.
    • If it remains conflicts, the meta-data is used to produce an 'as merged as possible' version with conflict markers that can be checked out to the working tree.
  • hunks can partially overlap, but can also be resolved line-by line to some extend.

@Byron Byron force-pushed the merge branch 3 times, most recently from 45b3557 to 64e0f78 Compare October 12, 2024 08:31
@Byron Byron mentioned this pull request Oct 12, 2024
1 task
EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Oct 13, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Per actions/runner-images#10636, the
`ubuntu-latest` GitHub Actions hosted runner is being migrated from
`ubuntu-22.04` to `ubuntu-24.04`. The `ci.yml` workflow, including
the full `test` job, use `ubuntu-latest`, and recently has switched
over to using Ubuntu 24.04 runner.

It makes sense to use this newer version, but that runner
apparently does not preinstall the headers required to build with
`-llzma`. That causes `just ci-test` to fail at
`cargo nextest run -p gix-testtools --features xz` with
`/usr/bin/ld: cannot find -llzma: No such file or directory`.

This installs the `liblzma-dev` package that provides the required
headers, so we can use Ubuntu 24.04 LTS while continuing to test
the `xz` feature of `gix-testtools`.

The failure this addresses was first observed in GitoxideLabs#1618 at 64e0f78:
https://github.com/GitoxideLabs/gitoxide/actions/runs/11304216949/job/31442302018?pr=1618

But it is not caused by any of the changes there, and it now occurs
whenever the `test` job is run, including if it is re-run at the
tip of the main branch (or any other branch), such as in:
https://github.com/EliahKagan/gitoxide/actions/runs/11317510844/job/31471040361

In both cases, expanding Set up job > Operating system shows
Ubuntu 24.04.1 LTS.

(The failure this addresses is also not related to GitoxideLabs#1622, which
documents an unrelated failure that is possible to observe locally
and that, if no change is made related to it, can be expected to
occur on CI in the `test` starting sometime soon, but that is not
yet seen on CI.)
@EliahKagan
Copy link
Member

EliahKagan commented Oct 13, 2024

As detailed in #1623, which provides a fix, the failure observed here in the CI test job is actually not due to any of the changes in this PR, and also occurs if CI is re-run on the tip of main. It is instead due to the upgraded runner image not having the headers needed for building with -llzma, which is needed for the xz feature of gix-testtools. This is also entirely unrelated to #1622, which does it not yet affect CI.

Merging #1623 and then rebasing this onto main should fix the test failure here. The other failure here is in the lint job and unrelated.

EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Oct 14, 2024
Per actions/runner-images#10636, the
`ubuntu-latest` GitHub Actions hosted runner label is being
migrated from aliasing `ubuntu-22.04` to aliasing `ubuntu-24.04`.
Our `ci.yml` workflow, which includes the full `test` job, uses
`ubuntu-latest`, and recently has switched automatically to using
an Ubuntu 24.04 runner.

It makes sense to use this newer version, but that runner
apparently does not preinstall the headers required to build with
`-llzma`. That causes `just ci-test` to fail at
`cargo nextest run -p gix-testtools --features xz` with
`/usr/bin/ld: cannot find -llzma: No such file or directory`.

This commit installs the `liblzma-dev` package that provides the
required headers, so we can use Ubuntu 24.04 LTS while continuing
to test the `xz` feature of `gix-testtools`.

The failure this addresses was first observed in GitoxideLabs#1618 at 64e0f78:
https://github.com/GitoxideLabs/gitoxide/actions/runs/11304216949/job/31442302018?pr=1618

But it is not caused by any of the changes there, and it now occurs
whenever the `test` job is run, including if it is re-run at the
tip of the main branch (or any other branch), such as in:
https://github.com/EliahKagan/gitoxide/actions/runs/11317510844/job/31471040361

In both cases, expanding *Set up job* > *Operating system* shows
Ubuntu 24.04.1 LTS.

(The failure this addresses is also not related to GitoxideLabs#1622, which
documents an unrelated failure that is possible to observe locally
and that, if no change is made related to it, can be expected to
occur on CI in the `test` starting sometime soon, but that is not
yet seen on CI.)
@Byron Byron force-pushed the merge branch 16 times, most recently from 1f75b49 to 207c35a Compare October 20, 2024 18:45
@Byron Byron force-pushed the merge branch 6 times, most recently from b1dae2b to 0fce40d Compare October 22, 2024 13:51
Byron added 2 commits October 31, 2024 20:41
…merge results.

This works by either selecting a possibly unchanged and not even loaded resource,
instead of always loading it to provide a buffer, in case the user doesn't
actually want a buffer.

Note that this also alters `buffer_by_pick()` to enforce handling of the 'buffer-too-large'
option.
Byron added 6 commits November 1, 2024 11:31
…ration

This also fixes an issue with blob merge computations.

It's breaking because the marker-size was reduced to `u8`.
…tial name.

For instance, previously, a ref named `A` could not be found even though `refs/heads/A` existed.
There it's far more useful and plumbing crates are enabled to write
objects without pulling in `gix-odb` as dependency.
@Byron Byron force-pushed the merge branch 5 times, most recently from 2125886 to df7a50f Compare November 2, 2024 09:06
Byron added 6 commits November 2, 2024 14:14
…RT` as far as tests go.

Note that this judgement of quality is based on a limited amount of partially complex
test, but it's likely that in practice there will be deviations of sorts.

Also, given the complexity of the implementation it is definitely under-tested,
but with that it's mostly en par with Git, unfortunatly.

On the bright side, some of the tests are very taxing and I'd hope this
means something for real-world quality.
…writing.

That way it becomes usable when merging trees, which benefits from automatic
checking of hashes before writing loose objects.
@Byron Byron marked this pull request as ready for review November 2, 2024 13:36
@Byron Byron enabled auto-merge November 2, 2024 13:36
@Byron Byron merged commit 3fb989b into main Nov 2, 2024
16 checks passed
@Byron Byron deleted the merge branch November 2, 2024 13:54
@Byron Byron mentioned this pull request Nov 2, 2024
7 tasks
EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Nov 3, 2024
Since GitoxideLabs#1618, which introduced `gix-merge::merge tree::run_baseline`
and the functionality it tests, that test has failed on Windows
when run with `GIX_TEST_IGNORE_ARCHIVES=1`. This happens because,
while `chmod +x` commands run and exit indicating success in MSYS
environments such as Git Bash, they do not actually make a change.

Multiple cases (all sub-cases of the same faililing test) would
fail this way. But the test fails fast, so only one is reported.
The first to fail this way is `same-rename-different-mode`, which
shows:

    --- STDERR:              gix-merge::merge tree::run_baseline ---
    failed to extract 'tests\fixtures\generated-archives\tree-baseline.tar': Ignoring archive at 'tests\fixtures\generated-archives\tree-baseline.tar' as GIX_TEST_IGNORE_ARCHIVES is set.
    thread 'tree::run_baseline' panicked at gix-merge\tests\merge\tree\mod.rs:83:21:
    assertion `left != right` failed: same-rename-different-mode-A-B-reversed: Git caught up - adjust expectation Git works for the A/B case, but for B/A it forgets to set the executable bit
      left: Sha1(d75339daa64f3a1d98947623eee7bbb218e6c87d)
     right: Sha1(d75339daa64f3a1d98947623eee7bbb218e6c87d)

This commit fixes that by using `git update-index --chmod=+x` to
stage the intended effect of `chmod +x`. The `chmod +x` commands
are still run, to avoid non-intended and potentially bug-prone or
confusing differences between the metadata on disk and what is
staged and commited, on systems where Unix-style permissions are
supported on disk and `chmod +x` modifies them.

Although only one of the two approaches is needed on any particular
system in these tests, deciding based on the operating system is
more complicated. It would also be misleading, because the effect
of the two kinds of commands is not, in *general*, the same: one
operates on files in the working tree, and the other operates on
the index. Therefore, both are used, for simplicity and clarity.

For readability, adjacent `chmod +x` commands are also combined
into one. This way, each place that has both `chmod +x` logic and
`git update-index --chmod=+x` logic can have a single command for
each, that clearly relate to each other.

As noted above, the change in this commit is not sufficient to fix
that failing test, because a subsequent assertion still fails. This
will be detailed and fixed in a subsequent commit. (This commit
also does not yet update generated archives.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants