Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit Rugged Control Repo refspec #1412

Merged
merged 1 commit into from
Dec 4, 2024

Conversation

seanmil
Copy link
Contributor

@seanmil seanmil commented Dec 2, 2024

Some Git platforms (e.g. GitLab) store additional references beyond the ones typically found in refs/heads for various tracking/historical reference purposes. For Control Repos which have been around a long time these additional refs can number in the tens of thousands. Refs that numerous seem to cause issues with some versions of Rugged being unable to properly clone and reference the Control Repo.

This patch limits Control Repo reference fetching to refs/heads, which is all R10K should need anyways.

See https://docs.gitlab.com/ee/development/gitaly.html#gitlab-specific-references for details on some of these additional references.

@seanmil seanmil requested review from a team as code owners December 2, 2024 18:42
Some Git platforms (e.g. GitLab) store additional references
beyond the ones typically found in refs/heads for various
tracking/historical reference purposes. For Control Repos
which have been around a long time these additional refs
can number in the tens of thousands. Refs that numerous seem
to cause issues with some versions of Rugged being unable
to properly clone and reference the Control Repo.

This patch limits Control Repo reference fetching to
refs/heads, which is all R10K should need anyways.

See https://docs.gitlab.com/ee/development/gitaly.html#gitlab-specific-references
for details on some of these additional references.
@justinstoller
Copy link
Member

Thanks for the PR, Sean. I'll try to figure this out this week. I'm surprised that the :prune => true isn't cleaning up the old refs. I also think we'd need refs/tags/* as well. But I'll try to verify those questions.

@seanmil
Copy link
Contributor Author

seanmil commented Dec 3, 2024

Thanks for the PR, Sean. I'll try to figure this out this week. I'm surprised that the :prune => true isn't cleaning up the old refs. I also think we'd need refs/tags/* as well. But I'll try to verify those questions.

Hi Justin, thanks for the quick update.

The :prune => true wouldn't clean up all of these references because a number of them are permanent references GitLab uses to point to key points in time in the repository such as commits used to show MR state, commit refs for pipeline jobs, the commit at which a (GitLab) "project environment" was deployed, etc.. Those just accumulate as the project has more and more history associated with it, so they aren't eligible for pruning. You can remove them from GitLab (as I did to work around my critical issue at the time) but it isn't a normal procedure and as I recall requires admin-level console access to GitLab.

As far as the refspec, adding refs/tags/* may be a good idea regardless, but I'm not sure I've heard of any Control Repo workflows that deploy from tags instead of branch names. That particular bit of code is, I believe, only used to clone/fetch the bare repository containing the control repo itself. The clone/fetch for any of the repositories managed by the control repo is in the lib/r10k/git/working_repository.rb file instead. It has been too long since I did this work to remember with clarity, but I believe I based the refspec based on what I saw when running the git --mirror command in lib/r10k/git/shellgit/bare_repository.rb. Making the refspec configurable would be okay too, but that seems like a lot of additional work for questionable value.

Hope that helps, thanks for taking a look at it!

@justinstoller
Copy link
Member

Thanks for clarifying the gitlab usage!

The repository in the codedir (both the environment and any git modules within it) are ThinRepositorys which inherit from WorkingRepositorys. The BareRepository is used by the git cache. eg, every git module of "puppetlabs-apache" across all your environments are thin repos that point to a cached bare apache repository in your cachedir.

A brief repro with shellgit shows this, though it also seems to be using refs/heads, so maybe my concern is unnecessary. I'll keep looking:

apache on  HEAD (c4310ce) [✘] Ruby v3.2.0 
 ➜  cat .git/config 
[core]
        repositoryformatversion = 0
        filemode = true
        bare = false
        logallrefupdates = true
[remote "origin"]
        url = https://github.com/puppetlabs/puppetlabs-apache
        fetch = +refs/heads/*:refs/remotes/origin/*
[branch "main"]
        remote = origin
        merge = refs/heads/main
[remote "cache"]
        url = /home/justin/.r10k/git/https---jackfan.us.kg-puppetlabs-puppetlabs-apache
        fetch = +refs/heads/*:refs/remotes/cache/*

apache on  HEAD (c4310ce) [✘] Ruby v3.2.0 
 ➜  cat /home/justin/.r10k/git/https---jackfan.us.kg-puppetlabs-puppetlabs-apache/config
[core]
        repositoryformatversion = 0
        filemode = true
        bare = true
[remote "origin"]
        url = https://github.com/puppetlabs/puppetlabs-apache
        fetch = +refs/*:refs/*
        mirror = true

@justinstoller
Copy link
Member

From what I can find in the docs the default behavior of git is to pull down any tags that are reachable from a head that is being fetched. So pulling down all of refs/heads/* should involve pulling down all of the reachable tags as well.

I tested this by installing a dev build of PE, updating pe-r10k as done in this PR and pulling down a control-repo that used git modules with tags. I then, in a module that the control repo was pulling down, created a non-default branch with a new commit and tag on it, pushed that to Github, and updated my control repo to only reference that tag. It was able to deploy just fine.

@justinstoller justinstoller merged commit fe2b37f into puppetlabs:main Dec 4, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants