-
-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue cloning large repositories #301
Comments
Updated issue - fixed formatting, and corrected description, this does not get better after multiple fetches. I have at least one repo that GitPython simply cannot fetch. |
Write debug files for stderr output as well as FETCH_INFO into the git repository, e.g. into `.git/` Help with #301
The issue described looks very much like #232, which was fixed back in the days. Maybe, the fix didn't work after all. What happens is that GitPython receives output from stderr, line by line, and from the FETCH_INFO file git leaves on disk. The information of both data sources is merged to obtain all the branch information we seek to provide. If these data sources do not match up for some reason, the assertion will fail. Assuming you are already using the latest version on pypi or GitHub, we could try to allow me to reproduce the issue. To do that, you can run your code with GitPython on this branch which leaves debug files in the Thank you. |
Write debug files for stderr output as well as FETCH_INFO into the git repository, e.g. into `.git/` Help with #301
By the way: a quick workaround should be to leave GitPython out of the equation and just run |
@Byron thanks so much for the quick reply on this. This is happening with 1.0.1 from pypi (I wish I could say I was running one of the old RCs and the upgrade fixed it, but no such luck). So unfortunately this is a private, internal repo, and given some of the commit messages that I've seen, I don't feel comfortable providing... anything... from this repo externally. However, I've created a test repository at https://github.com/jantman/gitpython_issue_301 with 5,000 randomly-generated commits in it, each in its own branch. Very interestingly, I've been able to reproduce the issue, but only when using SSH as a transport; when using HTTPS, it works fine. Code to reproduce:
The failure happens during the fetch. I've added you as a collaborator on the repo so you can test if you want. |
You can watch the development stream on youtube.
|
@Byron That was actually really interesting to watch... it's the first time I've seen a livestream of development work, let alone the thought process of working through an issue I opened. |
Thank you ! And wait until you have seen a good one of that kind ! PS: I have the issue in my inbox and will get to it eventually. A few things piled up. |
This allows us to use the main thread to parse stderr to get progress, and resolve assertion failures hopefully once and for all. Relates to #301
Alright, a fix was pushed, and I could verify that the provided means of reproduction do not fail anymore. Please let me know if it works for you as well. |
You can watch the development stream on youtube.
|
Wow! I watched the second recording (the first one seems to have been deleted?)... quite a tricky issue. Thanks sooo much for giving this so much attention! Fix is confirmed for me, using both the test repo I created on GitHub (which I've now deleted, since you created https://github.com/gitpython-developers/gitpython_issue_301 ) and using the original internal repositories that I found the problem with. Also, I don't have any numbers to back this up, but subjectively, it seems to run quite a bit faster. I've used GitPython quite a bit (though usually only for one-off scripts, nothing that really runs on a regular basis) and REALLY appreciate the work you're doing getting the project caught up. If there's anything I can do to help, I've got a lot on my plate, but feel free to ask. |
Indeed, video 14 seems to be broken - youtube was unable to convert the video file, maybe the upload was corrupted somehow. In a way it's good, as it shows me trying to figure out what's so special about this particular way of fetching branches, after all it worked fine in my test which spawned a process printing the branch information to stdout and stderr. The speedup might be due to the improved line parsing, which is now handled by C instead of pure python. The fix was to only open stderr of the git fetch process, which now allows to just call readlines() on it. Thanks for your help in writing excellent issues, the time you spent to help reproducing them is much appreciated as well ! |
This does not seem to be fixed in HEAD. I ran into this issue with GitPython 1.0.1, and while troubleshooting I installed the package directly from the HEAD of this repo to see if that would fix the problem. The output of the error changes in my case, but the error itself does not – in 1.0.1 I would get the contents of each list in the error, whereas now I only get The repo I'm acting on is private, so some details are necessarily excluded. In summary: when running In the detailed output that was produced by 1.0.1, what I see are lists of branches. The formatting is noticeably different between the lists on either side of the assertion. One side looks like this:
Whereas the other side looks like this:
In the example from which I drew this output, the latter side was the smaller one. Also, it's worth noting that the branch names in both lists are sorted alphabetically, and that the latter list is noticeably truncated at a seemingly arbitrary point. Could this be an API pagination issue, or something similar? Let me know if there is anything I can do to help troubleshoot further. Thanks! |
@ianwestcott Thanks for chiming in on this one, and for offering your help to do additional troubleshooting. What's happening is that git will provide real-time information about the fetched heads on In the past, this was problematic as getting all lines from Would that mean that |
Hi @Byron, Ah, I suppose that info would be helpful, yes. I'm using Python 2.7.10 (installed via Homebrew) in OS X El Capitan (10.11). Let me know how I can be of further assistance in troubleshooting this. I can also open a new issue if you'd prefer not to clutter this one further. Thanks! |
@ianwestcott A new issue would be great ! My primary goal would be to reproduce the issue you are seeing with the latest HEAD. That might be involve some work on your part I suppose, but without a way to reproduce the issue, there can be no fix. |
To follow up on this: My issue persisted for a long time, and I never got to the bottom of it. I learned that the problem is specific to OS X El Capitan (10.11), and that it seems to be related to how # this will output a complete list of refs
git fetch --verbose 2>&1
# this will mysteriously output a truncated list of refs
git fetch --verbose 2>&1 | cat I could reproduce the issue regardless of the version of git installed in El Cap (I tested the Mac system version, homebrew's version, and the latest release compiled from source). Anyway, the good news is that while I never figured out the root cause of the truncated output, I am pleased to report that it looks like GitPython no longer throws an AssertionError as a result of it, probably due to the fix in #442. I have not had a problem since upgrading to GitPython 2.0.6. Which is a huge relief, because I was stumped at how to further troubleshoot the issue with git. :) |
@ianwestcott Thanks for the write-up ! It's incredible to see such behaviour can actually exist, and even does so with the plain command-line program. I myself didn't experience such behaviour yet, and am puzzled as what's causing is, or how to fix it. |
GitPython seems to break on large repositories. I've hacked a bit of additional output into the assertion at https://github.com/gitpython-developers/GitPython/blob/master/git/remote.py#L576 :
When I run
git.Repo.clone_from()
or.fetch()
, it produces:... and then an amazingly long line of output.
This appears to happen with very large repositories; I honestly can't tell if the issue here is commit messages or the repository size - the one I used for testing has (yeah..... :( ) 4,789 branches.
The text was updated successfully, but these errors were encountered: