-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Occasional issues with file already closed errors #77
Comments
This looks like a race condition during the extraction of multiple Forge archives. Can you provide the Puppetfile with Forge module only or at least how many Forge modules this Puppetfile is using? |
I'm actually not using any forge modules, all modules are hosted in a self-hosted github enterprise instance. Other data points that might be useful: git urls are all HTTPS |
Having the same problems. We're also using git only, while some are https and some are ssh/git. I've added some really simple debug code: mbaur@mbaur-g10k:~/g10k-source$ git diff
diff --git a/forge.go b/forge.go
index a9eba6a..b6cb9d6 100644
--- a/forge.go
+++ b/forge.go
@@ -303,7 +303,7 @@ func unTar(r io.Reader, targetBaseDir string) {
if err == io.EOF {
break
}
- Fatalf(funcName + "(): error while tar reader.Next() for io.Reader " + err.Error())
+ Fatalf(funcName + "(): error while tar reader.Next() for io.Reader " + err.Error() + targetBaseDir)
}
// get the individual filename and extract to the current directory The targetBaseDir path which is shown, than the error occurred, is the path to a git repository, so there is definitely something wrong here. If i can help to debug this further, please let me know. |
@baurmatt I also added the |
@xorpaul Thanks I'll try to capture a failed run later today |
@andrewfraley Could you reproduce the issue in your setup? Some information about our setup:
Still trying to get the debug log through our security team. |
I was finally able to reliably reproduce this. The problem seems to be some binary files (please don't ask......) which we have in our repository.
I've uploaded all data into https://github.com/baurmatt/g10k-77-control and https://github.com/baurmatt/g10k-77-other. Hope this helps to finally find the issue. |
I'm having trouble reproducing the error while not on our production puppet servers, but I know I've seen it happen when I testing in a Vagrant VM. I'll keep trying, I don't think we have any repos with binary data (at least not larger than a few KB). |
Ok I'm able to reproduce the error, but it never happens on the same module, so I think this is some sort of race condition based on performance of the underlying disks and the remote git server. To reliably make it happen I have to remove the cache directory and remove the target modules directory, so it's a full sync from scratch. I can only make it happen on one of my real servers, which has a much faster connection to the git server, but much slower disks, vs my workstation inside a vagrant instance (workstation has slow connection to git server but fast SSD storage, server has fast connection to git server but pretty slow disks). 2017/11/02 17:55:39 DEBUG executeCommand(): Executing git --git-dir /tmp/g10k-test/modules/https-__redacted.github.enterprise.server_redacted_git_repo.git rev-parse --verify 'v1.1.0' I should add that the total disk space used by all modules in all environments is 650MB, so we are talking about writing a huge number of small files in a very short amount of time. Hope this helps. |
Alright I have managed to reliably reproduce the error in a Virtualbox VM by restricting the disk bandwidth. Steps to limit bandwidth on a Virtualbox disk:
You can change the limit while the machine is running with: For me the errors start at a 50M limit. More info here: https://www.virtualbox.org/manual/ch05.html#storage-bandwidth-limit |
Interesting. Are you running Like #76 (comment) you can also try limiting the number of parallel checkouts and pulls with the |
We're running g10k in a Virtuozzo container with Ubuntu 16.04 on an HP server with SAS HDDs. I just tried to lower the max worker to 1 but it didn't help at all :/
The repository checkout seems to be fine:
|
Yes, g10k is running inside the restricted VM. Is maxworker supposed to restrict the number of concurrent git commands? Even with -maxworker 1 I still see hundreds of active git processes while g10k is running. I think I've solved it, though, I'm actually bumping up against the nproc limit. If I restrict the available disk bandwidth down to 5MB/s, I start getting g10k crashes with errors about resource not available, so half the time I get a g10k crash with a big go dump, the other half I get the unTar error. I think it's just random if git can't create a new thread or g10k itself. After increasing the nproc limit for the user to 9000 the errors stopped happening, even with the disk restricted to 5MB/s. On CentOS for a user "puppet", add to /etc/security/limits.d/90-nproc.conf: |
@baurmatt Are you using the latest g10k release? Which git version are you using? Ubuntu 16.04 should be using a somewhat recent version of git though. I can't reproduce the issue with your test control repo, but I can only test it on my workstation with plenty of RAM, a fast CPU and an SSD. If you could switch all your Puppetfile Github modules to use the The problem is that with that test setup you're triggering about 70 And as @andrewfraley already found out: the The way I'll add a section to the readme that suggests increasing the default limits (nofile, nproc) before running g10k with a large Puppet environment. Would a parameter help that would limit the local extracting processes, like |
A parameter to limit extracting processes would definitely be helpful, I think that would help solve another issue I have with g10k pegging the CPUs while running. I don't mind if g10k takes 30s to finish instead of 3s, I just want it to finish within 60s. Thanks! |
I'm using g10k 0.3.14 and git 2.7.4. As you suggested, i just switched the module URLs to https. For me the error still exists. I've also just migrated my Virtuozzo Container on a hardware server with SSD storage, the error exists there as well. |
Have you tried increasing the security limits nproc and nofile for your g10k user? |
If i understand it correctly, my limits are already really high:
|
I've found the issue...
Upgrading my container from 2 to 4GB solved the problem and g10k is running fine. So i guess the |
@baurmatt 😏
I'm currently working on that |
Try out the new v0.4 release: https://github.com/xorpaul/g10k/releases/tag/v0.4 You can limit the number of Goroutines with |
Works great, thanks! :) As i can't reproduce the error, i think this issue can be closed. |
Glad to hear! 😏 |
@xorpaul working great for me as well. Dramatically reduced CPU load and no more errors. Using |
I sometimes see this issue:
unTar(): error while tar reader.Next() for io.Reader read |0: file already closed
I have my script that runs g10k automatically clear the cache when the last run exited in error, so the next run is usually ok. I only see this happen after changing the Puppetfile.
The text was updated successfully, but these errors were encountered: