-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docker graceful exits when image pull fails #382 #450
Conversation
Codecov Report
@@ Coverage Diff @@
## master #450 +/- ##
==========================================
- Coverage 35.82% 32.32% -3.50%
==========================================
Files 47 50 +3
Lines 2892 3276 +384
==========================================
+ Hits 1036 1059 +23
- Misses 1763 2123 +360
- Partials 93 94 +1
Continue to review full report at Codecov.
|
cmd/tink-worker/internal/registry.go
Outdated
if _, err := io.Copy(os.Stdout, out); err != nil { | ||
return err | ||
fd := json.NewDecoder(out) | ||
var status *ImagePullStatus | ||
for { | ||
if err := fd.Decode(&status); err != nil { | ||
if err == io.EOF { | ||
break | ||
} | ||
} | ||
if status.Error != "" { | ||
return errors.Wrap(errors.New(status.Error), "DOCKER PULL") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the behavior and now hides the pulling logs, which I guess might be helpful while pulling large image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that would be only possible if we print the status
.
But the only logs which we get earlier where the stream logs, IMHO they might not be really that useful as we are catching the error. Also, later at some point if someone wishes to catch some details from the stream like image details
etc and modify something, now that is possible with the struct.
If we wish to log in, we can place a logger and that can be taken as a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can use TeeReader to both print the messages and decode them to retain the old behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so TeeReader will return a Reader object and data will be pushed to stdout when we read it. Which will fail but go won't catch the error. I tested it, while downloading the image it failed in between, but didn't return the error. i used ReadAll to read the reader object as per the example in the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can defer io.Copy on tee reader to keep the output, then read and decode it as done in this PR. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think defer will make a difference. Both of them eventually will call the read
. Secondly, reader object can not be read twice.
Even if we mae it work, IMHO that would be an overhead for the image pull
method.
Do you have any specific use case where logs are required ?
I think, one of the below three an be an approach:
- keep
io.Copy
and document the error. - keep the code change as in PR. For logs, we can add a logger from a separate issue.
- File a issue in repo of docker. Actually, you will find the issue on docker repo with the same context, and suggestions are there are to increase the host size. I didn't got lucky to find some other suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any specific use case where logs are required ?
It's the progress indicator, it's always nice to see in logs that something is happening. But if maintainers agrees this is not very important, I'm fine with it.
Secondly, reader object can not be read twice.
This is why you could use TeeReader, to duplicate the stream.
I also found moby/moby#36253 now, it seems ImagePull
has even more bugs :|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also found moby/moby#36253 now, it seems ImagePull has even more bugs :|
Woh! this is surprising, I didn't notice this.
This is why you could use TeeReader, to duplicate the stream.
I tried to replicate this scenario, as in below code snippet, but I didn't observed that object can be streamed/read again. Can you share me an example on what you are suggesting,
if err != nil {
fmt.Println(reader)
return
}
r := io.TeeReader(reader, os.Stdout)
// streaming the object
_, err= io.ReadAll(r)
if err != nil {
return err
}
// re streaming the object.
fd := json.NewDecoder(r)
type Status struct {
Status string `json:"status"`
Error string `json:"error"`
Progress string `json:"progress"`
ProgressDetail struct {
Current int `json:"current"`
Total int `json:"total"`
} `json:"progressDetail"`
}
var status *Status
for {
if err := fd.Decode(&status); err != nil {
if err == io.EOF {
break
}
fmt.Println("caught ", err, status)
}
if status.Error != "" {
fmt.Println("Failed ", status.Error)
}
fmt.Printf("Status: %+v\n", status)
}
fmt.Println("done")
if the error is caught correctly, done shouldn't be printed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about something like:
package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"io/ioutil"
"os"
"strings"
)
// ImagePullStatus is the status of the downloaded Image chunk
type ImagePullStatus struct {
Status string `json:"status"`
Error string `json:"error"`
Progress string `json:"progress"`
ProgressDetail struct {
Current int `json:"current"`
Total int `json:"total"`
} `json:"progressDetail"`
}
func main() {
r := ioutil.NopCloser(strings.NewReader("hello world")) // r type is io.ReadCloser
// Ensure ReadCloser will be closed and error will be handled.
//
// Error handling is not critical for execution, so only inform user
// if error occurs, but proceed with execution.
defer func() {
fmt.Println("Closing reader")
if err := r.Close(); err != nil {
fmt.Printf("Closing reader: %v\n", err)
}
}()
// Make a copy of logs produced by reader into buffer, so it can be copied into stdout.
var bufferRead bytes.Buffer
reader := io.TeeReader(r, &bufferRead)
// Stream copied logs.
//
// Error handling is not critical for execution, so only inform user
// if error occurs, but proceed with execution.
go func() {
fmt.Println("Copying logs to stdout")
if _, err := io.Copy(&bufferRead, os.Stdout); err != nil {
fmt.Printf("Writing logs to stdout: %v\n", err)
}
}()
fd := json.NewDecoder(reader)
var status *ImagePullStatus
fmt.Println("Entering decode loop")
for {
if err := fd.Decode(&status); err != nil {
if err == io.EOF {
break
}
fmt.Println("Decoding error occurred: %v", err)
continue
}
if status.Error != "" {
fmt.Printf("Error occurred: %v", status.Error)
return
}
}
}
This loops infinitely, it shouldn't, but I hope you get the idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! A couple of things! @invidian thank you for catching this. @Cbkhare you should limit the scope of a PR to a minimum. Changing the logging behavior is probably out of scope.
A few days ago something funny happened. Another member of the community asked for the opposite of what @invidian wants. Let's discuss "image pull logging" there because as I said, this change is out of scope for this PR.
@@ -64,8 +74,17 @@ func (r *RegistryConnDetails) pullImage(ctx context.Context, cli *client.Client, | |||
return errors.Wrap(err, "DOCKER PULL") | |||
} | |||
defer out.Close() | |||
if _, err := io.Copy(os.Stdout, out); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think I don't understand. so cli.ImagePull
does not return error when you run out of disk space? This seems like a bug to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is a bug. I think it is probably with docker.
As of now, this change gives us leverage to bypass the error gracefully. But eventually, a proper fix on the go-client or docker would be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, i would have personally preferred the io.Copy
catching that error.
On our last syncup call me and @thebsdbox discussed this in details, so we thought to have a workaround ready at least for now.
I also, think an alternative way could be to document this and ask the user to increase the disk space when this error is encountered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this been reported upstream? I'm also the user of cli.ImagePull
, so I'm curious under what conditions it may fail, but not return error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think probably the issue is with io.Copy
writing stream to stdout instead of cli.ImagePull
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
io.Copy
happens after cli.ImagePull
error checking. If Image pulling fail, I'd expect no logs will be printed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image is pulled in streams and read by io.Copy
in chunks. Once one of the chunk is read EOF is generated. When read of second chunk fails, the interrupt is either not generated or go compiler doesn't receives it, so in a way for go error never occurred and that is the reason why go is not sending the error back. I have pasted the logs in the issue #382 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what are you referring to as "go compiler", but I don't think the compiler is involved in here. Maybe you meant Go runtime?
So also looking at moby/moby#36253, it seems cli.ImagePull
does not really return error when pulling image fails, but streams the possible errors via the reader. This is hell-confusing API 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created moby/moby#42126, maybe they improve it somehow...
cmd/tink-worker/internal/registry.go
Outdated
if err := fd.Decode(&status); err != nil { | ||
if err == io.EOF { | ||
break | ||
} | ||
} | ||
if status.Error != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If error occurs but it's not EOF, deferencing status
for Error
field will panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I have added a return statement if Decode
fails.
} | ||
if status.Error != "" { | ||
return errors.Wrap(errors.New(status.Error), "DOCKER PULL") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed on Slack, if we add fmt.Printf("%+v\n", status)
below this line, we retain logging functionality and everything should work nicely then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not relevant anymore. See #451.
Ok! As we discussed here #451 we can remove the progress bar because it is not really a good way when to communicate what a program is doing in this context. It is good for humans, but not ideal in this scenario. So, even if it was not part of this work let's do it here in a separate commit @Cbkhare Thanks a lot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think commits can be squashed, but it looks good to me 👍
Yes, right. I waiting for conclusion and then i can squash. I usually prefer do it at last, One squash for all the commits. 😂 |
5cd3825
to
ca4f8a9
Compare
return err | ||
fd := json.NewDecoder(out) | ||
var status *ImagePullStatus | ||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could add some units tests for this. They shouldn't be too hard to write and I think it would be better to have some internal tests, rather than no tests.
ca4f8a9
to
b8fa81c
Compare
032d586
to
1387b5a
Compare
Will a docker pull error cause tink-worker to exit? |
yes, currently if there is a docker pull error then tink-worker exits |
We should not do this, this will be a hindrance to EM adoption and we'd have to patch that out. tink-worker should always be up to hopefully try and recover. |
This change does not alter the tink-worker existing functionality. This just does a gracefully exit and returns a proper error, instead of bailing out with a random error. IMHO, If we wish to remove the exit of tink-worker when docker failure occurs, then we can create a separate issue and fix it.. |
Yep that sounds good, I was likely to do the same :D |
cmd/tink-worker/internal/registry.go
Outdated
@@ -46,8 +56,12 @@ func (r *RegistryConnDetails) NewClient() (*client.Client, error) { | |||
return c, nil | |||
} | |||
|
|||
type ImagePuller interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this interface need to be exported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, this can be updated to un-exportable. Done
if test.err != nil { | ||
assert.Error(t, err) | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think test should also do some assertions when error is not expected, but it was returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, for success case, only possible scenario is that err would be nil. Which we are covering implicitly in the assertion.
s = "{\"error\": \"\"}" | ||
err = errors.New("Tested, failure of the image pull") | ||
} else if str == "test/fail_partial" { | ||
s = "{\"status\": \"hello\",\"error\":\"\"}{\"status\":\"world\",\"error\":\"Tested, failure of No space left on device\"}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have this data defined directly in the test case, to keep things simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, I thought it could be better, but then I have made changes as per your suggestion.
if str == "test/success" { | ||
s = "{\"status\": \"hello\",\"error\":\"\"}{\"status\":\"world\",\"error\":\"\"}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should only be a test case when decoding fails, as IMO it's important part of the spec, to decide on what to do if Docker throws some unexpected message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
third case does that. In that scenario, while reading the stream docker will bail out with No space left on device
. you can notice, third case inside Imagepull method returns the err as nil
stringReader := strings.NewReader(s) | ||
stringReadCloser := ioutil.NopCloser(stringReader) | ||
return stringReadCloser, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to also test case where ImagePull
returns error directly, not via decoder. This mock method could return an error when given string is empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
second case does that Tested, failure of the image pull
service := "github.com/tinkerbell/tink" | ||
logger, err := log.Init(service) | ||
if err != nil { | ||
panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/golang/go/wiki/CodeReviewComments#dont-panic
I'd use t.Fatalf
instead and pass *testing.T
as a parameter to this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
added check for Decode statement added unit tests added unit tests added unit tests added unit tests
0a3bcfb
to
3d5f797
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM. We could test one more scenario:
diff --git cmd/tink-worker/internal/registry.go cmd/tink-worker/internal/registry.go
index 97bc13a..2f177ba 100644
--- cmd/tink-worker/internal/registry.go
+++ cmd/tink-worker/internal/registry.go
@@ -85,7 +85,7 @@ func (r *RegistryConnDetails) pullImage(ctx context.Context, cli imagePuller, im
if err == io.EOF {
break
}
- return errors.Wrap(err, "DOCKER PULL")
+ //return errors.Wrap(err, "DOCKER PULL")
}
if status.Error != "" {
return errors.Wrap(errors.New(status.Error), "DOCKER PULL")
Right now with this patch, tests are still passing.
yes, that return of error is specific to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, thanks @Cbkhare
@Cbkhare Could you update the branch, please? |
Description
Docker image pull will do a graceful exit when an error is encountered.
Why is this needed
io.Copy
, an error is generated if the image pull fails, primarily when space is not left on the device.Fixes: #
io.Copy
How Has This Been Tested?
hello-world
image and failure with a very large size dummy image of around 1 GB.Please note:
This change, additionally, alters the details of logs collection when an image is pulled. This is to avoid unnecessary details in the logs. Same has been concluded to remove via #451