Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker graceful exits when image pull fails #382 #450

Merged
merged 3 commits into from
Mar 30, 2021

Conversation

Cbkhare
Copy link
Contributor

@Cbkhare Cbkhare commented Mar 8, 2021

Description

Docker image pull will do a graceful exit when an error is encountered.

Why is this needed

Fixes: #

  • Have removed the io.Copy
  • Added pull of an image and Error check via a struct.

How Has This Been Tested?

  • Tested on Vagrant setup.
  • Test with the below template.
version: "0.1"
name: helloworkflow
global_timeout: 600
tasks:
  - name: "hello world 1"
    worker: "{{.device_1}}"
    actions:
      - name: "hello_world_1"
        image: hello-world
        timeout: 60
      - name: "hello_world_2"
        image: test
        timeout: 200

  • Test logs covers both +ve and -ve scearios.
  • success with hello-world image and failure with a very large size dummy image of around 1 GB.
{"level":"info","ts":1615207865.9741204,"caller":"cmd/root.go:132","msg":"no config file found","service":"github.com/tinkerbell/tink"}
{"level":"info","ts":1615207865.9748633,"caller":"cmd/root.go:53","msg":"starting","service":"github.com/tinkerbell/tink","version":"d0780ce"}
{"level":"info","ts":1615207865.992383,"caller":"internal/worker.go:270","msg":"starting with action","service":"github.com/tinkerbell/tink","workerID":"0eba0bf8-3772-4b4a-ab9f-6ebe93b90a94","workflowID":"b19960b8-800c-11eb-ba2c-0242ac120005","actionName":"hello_world_1","taskName":"hello world 1"}
{"level":"info","ts":1615207865.9968224,"caller":"internal/worker.go:293","msg":"sent action status","service":"github.com/tinkerbell/tink","workerID":"0eba0bf8-3772-4b4a-ab9f-6ebe93b90a94","workflowID":"b19960b8-800c-11eb-ba2c-0242ac120005","actionName":"hello_world_1","taskName":"hello world 1","duration":"0"}
{"level":"info","ts":1615207866.195519,"caller":"internal/action.go:54","msg":"creating container","service":"github.com/tinkerbell/tink","command":[]}
{"level":"info","ts":1615207866.2004075,"caller":"internal/worker.go:108","msg":"container created","service":"github.com/tinkerbell/tink","workflowID":"b19960b8-800c-11eb-ba2c-0242ac120005","workerID":"0eba0bf8-3772-4b4a-ab9f-6ebe93b90a94","actionName":"hello_world_1","actionImage":"hello-world","containerID":"42482e6c52c6d13a44b8fc2456979409c9a4051ddf78b45c72ca7d3c9802daa7","command":[]}

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

{"level":"info","ts":1615207866.688619,"caller":"internal/worker.go:143","msg":"container removed","service":"github.com/tinkerbell/tink","workflowID":"b19960b8-800c-11eb-ba2c-0242ac120005","workerID":"0eba0bf8-3772-4b4a-ab9f-6ebe93b90a94","actionName":"hello_world_1","actionImage":"hello-world","status":"STATE_SUCCESS"}
{"level":"info","ts":1615207866.6886694,"caller":"internal/worker.go:186","msg":"action container exited","service":"github.com/tinkerbell/tink","workflowID":"b19960b8-800c-11eb-ba2c-0242ac120005","workerID":"0eba0bf8-3772-4b4a-ab9f-6ebe93b90a94","actionName":"hello_world_1","actionImage":"hello-world","status":"STATE_SUCCESS"}
{"level":"info","ts":1615207866.688689,"caller":"internal/action.go:115","msg":"removing container","service":"github.com/tinkerbell/tink","workflowID":"b19960b8-800c-11eb-ba2c-0242ac120005","workerID":"0eba0bf8-3772-4b4a-ab9f-6ebe93b90a94","actionName":"hello_world_1","actionImage":"hello-world","containerID":"42482e6c52c6d13a44b8fc2456979409c9a4051ddf78b45c72ca7d3c9802daa7"}
{"level":"info","ts":1615207866.6984775,"caller":"internal/worker.go:334","msg":"sent action status","service":"github.com/tinkerbell/tink","workerID":"0eba0bf8-3772-4b4a-ab9f-6ebe93b90a94","workflowID":"b19960b8-800c-11eb-ba2c-0242ac120005","actionName":"hello_world_1","taskName":"hello world 1"}
{"level":"info","ts":1615207866.712187,"caller":"internal/worker.go:293","msg":"sent action status","service":"github.com/tinkerbell/tink","workerID":"0eba0bf8-3772-4b4a-ab9f-6ebe93b90a94","workflowID":"b19960b8-800c-11eb-ba2c-0242ac120005","actionName":"hello_world_2","taskName":"hello world 1","duration":"0"}
{"level":"error","ts":1615207891.0433881,"caller":"internal/worker.go:319","msg":"DOCKER PULL: DOCKER PULL: failed to register layer: Error processing tar file(exit status 1): write /root/test123/570bbc7011b5f6155f4631fc33753f02f7e8288bd8f2fac20d766e437e8f1bbe/layer.tar: no space left on device","service":"github.com/tinkerbell/tink","workerID":"0eba0bf8-3772-4b4a-ab9f-6ebe93b90a94","workflowID":"b19960b8-800c-11eb-ba2c-0242ac120005","actionName":"hello_world_2","taskName":"hello world 1","error":"DOCKER PULL: DOCKER PULL: failed to register layer: Error processing tar file(exit status 1): write /root/test123/570bbc7011b5f6155f4631fc33753f02f7e8288bd8f2fac20d766e437e8f1bbe/layer.tar: no space left on device","errorVerbose":"failed to register layer: Error processing tar file(exit status 1): write /root/test123/570bbc7011b5f6155f4631fc33753f02f7e8288bd8f2fac20d766e437e8f1bbe/layer.tar: no space left on device\ngithub.com/tinkerbell/tink/cmd/tink-worker/internal.(*RegistryConnDetails).pullImage\n\t/home/Chitrabasu/Downloads/work/repo/tink/cmd/tink-worker/internal/registry.go:86\ngithub.com/tinkerbell/tink/cmd/tink-worker/internal.(*Worker).execute\n\t/home/Chitrabasu/Downloads/work/repo/tink/cmd/tink-worker/internal/worker.go:101\ngithub.com/tinkerbell/tink/cmd/tink-worker/internal.(*Worker).ProcessWorkflowActions\n\t/home/Chitrabasu/Downloads/work/repo/tink/cmd/tink-worker/internal/worker.go:301\ngithub.com/tinkerbell/tink/cmd/tink-worker/cmd.NewRootCommand.func2\n\t/home/Chitrabasu/Downloads/work/repo/tink/cmd/tink-worker/cmd/root.go:74\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/Chitrabasu/Downloads/work/gocode/pkg/mod/github.com/spf13/[email protected]/command.go:842\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/Chitrabasu/Downloads/work/gocode/pkg/mod/github.com/spf13/[email protected]/command.go:950\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/Chitrabasu/Downloads/work/gocode/pkg/mod/github.com/spf13/[email protected]/command.go:887\nmain.main\n\t/home/Chitrabasu/Downloads/work/repo/tink/cmd/tink-worker/main.go:29\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1373\nDOCKER PULL\ngithub.com/tinkerbell/tink/cmd/tink-worker/internal.(*RegistryConnDetails).pullImage\n\t/home/Chitrabasu/Downloads/work/repo/tink/cmd/tink-worker/internal/registry.go:86\ngithub.com/tinkerbell/tink/cmd/tink-worker/internal.(*Worker).execute\n\t/home/Chitrabasu/Downloads/work/repo/tink/cmd/tink-worker/internal/worker.go:101\ngithub.com/tinkerbell/tink/cmd/tink-worker/internal.(*Worker).ProcessWorkflowActions\n\t/home/Chitrabasu/Downloads/work/repo/tink/cmd/tink-worker/internal/worker.go:301\ngithub.com/tinkerbell/tink/cmd/tink-worker/cmd.NewRootCommand.func2\n\t/home/Chitrabasu/Downloads/work/repo/tink/cmd/tink-worker/cmd/root.go:74\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/Chitrabasu/Downloads/work/gocode/pkg/mod/github.com/spf13/[email protected]/command.go:842\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/Chitrabasu/Downloads/work/gocode/pkg/mod/github.com/spf13/[email protected]/command.go:950\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/Chitrabasu/Downloads/work/gocode/pkg/mod/github.com/spf13/[email protected]/command.go:887\nmain.main\n\t/home/Chitrabasu/Downloads/work/repo/tink/cmd/tink-worker/main.go:29\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1373\nDOCKER PULL\ngithub.com/tinkerbell/tink/cmd/tink-worker/internal.(*Worker).execute\n\t/home/Chitrabasu/Downloads/work/repo/tink/cmd/tink-worker/internal/worker.go:102\ngithub.com/tinkerbell/tink/cmd/tink-worker/internal.(*Worker).ProcessWorkflowActions\n\t/home/Chitrabasu/Downloads/work/repo/tink/cmd/tink-worker/internal/worker.go:301\ngithub.com/tinkerbell/tink/cmd/tink-worker/cmd.NewRootCommand.func2\n\t/home/Chitrabasu/Downloads/work/repo/tink/cmd/tink-worker/cmd/root.go:74\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/Chitrabasu/Downloads/work/gocode/pkg/mod/github.com/spf13/[email protected]/command.go:842\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/Chitrabasu/Downloads/work/gocode/pkg/mod/github.com/spf13/[email protected]/command.go:950\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/Chitrabasu/Downloads/work/gocode/pkg/mod/github.com/spf13/[email protected]/command.go:887\nmain.main\n\t/home/Chitrabasu/Downloads/work/repo/tink/cmd/tink-worker/main.go:29\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1373"}
Error: worker Finished with error: DOCKER PULL: DOCKER PULL: failed to register layer: Error processing tar file(exit status 1): write /root/test123/570bbc7011b5f6155f4631fc33753f02f7e8288bd8f2fac20d766e437e8f1bbe/layer.tar: no space left on device
Usage:
  tink-worker [flags]

Flags:
      --capture-action-logs        Capture action container output as part of worker logs (default true)
  -r, --docker-registry string     Sets the Docker registry (DOCKER_REGISTRY)
  -h, --help                       help for tink-worker
  -i, --id string                  Sets the worker id (ID)
      --max-file-size int          Maximum file size in bytes (MAX_FILE_SIZE) (default 10485760)
      --max-retry int              Maximum number of retries to attempt (MAX_RETRY) (default 3)
  -p, --registry-password string   Sets the registry-password (REGISTRY_PASSWORD)
  -u, --registry-username string   Sets the registry username (REGISTRY_USERNAME)
      --retry-interval duration    Retry interval in seconds (RETRY_INTERVAL) (default 3ns)
      --timeout duration           Max duration to wait for worker to complete (TIMEOUT) (default 1h0m0s)
  -v, --version                    version for tink-worker

Please note:
This change, additionally, alters the details of logs collection when an image is pulled. This is to avoid unnecessary details in the logs. Same has been concluded to remove via #451

@Cbkhare Cbkhare requested review from invidian and thebsdbox March 8, 2021 13:05
@Cbkhare Cbkhare self-assigned this Mar 8, 2021
@Cbkhare Cbkhare added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 8, 2021
@codecov
Copy link

codecov bot commented Mar 8, 2021

Codecov Report

Merging #450 (f4fdfa2) into master (a49e7e6) will decrease coverage by 3.49%.
The diff coverage is 87.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #450      +/-   ##
==========================================
- Coverage   35.82%   32.32%   -3.50%     
==========================================
  Files          47       50       +3     
  Lines        2892     3276     +384     
==========================================
+ Hits         1036     1059      +23     
- Misses       1763     2123     +360     
- Partials       93       94       +1     
Impacted Files Coverage Δ
cmd/tink-worker/internal/registry.go 71.87% <87.50%> (ø)
cmd/tink-worker/internal/action.go 0.00% <0.00%> (ø)
cmd/tink-worker/internal/worker.go 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a49e7e6...f4fdfa2. Read the comment docs.

Comment on lines 67 to 92
if _, err := io.Copy(os.Stdout, out); err != nil {
return err
fd := json.NewDecoder(out)
var status *ImagePullStatus
for {
if err := fd.Decode(&status); err != nil {
if err == io.EOF {
break
}
}
if status.Error != "" {
return errors.Wrap(errors.New(status.Error), "DOCKER PULL")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the behavior and now hides the pulling logs, which I guess might be helpful while pulling large image.

Copy link
Contributor Author

@Cbkhare Cbkhare Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that would be only possible if we print the status.
But the only logs which we get earlier where the stream logs, IMHO they might not be really that useful as we are catching the error. Also, later at some point if someone wishes to catch some details from the stream like image details etc and modify something, now that is possible with the struct.

If we wish to log in, we can place a logger and that can be taken as a separate issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use TeeReader to both print the messages and decode them to retain the old behavior?

Copy link
Contributor Author

@Cbkhare Cbkhare Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so TeeReader will return a Reader object and data will be pushed to stdout when we read it. Which will fail but go won't catch the error. I tested it, while downloading the image it failed in between, but didn't return the error. i used ReadAll to read the reader object as per the example in the documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can defer io.Copy on tee reader to keep the output, then read and decode it as done in this PR. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think defer will make a difference. Both of them eventually will call the read. Secondly, reader object can not be read twice.
Even if we mae it work, IMHO that would be an overhead for the image pull method.
Do you have any specific use case where logs are required ?

I think, one of the below three an be an approach:

  1. keep io.Copy and document the error.
  2. keep the code change as in PR. For logs, we can add a logger from a separate issue.
  3. File a issue in repo of docker. Actually, you will find the issue on docker repo with the same context, and suggestions are there are to increase the host size. I didn't got lucky to find some other suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any specific use case where logs are required ?

It's the progress indicator, it's always nice to see in logs that something is happening. But if maintainers agrees this is not very important, I'm fine with it.

Secondly, reader object can not be read twice.

This is why you could use TeeReader, to duplicate the stream.

I also found moby/moby#36253 now, it seems ImagePull has even more bugs :|

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also found moby/moby#36253 now, it seems ImagePull has even more bugs :|

Woh! this is surprising, I didn't notice this.

This is why you could use TeeReader, to duplicate the stream.

I tried to replicate this scenario, as in below code snippet, but I didn't observed that object can be streamed/read again. Can you share me an example on what you are suggesting,

	if err != nil {
		fmt.Println(reader)
		return
	}
	r := io.TeeReader(reader, os.Stdout)

        // streaming the object
	_, err= io.ReadAll(r)
        if err != nil {
                return err
        }

        // re streaming the object. 
        fd := json.NewDecoder(r)

        type Status struct {
        Status         string `json:"status"`
        Error          string `json:"error"`
        Progress       string `json:"progress"`
        ProgressDetail struct {
            Current int `json:"current"`
            Total   int `json:"total"`
            } `json:"progressDetail"`
        }
	var status *Status

        for {
        if err := fd.Decode(&status); err != nil {
	    if err == io.EOF {
		    break
            }
            fmt.Println("caught ", err, status)
        }
        if status.Error != "" {
		fmt.Println("Failed ", status.Error)
	}
	fmt.Printf("Status: %+v\n", status)
        }
        fmt.Println("done")

if the error is caught correctly, done shouldn't be printed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about something like:

package main

import (
        "bytes"
        "encoding/json"
        "fmt"
        "io"
        "io/ioutil"
        "os"
        "strings"
)

// ImagePullStatus is the status of the downloaded Image chunk
type ImagePullStatus struct {
        Status         string `json:"status"`
        Error          string `json:"error"`
        Progress       string `json:"progress"`
        ProgressDetail struct {
                Current int `json:"current"`
                Total   int `json:"total"`
        } `json:"progressDetail"`
}

func main() {
        r := ioutil.NopCloser(strings.NewReader("hello world")) // r type is io.ReadCloser

        // Ensure ReadCloser will be closed and error will be handled.
        //
        // Error handling is not critical for execution, so only inform user
        // if error occurs, but proceed with execution.
        defer func() {
                fmt.Println("Closing reader")
                if err := r.Close(); err != nil {
                        fmt.Printf("Closing reader: %v\n", err)
                }
        }()

        // Make a copy of logs produced by reader into buffer, so it can be copied into stdout.
        var bufferRead bytes.Buffer

        reader := io.TeeReader(r, &bufferRead)

        // Stream copied logs.
        //
        // Error handling is not critical for execution, so only inform user
        // if error occurs, but proceed with execution.
        go func() {
                fmt.Println("Copying logs to stdout")

                if _, err := io.Copy(&bufferRead, os.Stdout); err != nil {
                        fmt.Printf("Writing logs to     stdout: %v\n", err)
                }
        }()

        fd := json.NewDecoder(reader)

        var status *ImagePullStatus

        fmt.Println("Entering decode loop")
        for {
                if err := fd.Decode(&status); err != nil {
                        if err == io.EOF {
                                break
                        }

                        fmt.Println("Decoding error occurred: %v", err)
                        continue
                }

                if status.Error != "" {
                        fmt.Printf("Error occurred: %v", status.Error)
                        return
                }
        }
}

This loops infinitely, it shouldn't, but I hope you get the idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! A couple of things! @invidian thank you for catching this. @Cbkhare you should limit the scope of a PR to a minimum. Changing the logging behavior is probably out of scope.

A few days ago something funny happened. Another member of the community asked for the opposite of what @invidian wants. Let's discuss "image pull logging" there because as I said, this change is out of scope for this PR.

#451

@@ -64,8 +74,17 @@ func (r *RegistryConnDetails) pullImage(ctx context.Context, cli *client.Client,
return errors.Wrap(err, "DOCKER PULL")
}
defer out.Close()
if _, err := io.Copy(os.Stdout, out); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think I don't understand. so cli.ImagePull does not return error when you run out of disk space? This seems like a bug to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is a bug. I think it is probably with docker.

As of now, this change gives us leverage to bypass the error gracefully. But eventually, a proper fix on the go-client or docker would be required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, i would have personally preferred the io.Copy catching that error.

On our last syncup call me and @thebsdbox discussed this in details, so we thought to have a workaround ready at least for now.

I also, think an alternative way could be to document this and ask the user to increase the disk space when this error is encountered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been reported upstream? I'm also the user of cli.ImagePull, so I'm curious under what conditions it may fail, but not return error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think probably the issue is with io.Copy writing stream to stdout instead of cli.ImagePull.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

io.Copy happens after cli.ImagePull error checking. If Image pulling fail, I'd expect no logs will be printed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image is pulled in streams and read by io.Copy in chunks. Once one of the chunk is read EOF is generated. When read of second chunk fails, the interrupt is either not generated or go compiler doesn't receives it, so in a way for go error never occurred and that is the reason why go is not sending the error back. I have pasted the logs in the issue #382 .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what are you referring to as "go compiler", but I don't think the compiler is involved in here. Maybe you meant Go runtime?

So also looking at moby/moby#36253, it seems cli.ImagePull does not really return error when pulling image fails, but streams the possible errors via the reader. This is hell-confusing API 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created moby/moby#42126, maybe they improve it somehow...

@Cbkhare Cbkhare requested a review from invidian March 8, 2021 14:50
Comment on lines 80 to 90
if err := fd.Decode(&status); err != nil {
if err == io.EOF {
break
}
}
if status.Error != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If error occurs but it's not EOF, deferencing status for Error field will panic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I have added a return statement if Decode fails.

}
if status.Error != "" {
return errors.Wrap(errors.New(status.Error), "DOCKER PULL")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed on Slack, if we add fmt.Printf("%+v\n", status) below this line, we retain logging functionality and everything should work nicely then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not relevant anymore. See #451.

@gianarb
Copy link
Contributor

gianarb commented Mar 10, 2021

Ok! As we discussed here #451 we can remove the progress bar because it is not really a good way when to communicate what a program is doing in this context. It is good for humans, but not ideal in this scenario.

So, even if it was not part of this work let's do it here in a separate commit @Cbkhare

Thanks a lot

@Cbkhare
Copy link
Contributor Author

Cbkhare commented Mar 10, 2021

@gianarb Separate commit wont be required. Added change itself covers that. We can just edit the details in the PR, add context of 451 and reference all the details.

@invidian would you also like to add your any other input.

invidian
invidian previously approved these changes Mar 10, 2021
Copy link
Contributor

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think commits can be squashed, but it looks good to me 👍

@Cbkhare
Copy link
Contributor Author

Cbkhare commented Mar 10, 2021

I think commits can be squashed, but it looks good to me 👍

Yes, right. I waiting for conclusion and then i can squash.

I usually prefer do it at last, One squash for all the commits. 😂

return err
fd := json.NewDecoder(out)
var status *ImagePullStatus
for {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could add some units tests for this. They shouldn't be too hard to write and I think it would be better to have some internal tests, rather than no tests.

@gianarb gianarb added the needs-tests Needs supporting unit tests label Mar 11, 2021
@Cbkhare Cbkhare force-pushed the graceful_docker_exit_#382 branch from ca4f8a9 to b8fa81c Compare March 17, 2021 17:11
@Cbkhare Cbkhare marked this pull request as draft March 17, 2021 17:12
@Cbkhare Cbkhare force-pushed the graceful_docker_exit_#382 branch from 032d586 to 1387b5a Compare March 17, 2021 19:11
@Cbkhare Cbkhare removed the draft-pr label Mar 17, 2021
@Cbkhare Cbkhare marked this pull request as ready for review March 17, 2021 19:13
@mmlb
Copy link
Contributor

mmlb commented Mar 17, 2021

Will a docker pull error cause tink-worker to exit?

@Cbkhare
Copy link
Contributor Author

Cbkhare commented Mar 17, 2021

Will a docker pull error cause tink-worker to exit?

yes, currently if there is a docker pull error then tink-worker exits

@mmlb
Copy link
Contributor

mmlb commented Mar 17, 2021

Will a docker pull error cause tink-worker to exit?

yes, currently if there is a docker pull error then tink-worker exits

We should not do this, this will be a hindrance to EM adoption and we'd have to patch that out. tink-worker should always be up to hopefully try and recover.

@Cbkhare
Copy link
Contributor Author

Cbkhare commented Mar 17, 2021

Will a docker pull error cause tink-worker to exit?

yes, currently if there is a docker pull error then tink-worker exits

We should not do this, this will be a hindrance to EM adoption and we'd have to patch that out. tink-worker should always be up to hopefully try and recover.

This change does not alter the tink-worker existing functionality. This just does a gracefully exit and returns a proper error, instead of bailing out with a random error.

IMHO, If we wish to remove the exit of tink-worker when docker failure occurs, then we can create a separate issue and fix it..

@mmlb
Copy link
Contributor

mmlb commented Mar 17, 2021

Will a docker pull error cause tink-worker to exit?

yes, currently if there is a docker pull error then tink-worker exits

We should not do this, this will be a hindrance to EM adoption and we'd have to patch that out. tink-worker should always be up to hopefully try and recover.

This change does not alter the tink-worker existing functionality. This just does a gracefully exit and returns a proper error, instead of bailing out with a random error.

IMHO, If we wish to remove the exit of tink-worker when docker failure occurs, then we can create a separate issue and fix it..

Yep that sounds good, I was likely to do the same :D

@mmlb
Copy link
Contributor

mmlb commented Mar 17, 2021

#463

@Cbkhare Cbkhare requested a review from invidian March 18, 2021 06:13
@@ -46,8 +56,12 @@ func (r *RegistryConnDetails) NewClient() (*client.Client, error) {
return c, nil
}

type ImagePuller interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this interface need to be exported?

Copy link
Contributor Author

@Cbkhare Cbkhare Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, this can be updated to un-exportable. Done

Comment on lines 66 to 73
if test.err != nil {
assert.Error(t, err)
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think test should also do some assertions when error is not expected, but it was returned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, for success case, only possible scenario is that err would be nil. Which we are covering implicitly in the assertion.

s = "{\"error\": \"\"}"
err = errors.New("Tested, failure of the image pull")
} else if str == "test/fail_partial" {
s = "{\"status\": \"hello\",\"error\":\"\"}{\"status\":\"world\",\"error\":\"Tested, failure of No space left on device\"}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have this data defined directly in the test case, to keep things simple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I thought it could be better, but then I have made changes as per your suggestion.

Comment on lines 30 to 31
if str == "test/success" {
s = "{\"status\": \"hello\",\"error\":\"\"}{\"status\":\"world\",\"error\":\"\"}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should only be a test case when decoding fails, as IMO it's important part of the spec, to decide on what to do if Docker throws some unexpected message.

Copy link
Contributor Author

@Cbkhare Cbkhare Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

third case does that. In that scenario, while reading the stream docker will bail out with No space left on device . you can notice, third case inside Imagepull method returns the err as nil

Comment on lines 38 to 40
stringReader := strings.NewReader(s)
stringReadCloser := ioutil.NopCloser(stringReader)
return stringReadCloser, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to also test case where ImagePull returns error directly, not via decoder. This mock method could return an error when given string is empty.

Copy link
Contributor Author

@Cbkhare Cbkhare Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second case does that Tested, failure of the image pull

service := "github.com/tinkerbell/tink"
logger, err := log.Init(service)
if err != nil {
panic(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/golang/go/wiki/CodeReviewComments#dont-panic

I'd use t.Fatalf instead and pass *testing.T as a parameter to this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Cbkhare Cbkhare requested a review from invidian March 18, 2021 14:26
Cbkhare and others added 2 commits March 18, 2021 19:58
added check for Decode statement

added unit tests

added unit tests

added unit tests

added unit tests
@Cbkhare Cbkhare force-pushed the graceful_docker_exit_#382 branch from 0a3bcfb to 3d5f797 Compare March 18, 2021 14:28
Copy link
Contributor

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM. We could test one more scenario:

diff --git cmd/tink-worker/internal/registry.go cmd/tink-worker/internal/registry.go
index 97bc13a..2f177ba 100644
--- cmd/tink-worker/internal/registry.go
+++ cmd/tink-worker/internal/registry.go
@@ -85,7 +85,7 @@ func (r *RegistryConnDetails) pullImage(ctx context.Context, cli imagePuller, im
                        if err == io.EOF {
                                break
                        }
-                       return errors.Wrap(err, "DOCKER PULL")
+                       //return errors.Wrap(err, "DOCKER PULL")
                }
                if status.Error != "" {
                        return errors.Wrap(errors.New(status.Error), "DOCKER PULL")

Right now with this patch, tests are still passing.

@Cbkhare
Copy link
Contributor Author

Cbkhare commented Mar 21, 2021

Mostly LGTM. We could test one more scenario:

diff --git cmd/tink-worker/internal/registry.go cmd/tink-worker/internal/registry.go
index 97bc13a..2f177ba 100644
--- cmd/tink-worker/internal/registry.go
+++ cmd/tink-worker/internal/registry.go
@@ -85,7 +85,7 @@ func (r *RegistryConnDetails) pullImage(ctx context.Context, cli imagePuller, im
                        if err == io.EOF {
                                break
                        }
-                       return errors.Wrap(err, "DOCKER PULL")
+                       //return errors.Wrap(err, "DOCKER PULL")
                }
                if status.Error != "" {
                        return errors.Wrap(errors.New(status.Error), "DOCKER PULL")

Right now with this patch, tests are still passing.

yes, that return of error is specific to Decode. Decode failure is not added in test cases and I think it is not required.

@Cbkhare Cbkhare requested a review from invidian March 21, 2021 14:45
@gianarb gianarb requested review from gianarb and removed request for gianarb March 23, 2021 12:19
@gianarb gianarb removed the needs-tests Needs supporting unit tests label Mar 26, 2021
@gianarb gianarb added the ready-to-merge Signal to Mergify to merge the PR. label Mar 26, 2021
Copy link
Contributor

@thebsdbox thebsdbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, thanks @Cbkhare

@gauravgahlot
Copy link
Contributor

@Cbkhare Could you update the branch, please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. ready-to-merge Signal to Mergify to merge the PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants