Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always snapshot files in COPY and RUN commands #289

Merged
merged 4 commits into from
Aug 25, 2018
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 23 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ We do **not** recommend running the kaniko executor binary in another image, as
- [Security](#security)
- [Comparison with Other Tools](#comparison-with-other-tools)
- [Community](#community)
- [Limitations](#limitations)

_If you are interested in contributing to kaniko, see [DEVELOPMENT.md](DEVELOPMENT.md) and [CONTRIBUTING.md](CONTRIBUTING.md)._

Expand Down Expand Up @@ -256,7 +257,8 @@ To configure credentials, you will need to do the following:
#### --snapshotMode

You can set the `--snapshotMode=<full (default), time>` flag to set how kaniko will snapshot the filesystem.
If `--snapshotMode=time` is set, only file mtime will be considered when snapshotting.
If `--snapshotMode=time` is set, only file mtime will be considered when snapshotting (see
[limitations related to mtime](#mtime-and-snapshotting)).

#### --build-arg

Expand Down Expand Up @@ -356,3 +358,23 @@ provides.
[kaniko-users](https://groups.google.com/forum/#!forum/kaniko-users) Google group

To Contribute to kaniko, see [DEVELOPMENT.md](DEVELOPMENT.md) and [CONTRIBUTING.md](CONTRIBUTING.md).

## Limitations

### mtime and snapshotting

When taking a snapshot, kaniko's hashing algorithms include (or in the case of
[`--snapshotMode=time`](#--snapshotmode), only use) a file's
[`mtime`](https://en.wikipedia.org/wiki/Inode#POSIX_inode_description) to determine
if the file has changed. Unfortunately there is a delay between when changes to a
file are made and when the `mtime` is updated. This means:

* With the time-only snapshot mode (`--snapshotMode=time`), kaniko may miss changes
introduced by `RUN` commands entirely.
* With the default snapshot mode (`--snapshotMode=full`), whether or not kaniko will
add a layer in the case where a `RUN` command modifies a file **but the contents do
not** change is theoretically non-deterministic. This _does not affect the contents_
which will still be correct, but it does affect the number of layers.

_Note that these issues are currently theoretical only. If you see this issue occur, please
[open an issue](https://github.com/GoogleContainerTools/kaniko/issues)._
49 changes: 49 additions & 0 deletions integration/dockerfiles/Dockerfile_test_copy_same_file_many_times
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
FROM alpine@sha256:5ce5f501c457015c4b91f91a15ac69157d9b06f1a75cf9107bf2b62e0843983a
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SOLID

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X'D

7 changes: 3 additions & 4 deletions integration/integration_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -135,11 +135,10 @@ func TestMain(m *testing.M) {
defer DeleteFromBucket(fileInBucket)

fmt.Println("Building kaniko image")
buildKaniko := exec.Command("docker", "build", "-t", ExecutorImage, "-f", "../deploy/Dockerfile", "..")
err = buildKaniko.Run()
cmd := exec.Command("docker", "build", "-t", ExecutorImage, "-f", "../deploy/Dockerfile", "..")
_, err = RunCommandWithoutTest(cmd)
if err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think go convention is to write error messages like this as

if _, err := RunCommandWithoutTest(cmd); err != nil {

}

fmt.Print(err)
fmt.Print("Building kaniko failed.")
fmt.Printf("Building kaniko failed: %s", err)
os.Exit(1)
}

Expand Down
14 changes: 8 additions & 6 deletions pkg/commands/volume.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ limitations under the License.
package commands

import (
"fmt"
"os"
"strings"

Expand Down Expand Up @@ -53,13 +54,14 @@ func (v *VolumeCommand) ExecuteCommand(config *v1.Config, buildArgs *dockerfile.
return err
}

logrus.Infof("Creating directory %s", volume)
if err := os.MkdirAll(volume, 0755); err != nil {
return err
// Only create and snapshot the dir if it didn't exist already
if _, err := os.Stat(volume); os.IsNotExist(err) {
logrus.Infof("Creating directory %s", volume)
v.snapshotFiles = []string{volume}
if err := os.MkdirAll(volume, 0755); err != nil {
return fmt.Errorf("Could not create directory for volume %s: %s", volume, err)
}
}

//Check if directory already exists?
v.snapshotFiles = append(v.snapshotFiles, volume)
}
config.Volumes = existingVolumes

Expand Down
10 changes: 8 additions & 2 deletions pkg/commands/workdir.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,14 @@ func (w *WorkdirCommand) ExecuteCommand(config *v1.Config, buildArgs *dockerfile
config.WorkingDir = filepath.Join(config.WorkingDir, resolvedWorkingDir)
}
logrus.Infof("Changed working directory to %s", config.WorkingDir)
w.snapshotFiles = []string{config.WorkingDir}
return os.MkdirAll(config.WorkingDir, 0755)

// Only create and snapshot the dir if it didn't exist already
if _, err := os.Stat(config.WorkingDir); os.IsNotExist(err) {
logrus.Infof("Creating directory %s", config.WorkingDir)
w.snapshotFiles = []string{config.WorkingDir}
return os.MkdirAll(config.WorkingDir, 0755)
}
return nil
}

// FilesToSnapshot returns the workingdir, which should have been created if it didn't already exist
Expand Down
40 changes: 29 additions & 11 deletions pkg/executor/build.go
Original file line number Diff line number Diff line change
Expand Up @@ -87,23 +87,41 @@ func DoBuild(opts *options.KanikoOptions) (v1.Image, error) {
if err := dockerCommand.ExecuteCommand(&imageConfig.Config, buildArgs); err != nil {
return nil, err
}
// Don't snapshot if it's not the final stage and not the final command
// Also don't snapshot if it's the final stage, not the final command, and single snapshot is set
if (!finalStage && !finalCmd) || (finalStage && !finalCmd && opts.SingleSnapshot) {
continue
}
// Now, we get the files to snapshot from this command and take the snapshot
snapshotFiles := dockerCommand.FilesToSnapshot()
if finalCmd {
snapshotFiles = nil
var contents []byte

// If this is an intermediate stage, we only snapshot for the last command and we
// want to snapshot the entire filesystem since we aren't tracking what was changed
// by previous commands.
if !finalStage {
if finalCmd {
contents, err = snapshotter.TakeSnapshotFS()
}
} else {
// If we are in single snapshot mode, we only take a snapshot once, after all
// commands have completed.
if opts.SingleSnapshot {
if finalCmd {
contents, err = snapshotter.TakeSnapshotFS()
}
} else {
// Otherwise, in the final stage we take a snapshot at each command. If we know
// the files that were changed, we'll snapshot those explicitly, otherwise we'll
// check if anything in the filesystem changed.
if snapshotFiles != nil {
contents, err = snapshotter.TakeSnapshot(snapshotFiles)
} else {
contents, err = snapshotter.TakeSnapshotFS()
}
}
}
contents, err := snapshotter.TakeSnapshot(snapshotFiles)
if err != nil {
return nil, err
return nil, fmt.Errorf("Error taking snapshot of files for command %s: %s", dockerCommand, err)
}

util.MoveVolumeWhitelistToWhitelist()
if contents == nil {
logrus.Info("No files were changed, appending empty layer to config.")
logrus.Info("No files were changed, appending empty layer to config. No layer added to image.")
continue
}
// Append the layer to the image
Expand Down
15 changes: 15 additions & 0 deletions pkg/snapshot/layered_map.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ limitations under the License.
package snapshot

import (
"fmt"
"path/filepath"
"strings"
)
Expand Down Expand Up @@ -82,6 +83,20 @@ func (l *LayeredMap) MaybeAddWhiteout(s string) (bool, error) {
return true, nil
}

// Add will add the specified file s to the layered map.
func (l *LayeredMap) Add(s string) error {
newV, err := l.hasher(s)
if err != nil {
return fmt.Errorf("Error creating hash for %s: %s", s, err)
}
l.layers[len(l.layers)-1][s] = newV
return nil
}

// MaybeAdd will add the specified file s to the layered map if
// the layered map's hashing function determines it has changed. If
// it has not changed, it will not be added. Returns true if the file
// was added.
func (l *LayeredMap) MaybeAdd(s string) (bool, error) {
oldV, ok := l.Get(s)
newV, err := l.hasher(s)
Expand Down
63 changes: 50 additions & 13 deletions pkg/snapshot/snapshot.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,12 @@ package snapshot
import (
"archive/tar"
"bytes"
"fmt"
"io"
"io/ioutil"
"os"
"path/filepath"
"syscall"

"github.com/GoogleContainerTools/kaniko/pkg/constants"
"github.com/GoogleContainerTools/kaniko/pkg/util"
Expand All @@ -49,17 +51,28 @@ func (s *Snapshotter) Init() error {
return nil
}

// TakeSnapshot takes a snapshot of the filesystem, avoiding directories in the whitelist, and creates
// TakeSnapshot takes a snapshot of the specified files, avoiding directories in the whitelist, and creates
// a tarball of the changed files. Return contents of the tarball, and whether or not any files were changed
func (s *Snapshotter) TakeSnapshot(files []string) ([]byte, error) {
buf := bytes.NewBuffer([]byte{})
var filesAdded bool
var err error
if files == nil {
filesAdded, err = s.snapShotFS(buf)
} else {
filesAdded, err = s.snapshotFiles(buf, files)
filesAdded, err := s.snapshotFiles(buf, files)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think you need to declare var filesAdded bool on line 56 anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whops thanks!

if err != nil {
return nil, err
}
contents := buf.Bytes()
if !filesAdded {
return nil, nil
}
return contents, err
}

// TakeSnapshotFS takes a snapshot of the filesystem, avoiding directories in the whitelist, and creates
// a tarball of the changed files. Return contents of the tarball, and whether or not any files were changed
func (s *Snapshotter) TakeSnapshotFS() ([]byte, error) {
buf := bytes.NewBuffer([]byte{})
var filesAdded bool
filesAdded, err := s.snapShotFS(buf)
if err != nil {
return nil, err
}
Expand All @@ -70,8 +83,8 @@ func (s *Snapshotter) TakeSnapshot(files []string) ([]byte, error) {
return contents, err
}

// snapshotFiles takes a snapshot of specific files
// Used for ADD/COPY commands, when we know which files have changed
// snapshotFiles creates a snapshot (tar) and adds the specified files.
// It will not add files which are whitelisted.
func (s *Snapshotter) snapshotFiles(f io.Writer, files []string) (bool, error) {
s.hardlinks = map[uint64]string{}
s.l.Snapshot()
Expand All @@ -81,16 +94,19 @@ func (s *Snapshotter) snapshotFiles(f io.Writer, files []string) (bool, error) {
}
logrus.Infof("Taking snapshot of files %v...", files)
snapshottedFiles := make(map[string]bool)
n := len(files)
for _, file := range files {
parentDirs := util.ParentDirectories(file)
// If we do end up snapshotting the dirs, we need to snapshot them before
// we snapshot their contents
files = append(parentDirs, files...)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@priyawadhwa can you confirm if the comment i added is correct? I was trying to understand why we were actually appending to parentDirs instead of to files

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I don't think the order matters! I'm pretty sure I did that bc when I was writing the function and debugging it the logging was easier to read as [/tmp, /tmp/foo] instead of [/tmp/foo, /tmp] (but I guess I removed the logging and kept the files the way they were) 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah kk! if we're going to have 2 separate lists it wont matter too much anyway

}
filesAdded := false
w := tar.NewWriter(f)
defer w.Close()

// Now create the tar.
for _, file := range files {
for i, file := range files {
file = filepath.Clean(file)
if val, ok := snapshottedFiles[file]; ok && val {
continue
Expand All @@ -108,12 +124,24 @@ func (s *Snapshotter) snapshotFiles(f io.Writer, files []string) (bool, error) {
if err != nil {
return false, err
}
// Only add to the tar if we add it to the layeredmap.
addFile, err := s.l.MaybeAdd(file)

var fileAdded bool
lastParentFileIndex := len(files) - n
isParentDir := i < lastParentFileIndex
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is one of the grossest things ive written in a long time (knock on wood) @priyawadhwa so let me know if it's not clear and/or you'd just like it to be nicer

i think the next step to making this better would be to break some of this logic out into a separate function or two

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we could have two separate lists, files and parentDirs, and then run Add on the files and MaybeAdd on the parentDirs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk let's do it!


// If this is parent dir of the file we're snapshotting, only snapshot it
// if it changed
if isParentDir {
fileAdded, err = s.l.MaybeAdd(file)
} else {
// If this is one of the files we are snapshotting, definitely snapshot it
err = s.l.Add(file)
fileAdded = true
}
if err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same error formatting as I commented above

return false, err
return false, fmt.Errorf("Unable to add file %s to layered map: %s", file, err)
}
if addFile {
if fileAdded {
filesAdded = true
if err := util.AddToTar(file, info, s.hardlinks, w); err != nil {
return false, err
Expand All @@ -132,8 +160,17 @@ func isBuildFile(file string) bool {
return false
}

// shapShotFS creates a snapshot (tar) of all files in the system which are not
// whitelisted and which have changed.
func (s *Snapshotter) snapShotFS(f io.Writer) (bool, error) {
logrus.Info("Taking snapshot of full filesystem...")

// Some of the operations that follow (e.g. hashing) depend on the file system being synced,
// for example the hashing function that determines if files are equal uses the mtime of the files,
// which can lag if sync is not called. Unfortunately there can still be lag if too much data needs
// to be flushed or the disk does its own caching/buffering.
syscall.Sync()

s.hardlinks = map[uint64]string{}
s.l.Snapshot()
existingPaths := s.l.GetFlattenedPathsForWhiteOut()
Expand Down
Loading