Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --ignore-ctime flag to backup and document change detection #2823

Merged
merged 1 commit into from
Jan 28, 2021

Conversation

greatroar
Copy link
Contributor

What is the purpose of this change? What does it change?

Since restic 0.9.6, restic backup has required both the ctime and mtime on a file to match with a previously backed-up version for the file to skipped on Unix. Before 0.9.6, if the mtime matched but the ctime did not, restic backup would not rehash the file.

This PR introduces a --trust-mtime flag to restore the 0.9.5 behavior. Perhaps more importantly, it documents the rules used to determine whether a file can be skipped, since the description in the manual was incomplete and out of date.

The --ignore-inode flag already turned off the ctime check, but also removed a different check; its description in the manual was also in the wrong place.

Changes in metadata are still recorded, regardless of command line options. (This seems to have confused people in previous discussions, so it's worth pointing out.)

Was the change discussed in an issue or in the forum before?

Closes #2558. I took the flag name from Git, because the suggested --ignore-ctime suggests that the ctime is ignored also for metadata changes.

Closes #2503.

Closes #2495. There, @fd0 suggested --use-mtime, but that's confusing because the mtime is always used. @rawtaz suggested --change-detect=mtime, but it's unclear what the other behaviors should be called (ctime+mtime? ctime+mtime+inode?).

Probably fixes #2819.

Checklist

  • I have read the Contribution Guidelines
  • I have enabled maintainer edits for this PR
  • I have added tests for all changes in this PR
  • I have added documentation for the changes (in the manual)
  • There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
  • I have run gofmt on the code in all commits
  • All commit messages are formatted in the same style as the other commits in the repo
  • I'm done, this Pull Request is ready for review

@greatroar greatroar force-pushed the trust-mtime branch 2 times, most recently from 97d53b4 to 1504356 Compare July 8, 2020 12:25
Copy link
Contributor

@aawsome aawsome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only few suggestions due to my personal preferences. May as well be ignored 😄
I especially like updated documentation that clarifies things to users!

internal/archiver/archiver.go Outdated Show resolved Hide resolved
internal/archiver/archiver.go Outdated Show resolved Hide resolved
@greatroar greatroar force-pushed the trust-mtime branch 2 times, most recently from 06d0f1f to ccb4278 Compare October 15, 2020 09:13
doc/040_backup.rst Outdated Show resolved Hide resolved
@smlx
Copy link
Contributor

smlx commented Oct 16, 2020

In which situation would --trust-mtime be useful instead of --ignore-inode?

changelog/unreleased/pull-2823 Outdated Show resolved Hide resolved
changelog/unreleased/pull-2823 Outdated Show resolved Hide resolved
internal/archiver/archiver.go Show resolved Hide resolved
internal/archiver/archiver.go Outdated Show resolved Hide resolved
internal/archiver/archiver.go Outdated Show resolved Hide resolved
internal/archiver/archiver_test.go Outdated Show resolved Hide resolved
internal/archiver/archiver_test.go Outdated Show resolved Hide resolved
doc/040_backup.rst Outdated Show resolved Hide resolved
doc/040_backup.rst Outdated Show resolved Hide resolved
doc/040_backup.rst Outdated Show resolved Hide resolved
@MichaelEischer
Copy link
Member

@smlx It should be useful if someone e.g. runs a recursive chmod over lots of files. While --ignore-inode would also work, that also ignores inode changes which could cause a few changes to be missed. Some programs could (?) save data to a temporary file and then replace the original with it while keeping the modification timestamp (maybe some usage of tar?).

Another important point here is that the change detection should be as strict as possible. When a filesystem has proper inodes then it's a good idea to use that information source to detect file changes.

@greatroar
Copy link
Contributor Author

I think I've addressed all review comments now, and I've rebased on current master.

@MichaelEischer
Copy link
Member

@greatroar Thanks for the fixes so far, from what I see there are still three unaddressed comments in my last review which are hidden by default by Github.

@greatroar
Copy link
Contributor Author

Sorry, I'd missed them indeed. Addressed now.

@rawtaz
Copy link
Contributor

rawtaz commented Oct 25, 2020

Regarding the naming, would you agree with the following:

  • --ignore-ctime may result in people thinking that ctime changes are not recorded in the metadata.
  • --trust-mtime may result in people thinking that mtime is the only thing looked at to determine whether a file should be skipped or not (which isn't the case, because if mtime is unchanged but e.g. file size has changed, restic will consider the file changed).

I'm thinking that the latter misconception is worse than the former, and also messier to explain/clarify in description and docs. The former is such a small misconception that it can be clarified in the description of the option alone. Hence I'm thinking --ignore-ctime might be a better name after all (assuming a description that mentions it's just ignored for scanning). Thoughts?

@greatroar
Copy link
Contributor Author

I remember now why the ctime test was skipped on Windows previously: there is no ctime on Windows, so nothing is being tested. I disabled it again, because a skipped test is more informative than one that succeeds without testing anything.

@greatroar
Copy link
Contributor Author

@rawtaz Worse, trust-mtime might be taken to imply that the inode is not checked, while it exists explicitly to check the inode but not the ctime. Maybe ignore-ctime is the right choice after all. It's just that it looks very much like ignore-atime, but does something completely different...

@rawtaz
Copy link
Contributor

rawtaz commented Oct 25, 2020

Maybe ignore-ctime is the right choice after all. It's just that it looks very much like ignore-atime, but does something completely different.

Did you mean --with-atime? There's no --ignore-atime.

In relation to --ignore-inode, it too is only conveying a part of the truth, because even if you specify this option, which would suggest inode's aren't handled at all, restic still updates/records the new the inode number for a file if it changes.

So we're already at a point where the option name doesn't explain the entire picture, and I think that with this in mind and the small discrepancies that --ignore-inode (it's just when scanning for file changes that we ignore inode changes) and --ignore-ctime (it's just when scanning that we ignore ctime changes) has in them (which can be fixed in their description), it might be better to use --ignore-ctime.

Can you verify that if ctime changed but mtime did not, and one used e.g. --ignore-ctime, then restic considers the file unchanged (i.e. restic does what the option name suggests)?

@greatroar
Copy link
Contributor Author

Yes, I meant with-atime. Brain fart, never mind :)

Can you verify that if ctime changed but mtime did not, and one used e.g. --ignore-ctime, then restic considers the file unchanged (i.e. restic does what the option name suggests)?

$ touch -m -d '2020-10-16 15:51:38 +0200' ../../docker/Dockerfile # set mtime

$ stat ../../docker/Dockerfile
  File: ../../docker/Dockerfile
  Size: 142       	Blocks: 24         IO Block: 4096   regular file
Device: 37h/55d	Inode: 24384881    Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/      me)   Gid: ( 1000/      me)
Access: 2020-10-25 20:50:04.169688526 +0100
Modify: 2020-10-16 15:51:38.000000000 +0200
Change: 2020-10-25 20:51:10.485829582 +0100
 Birth: -

$ ./restic backup ../../docker
repository cb9cd2c1 opened successfully, password is correct
created new cache in /home/me/.cache/restic

Files:           3 new,     0 changed,     0 unmodified
Dirs:            1 new,     0 changed,     0 unmodified
Added to the repo: 2.302 KiB

processed 3 files, 910 B in 0:00
snapshot 06e70058 saved

$ chmod 664 ../../docker/Dockerfile # change ctime

$ touch -m -d '2020-10-16 15:51:38 +0200' ../../docker/Dockerfile # reset mtime

$ stat ../../docker/Dockerfile
  File: ../../docker/Dockerfile
  Size: 142       	Blocks: 24         IO Block: 4096   regular file
Device: 37h/55d	Inode: 24384881    Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/      me)   Gid: ( 1000/      me)
Access: 2020-10-25 20:51:41.461902319 +0100
Modify: 2020-10-16 15:51:38.000000000 +0200
Change: 2020-10-25 20:51:55.709937134 +0100
 Birth: -

$ ./restic backup --trust-mtime ../../docker
repository cb9cd2c1 opened successfully, password is correct

Files:           0 new,     0 changed,     3 unmodified
Dirs:            0 new,     1 changed,     0 unmodified
Added to the repo: 1.413 KiB

processed 3 files, 910 B in 0:00
snapshot 5ed2f334 saved

@rawtaz
Copy link
Contributor

rawtaz commented Oct 25, 2020

@greatroar Thanks a lot for verifying and in particular for showing how you did it so people (including me, I never used touch for that before) can learn!

So, the --ignore-ctime looking like --with-atime concern you had I guess is not a concern considering how differently those two options would be named. Also they do pretty different things.

Since what --ignore-ctime would actually do in practice is pretty much what it sounds like, I'm voting for using that as the option name. If we clarify in the description (and docs, where relevant) that it's just for the scanning, it should be fine I think. At least no worse than any misconception --ignore-inode already has in it, but more or less the same. What isn't clear as day is that in practice it means "out of mtime and ctime, fall back to whatever mtime is saying". But that's practically the same thing as "ignore ctime" :)

@greatroar greatroar changed the title Add --trust-mtime flag to backup and document change detection Add --ignore-ctime flag to backup and document change detection Oct 28, 2020
@greatroar
Copy link
Contributor Author

Changed to --ignore-ctime and added some more discussion of device numbers inspired by #3041.

@rawtaz
Copy link
Contributor

rawtaz commented Oct 28, 2020

I'd like to flip the changelog and documentation so that instead of talking about what restic does to determine if things didn't change, it talks about what it does to determine if things did change, because that is what the average user thinks of. They think in terms of restic scanning things to determine what changed and not. It'd be easier for a person reading the text to understand it.

@greatroar
Copy link
Contributor Author

@rawtaz Please provide some example text. My proposed text starts off by saying everything gets scanned, then lists the conditions where scanning can be skipped. I don't see how to do it the other way around without getting very verbose.

@MichaelEischer
Copy link
Member

@rawtaz Did you have something like

If either ctime or mtime differs from the previously backed up version, then it will be backed up again.

in mind for the changelog?

And for the documentation e.g. the following?

On **Unix** (including Linux and Mac), given that a file at a given location
also exists in a previous backup, if any of the following file metadata
attributes differs then the file contents are presumed to have changed:

@fd0
Copy link
Member

fd0 commented Dec 18, 2020

Thanks a lot for all your work on this! For the record, in the forum we've had a case of restic rescanning all data apparently because adding a hard link to a file changes the ctime: https://forum.restic.net/t/randomly-needs-to-rescan-all-data/3366/34

@rawtaz rawtaz merged commit de7e3a0 into restic:master Jan 28, 2021
@rawtaz
Copy link
Contributor

rawtaz commented Jan 28, 2021

Thank you so much greatroar, and sincere apologies for the delay over cosmetics (although it did feel a bit important).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants