Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add stat-based include/exclude #4102

Open
ThomasWaldmann opened this issue Oct 7, 2018 · 15 comments
Open

add stat-based include/exclude #4102

ThomasWaldmann opened this issue Oct 7, 2018 · 15 comments
Labels
cmd: create enhancement filesystem patterns pattern matching, include, exclude, ...

Comments

@ThomasWaldmann
Copy link
Member

there are tickets about doing file size based exclusion: #902, jborg/attic#330

file size is a stat result attribute, so this is a special case of a stat-based rule.

also in stat result:

  • timestamps: atime, ctime, mtime
  • type and mode
  • uid / gid

so we could add a mechanism to define inclusion / exclusion rules not only based on the file's path/name (as we already have), but also based on comparing stat attributes to given values.

@ThomasWaldmann
Copy link
Member Author

If somebody is finding this searching for a solution/workaround for borg 1.0/1.1:

  • you can exclude "known-big" files by a name-based pattern, like *.iso (or their directory, like .../Virtualbox VMs/*).
  • you can use find unix tool to create a list for --exclude-from borg option

Temporarily excluding big files is especially useful for a initial backup(s), which might take a while.

@ThomasWaldmann
Copy link
Member Author

Note: the first implementation could just limit the scope to size-based include/exclude (but when writing the code, do it in a way that e.g. timestamp-based can be easily done also).

@n-st
Copy link
Contributor

n-st commented Oct 7, 2018

you can use find unix tool to create a list for --exclude-from borg option

Beware of race conditions, though (i.e. large files appearing after you've generated the list).

@russelldavis
Copy link
Contributor

I have a proposal for how this could be implemented. Rather than a global CLI flag like exclude-by-size, it could be added as a special borg-patterns prefix that is applied at the individual pattern level. This would make it quite flexible -- you could apply the rule to certain files/directories only, use it with include patterns, etc.

There's already logic in place to handle prefixes (for R and P) so adding another one should be simple and backwards compatible. I propose calling it F for "filter". The prefix would be followed by a filter-type specifier, any arguments needed for the filter, and finally the pattern to apply the filter to.

So, to exclude files over 100M from Downloads folders, you would write:

F size > 100M -/Users/*/Downloads

To exclude files over 1G everywhere, you could add this to the command line:

borg ... --pattern='F size > 1G -**'

For other stat filters, just replace size with mtime, mode, etc.

I have more thoughts, including how to combine multiple filters together, but wanted to put this out there first. What do you think of the proposal? (If it's well received I may take a stab at implementing it.)

@ThomasWaldmann
Copy link
Member Author

In the end, guess this will need boolean expressions.

Operators and, or and not.

And the terms in these expressions would be stuff like:

  • size < 100M
  • mtime < 1d
  • user == joe

See man find about what people want to potentially find (not sure all make sense for backups) and how find does it.

As a borg backup archive is usually expected to be a full archive containing all the files in the input data set, guess the first step is to look at what makes sense.

One obvious thing is being in a hurry and wanting to make a quick first backup, ignoring huge files (like having important little documents and less important *.iso).

Other use cases?

@russelldavis
Copy link
Contributor

I think for an initial version of this, we could keep it really simple and not worry about boolean operators. I imagine use cases for them would be relatively rare. By the nature of how patterns combine, or can be already achieved by just writing two separate rules.

And multiple negated include rules (+) can be used to achieve a rough version of and. For example, to exclude user == joe && size > 1M, you could write:

F user != joe + **
F size <= 1M + **
- **

(It's not quite the same as an actual and operator when other rules are involved, since borg stops processing rules once a single match is made, but it's probably Good Enough for now.)

@russelldavis
Copy link
Contributor

Other use cases?

The main ones that come to mind are:

  • Excluding really large files in general. Protect against accidentally adding a multi-GB VM image, for example, when you know the files you actually care about backing up will be much smaller.
  • My downloads directory tends to accumulate a few random things that would be nice to back up, but I want to exclude large files.

@russelldavis
Copy link
Contributor

russelldavis commented Jun 26, 2019

Another use case that occurred to me: filtering output from borg list. You may want to check a particular archive (or iterate over all archives) and find files matching certain criteria. Examples:

  • Looking for files modified modified on a particular day
  • Searching all past archives for files over a certain size, to see what's taking up space in the repo

Although I guess this use case can already be accomplished by using borg mount and find.

@ThomasWaldmann
Copy link
Member Author

Yeah. Also this is a bit different to implement (one has to look at archived metadata vs. at stat() metadata from fs).

@ThomasWaldmann
Copy link
Member Author

it is now (master branch, later borg 1.2) possible to feed find output (paths) into borg instead of using borg's builtin recursion.

so you can do all matching/selecting that is possible via find.

@ThomasWaldmann ThomasWaldmann removed this from the hydrogen milestone Jan 3, 2021
@setaur
Copy link

setaur commented Jan 15, 2022

it is now (master branch, later borg 1.2) possible to feed find output (paths) into borg instead of using borg's builtin recursion.
so you can do all matching/selecting that is possible via find.

Could you please give more details on this or some link to this function? I was searching changelog for "find" keyword without success.

@russelldavis
Copy link
Contributor

He's referring to the unix find command.

@setaur
Copy link

setaur commented Jan 15, 2022

He's referring to the unix find command.

Of course, I know that. But how can I use it to filter files and directories to backup? Now the only solution I can think of, is to put output of my specific find command into file, each line with added specific pattern selector (borg help patterns), preferably pf: and load that file as pattern file using --pattern-from or --exclude-from arguments.
Will there be a more elegant solution?

@ThomasWaldmann
Copy link
Member Author

borg create
--paths-from-stdin
or
--paths-from-command

See there: https://borgbackup.readthedocs.io/en/1.2.0b3/usage/create.html

@ThomasWaldmann
Copy link
Member Author

related: #4972

@ThomasWaldmann ThomasWaldmann added filesystem patterns pattern matching, include, exclude, ... labels Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmd: create enhancement filesystem patterns pattern matching, include, exclude, ...
Projects
None yet
Development

No branches or pull requests

4 participants