Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backuptar: Fix sparse file handling #221

Merged
merged 1 commit into from
Oct 4, 2021
Merged

Conversation

kevpar
Copy link
Member

@kevpar kevpar commented Sep 23, 2021

A recent OS change altered how sparse files are represented in backup
streams. This caused backuptar to no longer work with certain files. The
specific behavior that changed is as follows:

  • Empty sparse files (size = 0), previously did not have any data or
    sparse block streams in the backup stream. Now, they will have a
    data stream with size = 0, and no sparse block streams.
  • Sparse files with a single allocated range (e.g. a normal file that
    has the sparse attribute set) previously would not show as sparse in
    the backup stream. Now, they will show as sparse.

The old backuptar behavior assumed that if the sparse flag was set on
the data stream, then there would always be a set of sparse blocks
following. These changes break this assumption, and so require special
handling.

It is unsupported to have a data stream, marked sparse, that contains
file content AND a series of sparse block streams following. As far as
I can tell this is not a valid case for backup streams.

This change also cleans up some code and error messages, and expands on
the test coverage for backuptar.

For more information on backup stream format see: https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-bkup/f67950c8-d583-469a-83dd-c4ff4cedf533

Signed-off-by: Kevin Parsons [email protected]

@kevpar kevpar marked this pull request as ready for review September 23, 2021 18:12
@kevpar kevpar requested a review from a team as a code owner September 23, 2021 18:12
A recent OS change altered how sparse files are represented in backup
streams. This caused backuptar to no longer work with certain files. The
specific behavior that changed is as follows:
- Empty sparse files (size = 0), previously did not have any data or
  sparse block streams in the backup stream. Now, they will have a
  data stream with size = 0, and no sparse block streams.
- Sparse files with a single allocated range (e.g. a normal file that
  has the sparse attribute set) previously would not show as sparse in
  the backup stream. Now, they will show as sparse.

The old backuptar behavior assumed that if the sparse flag was set on
the data stream, then there would always be a set of sparse blocks
following. These changes break this assumption, and so require special
handling.

It is unsupported to have a data stream, marked sparse, that contains
file content AND a series of sparse block streams following. As far as
I can tell this is not a valid case for backup streams.

This change also cleans up some code and error messages, and expands on
the test coverage for backuptar.

For more information on backup stream format see: https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-bkup/f67950c8-d583-469a-83dd-c4ff4cedf533

Signed-off-by: Kevin Parsons <[email protected]>
Copy link
Collaborator

@katiewasnothere katiewasnothere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm new to this code and the backup protocol, but from my understanding, this LGTM

Copy link
Contributor

@msscotb msscotb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

if dataHdr != nil {
// A data stream was found. Copy the data.
if (dataHdr.Attributes & winio.StreamSparseAttributes) == 0 {
// We assume that we will either have a data stream size > 0 XOR have sparse block streams.
if dataHdr.Size > 0 || (dataHdr.Attributes&winio.StreamSparseAttributes) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check spacing with &

Copy link
Member Author

@kevpar kevpar Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gofmt enforces this style. I don't understand the exact rule it's using to format, but if I take away the dataHdr.Size > 0 || then it puts a space in around the &, otherwise not.

@kevpar
Copy link
Member Author

kevpar commented Oct 4, 2021

I'm new to this code and the backup protocol, but from my understanding, this LGTM

@katiewasnothere is there something that should be expanded upon that would be helpful here? I had never touched this code either, and had to learn what it was doing. I tried to clarify the reasoning for the new behavior in my comments.

@katiewasnothere
Copy link
Collaborator

@katiewasnothere is there something that should be expanded upon that would be helpful here? I had never touched this code either, and had to learn what it was doing. I tried to clarify the reasoning for the new behavior in my comments.

No, it made sense! I wrote that more to say that while it looks good to me, since I'm not super familiar with the code, I could potentially be missing small details/issues.

@kevpar kevpar merged commit 7ec9238 into microsoft:master Oct 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants