Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base64 encode the file names in the 'file' physical backend #2203

Merged
merged 7 commits into from
Jan 19, 2017

Conversation

vishalnayak
Copy link
Contributor

@vishalnayak vishalnayak commented Dec 22, 2016

Fixes https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/vault-tool/D2GStAl1f1k/AoURH_uUEAAJ in a generic manner.

This will deprecate #2198.

This PR will base64 URL encodes the file name part for all the paths stored at the file backend and decodes when appropriate with backwards compatibility. This will ensure that file names are compatible with Windows platform.

Note: However, if Windows disallowed characters are present as part of the basePath, then it will not work. If we choose to encode the entire path, the directory hierarchy will flatten, which may not be desired.

Tested on Windows manually to verify the reported bug in PKI.

@vishalnayak
Copy link
Contributor Author

I think in case of Put, we should upgrade the old entries to new one with its file name encoded. This is to ensure that updates to existing storage entries works properly. I'm on it now.

@vishalnayak vishalnayak changed the title [WIP] Base64 encode the file names in the 'file' physical backend Base64 encode the file names in the 'file' physical backend Jan 4, 2017
@vishalnayak vishalnayak requested a review from jefferai January 4, 2017 04:22
@vishalnayak vishalnayak self-assigned this Jan 4, 2017
@vishalnayak vishalnayak added this to the 0.6.5 milestone Jan 4, 2017
@vishalnayak
Copy link
Contributor Author

I see a race-y situation in Put function, cycles after old entry deletion and before the new entry creation. But since the critical sections will touch different storage entries, I am not sure if we need to handle all tasks under a single lock acquisition.


// For backwards compatibility, try to delete the file without base64 URL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think probably this should actually be the other way around; we should default to doing the faster thing for new installations and fall back if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done here and for other places as well.

basePath, fileName := b.path(k)
fullPath := filepath.Join(basePath, "_"+fileName)

// If non-encoded file name is a valid storage entry, read it out.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment (and in the other functions)

// Otherwise, encode the file name and look for the storage entry.
f, err := os.Open(fullPath)
if err != nil && os.IsNotExist(err) {
fullPath = filepath.Join(basePath, "_"+base64.URLEncoding.EncodeToString([]byte(fileName)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than do the munging with base64-encoding and _ everywhere, I'd make the path function return three values instead of two: the two it originally returned, and a third value with _ plus the base64 path. This removes a lot of boilerplate elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. Done.

// New storage entries will have their file names base64 URL encoded. If a
// file with a non-encoded file name exists, it indicates that this is an
// update operation. To avoid duplication of storage entries, delete the
// old entry first.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the old entry shouldn't be deleted first; if there is an error writing the new entry then you've just had data loss.

If we prefer the base64'd version, having an old version lying around isn't really a problem unless you delete it and the old version isn't deleted.

My suggestion is that this deletion should actually be in a deferred function that is invoked iff the stat against a previous non-base64'd value is successful. This would also have to take place with the lock, so it would need to be after the lock acquisition (and also, defers are LIFO so the defer statement itself must be after the deferred unlock).

In Delete, you could help avoid any issues by also stating the non-base64 version, and deleting whichever version(s) are found.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done for most of it. I am not sure about the advantages of stating both versions and deleting it. What could go wrong if we try to delete the encoded one and if that fails, try deleting the old one? Is it that we'll not be able to detect the failure of deletion properly?

@vishalnayak vishalnayak merged commit a9d6ac5 into master Jan 19, 2017
@vishalnayak vishalnayak deleted the file-backend-base64 branch January 19, 2017 15:11
@jefferai jefferai restored the file-backend-base64 branch January 25, 2017 17:27
@jefferai
Copy link
Member

I restored the branch so it can be used as the base for a new set of changes as we needed to roll this back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants