Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace multiple contiguous forward-slash characters with a single one #1396

Closed
wants to merge 2 commits into from

Conversation

staticfloat
Copy link
Member

So that naive functions such as split_path don't end up with empty substrings in their pathnames

I'm not sure which other functions this should be a part of, this was to address #1387

…e, so that naive functions such as split_path don't end up with empty substrings in their pathnames
@johnmyleswhite
Copy link
Member

Looks good to me.

@nolta
Copy link
Member

nolta commented Oct 16, 2012

Even after applying this patch,

julia> split_path("a///c")
4-element String Array:
 "a"
 "" 
 "" 
 "c"

so it doesn't seem to fix the real problem. Anyway, paths w/ multiple slashes are valid.

@staticfloat
Copy link
Member Author

Right, this was only to deal with file_path...... I suppose we should just have a generic clean_filepath function that all filepath-dealing functions should call out to to sanitize their filepaths first? How does that sound?

@staticfloat
Copy link
Member Author

Actually, on second thought, we could just use real_path for this very purpose. This dereferences symlinks, culls out /../ terms, etc..... Not sure how efficient it is, but I'll look into places that it makes sense to wrap things in it for sanity reasons

@staticfloat
Copy link
Member Author

Except..... real_path doesn't work for non-existent files.

@staticfloat
Copy link
Member Author

Found abs_path, which does a better job than a quick regex, and integrated it into split_path and fullfile as well. Didn't feel comfortable putting it in any other functions, but do any of you see places this kind of change would be beneficial?

@timholy
Copy link
Member

timholy commented Oct 16, 2012

Another way to go would be to create a print_joined_nodup function (based on print_joined in string.jl) that examines whether the previous string already has the delimiter as its last character. That wouldn't handle arbitrarily long sequences of '/', but would solve the problem that spawned this initiative.

@timholy
Copy link
Member

timholy commented Oct 16, 2012

I should have added: the advantage of this route is that rather than building something, and then checking it for problems and recreating if necessary, you just get it right the first time.

@staticfloat
Copy link
Member Author

I think abs_path is a better solution in this case. If I were designing this from scratch, I would define a special FilePath type whose constructor basically wraps string with abs_path or some similar function guarding the gate to the outside world. Relative paths should be disambiguated to full paths as soon as possible, and keeping things like "/../" or "//" inside paths serves no purpose and will cause naive functions to stumble.

@JeffBezanson
Copy link
Member

split can also be told not to include empty tokens.
On Oct 16, 2012 3:16 PM, "Elliot Saba" [email protected] wrote:

I think abs_path is a better solution in this case. If I were designing
this from scratch, I would define a special FilePath type whose constructor
basically wraps string with abs_path or some similar function guarding
the gate to the outside world. Relative paths should be disambiguated to
full paths as soon as possible, and keeping things like "/../" or "//"
inside paths serves no purpose and will cause naive functions to stumble.


Reply to this email directly or view it on GitHubhttps://github.com//pull/1396#issuecomment-9509127.

@StefanKarpinski
Copy link
Member

You have to be careful with that and leading slashes though.

@staticfloat
Copy link
Member Author

@timholy: We can't rely on it being right the first time, because users can
enter their own strings at any time, and double forward slashes, /../
portions, etc are all valid paths, but will cause algorithms to freak out.
If we want to go the "sanitize at construction and KNOW that the path is
clean thereafter" route, I suggest creating a FilePath type whose
constructor takes in a string and cleans it up. Any file_ function would
then take a FilePath and know that it is a clean path.

If we want to take in arbitrary strings, though, we need to always treat
them suspiciously.
On Oct 16, 2012 5:40 PM, "Stefan Karpinski" [email protected]
wrote:

You have to be careful with that and leading slashes though.


Reply to this email directly or view it on GitHubhttps://github.com//pull/1396#issuecomment-9512456.

@timholy
Copy link
Member

timholy commented Oct 17, 2012

@staticfloat , good points, carry on...

@staticfloat
Copy link
Member Author

@StefanKarpinski, @JeffBezanson: What do you think about creating a FilePath type as described in my comment two places up and using that in pretty much everything in file.jl?

@JeffBezanson
Copy link
Member

I don't like it that much, since it strikes me as basically replacing filefunc("path") with filefunc(FilePath("path")), which could equivalently be filefunc(path_normalize("path")). (Of course the extra calls would generally be inside definitions rather than written by the end user).
I suspect programmers will usually want to work with paths as strings, and so will always be passing strings in anyway.

@staticfloat
Copy link
Member Author

I agree with your points, but I see a few pros that may still outweigh the cons:

  • If dealing with FilePath objects, the path normalization happens automagically, as we would have a convert rule defined for going from String -> FilePath
  • The path normalization would only have to happen once per path. Clearly if the user uses a bunch of strings on their end, we'll have to clean up the string every time they pass it in, but functions that deal with FilePath can pass those FilePath objects off to other functions without having to clean them up. It's essentially a "Clean" marker
  • I'm not sure if this is viable, but might it be possible to subclass String and thus be able to return a FilePath to the user, but have the user still use it like a String? (This would make the process all but transparent to the user, and make our "is this string clean?" check simply a typeof check)

@pao
Copy link
Member

pao commented Oct 18, 2012

The "safety" provided by a new type is still a consenting-adults thing until #13. (Thus adding this to the list of things to maybe look at again when it's resolved.)

@staticfloat
Copy link
Member Author

@pao: Are you referring to the lack of an immutable keyword? I suppose people could try to change the FilePath, but any modifications they make would have to go through the FilePath methods, so I don't see why this would be a problem safety-wise. For instance, you could imagine the following:

julia>  FilePath a = "/tmp/../tmp"
"/tmp/"

julia>  a = a * "mydir/" * "myfile"

"/tmp/mydir/myfile"

Where every operation performed inside FilePath is passed through path_normalize or whatever, so that functions that deal with it don't have to do so. I'm not sure how immutability would help this.

@JeffBezanson
Copy link
Member

This also might be a premature optimization. It may not matter to repeatedly normalize paths.

@staticfloat
Copy link
Member Author

Fair enough. Let's just roll with what we have, then, and leave this for a rainy day when issues like this actually start to pop up.

@pao
Copy link
Member

pao commented Oct 19, 2012

My point was that you could directly access the field after the path was constructed,which subverts the work done by the constructor. Doesn't matter in light of current decisions, though.

@nolta
Copy link
Member

nolta commented Oct 19, 2012

The split_path bug was fixed in 5b5824d, so i'm going to close this.

@nolta nolta closed this Oct 19, 2012
@staticfloat staticfloat deleted the filepathfix branch October 18, 2016 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants