-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Using types and not strings to represent Paths #26
Conversation
It has a slightly different emphasis, but I'm surprised not to see a reference to FileIO.jl here. I imagine your |
I was actually thinking of FileIO.lj when I wrote this, I haven't yet wrote anything about it into the Julep. To get information out of a file (or file-like object) , you need two things
I can see how this could fit into FileIO.jl, but it would be a widening of scope from FileIO.jl's current mission: I see these as two separate but supporting tasks. |
I definitely see the case for adding path-like objects, but I'm not as certain about deprecating |
@mbauman maybe i am wrong wrong about the need to deprecate Maybe my instincts are wrong because I am not used having multiple-dispatch. WRT encouraging the use of
|
The place where this would be a most valuable addition at this time is working towards a "virtual filesystem" abstraction which could then be used for code loading on remote nodes. |
Apparently Racket does this and people have told me that it's a big win. Maybe check out what they do? |
I'm very positive about this julep. @mbauman - can you describe in more detail the practical reasons for keeping As a point of comparison, I've always found the roughly equivalent thing in C++ (
To expand on point 1, deprecating On the subject of literals, a thorny question: Should |
I think Rust does this as well. They have something like a |
It's a large deprecation, That's not to say deprecating |
Yes, I'm not so convinced about having |
I see I was unclear here. function foo_process(content::AbstractString)
...
end
#can co exist with:
foo_process(io::IO) = foo_process(readall(io))
foo_process(filename::AbstractPath) = open(foo_process, filename) #using open(::Function, ::AbstractPath) But to accomplish this I suggest that I feel like adding the deprecation to |
Sounds perfectly sensible. What do you think about the return type of What about absolute paths, and how does this play nicely across systems where users will want to use the native path format in the REPL? Some systems adopt a platform-independent standard for writing path literals (eg, cmake, I think). The only libraries I recall parse strings as native paths, which is operating system dependent. Side note - this stuff should work really nicely as a hint for tab completion. |
I think we need to identify what the concrete advantages of path types are and then determine what we need to do to get those advantages. I'm not convinced that disallowing strings as path arguments is necessary to get the advantages. But then again, we're still a bit hazy on what the advantages are precisely, so clarifying that needs to be the next step. |
It would be useful to link to what other languages with similar ideas have done (and, ideally, the reasons why they made those choices). For example, the rust RFC is here and points out many interesting issues (e.g. the possibility of unpaired UTF-16 surrogates in Windows paths). |
I've been using FilePaths.jl for a while now and here are some notes from my experience. Advantages:
Disadvantages: The main issue I ran into was that interop with other packages can be a bit annoying. I was always writing Overall, I think having a minimal file system path type hierarchy in base with appropriate string conversions would be a good step forward. |
I'm broadly in favour, but would prefer a non-single letter macro (perhaps |
Hmmm, I was mostly wanting to mimic |
So far the only advantage cited here is that "being able to dispatch on a path type is really nice". The fact that |
FWIW, my view was just that we have I can't think of any "real" advantages apart from having a type that is distinct from a general string for representing filesystem paths feels a bit more ergonomic and has helped me avoid a few bugs... but that also summarizes why I'm not writing my code in C :) |
Yes, it's capturing the semantic that paths are a "different kind of thing" which makes this interesting. Being able to use dispatch effectively is the most obvious sign that this might be worthwhile. Here's some minor advantages related to literals:
But these advantages are a bit of a sideshow, I think. Perhaps some concrete use cases might be helpful. Here's a contribution from me (apologies that it's not fully concrete, it reflects work I've done, but more in C++ than julia). Say I want to write code which passes around either S3 URLs or file paths pointing to some point cloud data. I don't want to |
Other arguments:
could be defined, and then have recursive include calls work by overloading |
The distinction between Other advantages that I might hope for with path types:
|
Could this make |
I don't think that's necessarily connected to using types. You can do that already: using Glob
readdir(glob"somedir/*.jpg", expanduser("~") #= aka homedir() =#) |
Good point. Though manually having to type [edit: TBH I've tried using the literal "~/blah" out of habit before, been unsurprised that it doesn't work, and looked no further. Path literals potentially give us the opportunity to make this "just work" for users.] |
Check the issue linked above. It makes similar proposal. Adding to what was said, beyond path literals, I propose having Additionally, pathlib has some extra functionalities like iterating over the path |
FilePathsBase.jl already provides that functionality, though I don’t think the optional division operator overloading would make it into base. |
@StefanKarpinski mentioned |
I agree, which is why I used it in FilePathsBase. I think the issue is just that we don't want to have an operator like Discussion about different operators here rofinn/FilePathsBase.jl#2 |
It's just logical: |
What we are actually doing when we write tl;dr; |
@rofinn don't agree about First, yes it is Unix convention, but people nowadays more often than not use Unix-like systems on their personal computers (Mac OS, even under Windows you can use Bash shell, or even Ubuntu as a "software"), or when working remotely (computational server, cloud computing, Docker etc), also URL's use this convention, so everyone seems to be familiar with it. Second, currently So |
Again, that's largely why I opted to use |
Being fastidious about not punning on operators is a pretty core Julian principle. It's fine if people do it in their own code, but mixing up "divide" and "concatenate this path with this other path" in one generic function is not really cool. |
Some people do, but the choice is made now (there is even a FAQ about it), so simply using it for paths would be somewhat consistent, as pretty much all of the arguments apply in a similar way to strings. |
The problem with reusing p"foo" / "bar" / "baz" * ".txt" == p"foo/bar/baz.txt" |
Another possible approach is to allow interpolation into path strings with |
That seems interesting and I would need to think about it more. But one key lack is that it can't be passed as an input to a higher order function and it can't be broadcast. I often broadcast |
😬 |
Not if you disallow p"foo" * p"bar" * p"baz" * ".txt" == p"foo/bar/baz.txt" Personally, I am OK with |
I think the most reasonable path (😬) forward would be:
Path literals can also have features like making it easier to write Windows paths when you have to. |
Hasn't windows accepted |
I thought there were situations where you need to use |
UNC drives on Windows require |
The path literal approach is very flexible: if a path literal starts with a valid UNC drive sequence, then it can allow single backslashes in the rest. Another reason we may want to allow |
I've been working with abstracting data location recently (see DataSets.jl) and I've noticed anew that there's a really big difference in the genericity of relative vs absolute path types.
However I'd observe that portable code likely gets the path root from somewhere programmatically and rarely needs absolute paths. From this point of view, a relative path literal would be fine, especially if it could incorporate a few things like tilde expansion. Alas, doing away with absolute path literals is not going to satisfy anyone who wants to write a quick script unless we've got a compelling replacement. For system dependent stuff, perhaps we could have What can you do with an abstract absolute path?In generic code which takes
But other than that, I don't think it's clear what you can do! There's some other contenders for generic verbs but they have their problems
If you think about |
I think the suggested path forward here is https://github.com/rofinn/FilePaths.jl. We aren't going to make breaking changes to the file system functions in base to stop using string, and I think it does make the most sense for the primary API to be strings, but with the option for the user to layer a more advanced type on top (particularly for more complex cases such as non-local resources) But this could be a discourse post or discussion here, if we want to continue with the julep proposal written here. |
I started writing this over 4 months ago; but life got in the way.
Its now in a state to take feedback from others.
It could really do with it, I'm sure.
The current way path are handled as strings hasn't changed much since it was written,
mostly in JuliaLang/julia@6f9fb22
in January 2013.
Proposal Abstract:
Add a AbstractPath type and deprecate
open(::AbstractString)
in favour ofopen(::AbstractPath)
AbstractPaths allow code to be written without caring where or how the data is stored.
Using types for paths allow us to enforce some validity and constancy rules.
This also allows for multiple dispatch differentiating between a Path to a file, and that files contents as a string.
TODO
running todo list of things to add/change in the julep before it can stop being WIP
include
.