Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Path type #5550

Closed
straight-shoota opened this issue Jan 6, 2018 · 23 comments
Closed

Add Path type #5550

straight-shoota opened this issue Jan 6, 2018 · 23 comments

Comments

@straight-shoota
Copy link
Member

It has already been proposed in other discussions to add a Path type which essentially holds a filesystem path and turns current static methods of File into an object oriented interface.

The benefits of this approach are:

  • type safe encapsulation
  • one dedicated place for path operations
  • clear separation: File and Dir handle file system access, Path performs only syntactic operations.

Prominent implementations of this type are Java's java.nio.file.Path (API docs, tutorial) and Python's pathlib (API docs).

Pathlib actually comes in two flavours: pure paths provide only syntactical operations while conrete paths also do IO operations. IMHO it makes more sense to keep that separated like the Java approach.

Paths are implemented differently depending on operating system. Namely, we need to respect POSIX paths (like /foo/bar/baz) as well as windows paths (like C:\foo\bar\baz). The major difference are the primary directory separator (/ vs. \) and designators for absolute paths ( /foo vs C:\foo).

Pythons pathlib offers two classes for this, PurePosixPath and PureWindowsPath, Java's Path by default use the variant supported by the default file system but you can use different providers (Path is just an interface).

I don't think this needs to be implemented in separate types but it would certainly be nice to not be limited to the system's native path model. This could be achieved by an ivar flag. But it's debateable if this is worth it, the most common use case by far will be working with native paths.

The following methods would migrate from File to Path.

File.basename(path)         => Path#basename
File.basename(path, suffix) => Path#basename(suffix)
File.dirname(path)          => Path#dirname
File.expand_path(path, dir) => Path#epxand_path(dir)
File.extname(filename)      => Path#extname
File.join(*paths)           => Path#resolve(*paths), Path#<<(path)

Proposed additionals instance methods for Path:

#<=>(Path | String) : Int
#== (String) : Bool
#[](Int) : Path # access path member by index
#[](Int, Int) : Path # access path members by index
#[](Range) : Path # access path members by index
#absolute? : Bool
#ends_with?(Path | String) : Bool
#normalize : Path # remove `./` and `foo/..` segments
#parent : Path?
#root : Path?
#relative_to(Path | String) : Path
#resolve_sibling(Path | String) : Path
#starts_with(Path | String) : Bool
#to_s : String
#to_uri : URI

The type could probably also include Enumerable(Path).

All methods currently accepting a path as String would now expect Path | String argument.

@asterite
Copy link
Member

asterite commented Jan 7, 2018

So instead of:

File.basename(path)

I'll have to write:

Path.new(path).basename

and so on for every method?

What about File.open(...), do we need to change it to also accept a Path? And possibly every method that accepts a String would need to also take into consideration accepting a Path?

I don't oppose having a Path (or Pathname) module to have class methods, to do Path.basename(...) instead of File.basename, but this proposal looks like we'll going to have multiple ways of doing the same thing.

@asterite
Copy link
Member

asterite commented Jan 7, 2018

It would also be nice to know why the people that want Path want it: what code they have now and how it will become better with a Path type.

@asterite
Copy link
Member

asterite commented Jan 7, 2018

As another note: ruby has a Pathname class. I never used it, and never seen it in code I read.

@asterite
Copy link
Member

asterite commented Jan 7, 2018

More questions: what about Dir.glob? Would it return Path instead of String? Will I need to invoke to_s on each of its entries? How will a Path be represented, is it just a String wrapper or an array of pieces (so less performant)?

@j8r
Copy link
Contributor

j8r commented Jan 7, 2018

I like the idea of a Path module (not a class) for String manipulations related to pathes. The File class will only be related to operations requiring to read/write to the filesystem. I agree with @asterite , I see more disadvantages than advantages to create a Path type, this appears to become more cumbersome if we want to manipulate them.

@bew
Copy link
Contributor

bew commented Jan 7, 2018

I would go for class methods for all path-related methods that are currently on File, but having a Path object might be handy for those who want to check specific segments of the path, or do some other segment manipulation.
In my view Path as an object would be rarely used, and File.open(...) should stay with String only, the user must convert its Path object with Path#to_s if he used one.

@RX14
Copy link
Contributor

RX14 commented Jan 7, 2018

If we look at the average piece of path-handling code it rarely just does one operation in isolation so looking at File.basename(str) vs Path.new(str).basename is I think misleading.

For example, this code

All those File.exists? calls will become absolute_filename.exists?. Those make_relative_unless_absolute calls are actually fairly common, there would definitely be a mathod on path for those, for example just path.to_absolute (which is much shorter).

I'd go much further and remove most File class methods and nuke Dir entirely. Place it all on Path. We don't have to think about that in this proposal, but at least I'd like File's class methods to simply be aliases for Path.new(string).foo in any future PR. (path can be a struct because it's immutable, so little if any performance penalty)

@straight-shoota
Copy link
Member Author

@asterite

How will a Path be represented, is it just a String wrapper or an array of pieces (so less performant)?

Most certainly it should just be a wrapper around String (plus an ivar for path type: posix/windows).

What about File.open(...), do we need to change it to also accept a Path? And possibly every method that accepts a String would need to also take into consideration accepting a Path?

All methods receiving a path like File.open should accept both Path and String.
If we have Path objects, all methods receiving a path should accept an instance of Path, not just String (@bew).

this proposal looks like we'll going to have multiple ways of doing the same thing.

In the long term there should only be one way of doing path manipulation, that is using a Path object.

It would already be an improvement to move all path-related class methods from File to Path (as a module) but I think it would be benefitial to have specific type representing a path. This makes it actually easier to perform path operations. Instead of File.basename(path) you'll just use path.basename - because path should already be a Path.
As far as I can tell, it is not very common to perform ordinary string operations on paths so the added benefit has zero to little downside from an API perspective.

@asterite
Copy link
Member

asterite commented Jan 7, 2018

I guess the only way to know is by actually implementing something like this and using it across the std and the compiler (probably even more also testing this on a few shards, maybe prax is a good one). I just worry about duplicating logic having to handle both String and Path across the std and Possible shards, plus having to wrap single calls like File.basenane into Path.new(...).basename

@j8r
Copy link
Contributor

j8r commented Jan 7, 2018

If we go for a Path class, we can remove basename and dirname:

Dir.current         #=> "/my/current/dir"  (Path)
Dir.current[0..-2]  #=> "/my/current" (String) == dirname
Dir.current[-1]     #=> "dir" (String) == basename

@RX14
Copy link
Contributor

RX14 commented Jan 7, 2018

@asterite it's simple: we just make sure to call to_s then restrict on String | Path. String#to_s and Path#to_s will both return the string representations of paths so everything that takes a string currently will need just a minor change.

Plus, we can always go the whole way and just flat-out remove File.basename in any Path PR if you're worried...

@RX14
Copy link
Contributor

RX14 commented Jan 14, 2018

Most of the path-handling methods on File need a rewrite for windows, so this is a good time to implement Path. An initial PR to add Path with support for windows and unix-style paths, initially without any support for performing filesystem operations, would be great.

@straight-shoota
Copy link
Member Author

I'm happy to start working on that.

@sudo-nice
Copy link

To make Path consistent, it seems, almost every File.* and Dir.* method should take arguments of type Path and return values of type Path. Maybe, it should be strictly Path and not String anymore, because it would be a simple rule: All file/path operations happen with Path, not String.

This approach will clearly separate two intentions:

  1. We want to make file/path operations, so we deal with Path.
  2. We want to store a Path somewhere, so we explicitly convert it with .to_s.

What do you think?

@sudo-nice
Copy link

Here is a real life code snippet, to illustrate the previous post. There are 7 forced .to_s, because File.* methods take String.

struct OVPN
  getter path, ip, country_code, country_name

  def initialize(@path : Path)
    @ip = path.to_s[/(?:\d+\.){3}\d+/]
    @country_code = (Geo.country_code_of(ip) || "UU")
    @country_name = (Geo.country_name_of(ip) || "UNKNOWN")
  end

  def delete
    File.delete @path.to_s
  end

  def move(to pool : Pool)
    new_path = pool.path.join @path.basename
    File.rename @path.to_s, new_path.to_s
  end

  def exchange_with(ovpn : OVPN)
    return if @path.basename == ovpn.path.basename
    File.rename @path.to_s, ovpn.path.parent.join(@path.basename).to_s
    File.rename ovpn.path.to_s, @path.parent.join(ovpn.path.basename).to_s
  end
end

@asterite
Copy link
Member

asterite commented Dec 7, 2018

@sudo-nice There's no need for to_s, File methods already accept Path or String in this PR. But what you said is different: you want only Path to be accepted as an argument. That will force simpler use cases to do Path.new(path), and that's inconvenient.

@RX14
Copy link
Contributor

RX14 commented Dec 7, 2018

Personally i'd like to replace some of the class methods on File and Dir with ones on Path, (path.delete instead of File.delete(path)). But as a whole, methods which accept Path as arguments should accept Path | String.

@sudo-nice
Copy link

@RX14 yes, it would be great to have something like these:

path.exists? - for more control we can always do File|Dir.exists?(path)
path.rename(new_path) - using FileUtils under the hood
path.symlink(new_path)
path.touch
path.children - returns Array(Path)
path.each_child - iterate over each child as a Path
path.cd
path.glob - returns Array(Path)
path.mkdir
path.rmdir
path.write(content)

@j8r
Copy link
Contributor

j8r commented Apr 20, 2019

Path is now present in 0.28.0.

@straight-shoota
Copy link
Member Author

Yes, but it's not yet very well integrated into the stdlib. I think this issue should stay open for the time being.

@j8r
Copy link
Contributor

j8r commented May 16, 2019

Also, what's the reasons why there some don't want FileSystem related method in Path, like in Rust?

@straight-shoota
Copy link
Member Author

Path provides an implementation of a very general path concept. It can be used for paths on the local file system, but also for every other application, such as remote or virtual file systems. That's already the reason why Path comes with different flavours, because you might want to use it with a different than the native path type.

This is obviously no a strict reason to keep Path entirely separated from the local file system. But IMO that makes sense. There is already some integration, however, for example the default arguments to Path#expand assume the local file system.

@jhass
Copy link
Member

jhass commented Apr 12, 2020

@straight-shoota

Yes, but it's not yet very well integrated into the stdlib. I think this issue should stay open for the time being.

I feel like this issue got pretty stale now. Maybe it would be better to open separate issues with concrete proposals on how to integrate it better?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants