Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: a couple of file api tweaks #1407

Merged
merged 2 commits into from
Oct 19, 2012
Merged

RFC: a couple of file api tweaks #1407

merged 2 commits into from
Oct 19, 2012

Conversation

nolta
Copy link
Member

@nolta nolta commented Oct 19, 2012

  • Change fullfile to match matlab:

    julia> fullfile("a","b","c")
    "a/bc"
    
    >> fullfile('a','b','c')
    
    ans =
    
    a/b/c

    In other words, define const fullfile = file_path.

  • Rename real_path -> realpath, as it's a call to the POSIX realpath function, and that's what python, ruby, and perl call it.

@staticfloat
Copy link
Member

+1 on this, was thinking about suggesting the same for realpath.

JeffBezanson added a commit that referenced this pull request Oct 19, 2012
RFC: a couple of file api tweaks
@JeffBezanson JeffBezanson merged commit 383e601 into master Oct 19, 2012
@timholy
Copy link
Member

timholy commented Oct 19, 2012

Good. I think Matlab's fullfile also does the assembly in a way that addresses #1396, so that issue is relevant, too.

@StefanKarpinski
Copy link
Member

We still need to go through and bikeshed the names of all the things in base/file.jl. Given how much everyone loves bikeshedding (myself certainly not excluded), I'm a little shocked this hasn't happened yet...

@johnmyleswhite
Copy link
Member

We also need to clean up the code itself. I've got new code for tempfile() and tempdir() that I would submit as pull request tomorrow morning.

@nolta
Copy link
Member Author

nolta commented Oct 19, 2012

I've started putting together an api comparison table. Help and corrections are of course appreciated.

@StefanKarpinski
Copy link
Member

Ah, yes. This is an excellent way to go about picking names. I can fill in the Ruby names later.

@johnmyleswhite
Copy link
Member

Filled in the R names I knew offhand. As you can see, the names for the functions I added long ago are all derived from R, which is the major source of quirkiness in the names.

@StefanKarpinski
Copy link
Member

I filled in the Ruby names for things. Would be handy to have the R and Matlab names completely filled in as well but this is an area where we should probably follow Python, Ruby and Perl more closely than R or Matlab. Also, I really don't think that downloading data from a URL belongs in this list.

@staticfloat
Copy link
Member

I filled in the MATLAB names for things as best I could, using this for inspiration.

There's an awful lot of "N/A" entries, so if I've missed something feel free to replace it with what should be there. My guess is that MATLAB doesn't have things like readlink, etc. because you can so easily just do !readlink?

@StefanKarpinski
Copy link
Member

The downside of leaning so heavily on the file system is that it isn't portable at all. My general inclination is to mimic a nice reasonable UNIX system (the way that Ruby's FileUtils class does) and then implement that behavior portably.

@johnmyleswhite
Copy link
Member

I agree that the API should essentially replicate UNIX commands and system calls.

@johnmyleswhite
Copy link
Member

Bump. We need to have a collective debate about the file API again.

@StefanKarpinski
Copy link
Member

Right. I added a "Julia (proposal)" column which has what I think we should rename things to. I mostly stuck to whatever the UNIX shell command for doing things is called. The things that need a little debate are:

  • splitpath – is there some way we can combine dirname_basename, split_extension, split_path and fileparts into a single thing? Can we really need four functions for parsing file names?
  • Should we rename cwd to pwd? It doesn't actually print the working directory, but I feel like the name just kind of works better. One thing I've considered is dir() but I'm guessing no one will like that.
  • abspath vs. realpath — what's the difference here? I wonder if we can orthogonalized some of this better.

@johnmyleswhite
Copy link
Member

Having only splitpath works for me. But clearly people have wanted to have the directory name and file name be more easily accessed than:

parts = split_path(path)
dirname = file_path(parts[1:(end - 1)])
filename = paths[end]

So I do think we should come up with names for those two concepts. I would also use a function like file_extension if it existed and did what split_extension presumably does.

Riffing off splitpath, I would find joinpath clearer than path and less likely to interfere with useful local variable names. If the main function is called splitpath, then I think joinpath is the obvious complement.

I would prefer pwd over cwd. I would find dir very confusing since dir in R does what readdir does now in Julia.

I would prefer isabspath over isabs, which sounds related to abs rather than abspath.

I'm not sure why we need anything other than abspath. What does realpath do again?

Why have both glob and pathexpand? I would keep only glob.

What is the status of cd, mkdir and rmdir? They all seem pretty important to me, although I suppose I'm not opposed to having rm do rmdir's work.

Maybe this is a bad habit, but I basically never want mktemp or mktempdir, I just want tempname or tempdir. If only name is kept, we should have an option that only returns a path name as a string.

@JeffBezanson
Copy link
Member

I like all these name suggestions.
Only thing is the functionality of mkstemp and mkdtemp is preferred since they avoid races.
realpath follows symlinks, and the directories involved must exist. Doesn't seem too useful to me, since you usually don't care about symlinks, and you might not care whether the path exists.

@StefanKarpinski
Copy link
Member

My basic attitude is that we all know how UNIX commands work and we should make Julia's file manipulation commands be portable versions of those. A good first cut is just calling the corresponding UNIX commands, but we should switch to our own implementations eventually to avoid the overhead of spawning child processes and so that our behavior doesn't depend on the possibly idiosyncratic behavior of the system UNIX commands. As a corollary, I think that rm and rmdir should probably do what they do in UNIX: rm removes files and rmdir which will only remove a directory. Maybe this isn't the best possible behavior, but it's one we're all familiar with.

@binarybana
Copy link
Contributor

+1 to @johnmyleswhite's suggestions:

  • path -> joinpath
  • pwd
  • isabs -> isabspath
  • Roll expandpath tilde expansion into abspath and have glob do everything else

@StefanKarpinski
Copy link
Member

I've already done cwd=>pwd and isrooted=>isabspath here. I'm still sorting out the joinpath/splitpath pair. I think it's important to allow granular expansion of paths in various ways, which is why it makes sense to separate out abspath for turning relative paths into absolute ones, userexpand for tilde expansion, and glob for file globbing. You don't want to make those inseparable from each other. We could have expandpath which does all of them, however. Alternatively, expandpath(path, abs::Bool=true, user::Bool=true, glob::Bool=true) would be a nice interface once we have keywords. But you'd still want to implement that in terms of functions that do each thing individually.

@kmsquire
Copy link
Member

Just to respond briefly to @JeffBezanson on abspath vs realpath: when working on a cluster with large files and multiple file systems, both of these are useful to have. I often symlink to large data files on different file systems, and occasionally those get moved from under me, so realpath is useful (also for testing whether a symlink points to the correct file, since many of the raw data files have the same name). I also tend to use abspath (in python) when processing files, as I've been bitten too many times when I try to use relative paths and forget to tell OpenGridEngine which directory to work in.

@johnmyleswhite
Copy link
Member

Bump. We still need to agree on everything and implement it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants