Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write to directory instead of renaming temp dir #15

Merged
merged 3 commits into from
Apr 28, 2021
Merged

write to directory instead of renaming temp dir #15

merged 3 commits into from
Apr 28, 2021

Conversation

polettif
Copy link
Collaborator

Currently export_gtfs(as_dir = TRUE) renames the created temporary folder. However, I ran into issues with this approach:

library(gtfsio)
path1 <- system.file("extdata", "sample-feed-fixed.zip", package = "tidytransit")
path2 <- tempdir()
g1 <- import_gtfs(path1)
export_gtfs(g1, path2, as_dir = TRUE)
#> Warning message:
#> In file.rename(tmpd, path) :
#>   cannot rename file '/var/folders/p8/ck58x_354mj8vgdcqyyz_r540000gn/T//RtmpR8xTkj/gtfsio5bc7fc71ddf' 
#>   to '/var/folders/p8/ck58x_354mj8vgdcqyyz_r540000gn/T//RtmpR8xTkj', reason 'No such file or directory'

With this PR export_gtfs now writes all files directly to path if as_dir is set to TRUE instead of writing it to a temporary directory and then renaming that directory to path. Is there any drawback to this approach that I'm missing?

@dhersz
Copy link
Collaborator

dhersz commented Apr 16, 2021

Hi @polettif, thanks for the PR.

So, playing around with this issue I bit I realized that the current approach always works if path is not tempdir(). For example, the bit above works fine:

library(gtfsio)
path1 <- system.file("extdata", "sample-feed-fixed.zip", package = "tidytransit")
path2 <- tempfile()
g1 <- import_gtfs(path1)
export_gtfs(g1, path2, as_dir = TRUE)
list.files(path2)
#>  [1] "agency.txt"          "calendar_dates.txt"  "calendar.txt"       
#>  [4] "fare_attributes.txt" "fare_rules.txt"      "frequencies.txt"    
#>  [7] "routes.txt"          "shapes.txt"          "stop_times.txt"     
#> [10] "stops.txt"           "trips.txt"

That's because tempfile() creates a file/directory inside R temporary directory, whereas tempdir() returns the path to this temporary directory (and apparently creates it, if it had been removed for any reason - I'll call this directory Rtmp from now on). So when you set path = tempdir(), the function tries to move the directory where files were extracted to (which is inside Rtmp) to Rtmp itself. But as overwrite = TRUE (by default), it removes Rtmp and all files within it before attempting to move the files, and that's why we see this 'No such file or directory' error.

The approach you're following in this PR fixes this, but I'm not sure if we should even use tempdir() as a path. That's because all packages may store temporary files inside Rtmp, so if we remove/overwrite Rtmp we may be interfering with other packages' behaviour. For example, {geobr}, {r5r} and {aopdata} all use some temporary files inside Rtmp to cache some operations and run faster when used repetitively inside the same R session. If we use tempdir() as a path to export_gtfs() we'll be overwriting Rtmp and any files that other packages may be using.

oi <- geobr::read_country()
#> Loading required namespace: sf
#> Using year 2010

list.files(tempdir())
#> [1] "country_2010_simplified.gpkg" "file3f9d49265d57.so"         
#> [3] "gtfsio"                       "metadata_geobr.csv"

But the current approach also removes the Rtmp if it path = tmpdir(), and, even less usefully, throws an error instead of creating a gtfs-like directory.

Which leads me to think that perhaps we should check if path points to tmpdir(), and raise an error if so. Something along the lines of:

if (path == tempdir())
  stop("Please use 'path = tempfile()' instead of tempdir() to designate temporary directories")

What do you think?

@polettif
Copy link
Collaborator Author

Ah, I see. Thanks for looking into it! I wasn't aware of how differently tempdir() is handled.

Which leads me to think that perhaps we should check if path points to tmpdir(), and raise an error if so. Something along the lines of:

if (path == tempdir())
  stop("Please use 'path = tempfile()' instead of tempdir() to designate temporary directories")

Prevent using tempdir in general sounds good to me.

Just a general design thought: I'd still prefer writing files to a directory (as in this PR) instead of "renaming" it, seems cleaner to me and it fails faster if the output directory cannot be written to. There's probably no difference in practice, so there's no need to change it, just wanted to point it out.

@dhersz
Copy link
Collaborator

dhersz commented Apr 20, 2021

Thank you for flagging the issue!

I agree with you that writing directly to the desired path is better than renaming a previously created temporary directory, and I'll work on it as soon as I have some more time.

Alternatively, if you feel like adapting this PR to accommodate the tempdir() check and the proposed changes I'll more than gladly accept it :)

Cheers!

@polettif
Copy link
Collaborator Author

Alternatively, if you feel like adapting this PR to accommodate the tempdir() check and the proposed changes I'll more than gladly accept it :)

I'll do that 👍 cheers

@polettif
Copy link
Collaborator Author

I'd say this is good to merge if it's ok for you @dhersz

@dhersz
Copy link
Collaborator

dhersz commented Apr 28, 2021

Sorry about it, I've been quite busy lately and forgot to merge it. Thanks @polettif!

@dhersz dhersz merged commit 4e2eb9a into master Apr 28, 2021
@dhersz dhersz deleted the tmp-dir branch May 26, 2021 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants