Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error (error: In path.expand(path)) from spThin::thin(..., write.files = TRUE) resulting from long file names where a dataset with where "spec.col" has many levels #29

Open
jeanKRS opened this issue Apr 8, 2023 · 0 comments

Comments

@jeanKRS
Copy link

jeanKRS commented Apr 8, 2023

Hi,

I experienced a challenge saving files from thin( ) which I describe below and share the modification I made on the "thin.R" script changing how the files are named to both:

  • have unique file names
  • avoid file names getting too long
  • Current state

PROBLEM: saving thinned data by setting the option write.files = TRUE in “spThin::thin( …)

  • the thinning function “spThin::thin( … )” has the option to save each species’ thinned dataset as it is generated. However, from the source code (at https://github.com/mlammens/spThin/blob/master/R/thin.R), the way the file names are created results in subsequent “csv file names” to keep increasing in length i.e

  • If first csv is saved “new.csv”, the second is saved as “new_new.csv”, the 3rd as “new_new_new.csv” etc (line 185 in the source code). The problem with this is that for datasets with very many levels under "spec.col", the file names used become too long as the number of species thinned increase, causing (error: In path.expand(path))

  • This naming system is used to prevent overwriting since the “base” naming system used (“in line 170”) has no unique identifier for the species and may therefore result in different csv files having the same name. At “line 185”, the names are modified increasing “_new” to each subsequent thinned dataset.

SOLUTION PROPOSED

Modify the “thin()” function’s source code by : “changing how the files names such that the name of the “species” is included in the file name. i.e

At “line 170”, add species name to the thinned output file, i.e Replacing:

csv.files <- paste( out.dir, out.base, "_thin", rep(1:n.csv), ".csv", sep="")

With:

csv.files <- paste( out.dir, out.base, "thin", gsub(" ", "", as.character(species)), rep(1:n.csv), ".csv", sep="")

This will ensure every file name is unique and line 185 which adds the “_new” to every subsequent file name will be unnecessary and can be removed.

RESULTS:

  • This has worked and the bias removal completed on several datasets that had failed before. Could this change be made on the package source code?

Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant