You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
XLConnect is slow and memory hungry in some scenarios; that seems to be a known and nontrivial concern.
When the Excel files that are loaded change infrequently compared to R code changes, and data is only read and not written to Excel files, a viable solution might be the creation of sidecar memoization files.
Describe the solution you'd like
I suggest that the load function gets a new boolean parameter "memoize" (probably off-by-default). If that parameter is set, XLConnect comes up with a file name (maybe "oldfilename.xlsx.cache"). If that cache file does not exist, or is older than the file being loaded, regular XLConnect loading is performed, and the resulting data frame saveRDS()'d in the cache file, along with the list of any arguments that influence loading outcome. Otherwise, the cache file is loaded. The loading step contains a check for whether the arguments stored there match the ones in the function; if not, it falls back to regular loading and cache recreation.
Alternative Solutions
There are some variations on how the file name is best created, and what information is stored in the cache file:
It may be a good idea to store the XLConnect and R versions in there, so that after an upgrade, cache files are recreated. (Better err on the side of caution).
It may make sense to have a cache file per set of arguments (think "myfile.xlsx.readWorksheet(sheet=Tabelle).cache"), or some base64 encoding thereof; file name length limitations may force this into subfolders or through hashes. The main point is that if multiple loads are performed in a single application on a single file, the caches should not constantly invalidate each other.
It may make sense to store the cache files locally rather than next to the Excel file. On UNIX platforms, XDG_CACHE_HOME might be a good place.
A viable alternative is creating such a memoization function ad hoc or in some personal library; however, that adds hard-to-explain code to scripts primarily created by users who are more familiar with domain concepts than programming concepts.
Additional context
I'm unsure whether XLConnect supports loading data from Excel files that is spread out over multiple files (say, cross-file references or anything like that). If so, the cache file may need to keep a list of files that were loaded during the initial creation run, and check all their dates before using the cached file.
The text was updated successfully, but these errors were encountered:
What's the problem this feature will solve?
XLConnect is slow and memory hungry in some scenarios; that seems to be a known and nontrivial concern.
When the Excel files that are loaded change infrequently compared to R code changes, and data is only read and not written to Excel files, a viable solution might be the creation of sidecar memoization files.
Describe the solution you'd like
I suggest that the load function gets a new boolean parameter "memoize" (probably off-by-default). If that parameter is set, XLConnect comes up with a file name (maybe "oldfilename.xlsx.cache"). If that cache file does not exist, or is older than the file being loaded, regular XLConnect loading is performed, and the resulting data frame saveRDS()'d in the cache file, along with the list of any arguments that influence loading outcome. Otherwise, the cache file is loaded. The loading step contains a check for whether the arguments stored there match the ones in the function; if not, it falls back to regular loading and cache recreation.
Alternative Solutions
There are some variations on how the file name is best created, and what information is stored in the cache file:
A viable alternative is creating such a memoization function ad hoc or in some personal library; however, that adds hard-to-explain code to scripts primarily created by users who are more familiar with domain concepts than programming concepts.
Additional context
I'm unsure whether XLConnect supports loading data from Excel files that is spread out over multiple files (say, cross-file references or anything like that). If so, the cache file may need to keep a list of files that were loaded during the initial creation run, and check all their dates before using the cached file.
The text was updated successfully, but these errors were encountered: