-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_csv crashes RStudio Session - larger csv #1141
Comments
Try reinstalling readr, there was an interaction with cpp11 and how RStudio saves and restores environments that should now be resolved. But if the version of readr you have installed was compiled against the old version of cpp11 this could cause this behavior. The current CRAN binaries should be ok, so reinstalling will hopefully fix your issue. |
Thanks but that did not fix the issue. I also removed both I also deleted and remade the Rproject file and the hidden folder that gets created with Rprojects and still hangs up. |
Here is a # load packages
library(readr)
# bring in dataframe
df <- read_csv(file = "http://www2.census.gov/programs-surveys/bds/tables/time-series/bds2018_msa_sector_fage.csv")
#>
#> -- Column specification --------------------------------------------------------
#> cols(
#> .default = col_character(),
#> year = col_double(),
#> msa = col_double()
#> )
#> i Use `spec()` for the full column specifications. Created on 2020-10-28 by the reprex package (v0.3.0) |
Does the example you posted in #1141 (comment) reproduce the issue or not, I am not clear? If you download the file locally first and read it from the file |
Similar problem, reinstalling (from source) helped but I'm still something a little weird going on. Input file is 60M (sensitive data so it will take me a little while to come up with a reproducible example). update: if I increase Error: Assigned data `all_colnames[problems$col]` must be compatible with existing data.
✖ Existing data has 1 row.
✖ Assigned data has 2408743 rows.
ℹ Row updates require a list value. Do you need `list()` or `as.list()`?
Backtrace:
█
1. ├─global::csvRead()
2. │ └─readr::read_csv(matchFile(pat, fl, exts), ...)
3. │ └─readr:::read_delimited(...)
4. │ └─readr:::name_problems(out, names(spec$cols), name)
5. │ ├─base::`$<-`(...)
6. │ └─tibble:::`$<-.tbl_df`(...)
7. │ └─tibble:::tbl_subassign(...)
8. │ └─tibble:::vectbl_recycle_rhs(...)
9. │ ├─base::withCallingHandlers(...)
10. │ └─vctrs::vec_recycle(value[[j]], nrow)
11. ├─vctrs:::stop_recycle_incompatible_size(...)
12. │ └─vctrs:::stop_vctrs(...)
13. │ └─rlang::abort(message, class = c(class, "vctrs_error"), ...)
14. │ └─rlang:::signal_abort(cnd)
15. │ └─base::signalCondition(cnd)
16. └─(function (cnd) ...
Execution halted |
Sorry for not getting back to you. I was in the field without internet or cell service the past few days. The reprex I supplied isn't helpful whether I download the file prior or run it from the web as it doesn't cause the session to crash. I'm working on a more specific reproducible example by making a large toy data file with similar Separately while making this file, I've noticed that in a clean session after loading packages if I read in the toy data first (550 mb; I'll put this toy file on github) then a smaller data file (200 mb) and then the one that's causing the session to crash (571 mb), all of them load properly and the session doesn't crash. If I create a new clean session, load the same packages and read the issue file (571 mb) first, it crashes. So something about the order in which the files are loaded is causing it to hang up. The order in which the files import is important as the files are downloads at different times of the year that correspond to different meta data that later has to be added to the downloaded files and lined up properly. |
I've been trying to come up with a better reprex for this issue but the problem is the error is super inconsistent. I run it several times with Here is a reprex using the data that's causing it. I've made the csv's downloadable from dropbox, see link here file 1, file 2, and file 3
I usually use
This 100% of the time will crash |
Could you try re-installing readr and the cpp11 package? I have been trying to reproduce this crash and have been unable to do so with these files, it is possible it was an interaction with RStudio session restore that was fixed by the latest cpp11 release. |
I thought that the crash wouldn't happen on your end as it is so inconsistent on mine. I have made new projects and load in the same data and sometimes it crashes and sometimes it doesn't. I have removed and reinstalled both cpp11 and readr and it still crashes. Last week I removed R, RStudio, and Rtools completely, deleted my entire package library and deleted the temporary files that a session creates that RStudio website says to use to reset RStudio (link here). One of the laptops that I tried to run it on, I upgraded R and RStudio version and freshly installed both cpp11 and readr as neither of those packages had been previously installed and it still crashed. I'm at the point of considering editing everything back to base as I need to keep working on stuff and this is holding me up. The one thing I really like about readr is that it recognized the POSIXct timestamp and imports that properly as well as the sensor unit column. Base r doesn't. I'm confused as to why this is happening as until 3 weeks ago prior to updating RStudio version as well as readr this was not happening. Thank you @jimhester for your help on this! |
You can install the prior version of readr, |
Thank you for the suggestion, I have installed the previous version and it has yet to crash. Again not sure why it's causing it to crash but seems like some type of memory issue between readr and Rstudio. Would installing both the developer version of cpp11 and readr potentially address this as suggested in closed issue #1145. Again I don't fully know the development side of readr but these seem semi related as its clearly a memory issue. |
Yes, installing the development version of cpp11 and readr would definitely be something to try if you were interested. |
It appears as if the development versions of both cpp11 and readr have caused the crash to stop occurring. If this changes I'll reopen this. I'll be deleting the link to the files I shared. Thanks again for your help on this. |
RStudio Session aborts when using
read_csv
on a batch of larger (200 - 500 mb) similar csv, all exported from a data logger, which open fine withread.csv()
. I'm currently using R v4.0.3 and RStudio v1.3.1093. Prior to updating packages and R versions last week, this code ran completely fine so I'm not sure what the issue is. If I run this outside of RStudio just in R, it loads properly and if I load in smaller csv (1 mb) that don't require or display a progress bar it loads them just fine. I uninstalled and installed both R and RStudio as well.I currently use the package
here
in tandem with Rprojects. I have tried removing thehere()
section of code and it still crashes in RStudio but not in R. I've provided mysession_info()
and would make areprex
of this but I'm unsure how to do that for this situation, since the files are large and I'm unable to provide access to the files. I could make areprex
using a large datafile I guess from online but I don't know if that would result in the same error.The text was updated successfully, but these errors were encountered: