Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash with summarize(n=n()) #3717

Closed
momeara opened this issue Jul 18, 2018 · 8 comments
Closed

Crash with summarize(n=n()) #3717

momeara opened this issue Jul 18, 2018 · 8 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@momeara
Copy link

momeara commented Jul 18, 2018

With the latest dplyr, d3ded01, executing this

library(dplyr)

z <- data.frame(
  x = c("a", "a"),
  y = c("1", "2"),
  z = c("b", "b"),
  stringsAsFactors=FALSE)  %>%
  dplyr::group_by(x, z) %>%
  dplyr::summarize(n=n())

gives

  *** caught segfault ***
 address (nil), cause 'unknown'

 Traceback:
  1: .Call(`_dplyr_summarise_impl`, df, dots)
  2: summarise_impl(.data, dots)
  3: summarise.tbl_df(., n = n())
  4: dplyr::summarize(., n = n())
  5: function_list[[1L]](value)
  6: freduce(value, `_function_list`)
  7: Recall(function_list[[1L]](value), function_list[-1L])
  8: freduce(value, `_function_list`)
  9: `_fseq`(`_lhs`)
 10: eval(quote(`_fseq`(`_lhs`)), env, env)
 11: eval(quote(`_fseq`(`_lhs`)), env, env)
 12: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
 13: data.frame(x = c("a", "a"), y = c("1", "2"), z = c("b", "b"),     stringsAsFactors = FALSE) %>% dplyr::group_by(x, z) %>% dplyr::summarize(n = n())

 Possible actions:
 1: abort (with core dump, if enabled)
 2: normal R exit
 3: exit R without saving workspace
 4: exit R saving workspace
 Selection: 2
!Save workspace image? [y/n/c]: n

This seg fault occurs on both linux,

> sessionInfo()
sessionInfo()
 R version 3.4.1 (2017-06-30)
 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: CentOS release 6.7 (Final)

 Matrix products: default
 BLAS: /mnt/nfs/home/momeara/opt/lib64/R/lib/libRblas.so
 LAPACK: /mnt/nfs/home/momeara/opt/lib64/R/lib/libRlapack.so

 locale:
  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
  [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
  [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 loaded via a namespace (and not attached):
 [1] compiler_3.4.1 tools_3.4.1

and macOS

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.13.4

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] dplyr_0.7.99.9000

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.17       bindr_0.1.1        knitr_1.20         magrittr_1.5
 [5] devtools_1.13.5    tidyselect_0.2.4   munsell_0.4.3      colorspace_1.3-2
 [9] ape_5.1            lattice_0.20-35    R6_2.2.2           rlang_0.2.1.9000
[13] httr_1.3.1         plyr_1.8.4         tools_3.4.0        parallel_3.4.0
[17] grid_3.4.0         ggtree_1.8.2       gtable_0.2.0       nlme_3.1-137
[21] git2r_0.21.0       withr_2.1.2        assertthat_0.2.0   lazyeval_0.2.1
[25] digest_0.6.15      tibble_1.4.2       treeio_1.0.2       bindrcpp_0.2.2
[29] purrr_0.2.5        ggplot2_2.2.1.9000 tidyr_0.8.1        curl_3.2
[33] memoise_1.1.0      glue_1.3.0         compiler_3.4.0     pillar_1.2.3
[37] rvcheck_0.1.0      scales_0.5.0.9000  jsonlite_1.5       pkgconfig_2.0.1
@momeara
Copy link
Author

momeara commented Jul 18, 2018

I can confirm that the version currently in CRAN, 0.7.6 doesn't cause the seg fault

> sessionInfo()
 R version 3.4.1 (2017-06-30)
 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: CentOS release 6.7 (Final)

 Matrix products: default
 BLAS: /mnt/nfs/home/momeara/opt/lib64/R/lib/libRblas.so
 LAPACK: /mnt/nfs/home/momeara/opt/lib64/R/lib/libRlapack.so

 locale:
  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
  [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
  [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 other attached packages:
 [1] bindrcpp_0.2.2.9000 dplyr_0.7.6

 loaded via a namespace (and not attached):
  [1] tidyselect_0.2.4      compiler_3.4.1        magrittr_1.5.0.9000
  [4] assertthat_0.2.0.9000 R6_2.2.2              tools_3.4.1
  [7] pillar_1.3.0.9000     glue_1.3.0            tibble_1.4.2.9004
 [10] crayon_1.3.4          Rcpp_0.12.17.4        pkgconfig_2.0.1
 [13] rlang_0.2.1.9000      purrr_0.2.4.9000      bindr_0.1.1.9000

@krlmlr
Copy link
Member

krlmlr commented Jul 21, 2018

Thanks, confirmed. @romainfrancois: Could you please take a look?

@krlmlr krlmlr added the bug an unexpected problem or unintended behavior label Jul 21, 2018
@romainfrancois
Copy link
Member

Sure. Will look it up in august.

@krlmlr
Copy link
Member

krlmlr commented Aug 1, 2018

Bisection identifies 072f050 as the first bad commit. I'll take a look in the context of the failures seen after #3610.

@krlmlr
Copy link
Member

krlmlr commented Aug 1, 2018

This seems to be related to string columns and grouping by more than one column:

library(tidyverse)

# Works
tibble(x = "a", z = "b") %>%
  count(x)

# Works
tibble(x = 1, z = 2) %>%
  count(x, z)

# Segfaults
tibble(x = "a", z = "b") %>%
  count(x, z)

@krlmlr
Copy link
Member

krlmlr commented Aug 1, 2018

Not sure if it's related, but with -O0 I'm seeing a protection error with the following code:

library(tidyverse)

d <- tibble(x = "a", z = "b") %>% group_by(x, z)

gc()
gctorture2(1)
z <- summarize(d, n())

@krlmlr
Copy link
Member

krlmlr commented Aug 1, 2018

The protection error also happens with:

library(tidyverse)

d <- tibble(x = "a", z = "b") %>% group_by(x)

gc()
gctorture2(1)
z <- summarize(d, n())

and

library(tidyverse)

d <- tibble(x = 1, z = 2) %>% group_by(x)

gc()
gctorture2(1)
z <- summarize(d, n())

it's unlikely to be related. Opening a new issue.

@lock
Copy link

lock bot commented Jan 28, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jan 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants