Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid an uninitialized bool for the CRAN UBSAN check #386

Merged
merged 4 commits into from
Nov 23, 2021
Merged

Avoid an uninitialized bool for the CRAN UBSAN check #386

merged 4 commits into from
Nov 23, 2021

Conversation

DavisVaughan
Copy link
Member

i.e. fixes:

/usr/local/bin/../include/c++/v1/__utility/pair.h:192:52: runtime error: load of value 128, which is not a valid value for type 'bool'
    #0 0x7f1daea6b953 in std::__1::pair<bool, bool>::pair<bool, bool&, false>(bool&&, bool&) /usr/local/bin/../include/c++/v1/__utility/pair.h:192:52
    #1 0x7f1daea6b3f9 in vroom::is_blank_or_comment_line(char const*, char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool) /data/gannet/ripley/R/packages/tests-clang-SAN/vroom/src/./utils.h:68:12

The first commit is the minimal change required to fix this. I just avoided creating has_comment entirely, since it didn't seem to have anything to do with the first half of the if statement.

The second commit slightly refactors the function in a way that I think makes it easier to follow. I originally had trouble knowing what the return value meant, and had to go look for locations where is_blank_or_comment_line() was being used to understand how the two bool values should be interpreted. Making them local variables with names makes it a bit clearer, IMO. I can revert this one if you disagree.

It also makes the return value a bit clearer, as originally I had to go look for places `is_blank_or_comment_line()` was being used to understand what the return value was being interpreted as
@DavisVaughan DavisVaughan changed the title Avoid an uninitialized bool for the CRAN USBAN check Avoid an uninitialized bool for the CRAN UBSAN check Nov 23, 2021
@jimhester
Copy link
Collaborator

Looks good to me and I like the refactoring as well.

@DavisVaughan DavisVaughan merged commit 1b83546 into tidyverse:main Nov 23, 2021
@DavisVaughan DavisVaughan deleted the fix/uninitialized-bool branch November 23, 2021 16:18
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request May 1, 2022
# vroom 1.5.7

* Jenny Bryan is now the official maintainer.

* Fix uninitialized bool detected by CRAN's UBSAN check
  (tidyverse/vroom#386)

* Fix buffer overflow when trying to parse an integer field that is
  over 64 characters long
  (tidyverse/readr#1326)

* Fix subset indexing when indexes span a file boundary multiple times
  (#383)

# vroom 1.5.6

* `vroom(col_select=)` now works if `col_names = FALSE` as intended (#381)

* `vroom(n_max=)` now correctly handles cases when reading from a
  connection and the file does _not_ end with a newline
  (tidyverse/readr#1321)

* `vroom()` no longer issues a spurious warning when the parsing needs
* to be restarted due to the presence of embedded newlines
* (tidyverse/readr#1313) Fix performance
* issue when materializing subsetted vectors (#378)

* `vroom_format()` now uses the same internal multi-threaded code as
  `vroom_write()`, improving its performance in most cases (#377)

* `vroom_fwf()` no longer omits the last line if it does _not_ end
  with a newline (tidyverse/readr#1293)

* Empty files or files with only a header line and no data no longer
  cause a crash if read with multiple files
  (tidyverse/readr#1297)

* Files with a header but no contents, or a empty file if `col_names =
  FALSE` no longer cause a hang when `progress = TRUE`
  (tidyverse/readr#1297)

* Commented lines with comments at the end of lines no longer hang R
  (tidyverse/readr#1309)

* Comment lines containing unpaired quotes are no longer treated as
  unterminated quotations
  (tidyverse/readr#1307)

* Values with only a `Inf` or `NaN` prefix but additional data
  afterwards, like `Inform` or no longer inappropriately guessed as
  doubles (tidyverse/readr#1319)

* Time types now support `%h` format to denote hour durations greater
  than 24, like readr (tidyverse/readr#1312)

* Fix performance issue when materializing subsetted vectors (#378)


# vroom 1.5.5

* `vroom()` now supports files with only carriage return newlines
  (`\r`). (#360, tidyverse/readr#1236)

* `vroom()` now parses single digit datetimes more consistently as
  readr has done (tidyverse/readr#1276)

* `vroom()` now parses `Inf` values as doubles
  (tidyverse/readr#1283)

* `vroom()` now parses `NaN` values as doubles
  (tidyverse/readr#1277)

* `VROOM_CONNECTION_SIZE` is now parsed as a double, which supports
  scientific notation (#364)

* `vroom()` now works around specifying a `\n` as the delimiter (#365,
  tidyverse/dplyr#5977)

* `vroom()` no longer crashes if given a `col_name` and `col_type`
  both less than the number of columns
  (tidyverse/readr#1271)

* `vroom()` no longer hangs if given an empty value for
  `locale(grouping_mark=)`
  (tidyverse/readr#1241)

* Fix performance regression when guessing with large numbers of rows
  (tidyverse/readr#1267)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants