Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbWriteTable() to load CSV data #151

Closed
krlmlr opened this issue Aug 21, 2016 · 5 comments
Closed

dbWriteTable() to load CSV data #151

krlmlr opened this issue Aug 21, 2016 · 5 comments
Milestone

Comments

@krlmlr
Copy link
Member

krlmlr commented Aug 21, 2016

as in v1.0.0.

Tests in #149.

@krlmlr krlmlr added this to the 1.1 milestone Aug 21, 2016
@krlmlr krlmlr closed this as completed in 1202e0b Aug 22, 2016
krlmlr added a commit that referenced this issue Aug 22, 2016
- Reimplement `dbSendPreparedQuery()` (with warning) for compatibility with existing packages (#153).
- Reimplement `dbWriteTable("SQLiteConnection", "character", "character")` for import of CSV files (#151).
- Reimplement `dbListResults()` (with warning) for compatibility with existing packages (#154).
- Soft-deprecate `dbGetInfo()`: The "Result" method is implemented by DBI, the methods for the other classes raise a warning (#137).
- Reimplement `dbGetPreparedQuery()` (with warning) using `dbSendQuery()`, `dbBind()` and `dbFetch()` (#100).
- Binding a factor with `dbBind()` converts to character, with warning.
- `sqliteCopyDatabase()` accepts character as `to` argument again, in this case a temporary connection is opened.
- Allow only one open result set (#150).
- Fix `dbExecute()` usage in examples.
- Reexport `dbDriver()` (#147).
- The `dbGetQuery()` function is reexported again from DBI to avoid revdep check errors (#148).
krlmlr added a commit that referenced this issue Nov 30, 2016
- New maintainer: Kirill Müller.

- RSQLite always builds with the included source, which is located in `src/sqlite3`. This prevents bugs due to API mismatches and considerably simplifies the build process.

- Current version: 3.11.1.

- Enable JSON1 extension (#152, @TigerToes).

- Include support for FTS5 (@mkuhn).

- Compilation limits `SQLITE_MAX_VARIABLE_NUMBER` and `SQLITE_MAX_COLUMN` have been reset to the defaults. The documentation suggests setting to such high values is a bad idea.

- Header files for `sqlite3` are no longer installed, linking to the package is not possible anymore. Packages that require access to the low-level sqlite3 API should bundle their own copy.

- `RSQLite()` no longer automatically attaches DBI when loaded. This is to
  encourage you to use `library(DBI); dbConnect(RSQLite::SQLite())`.

- Functions that take a table name, such as `dbWriteTable()` and `dbReadTable()`,
  now quote the table name via `dbQuoteIdentifier()`.
  This means that caller-quoted names should be marked as such with `DBI::SQL()`.

- RSQLite has been rewritten (essentially from scratch) in C++ with
  Rcpp. This has considerably reduced the amount of code, and allows us to
  take advantage of the more sophisticated memory management tools available in
  Rcpp. This rewrite should yield some minor performance improvements, but
  most importantly protect against memory leaks and crashes. It also provides
  a better base for future development. In particular, it is now technically
  possible to have multiple result sets per connection, although this feature
  is currently disabled (#150).

- You can now use SQLite's URL specification for databases. This allows you to
  create [shared in-memory](https://www.sqlite.org/inmemorydb.html) databases
  (#70).

- Queries (#69), query parameters and table data are always converted to UTF-8 before being sent to the database.

- Adapted to `DBI` 0.5, new code should use `dbExecute()` instead of `dbGetQuery()`, and `dbSendStatement()` instead of `dbSendQuery()` where appropriate.

- New strategy for prepared queries. Create a prepared query with `dbSendQuery()` or `dbSendStatement()` and bind values with `dbBind()`. The same query/statement can be executed efficiently multiple times by passing a data-frame-like object (#168, #178, #181).

- `dbSendQuery()`, `dbGetQuery()`, `dbSendStatement()` and `dbExecute()`
  also support inline parameterised queries,
  like `dbGetQuery(datasetsDb(), "SELECT * FROM mtcars WHERE cyl = :cyl",
  params = list(cyl = 4))`. This has no performance benefits but protects you
  from SQL injection attacks.

- Improve column type inference: the first non-`NULL` value decides the type of a column (#111). If there are no non-`NULL` values, the column affinity is used, determined according to sqlite3 rules (#160).

- `dbFetch()` uses the same row name strategy as `dbReadTable()` (#53).

- `dbColumnInfo()` will now return information even before you've retrieved any data.

- New `sqliteVersion()` prints the header and library versions of RSQLite.

- Deprecation warnings are given only once, with a clear reference to the source.

- `datasetsDb()` now returns a read-only database, to avoid modifications to the installed file.

- `make.db.names()` has been formally deprecated. Please use `dbQuoteIdentifier()` instead. This function is also used in `dbReadTable()`, `dbRemoveTable()`, and `dbListFields()` (#106, #132).

- `sqliteBuildTableDefinition()` has been deprecated. Use `DBI::sqlCreateTable()` instead.

- `dbGetException()` now raises a deprecation warning and always returns `list(errorNum = 0L, errorMsg = "OK")`, because querying the last SQLite error only works if an error actually occurred (#129).

- `dbSendPreparedQuery()` and `dbGetPreparedQuery()` have been reimplemented (with deprecation warning) using `dbSendQuery()`, `dbBind()` and `dbFetch()` for compatibility with existing packages (#100, #153, #168, #181). Please convert to the new API, because the old function may be removed completely very soon: They were never part of the official API, and do less argument checking than the new APIs. Both `dbSendPreparedQuery()` and `dbGetPreparedQuery()` ignore parameters not found in the query, with a warning (#174).

- Reimplemented `dbListResults()` (with deprecation warning) for compatibility with existing packages (#154).

- Soft-deprecated `dbGetInfo()`: The "Result" method is implemented by DBI, the methods for the other classes raise a warning (#137). It's now better to access the metadata with individual functions `dbHasCompleted()`, `dbGetRowCount()` and `dbGetRowsAffected()`.

- All `summary()` methods have been removed: the same information is now displayed in the `show()` methods, which were previously pretty useless.

- The `raw` data type is supported in `dbWriteTable()`, creates a `TEXT` column with a warning (#173).

- Numeric values for the `row.names` argument are converted to logical, with a warning (#170).

- If the number of data frame columns matches the number of existing columns for `dbWriteTable(append = TRUE)`, columns will be matched by position for compatibility, with a warning in case of a name mismatch (#164).

- `dbWriteTable()` supports the `field.types` argument when creating a new table (#171), and the `temporary` argument, default `FALSE` (#113).

- Reexporting `dbGetQuery()` and `dbDriver()` (#147, #148, #183).

- `sqliteCopyDatabase()` accepts character as `to` argument again, in this case a temporary connection is opened.

- Reimplemented `dbWriteTable("SQLiteConnection", "character", "character")` for import of CSV files, using a function from the old codebase (#151).

- `dbWriteTable("SQLiteConnection", "character", "data.frame")` looks
  for table names already enclosed in backticks and uses these,
  (with a warning), for compatibility with the sqldf package.

- The `dbExistsTable()` function now works faster by filtering the list of tables using SQL (#166).

- Start on a basic vignette: `vignette("RSQLite")` (#50).

- Reworked function and method documentation, removed old documentation (#121).

- Using `dbExecute()` in documentation and examples.

- Using both `":memory:"` and `":file::memory:"` in documentation.

- Added additional documentation and unit tests for
  [autoincrement keys](https://www.sqlite.org/autoinc.html) (#119, @wibeasley).

- Avoid warning about missing `long long` data type in C++98 by using a compound data type built from two 32-bit integers, with static assert that the size is 8 indeed.

- Remove all compilation warnings.

- All DBI methods contain an ellipsis `...` in their signature. Only the `name` argument to the transaction methods appears before the ellipsis for compatibility reasons.

- Using the `DBItest` package for testing (#105), with the new `constructor_relax_args` tweak.

- Using the `plogr` for logging at the C++ level, can be enabled via `RSQLite:::init_logging()`.

- Using new `sqlRownamesToColumn()` and `sqlColumnToRownames()` (r-dbi/DBI#91).

- Using `astyle` for code formatting (#159), also in tests (but only if sources can be located), stripped space at end of line in all source files.

- Tracking dependencies between source and header files (#138).

- Moved all functions from headers to modules (#162).

- Fixed all warnings in tests (#157).

- Checking message wording for deprecation warnings (#157).

- Testing simple and named transactions (#163).

- Using container-based builds and development version of `testthat` on Travis.

- Enabled AppVeyor testing.

- Differential reverse dependency checks.

- Added upgrade script for sqlite3 sources and creation script for the datasets database to the `data-raw` directory.
@cboettig
Copy link

@krlmlr Does this allow one to read in a (possibly compressed) tsv file without needing to load it into R first? (a la https://cran.r-project.org/web/packages/sqldf/) I'm not quite clear on how to invoke this. Seems like it doesn't like compressed files? And the sep argument doesn't recognize \\t tabs?

dbWriteTable(con, "taxa", "data/taxa.tsv", delim = "\\t")
Error in connection_import_file(conn@ptr, name, value, sep, eol, skip)

@krlmlr
Copy link
Member Author

krlmlr commented Jul 15, 2018

The import code comes from SQLite3's shell, https://www.sqlite.org/cli.html, section 8. Compressed files are not supported, does sep = "\t" work?

The documentation at http://rsqlite.r-dbi.org/reference/dbwritetable leaves room for improvement. We could consider supporting compressed files.

@cboettig
Copy link

@krlmlr Thanks!

My solution to the issue of reading large, possibly compressed text files in and out of databases over a DBI connection without needing to uncompress the whole thing or to load the whole thing into memory was to write this little helper package, arkdb. Works nicely with RSQLite or anything else that supports a DBI connection (I've recently become a fan of MonetDB & MonetDBLite). Probably not perfect but does work rather more generically than the miss-mash of different load/copy commands supported to such varying degrees by different databases. Feedback on the package / approach welcome! (It's in rOpenSci onboarding at the moment, hopefully off to CRAN after that).

@krlmlr
Copy link
Member Author

krlmlr commented Jul 21, 2018

Thanks for the pointer, this is a much more robust solution indeed. I wish data loading was supported in a more consistent way between databases (r-dbi/DBI#62), but there are differences regarding:

  • SQL syntax
  • CSV format
  • location of the file (client file system vs. server file system)

Especially the last issue makes it very brittle, and difficult to test.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 7, 2020

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants