Sanitize table names in copy_to() #611

richierocks · 2017-04-11T21:28:11Z

Dots in destination table names cause errors.

library(dplyr)
library(sparklyr)

local_tbl <- data_frame(
  x = runif(5)
)
spark_conn <- spark_connect("local")
spark_tbl <- copy_to(spark_conn, local_tbl, "foo.bar")
## Error: org.apache.spark.sql.AnalysisException: 
##    It is not allowed to add database prefix `foo` for the TEMPORARY view name.
spark_tbl <- copy_to(spark_conn, local_tbl, "foo.1")
##  Error: org.apache.spark.sql.catalyst.parser.ParseException: 
##    extraneous input '.1' expecting <EOF>(line 1, pos 3)

It would be nice if the table names were checked before they were sent to Spark. I'm not sure if the best behavior is to throw a helpful error or to fix the name.

kevinushey · 2017-04-11T21:37:52Z

Good idea -- we currently sanitize column names (https://github.com/rstudio/sparklyr/blob/7b1c6d8ff864612d8f5becacbc24fcd9d131ef21/R/utils.R#L116); it might make sense to use a similar transformation for table names.

chezou · 2017-04-15T10:35:00Z

I guess database name handling affects this problem. Is it usual thing in R world to name a table with dot? In SQL, dot is a separater syntax between database and table name.

richierocks · 2017-04-17T12:22:18Z

@chezou Dots in variable names are allowed in R, and it wasn't clear to me that they weren't allowed in Spark.

If I've understood the DBI package documentation correctly, I think that sparklyr just needs an improved method for dbQuoteIndentifier().

chezou · 2017-04-17T12:35:42Z

Thanks for the explanation. To be honest, I sometimes saw variables with dots in R, but I was not sure it is usual or not.

I heard Hadley is trying to decide how to handle DB identifier. r-dbi/DBI#24 It would solve this problem. In the SQL world (not only Spark world), it is very rare to use dots for a table name, and I didn't realize that problem. Not unlike usual DB, such as PostgreSQL, we usually use Spark to join with other tables.

chezou · 2017-06-21T15:47:50Z

With dplyr 0.7.0, we now have in_schema() for handling table name appropriately. We should change the way to hande table names.

yitao-li · 2021-03-30T22:43:13Z

Closing this issue because I don't think there is any sensible way to move forward with any of the proposed changes:

Possibility 1: replacing "." with some other valid character in view name -- I don't think that's a good idea, because if the name of the view is supposed to be some pre-determined string, and the name has "." in it, then it is better to fail (as Spark won't accept "." in the name) rather than replacing "." with something else and pretend the predetermined table/view name is acceptable for Spark
Possibility 2: improving quoting to make it work -- I don't think this is possible with Spark at the moment. See examples below:

library(sparklyr)

sc <- spark_connect(master = "local")

DBI::dbGetQuery(sc, "CREATE VIEW `foobar` AS SELECT 1") #works
DBI::dbGetQuery(sc, "CREATE VIEW `foo.bar` AS SELECT 1") # will throw an error

javierluraschi added data featurerequest labels May 10, 2017

kevinykuo mentioned this issue May 26, 2017

Issue while copying data to spark connection #704

Closed

yitao-li closed this as completed Mar 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sanitize table names in copy_to() #611

Sanitize table names in copy_to() #611

richierocks commented Apr 11, 2017

kevinushey commented Apr 11, 2017

chezou commented Apr 15, 2017

richierocks commented Apr 17, 2017

chezou commented Apr 17, 2017

chezou commented Jun 21, 2017

yitao-li commented Mar 30, 2021 •

edited

Loading

Sanitize table names in copy_to() #611

Sanitize table names in copy_to() #611

Comments

richierocks commented Apr 11, 2017

kevinushey commented Apr 11, 2017

chezou commented Apr 15, 2017

richierocks commented Apr 17, 2017

chezou commented Apr 17, 2017

chezou commented Jun 21, 2017

yitao-li commented Mar 30, 2021 • edited Loading

yitao-li commented Mar 30, 2021 •

edited

Loading