Make `/` behave in SQL as it does in R #1862

iangow · 2016-05-26T21:15:49Z

See #1799 (now closed).

There we discussed the fact that / will behave differently in SQL from how it behaves in R. This breaks a nice feature of dplyr, which is that analysis can be moved from R to SQL as it proceeds with little or no changes to code. I suggested that / should get translated so as to behave the same in both places:

@hadley responded: "Hmmm, that's not a bad idea, but what should it get translated to?"

I suggested that translating / as %* 1.0 /% might work:

> pg <- src_postgres()
> stuff <- tbl(pg, sql("SELECT 5 AS a, 2 AS b"))
> stuff
Source: postgres 9.5.2 [igow@localhost:5432/crsp]
From: <derived table> [?? x 2]

       a     b
   (int) (int)
1      5     2
..   ...   ...
> stuff %>% mutate(c = a %/% b)
Source: postgres 9.5.2 [igow@localhost:5432/crsp]
From: <derived table> [?? x 3]

       a     b     c
   (int) (int) (int)
1      5     2     2
..   ...   ...   ...
> stuff %>% mutate(c = a %* 1.0 /% b)
Source: postgres 9.5.2 [igow@localhost:5432/crsp]
From: <derived table> [?? x 3]

       a     b     c
   (int) (int) (dbl)
1      5     2   2.5
..   ...   ...   ...

I'm not sure if it was intended to close this issue along with the original "too esoteric" issue from #1799, so I am making this a new one (perhaps to be closed again) just in case.

The text was updated successfully, but these errors were encountered:

hadley · 2016-06-07T21:24:53Z

This just feels like a step too far to me. I can't put it precisely into words, but my intuition tells me its the wrong thing to do.

iangow · 2016-06-07T23:13:19Z

I think this XKCD applies here too. That is, this is perhaps best left as a matter of documentation in the databases vignette (e.g., / gets translated(?) into / in SQL, which does truncated division if the two arguments are integers, so SELECT 3/2 yields 1) or

> tbl(pg, sql("SELECT 3 AS a, 2 AS b")) %>% mutate(c=a/b)
Source: postgres 9.6.0 [igow@localhost:5432/crsp]
From: <derived table> [?? x 3]

       a     b     c
   (int) (int) (int)
1      3     2     1
..   ...   ...   ...

krlmlr · 2016-06-08T13:52:22Z

I would expect at least the basic operators to behave identically in all backends. However, if we change the behavior now, it's a truly breaking change: It affects both the results and the data type of the output of a division, and also all operations that are based on the result.

How about introducing and supporting a new %//% operator that works like R's / in both SQL and data frame? Then we could gradually phase out usage of the original division operator (by first giving a warning, later perhaps an error), or simply leave matters unchanged.

iangow · 2016-06-08T14:06:48Z

I think this may just be a matter of dividing the world into SQL and non-SQL backends. In SQL, / has it's well-defined meaning (i.e., truncated division for integers) and in non-SQL, it has the regular R meaning. Hadley's intuition may be that there are enough differences between SQL and non-SQL that it might be too much to try to stamp them all out (e.g., sum with NA/NULL values), as elegant as it would be if it could be done. (Perhaps it could be that there's a option that can be set to choose greater conformity.)

> pg <- src_postgres()
> df <- tbl(pg, sql("SELECT UNNEST(ARRAY[1,2,3,NULL]) AS a"))
> df
Source: postgres 9.5.3 [igow@localhost:5432/crsp]
From: <derived table> [?? x 1]

       a
   (int)
1      1
2      2
3      3
4     NA
..   ...
> df %>% summarize(sum(a))
Source: postgres 9.5.3 [igow@localhost:5432/crsp]
From: <derived table> [?? x 1]

   sum(a)
    (dbl)
1       6
..    ...
> df %>% collect() %>% summarize(sum(a))
Source: local data frame [1 x 1]

  sum(a)
   (int)
1     NA

(Note that sum quietly changes type from input to output in PostgreSQL.)

Of course, // could be confusing for Python (3) users (meanings would be flipped).

2009-Mac-Pro:~ igow$ ipython3
Python 3.4.4 (default, Mar  2 2016, 03:31:27) 
Type "copyright", "credits" or "license" for more information.

IPython 4.2.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: 3/2
Out[1]: 1.5

In [2]: 3//2
Out[2]: 1

In [3]: exit
2009-Mac-Pro:~ igow$ ipython2
Python 2.7.11 (default, Mar  1 2016, 18:40:10) 
Type "copyright", "credits" or "license" for more information.

IPython 4.2.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: 3/2
Out[1]: 1

In [2]: 3//2
Out[2]: 1

hadley · 2016-06-08T14:06:48Z

It's simply not possible for all operators to act identically - even basic things like mean and sum work differently. This is a delicate balancing act, but I just don't think it's worth it in this case.

krlmlr · 2016-06-08T14:21:08Z

I thought it's possible at least for operators, but perhaps not. How about printing a message if a backend's behavior is ~~incompatible with~~ different from R (e.g., using / with integer operators)?

hadley · 2016-06-08T14:23:03Z

I'm not sure even that's true - consider NA vs NaN vs NULL etc. Are the semantics always going to be identical?

krlmlr · 2016-06-08T14:38:09Z

NULL always "wins" in SQL.
Division by zero sometimes raises an exception
NaN roundtrips to NA (in RSQLite, Roundtrip NaN values r-dbi/DBItest#79)
Inf roundtrips to MAX_DOUBLE (in RMySQL, Can't roundtrip +Inf and -Inf values r-dbi/RMySQL#105)
...

Correct roundtrip should be done by the DBI backends. There are still rough edges, but I think valid input should give correct output, or warn.

With sum() and mean(): Here, a message could appear if the user doesn't explicitly state na.rm = TRUE, but that's probably overkill.

hadley closed this as completed Jun 7, 2016

lock bot locked as resolved and limited conversation to collaborators Jun 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `/` behave in SQL as it does in R #1862

Make `/` behave in SQL as it does in R #1862

iangow commented May 26, 2016 •

edited

Loading

hadley commented Jun 7, 2016

iangow commented Jun 7, 2016

krlmlr commented Jun 8, 2016

iangow commented Jun 8, 2016 •

edited

Loading

hadley commented Jun 8, 2016

krlmlr commented Jun 8, 2016 •

edited

Loading

hadley commented Jun 8, 2016

krlmlr commented Jun 8, 2016

Make / behave in SQL as it does in R #1862

Make / behave in SQL as it does in R #1862

Comments

iangow commented May 26, 2016 • edited Loading

hadley commented Jun 7, 2016

iangow commented Jun 7, 2016

krlmlr commented Jun 8, 2016

iangow commented Jun 8, 2016 • edited Loading

hadley commented Jun 8, 2016

krlmlr commented Jun 8, 2016 • edited Loading

hadley commented Jun 8, 2016

krlmlr commented Jun 8, 2016

Make `/` behave in SQL as it does in R #1862

Make `/` behave in SQL as it does in R #1862

iangow commented May 26, 2016 •

edited

Loading

iangow commented Jun 8, 2016 •

edited

Loading

krlmlr commented Jun 8, 2016 •

edited

Loading