Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL grouping by column derived by expression repeats expression #263

Closed
machow opened this issue Aug 13, 2020 · 1 comment
Closed

SQL grouping by column derived by expression repeats expression #263

machow opened this issue Aug 13, 2020 · 1 comment
Labels
Milestone

Comments

@machow
Copy link
Owner

machow commented Aug 13, 2020

Need to reduce to a simpler example, but this shows the issue.

from siuba.sql.utils import mock_sqlalchemy_engine
from siuba.sql import LazyTbl

engine = mock_sqlalchemy_engine('postgresql')
tbl_sql = LazyTbl(engine, 'some_table_name', ['date', 'snow', 'snwd'])
from siuba import _, filter, mutate, group_by, arrange, if_else, show_query, summarize, head
ALTA2_PRCP = 1111
season = 'some_season'

q = (
    tbl_sql
    >> mutate(month = _.date.dt.month, year = _.date.dt.year, fluffy = _.snow/ALTA2_PRCP,
              season = if_else(_.month > 6,
                               _.year.astype(str) + '-' + (_.year - 1).astype(str),
                               (_.year - 1).astype(str) + '-' + _.year.astype(str)
                              )
             )
    >> group_by(_.season)
    >> summarize(max_snow = _.snow.max(), snow_days = _.snow.count(), max_snwd = _.snwd.max(), mean_fluffy = _.fluffy.mean())
    >> arrange(-_.season)
    >> filter(_.season == season)
    >> show_query(simplify = True)
)

Results in a query with the same Case When expression multiple times...

SELECT anon_1.season, anon_1.max_snow, anon_1.snow_days, anon_1.max_snwd, anon_1.mean_fluffy 
FROM (
    SELECT season, max_snow, snow_days, max_snwd, mean_fluffy 
    FROM (

        -- first case when
        SELECT CASE WHEN (anon_3.month > 6) THEN CAST(anon_3.year AS TEXT) || '-' || CAST(anon_3.year - 1 AS TEXT) ELSE CAST(anon_3.year - 1 AS TEXT) || '-' || CAST(anon_3.year AS TEXT) END AS season, max(anon_3.snow) AS max_snow, count(anon_3.snow) AS snow_days, max(anon_3.snwd) AS max_snwd, avg(anon_3.fluffy) AS mean_fluffy 
FROM (SELECT date, snow, snwd, EXTRACT(month FROM some_table_name.date) AS month, EXTRACT(year FROM some_table_name.date) AS year, some_table_name.snow / 1111 AS fluffy 
FROM some_table_name) AS anon_3
        GROUP BY 

            -- case when repeated
            CASE WHEN (anon_3.month > 6) THEN CAST(anon_3.year AS TEXT) || '-' || CAST(anon_3.year - 1 AS TEXT) ELSE CAST(anon_3.year - 1 AS TEXT) || '-' || CAST(anon_3.year AS TEXT) END ORDER BY -CASE WHEN (anon_3.month > 6) THEN CAST(anon_3.year AS TEXT) || '-' || CAST(anon_3.year - 1 AS TEXT) ELSE CAST(anon_3.year - 1 AS TEXT) || '-' || CAST(anon_3.year AS TEXT) END) AS anon_2) AS anon_1 
WHERE anon_1.season = 'some_season'
@machow
Copy link
Owner Author

machow commented Aug 29, 2020

Query now looks like this! (note formatted for ease of reading :)

SELECT anon_1.season, 
       anon_1.max_snow, 
       anon_1.snow_days, 
       anon_1.max_snwd, 
       anon_1.mean_fluffy 
FROM   (SELECT season, 
               Max(anon_2.snow)   AS max_snow, 
               Count(anon_2.snow) AS snow_days, 
               Max(anon_2.snwd)   AS max_snwd, 
               Avg(anon_2.fluffy) AS mean_fluffy 
        FROM   (SELECT DATE, 
                       snow, 
                       snwd, 
                       month, 
                       year, 
                       fluffy, 
                       CASE 
                         WHEN ( anon_3.month > 6 ) THEN 
                         Cast(anon_3.year AS TEXT) 
                         || '-' 
                         || 
                         Cast(anon_3.year - 1 AS TEXT) 
                         ELSE Cast(anon_3.year - 1 AS TEXT) 
                              || '-' 
                              || Cast(anon_3.year AS TEXT) 
                       END AS season 
                FROM   (SELECT DATE, 
                               snow, 
                               snwd, 
                               Extract(month FROM some_table_name.DATE) AS month 
                               , 
                               Extract( 
                       year FROM some_table_name.DATE)  AS year, 
                               some_table_name.snow / 1111              AS 
                               fluffy 
                        FROM   some_table_name) AS anon_3) AS anon_2 
        GROUP  BY anon_2.season 
        ORDER  BY -anon_2.season) AS anon_1 
WHERE  anon_1.season = 'some_season' ```

@machow machow closed this as completed Aug 30, 2020
@machow machow added this to siuba Jan 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

No branches or pull requests

1 participant