KeyError when using Redshift #5308

minh5 · 2018-06-28T16:27:05Z

I'm trying to create line charts and time series charts with aggregates. In this case, I am summing up the column items sales over a date, day . However I keep getting this error

- [ x] I have checked the superset logs for python stacktraces and included it here as text if any

2018-06-28 12:07:29,604:INFO:root:Database.get_sqla_engine(). Masked URL: redshift+psycopg2://user:[email protected]:5439/testdb
2018-06-28 12:07:30,077:DEBUG:root:[stats_logger] (incr) loaded_from_source
2018-06-28 12:07:30,077:ERROR:root:u'SUM(itemsales)'
Traceback (most recent call last):
  File "/Users/minhmai/envs/py2/lib/python2.7/site-packages/superset/views/core.py", line 1107, in generate_json
    payload = viz_obj.get_payload()
  File "/Users/minhmai/envs/py2/lib/python2.7/site-packages/superset/viz.py", line 329, in get_payload
    payload['data'] = self.get_data(df)
  File "/Users/minhmai/envs/py2/lib/python2.7/site-packages/superset/viz.py", line 580, in get_data
    values=values)
  File "/Users/minhmai/envs/py2/lib/python2.7/site-packages/pandas/core/frame.py", line 4468, in pivot_table
    margins_name=margins_name)
  File "/Users/minhmai/envs/py2/lib/python2.7/site-packages/pandas/core/reshape/pivot.py", line 58, in pivot_table
    raise KeyError(i)
KeyError: u'SUM(itemsales)'

A bit of digging saw that the column names become lower case when turned into a pandas data frame but the metric name is still capitalized, as shown by my logs above. I've set a trace and it's exactly what I expected

(Pdb) l
585  	                records=pt.to_dict(orient='index'),
586  	                columns=list(pt.columns),
587  	                is_group_by=len(fd.get('groupby')) > 0,
588  	            )
589  	        except:
590  ->	            import pdb; pdb.post_mortem()
591
592
593  	class PivotTableViz(BaseViz):
594
595  	    """A pivot table view, define your rows, columns and metrics"""
(Pdb) values
[u'SUM(itemsales)']
(Pdb) df.head()
                __timestamp  sum(itemsales)
0 2018-06-15 00:00:00+00:00             0.0
1 2018-06-11 00:00:00+00:00             0.0
2 2018-06-13 00:00:00+00:00             0.0
3 2018-06-09 00:00:00+00:00             0.0
4 2018-06-07 00:00:00+00:00             0.0
(Pdb) self.metrics
[u'SUM(itemsales)']
(Pdb) df.columns
Index([u'__timestamp', u'sum(itemsales)'], dtype='object')

The error occurred at line 578

            pt = df.pivot_table(
                index=DTTM_ALIAS,
                columns=columns,
                values=values)

Make sure these boxes are checked before submitting your issue - thank you!

- [ x] I have reproduced the issue with at least the latest released version of superset
- [ x] I have checked the issue tracker for the same issue and I haven't found one similar

Superset version

superset==0.25.6

Expected results

I expect either the metrics to be all lower cased or that the column names of the results dataframe to match the form as the aggregate query

Actual results

The data frame has their column name lower cased and the metrics still retain the formatting.

Steps to reproduce

This is used on test data with a random numeric generator. I have seen this error in every case where I am using the SUM aggregation. The database is on Redshift and I have confirmed that I am using pandas==0.22.0.

I can push a fix to make the metrics lower cased or have the column name of the data frame match the metric but I'm not sure if that is the best way to approach this.

The text was updated successfully, but these errors were encountered:

villebro · 2018-07-03T05:33:53Z

This sounds a lot like a similar problem with Snowflake/Oracle that was fixed in PR #4994. Can you try to manually lowercasing the metric name from SUM(itemsales) to sum(itemsales) under the datasource properties? If that fixes the problem I suggest implementing the normalize_column_name override from the Snowflake/Oracle spec to Redshift, which forces the names to lower case, circumventing the problem for now.

minh5 · 2018-07-03T14:34:59Z

Yea manually lowering the case will fix it. Where can I find the Snowflake/Oracle spec? I'll try to push a fix

villebro · 2018-07-03T18:03:43Z

Just copy the following lines from under the snowflake spec to the redshift one, that should do the trick. If it works put through a PR.

https://github.com/apache/incubator-superset/blob/72d815c0f950ad655cb076ef8f8f8d01b65fe9aa/superset/db_engine_specs.py#L415-L417

minh5 · 2018-07-03T18:36:54Z

Thanks, @villebro. I tested it out and it worked beautifully, I pushed up a PR.

mistercrunch · 2018-07-04T21:35:33Z

Merged the PR, closing issue

wyndhblb · 2018-10-22T14:19:35Z

Hi does it seem like this merge got clobbered :: a165aec#diff-6519edc75f2440a575cb22492f401100?

minh5 mentioned this issue Jul 3, 2018

normalize column names for Redshift #5337

Merged

mistercrunch closed this as completed Jul 4, 2018

minh5 mentioned this issue Sep 7, 2018

Improve support for BigQuery, Redshift, Oracle, Db2, Snowflake #5827

Merged

mrshu mentioned this issue Mar 5, 2021

KeyError when using uppercase Metrics with Athena (in Superset 1.0.1) #13481

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError when using Redshift #5308

KeyError when using Redshift #5308

minh5 commented Jun 28, 2018

villebro commented Jul 3, 2018

minh5 commented Jul 3, 2018

villebro commented Jul 3, 2018

minh5 commented Jul 3, 2018

mistercrunch commented Jul 4, 2018

wyndhblb commented Oct 22, 2018

KeyError when using Redshift #5308

KeyError when using Redshift #5308

Comments

minh5 commented Jun 28, 2018

Superset version

Expected results

Actual results

Steps to reproduce

villebro commented Jul 3, 2018

minh5 commented Jul 3, 2018

villebro commented Jul 3, 2018

minh5 commented Jul 3, 2018

mistercrunch commented Jul 4, 2018

wyndhblb commented Oct 22, 2018