-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[explore] proper filtering of NULLs and '' #4651
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4651 +/- ##
==========================================
- Coverage 76.98% 76.91% -0.07%
==========================================
Files 44 44
Lines 8498 8522 +24
==========================================
+ Hits 6542 6555 +13
- Misses 1956 1967 +11
Continue to review full report at Codecov.
|
Definitely need this!! |
Did some work on the druid side. I can pick up the rest but i wanted to get some insight before i go on. https://github.com/hughhhh/incubator-superset/commit/3bc3bd46d05a96335f9663745f68429f10b5eff3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a [WIP] idea (tc-dc#8) for handling nulls that involves showing a tooltip next to the label, to distinguish between actual null
values and "null"
string values.
(I've only tested with druid, and so far only have it such that Enable Filter Select
must be on.)
@@ -61,7 +61,7 @@ export default class AlteredSliceTag extends React.Component { | |||
return '[]'; | |||
} | |||
return value.map((v) => { | |||
const filterVal = v.val.constructor === Array ? `[${v.val.join(', ')}]` : v.val; | |||
const filterVal = v.val && v.val.constructor === Array ? `[${v.val.join(', ')}]` : v.val; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be my unfamiliarity, but why not use Array.isArray
like in other parts of the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't write that line but FWIW someone on StackOverflow says it's the best way
https://stackoverflow.com/questions/767486/how-do-you-check-if-a-variable-is-an-array-in-javascript
f513bc8
to
bf32f56
Compare
e0956e6
to
aeb5bf2
Compare
superset/connectors/base/models.py
Outdated
def handle_single_value(v): | ||
# backward compatibility with previous <select> components | ||
if isinstance(v, basestring): | ||
v = v.strip().strip("'").strip('"') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do a single strip to remove whitespace and single/double quotes:
v = v.strip(' \'"')
Or, to also take care of tabs and line feeds:
v = v.strip('\t\n \'"')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh neat
superset/connectors/base/models.py
Outdated
return v | ||
if isinstance(values, (list, tuple)): | ||
values = [handle_single_value(v) for v in values] | ||
values = handle_single_value(values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing... I'd add the second call to an else
block (unless I'm missing something):
if isinstance(values, (list, tuple)):
values = [handle_single_value(v) for v in values]
else:
values = handle_single_value(values)
superset/connectors/base/models.py
Outdated
values = [handle_single_value(v) for v in values] | ||
values = handle_single_value(values) | ||
if is_list_target and not isinstance(values, (tuple, list)): | ||
values = [values] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if is_list_target
is true but values
is a tuple? In this case it would return a tuple, is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable should probably be called is_iterable_target
, both tuple and list will work here.
superset/connectors/druid/models.py
Outdated
eq = utils.string_to_num(eq) | ||
|
||
eq = cls.filter_values_handler( | ||
eq, is_list_target=op in ('in', 'not in'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit hard to read, can we define is_list_target
outside the function call, like you did for is_numeric_col
?
superset/connectors/sqla/models.py
Outdated
eq = self.filter_values_handler( | ||
flt.get('val'), | ||
target_column_is_numeric=col_obj.is_num, | ||
is_list_target=op in ('in', 'not in')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above here.
superset/viz.py
Outdated
@@ -1693,6 +1693,8 @@ def run_extra_queries(self): | |||
for flt in filters: | |||
qry['groupby'] = [flt] | |||
df = self.get_df_payload(query_obj=qry).get('df') | |||
print(df) | |||
print(df.dtypes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d2d
tests/druid_func_tests.py
Outdated
filtr = {'col': 'A', 'op': '==', 'val': []} | ||
res = DruidDatasource.get_filters([filtr], []) | ||
self.assertEqual('', res.filter['filter']['value']) | ||
self.assertEqual(None, res.filter['filter']['value']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use assertIsNone
instead.
tests/druid_func_tests.py
Outdated
|
||
def test_get_filters_handles_none_for_string_types(self): | ||
filtr = {'col': 'A', 'op': '==', 'val': None} | ||
res = DruidDatasource.get_filters([filtr], []) | ||
self.assertEqual('', res.filter['filter']['value']) | ||
self.assertEqual(None, res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
@betodealmeida addressed your comments |
TODO: handling of Druid equivalents
Error "unorderable types: str() < int()" occurs when grouping by a numerical Druid colummn that contains null values. * druid/pydruid returns strings in the datafram with NAs for nulls * Superset has custom logic around get_fillna_for_col that fills in the NULLs based on declared column type (FLOAT here), so now we have a mixed bag of type in the series * pandas chokes on pivot_table or groupby operations as it cannot sorts mixed types The approach here is to stringify and fillna('<NULL>') to get a consistent series.
* [WiP] [explore] proper filtering of NULLs and '' TODO: handling of Druid equivalents * Unit tests * Some refactoring * [druid] fix 'Unorderable types' when col has nuls Error "unorderable types: str() < int()" occurs when grouping by a numerical Druid colummn that contains null values. * druid/pydruid returns strings in the datafram with NAs for nulls * Superset has custom logic around get_fillna_for_col that fills in the NULLs based on declared column type (FLOAT here), so now we have a mixed bag of type in the series * pandas chokes on pivot_table or groupby operations as it cannot sorts mixed types The approach here is to stringify and fillna('<NULL>') to get a consistent series. * typo * Fix druid_func tests * Addressing more comments * last touches
* [WiP] [explore] proper filtering of NULLs and '' TODO: handling of Druid equivalents * Unit tests * Some refactoring * [druid] fix 'Unorderable types' when col has nuls Error "unorderable types: str() < int()" occurs when grouping by a numerical Druid colummn that contains null values. * druid/pydruid returns strings in the datafram with NAs for nulls * Superset has custom logic around get_fillna_for_col that fills in the NULLs based on declared column type (FLOAT here), so now we have a mixed bag of type in the series * pandas chokes on pivot_table or groupby operations as it cannot sorts mixed types The approach here is to stringify and fillna('<NULL>') to get a consistent series. * typo * Fix druid_func tests * Addressing more comments * last touches
* [WiP] [explore] proper filtering of NULLs and '' TODO: handling of Druid equivalents * Unit tests * Some refactoring * [druid] fix 'Unorderable types' when col has nuls Error "unorderable types: str() < int()" occurs when grouping by a numerical Druid colummn that contains null values. * druid/pydruid returns strings in the datafram with NAs for nulls * Superset has custom logic around get_fillna_for_col that fills in the NULLs based on declared column type (FLOAT here), so now we have a mixed bag of type in the series * pandas chokes on pivot_table or groupby operations as it cannot sorts mixed types The approach here is to stringify and fillna('<NULL>') to get a consistent series. * typo * Fix druid_func tests * Addressing more comments * last touches
closes #4297