Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: insert/values does not convert Number to String #3282

Conversation

spena
Copy link
Member

@spena spena commented Aug 28, 2019

Description

Insert integer values to a String column throws a NPE.

ksql> create stream stream1(id varchar) ...
ksql> insert into stream1(id) values (1);
Failed to insert values into stream/table: STREAM1
Caused by: java.lang.NullPointerException

Other databases support this conversion. This PR fixes this to allow integers to be converted to strings in the above case.

Testing done

  • Added a test case
  • Verified manually on the CLI

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@spena spena requested review from agavra and a team August 28, 2019 18:29
@spena spena self-assigned this Aug 28, 2019
agavra
agavra previously approved these changes Aug 28, 2019
Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @spena!

@@ -58,6 +58,10 @@
return Optional.empty();
}

if (targetType.baseType() == SqlBaseType.STRING) {
return optional(String.valueOf(value));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we should coerce all types - what if it's a struct, or an array (I know we don't support these yet, but it might make sense to just explicitly convert the types that we what we want to)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition is called after the value has been checked for a Number type (see the previous if). So at this point, we're only converting Number types to String. Or am I missing something?

@agavra agavra requested a review from a team August 28, 2019 20:04
@blueedgenick
Copy link
Contributor

woah, this feels rather strange.... i understand why some kinds of coercion have to happen when inserting literal but this seems like the principal one that isn't needed ? e.g. inserting to a date column (if we had such a thing yet) clearly can't expect the user to put a value "of type date" in their literal and so whatever she types is coerced to the correct form using some default rules. But surely in the case of numeric values we can sensibly make the distinction - isn't it basically the same rules as JSON, where "1" is a string and 1 is a number type ? Reason for asking is that i always like to err on the side of being cautious with introducing "magical" casting/coercing features, they always seem to return to bite one in the rear by surprising users in unforeseen corner cases more than they delight with their helpfulness

@spena
Copy link
Member Author

spena commented Sep 4, 2019

@blueedgenick It might not be needed, but I've seen this behavior from INT to STRING conversions in all the DBs, so I saw it useful to have it working in KSQL too if we want to behave like a DB instead of fixing the error message, right?.

Anyway, I'm open to change the error message if we see this coercion problematic for some users. What do you think?

@spena spena force-pushed the insert_values_coerce_numbers_to_string branch from 1e1ec8d to 66cc641 Compare September 4, 2019 16:32
Copy link
Contributor

@big-andy-coates big-andy-coates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @spena

I spotted this too when I was adding the new RestQueryTranslationTest that is waiting in PR #3291. I've included the additional tests I wanted to add at the bottom for reference.

I don't think we should get into the game of doing magic coercion to the correct type if the user supplies the wrong type. If the column type is a STRING, then the user should supply a STRING. If they supply a number, then we should not convert it to a STRING, IMHO.

I think we should just fix the NPE and return a better error message.

The complication comes about with the handling of key fields. If the user supplies a value for ROWKEY, (which is a string), but not for their key field, (i.e. the WITH(KEY=x)), and the key field is another type, e.g. INT, then what do we do? Currently the code fails with NPE.

Given the key field functionality is on its way out, I don't think we should invest too much time here. I think the best thing is just to return a better error message, saying:

The KEY field defined for the stream has a different type to ROWKEY. Please supply a valid value for the KEY field in the statement.

OR some such. (Of course, the code will still need to coerce the type to a STRING to check it matches).

@almog might be better placed to look into this, as he knows the code better.

Here's the additional tests I added to insert-values.json test, but then had to remove cos they failed!

    {
      "name": "should set key field from rowkey with type coercion",
      "statements": [
        "CREATE STREAM TEST (ID INT) WITH (kafka_topic='test_topic', value_format='JSON', PARTITIONS=1, KEY='ID');",
        "INSERT INTO TEST (ROWKEY) VALUES ('10');"
      ],
      "inputs": [
      ],
      "outputs": [
        {"topic": "test_topic", "key": "10", "value": {"ID": 10}}
      ]
    },
    {
      "name": "should set rowkey from key field with type coercion",
      "statements": [
        "CREATE STREAM TEST (ID INT) WITH (kafka_topic='test_topic', value_format='JSON', PARTITIONS=1, KEY='ID');",
        "INSERT INTO TEST (ID) VALUES (10);"
      ],
      "inputs": [
      ],
      "outputs": [
        {"topic": "test_topic", "key": "10", "value": {"ID": 10}}
      ]
    },
    {
      "name": "should fail if value can not be coerced to column type",
      "statements": [
        "CREATE STREAM TEST (ID INT) WITH (kafka_topic='test_topic', value_format='JSON', PARTITIONS=1, KEY='ID');",
        "INSERT INTO TEST (ROWKEY) VALUES (10);"
      ],
      "expectedError": {
        "type": "io.confluent.ksql.util.KsqlStatementException",
        "message": "Can not coerce thing"
      }
    },
    {
      "name": "should fail if rowkey can not be coerced to key field column type",
      "statements": [
        "CREATE STREAM TEST (ID INT) WITH (kafka_topic='test_topic', value_format='JSON', PARTITIONS=1, KEY='ID');",
        "INSERT INTO TEST (ROWKEY, ROWTIME) VALUES ('10', 1234);"
      ],
      "inputs": [
      ],
      "outputs": [
        {"topic": "test_topic", "timestamp": 1234, "key": "10", "value": {"ID": 10}}
      ]
    }

@big-andy-coates
Copy link
Contributor

Closing this as its been fixed by #3355

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants