Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-230] DBT seed converts empty strings to nulls #4725

Closed
snagapuri-mdsol opened this issue Feb 14, 2022 · 3 comments
Closed

[CT-230] DBT seed converts empty strings to nulls #4725

snagapuri-mdsol opened this issue Feb 14, 2022 · 3 comments
Labels
bug Something isn't working seeds Issues related to dbt's seed functionality

Comments

@snagapuri-mdsol
Copy link

Describe the bug

DBT seed process converts

  • ,"",
  • ,,
  • ,NULL,

all above cases to NULLs

I would say atleast quoted empty strings should be allowed to be as blank as its being explicit enough to be used as blank: ,"",

Steps to reproduce

Add any csv seed file with data like below:
column1,column2,column3
NULL,"",

All the above three columns produce NULLs in database table

Expected results

NULL,,NULL
Atleast second column in the above example can be left alone to be blank in order to have an option in seed process to add blanks in database tables

Actual results

NULL,NULL,NULL

System information

Which database are you using dbt with?

  • snowflake

The output of dbt --version:

installed version: 1.0.1
   latest version: 1.0.1

Up to date!

Plugins:
  - snowflake: 1.0.0
@snagapuri-mdsol snagapuri-mdsol added bug Something isn't working triage labels Feb 14, 2022
@joellabes
Copy link
Contributor

Hi @snagapuri-mdsol, thanks for opening this!

Although you can define columns' types using https://docs.getdbt.com/reference/resource-configs/column_types, it doesn't seem to treat empty strings as blanks, instead converting them to nulls.

#2165 describes similar behaviour, in particular #2165 (comment)

I suspect the quotes are removed before agate even sees it, in python's csv.reader. In that case the only way to fix the problem would be to re-implement csv parsing (and then agate on top of that, but quote-aware). That sounds like a lot.

I'm going to move this to the dbt Core repo, where the undesirable behaviour lives. It sounds like in the past we've decided against implementing a fix, but let's see whether there's more appetite this time!

@joellabes joellabes transferred this issue from dbt-labs/dbt-utils Feb 15, 2022
@github-actions github-actions bot changed the title DBT seed converts empty strings to nulls [CT-230] DBT seed converts empty strings to nulls Feb 15, 2022
@snagapuri-mdsol
Copy link
Author

Hi @snagapuri-mdsol, thanks for opening this!

Although you can define columns' types using https://docs.getdbt.com/reference/resource-configs/column_types, it doesn't seem to treat empty strings as blanks, instead converting them to nulls.

#2165 describes similar behaviour, in particular #2165 (comment)

I suspect the quotes are removed before agate even sees it, in python's csv.reader. In that case the only way to fix the problem would be to re-implement csv parsing (and then agate on top of that, but quote-aware). That sounds like a lot.

I'm going to move this to the dbt Core repo, where the undesirable behaviour lives. It sounds like in the past we've decided against implementing a fix, but let's see whether there's more appetite this time!

Thanks @joellabes will keep watching for new updates

@jtcohen6 jtcohen6 added the seeds Issues related to dbt's seed functionality label Feb 21, 2022
@ChenyuLInx
Copy link
Contributor

@snagapuri-mdsol thanks for opening this! We are still going to stick to the behavior for empty strings in CSVs. Since it is implemented in python standard CSV reader, and it seems to be standard outside python.
To achieve the specific result you want, you can opt for a "placeholder" empty string value (e.g. EMPTY), coupled with a staging model that wraps around the seed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working seeds Issues related to dbt's seed functionality
Projects
None yet
Development

No branches or pull requests

4 participants