-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] DBT Seed in dbt-athena-community is not working for .csv with encoding utf-8 #415
Comments
Thanks for reaching out @juliodias20 ! Special characters worked for me when I tried with ExampleSee below for my output when using Create this file:
Run these commands: dbt seed
dbt show --inline 'select * from {{ ref("my_seed") }}' See this output: | id | some_text |
| -- | --------- |
| 1 | ABC |
| 2 | Ã Á Í Ç | |
Thanks for the collaboration @dbeatty10 ! Now that you said this, I tested the same case with the Databricks adapter and it works correctly! It really sounds like a problem with athena adapter. |
Hello, It works on my side with dbt-athena on Windows.
|
Hello @e-quili , thank you so much for collaboration! I tested this solution and it works!! I will use this in my local environment for developments, but I still think that there is a bug, once that anothers adapters can identify the Encoding of the .csv file. What do you think? |
I have a similar problem: I cannot even upload the csv file to s3 that contains these words: Sedlišćo pódzajtšo, Dolna Łužyca, etc. This will not work with the athena dbt adapter even if the letters are utf8. os.environ["PYTHONIOENCODING"]
sys.getfilesystemencoding()
sys.getdefaultencoding() And they are all set to utf8. The issue is withing agate The function So here I am actually stuck - these are the rows that cannot be processed because somehow it always checks with cp1252 which does not contain any special characters. Here is some test data:
One additional note: If I just read my csv and write it again (as dbt does) it just works:
Maybe someone can lead me in the right direction where local csv file is actually read in dbt - I cannot find the creation of the agate table. UPDATE: Okay it is actually Powershell causing the issues. If I use git bash it just works fine. |
Okay for everyone having issues with seeds and dbt athena on Windows: Set this: |
Is this a new bug in dbt-core?
Current Behavior
I would like to start by apologizing if there is already a bug report on this subject, but I couldn't find it.
Lets go, I have a .csv file (sds_teste.csv) that I am using as a seed, like this:
So, I execute the command
dbt seed -s sds_teste
and the dbt is successful executedBut, when i execute a
select
to see the table created by dbt seed command, I can see that the table cannot read the special characters (accented letters)I already try some things that I found around de internet, like to pass the
encoding: utf-8
, but i not found nothing that working.My profiles.yml
Expected Behavior
The expected behavir is that the dbt seed would can read a .csv file in encoding utf-8.
Should be:
A text with special characters, like Ã, Á, Í, or Ç
Instead of:
A text with special characters, like �, �, �, or �
Steps To Reproduce
1 - Install the python 3.11.9 in a windows computer
2 - Create a python environment with
python venv
3 - Install
dbt-core==1.8.7
anddbt-athena-community==1.8.4
bypip install
4 - Create a dbt project
5 - Create a .csv file in the folder
seeds/
and write some example with special characters6 - Configure the profile.yml to connect a
AWS Athena
(storage:AWS S3
)7 - Run the
dbt seed
commandRelevant log output
Environment
Which database adapter are you using with dbt?
other (mention it in "Additional Context")
Additional Context
No response
The text was updated successfully, but these errors were encountered: