-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_csv rendering bad data in return values #8273
Comments
Files identified in the description: If these files are incorrect, please update the |
attached is the actual json file created in task#2 (error only) |
It looks like your Excel file contains a Byte Order Mark (https://en.wikipedia.org/wiki/Byte_order_mark#Usage). This was already reported in the past in #544, and fixed in #6662. The fix landed in community.general 6.x.y and 7.x.y, and is contained in all later releases. The 5.8.3 version you are using does not have this fix. Please upgrade your community.general version; the version you are using is End of Life and will no longer get updated. |
@felixfontein I was able to update community.general to the latest version and after rerunning the playbook, I'm still getting a return value that contains this in the returned dict: }, so while the '\ufeff' byte order mark has been removed, it still doesn't explain why the read_csv module seems to create a k/v pair where v=k. I can write a filter to deal with this behavior so that it doesn't cause errors in the playbooks that use the collection, but it seems maybe this should be investigated further. I'll run a few more tests with some subsets of the existing CSV to see exactly at which point the modules starts adding these entries, will post the results. |
I'm not sure what you mean with "bad data" then. In your examples, the module seems to do exactly what it should do when using the # Example CSV file with header
#
# name,uid,gid
# dag,500,500
# jeroen,501,500
# Read a CSV file and access user 'dag'
- name: Read users from CSV file and return a dictionary
community.general.read_csv:
path: users.csv
key: name
register: users
delegate_to: localhost
- ansible.builtin.debug:
msg: 'User {{ users.dict.dag.name }} has UID {{ users.dict.dag.uid }} and GID {{ users.dict.dag.gid }}' |
@felixfontein ok, it took me a minute, but I realize now what is happening.... the module is creating a dictionary entry out the first row of the CSV containing the column headers. I ran a test with a file that contained the column headers in row 1 (name, subnet, netmask) and data in each column of row 2 and get this back: "Data": { I don't think this is the expected behavior for the module. |
The module has behaved this way for quite many years, so I would argue that yes, this is the expected behavior for the module. That doesn't mean that this behavior is great, or shouldn't be configurable. |
Summary
I'm using the read_csv module in a playbook to create a usable vars file in json format in subsequent plays/tasks. There seems to be a problem with rendering the return values in larger datasets (e.g. CSV with more than about 50 rows), as shown in the snippet below:
{"dict": {"\ufeffname": {"name": "\ufeffname", "subnet": "subnet", "netmask": "netmask"}}
I thought maybe the issue was occurring when the return data was passed to the copy module to be written in a json file in the next task, but running the debug module shows that the read_csv is producing this bad data somewhere and returning it to the original caller. Playbook below:
hosts: localhost
connection: local
gather_facts: no
vars:
csv_to_import: 'test.csv'
field_names:
tasks:
name: Extract Subnet data from CSV
read_csv:
delimiter: ','
dialect: excel
path: "{{ csv_to_import }}"
key: "{{ field_names[0] }}"
fieldnames: "{{ field_names }}"
register: Data
name: Print "Data"
debug:
var: Data
verbosity: 4
name: Create JSON VARS file
copy:
content: "{{ Data | to_json }} "
dest: test.json
owner: user
mode: 0777
Clipped log output from playbook run below:
2024-04-22 09:17:25,349 p=13932 u=dgiardin n=ansible | ok: [localhost] => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": false,
"dict": {
"name": {
"name": "name",
"netmask": "netmask",
"subnet": "subnet"
}
},
"invocation": {
"module_args": {
"delimiter": ",",
"dialect": "excel",
"fieldnames": [
"name",
"subnet",
"netmask"
],
"key": "name",
"path": "hmf_sdwan_fwobj.csv",
"skipinitialspace": null,
"strict": null,
"unique": true
}
},
"list": []
}
2024-04-22 09:17:25,368 p=13932 u=user n=ansible | TASK [Print "Data"] ************************************************************************************************************************************************************************************************************
2024-04-22 09:17:25,714 p=13932 u=user n=ansible | ok: [localhost] => {
"Data": {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": false,
"dict": {
"name": {
"name": "name",
"netmask": "netmask",
"subnet": "subnet"
}
},
"failed": false,
"list": []
}
}
Issue Type
Bug Report
Component Name
read_csv, copy
Ansible Version
Community.general Version
Configuration
OS / Environment
Ubuntu WSL2
Steps to Reproduce
Expected Results
a CSV with hundrends of rows seems to trigger this issue. I haven't found an exact number at which point the problems starts happening.
Actual Results
Code of Conduct
The text was updated successfully, but these errors were encountered: