read_csv rendering bad data in return values #8273

innanetguru · 2024-04-22T13:33:57Z

Summary

I'm using the read_csv module in a playbook to create a usable vars file in json format in subsequent plays/tasks. There seems to be a problem with rendering the return values in larger datasets (e.g. CSV with more than about 50 rows), as shown in the snippet below:

{"dict": {"\ufeffname": {"name": "\ufeffname", "subnet": "subnet", "netmask": "netmask"}}

I thought maybe the issue was occurring when the return data was passed to the copy module to be written in a json file in the next task, but running the debug module shows that the read_csv is producing this bad data somewhere and returning it to the original caller. Playbook below:

name: Extract Subnet Data from CSV
hosts: localhost
connection: local
gather_facts: no
vars:
- csv_to_import: 'test.csv'
- field_names:
  - name
  - subnet
  - netmask
    tasks:
- name: Extract Subnet data from CSV
  read_csv:
  delimiter: ','
  dialect: excel
  path: "{{ csv_to_import }}"
  key: "{{ field_names[0] }}"
  fieldnames: "{{ field_names }}"
  register: Data
- name: Print "Data"
  debug:
  var: Data
  verbosity: 4
- name: Create JSON VARS file
  copy:
  content: "{{ Data | to_json }} "
  dest: test.json
  owner: user
  mode: 0777

Clipped log output from playbook run below:

2024-04-22 09:17:25,349 p=13932 u=dgiardin n=ansible | ok: [localhost] => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": false,
"dict": {
"name": {
"name": "name",
"netmask": "netmask",
"subnet": "subnet"
}
},
"invocation": {
"module_args": {
"delimiter": ",",
"dialect": "excel",
"fieldnames": [
"name",
"subnet",
"netmask"
],
"key": "name",
"path": "hmf_sdwan_fwobj.csv",
"skipinitialspace": null,
"strict": null,
"unique": true
}
},
"list": []
}
2024-04-22 09:17:25,368 p=13932 u=user n=ansible | TASK [Print "Data"] ************************************************************************************************************************************************************************************************************
2024-04-22 09:17:25,714 p=13932 u=user n=ansible | ok: [localhost] => {
"Data": {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": false,
"dict": {
"name": {
"name": "name",
"netmask": "netmask",
"subnet": "subnet"
}
},
"failed": false,
"list": []
}
}

Issue Type

Bug Report

Component Name

read_csv, copy

Ansible Version

$ ansible --version
ansible [core 2.13.13]
  config file = /mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg
  configured module search path = ['/mnt/c/Users/user/Desktop/projects/ans-network-cloud/modules']
  ansible python module location = /home/user/.local/share/virtualenvs/ans-network-cloud-4k5bpl0c/lib/python3.8/site-packages/ansible
  ansible collection location = /home/user/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/user/.local/share/virtualenvs/ans-network-cloud-4k5bpl0c/bin/ansible
  python version = 3.8.10 (default, Sep 28 2021, 16:10:42) [GCC 9.3.0]
  jinja version = 3.1.3
  libyaml = True

Community.general Version

$ ansible-galaxy collection list community.general
# /home/user/.local/share/virtualenvs/ans-network-cloud-4k5bpl0c/lib/python3.8/site-packages/ansible_collections
Collection        Version
----------------- -------
community.general 5.8.3

Configuration

$ ansible-config dump --only-changed
DEFAULT_DEBUG(env: ANSIBLE_DEBUG) = False
DEFAULT_FILTER_PLUGIN_PATH(/mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg) = ['/mnt/c/Users/user/Desktop/projects/ans-network-cloud/filter_plugins']
DEFAULT_GATHERING(/mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg) = implicit
DEFAULT_HOST_LIST(/mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg) = ['/mnt/c/Users/user/Desktop/projects/ans-network-cloud/inventory']
DEFAULT_LOG_PATH(/mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg) = /mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.log
DEFAULT_MODULE_PATH(/mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg) = ['/mnt/c/Users/user/Desktop/projects/ans-network-cloud/modules']
DEFAULT_NO_TARGET_SYSLOG(/mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg) = True
DEFAULT_ROLES_PATH(/mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg) = ['/mnt/c/Users/user/Desktop/projects/ans-network-cloud/roles']
DEPRECATION_WARNINGS(/mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg) = False
HOST_KEY_CHECKING(/mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg) = False
PERSISTENT_COMMAND_TIMEOUT(/mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg) = 1000
PERSISTENT_CONNECT_TIMEOUT(/mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg) = 1000

OS / Environment

Ubuntu WSL2

Steps to Reproduce

---
- name: Extract Subnet Data from CSV
  hosts: localhost
  connection: local
  gather_facts: no 
  vars:
    - csv_to_import: 'test.csv'
    - field_names:
        - name
        - subnet
        - netmask
  tasks:
    - name: Extract Subnet data from CSV
      read_csv:
        delimiter: ','
        dialect: excel
        path: "{{ csv_to_import }}"
        key: "{{ field_names[0] }}"
        fieldnames: "{{ field_names }}"
      register: Data

    - name: Print "Data"
      debug:
        var: Data
        verbosity: 4
    
    - name: Create JSON VARS file 
      copy: 
        content: "{{ Data | to_json }} "
        dest: test.json
        owner: user
        mode: 0777

Expected Results

a CSV with hundrends of rows seems to trigger this issue. I haven't found an exact number at which point the problems starts happening.

Actual Results

Code of Conduct

I agree to follow the Ansible Code of Conduct

ansibullbot · 2024-04-22T14:02:32Z

Files identified in the description:

plugins/modules/read_csv.py

If these files are incorrect, please update the component name section of the description or use the !component bot command.

click here for bot help

ansibullbot · 2024-04-22T14:02:32Z

cc @dagwieers
click here for bot help

innanetguru · 2024-04-22T14:16:59Z

attached is the actual json file created in task#2 (error only)
read_csv_error.json

felixfontein · 2024-04-22T16:18:20Z

It looks like your Excel file contains a Byte Order Mark (https://en.wikipedia.org/wiki/Byte_order_mark#Usage). This was already reported in the past in #544, and fixed in #6662. The fix landed in community.general 6.x.y and 7.x.y, and is contained in all later releases. The 5.8.3 version you are using does not have this fix.

Please upgrade your community.general version; the version you are using is End of Life and will no longer get updated.

innanetguru · 2024-04-23T13:04:31Z

@felixfontein I was able to update community.general to the latest version and after rerunning the playbook, I'm still getting a return value that contains this in the returned dict:

},
"name": {
"name": "name",
"netmask": "netmask",
"subnet": "subnet"
}

so while the '\ufeff' byte order mark has been removed, it still doesn't explain why the read_csv module seems to create a k/v pair where v=k. I can write a filter to deal with this behavior so that it doesn't cause errors in the playbooks that use the collection, but it seems maybe this should be investigated further.

I'll run a few more tests with some subsets of the existing CSV to see exactly at which point the modules starts adding these entries, will post the results.

felixfontein · 2024-04-23T20:05:45Z

I'm not sure what you mean with "bad data" then. In your examples, the module seems to do exactly what it should do when using the key option. See the first example in the offical module docs:

# Example CSV file with header
#
#   name,uid,gid
#   dag,500,500
#   jeroen,501,500

# Read a CSV file and access user 'dag'
- name: Read users from CSV file and return a dictionary
  community.general.read_csv:
    path: users.csv
    key: name
  register: users
  delegate_to: localhost

- ansible.builtin.debug:
    msg: 'User {{ users.dict.dag.name }} has UID {{ users.dict.dag.uid }} and GID {{ users.dict.dag.gid }}'

innanetguru · 2024-04-24T01:01:16Z

@felixfontein ok, it took me a minute, but I realize now what is happening....

the module is creating a dictionary entry out the first row of the CSV containing the column headers. I ran a test with a file that contained the column headers in row 1 (name, subnet, netmask) and data in each column of row 2 and get this back:

"Data": {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": false,
"dict": {
"SDWAN_agg1": {
"name": "SDWAN_agg1",
"netmask": "255.255.254.0",
"subnet": "10.0.10.0/24"
},
"name": {
"name": "name",
"netmask": "netmask",
"subnet": "subnet"
}
},
"failed": false,
"list": []

I don't think this is the expected behavior for the module.

felixfontein · 2024-04-27T07:59:13Z

The module has behaved this way for quite many years, so I would argue that yes, this is the expected behavior for the module.

That doesn't mean that this behavior is great, or shouldn't be configurable.

ansibullbot added bug This issue/PR relates to a bug module module plugins plugin (any type) labels Apr 22, 2024

felixfontein closed this as completed Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_csv rendering bad data in return values #8273

read_csv rendering bad data in return values #8273

innanetguru commented Apr 22, 2024

ansibullbot commented Apr 22, 2024

ansibullbot commented Apr 22, 2024

innanetguru commented Apr 22, 2024

felixfontein commented Apr 22, 2024

innanetguru commented Apr 23, 2024

felixfontein commented Apr 23, 2024

innanetguru commented Apr 24, 2024 •

edited

Loading

felixfontein commented Apr 27, 2024

read_csv rendering bad data in return values #8273

read_csv rendering bad data in return values #8273

Comments

innanetguru commented Apr 22, 2024

Summary

Issue Type

Component Name

Ansible Version

Community.general Version

Configuration

OS / Environment

Steps to Reproduce

Expected Results

Actual Results

Code of Conduct

ansibullbot commented Apr 22, 2024

ansibullbot commented Apr 22, 2024

innanetguru commented Apr 22, 2024

felixfontein commented Apr 22, 2024

innanetguru commented Apr 23, 2024

felixfontein commented Apr 23, 2024

innanetguru commented Apr 24, 2024 • edited Loading

felixfontein commented Apr 27, 2024

innanetguru commented Apr 24, 2024 •

edited

Loading