Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_csv rendering bad data in return values #8273

Closed
1 task done
innanetguru opened this issue Apr 22, 2024 · 8 comments
Closed
1 task done

read_csv rendering bad data in return values #8273

innanetguru opened this issue Apr 22, 2024 · 8 comments
Labels
bug This issue/PR relates to a bug module module plugins plugin (any type)

Comments

@innanetguru
Copy link

Summary

I'm using the read_csv module in a playbook to create a usable vars file in json format in subsequent plays/tasks. There seems to be a problem with rendering the return values in larger datasets (e.g. CSV with more than about 50 rows), as shown in the snippet below:

{"dict": {"\ufeffname": {"name": "\ufeffname", "subnet": "subnet", "netmask": "netmask"}}

I thought maybe the issue was occurring when the return data was passed to the copy module to be written in a json file in the next task, but running the debug module shows that the read_csv is producing this bad data somewhere and returning it to the original caller. Playbook below:


  • name: Extract Subnet Data from CSV
    hosts: localhost
    connection: local
    gather_facts: no
    vars:
    • csv_to_import: 'test.csv'

    • field_names:

      • name
      • subnet
      • netmask
        tasks:
    • name: Extract Subnet data from CSV
      read_csv:
      delimiter: ','
      dialect: excel
      path: "{{ csv_to_import }}"
      key: "{{ field_names[0] }}"
      fieldnames: "{{ field_names }}"
      register: Data

    • name: Print "Data"
      debug:
      var: Data
      verbosity: 4

    • name: Create JSON VARS file
      copy:
      content: "{{ Data | to_json }} "
      dest: test.json
      owner: user
      mode: 0777

Clipped log output from playbook run below:

2024-04-22 09:17:25,349 p=13932 u=dgiardin n=ansible | ok: [localhost] => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": false,
"dict": {
"name": {
"name": "name",
"netmask": "netmask",
"subnet": "subnet"
}
},
"invocation": {
"module_args": {
"delimiter": ",",
"dialect": "excel",
"fieldnames": [
"name",
"subnet",
"netmask"
],
"key": "name",
"path": "hmf_sdwan_fwobj.csv",
"skipinitialspace": null,
"strict": null,
"unique": true
}
},
"list": []
}
2024-04-22 09:17:25,368 p=13932 u=user n=ansible | TASK [Print "Data"] ************************************************************************************************************************************************************************************************************
2024-04-22 09:17:25,714 p=13932 u=user n=ansible | ok: [localhost] => {
"Data": {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": false,
"dict": {
"name": {
"name": "name",
"netmask": "netmask",
"subnet": "subnet"
}
},
"failed": false,
"list": []
}
}

github_read_csv_issue

Issue Type

Bug Report

Component Name

read_csv, copy

Ansible Version

$ ansible --version
ansible [core 2.13.13]
  config file = /mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg
  configured module search path = ['/mnt/c/Users/user/Desktop/projects/ans-network-cloud/modules']
  ansible python module location = /home/user/.local/share/virtualenvs/ans-network-cloud-4k5bpl0c/lib/python3.8/site-packages/ansible
  ansible collection location = /home/user/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/user/.local/share/virtualenvs/ans-network-cloud-4k5bpl0c/bin/ansible
  python version = 3.8.10 (default, Sep 28 2021, 16:10:42) [GCC 9.3.0]
  jinja version = 3.1.3
  libyaml = True

Community.general Version

$ ansible-galaxy collection list community.general
# /home/user/.local/share/virtualenvs/ans-network-cloud-4k5bpl0c/lib/python3.8/site-packages/ansible_collections
Collection        Version
----------------- -------
community.general 5.8.3  

Configuration

$ ansible-config dump --only-changed
DEFAULT_DEBUG(env: ANSIBLE_DEBUG) = False
DEFAULT_FILTER_PLUGIN_PATH(/mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg) = ['/mnt/c/Users/user/Desktop/projects/ans-network-cloud/filter_plugins']
DEFAULT_GATHERING(/mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg) = implicit
DEFAULT_HOST_LIST(/mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg) = ['/mnt/c/Users/user/Desktop/projects/ans-network-cloud/inventory']
DEFAULT_LOG_PATH(/mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg) = /mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.log
DEFAULT_MODULE_PATH(/mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg) = ['/mnt/c/Users/user/Desktop/projects/ans-network-cloud/modules']
DEFAULT_NO_TARGET_SYSLOG(/mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg) = True
DEFAULT_ROLES_PATH(/mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg) = ['/mnt/c/Users/user/Desktop/projects/ans-network-cloud/roles']
DEPRECATION_WARNINGS(/mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg) = False
HOST_KEY_CHECKING(/mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg) = False
PERSISTENT_COMMAND_TIMEOUT(/mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg) = 1000
PERSISTENT_CONNECT_TIMEOUT(/mnt/c/Users/user/Desktop/projects/ans-network-cloud/ansible.cfg) = 1000

OS / Environment

Ubuntu WSL2

Steps to Reproduce

---
- name: Extract Subnet Data from CSV
  hosts: localhost
  connection: local
  gather_facts: no 
  vars:
    - csv_to_import: 'test.csv'
    - field_names:
        - name
        - subnet
        - netmask
  tasks:
    - name: Extract Subnet data from CSV
      read_csv:
        delimiter: ','
        dialect: excel
        path: "{{ csv_to_import }}"
        key: "{{ field_names[0] }}"
        fieldnames: "{{ field_names }}"
      register: Data

    - name: Print "Data"
      debug:
        var: Data
        verbosity: 4
    
    - name: Create JSON VARS file 
      copy: 
        content: "{{ Data | to_json }} "
        dest: test.json
        owner: user
        mode: 0777

Expected Results

a CSV with hundrends of rows seems to trigger this issue. I haven't found an exact number at which point the problems starts happening.

Actual Results

Code of Conduct

  • I agree to follow the Ansible Code of Conduct
@ansibullbot
Copy link
Collaborator

Files identified in the description:

If these files are incorrect, please update the component name section of the description or use the !component bot command.

click here for bot help

@ansibullbot
Copy link
Collaborator

@ansibullbot ansibullbot added bug This issue/PR relates to a bug module module plugins plugin (any type) labels Apr 22, 2024
@innanetguru
Copy link
Author

attached is the actual json file created in task#2 (error only)
read_csv_error.json

@felixfontein
Copy link
Collaborator

It looks like your Excel file contains a Byte Order Mark (https://en.wikipedia.org/wiki/Byte_order_mark#Usage). This was already reported in the past in #544, and fixed in #6662. The fix landed in community.general 6.x.y and 7.x.y, and is contained in all later releases. The 5.8.3 version you are using does not have this fix.

Please upgrade your community.general version; the version you are using is End of Life and will no longer get updated.

@innanetguru
Copy link
Author

@felixfontein I was able to update community.general to the latest version and after rerunning the playbook, I'm still getting a return value that contains this in the returned dict:

},
"name": {
"name": "name",
"netmask": "netmask",
"subnet": "subnet"
}

so while the '\ufeff' byte order mark has been removed, it still doesn't explain why the read_csv module seems to create a k/v pair where v=k. I can write a filter to deal with this behavior so that it doesn't cause errors in the playbooks that use the collection, but it seems maybe this should be investigated further.

I'll run a few more tests with some subsets of the existing CSV to see exactly at which point the modules starts adding these entries, will post the results.

@felixfontein
Copy link
Collaborator

I'm not sure what you mean with "bad data" then. In your examples, the module seems to do exactly what it should do when using the key option. See the first example in the offical module docs:

# Example CSV file with header
#
#   name,uid,gid
#   dag,500,500
#   jeroen,501,500

# Read a CSV file and access user 'dag'
- name: Read users from CSV file and return a dictionary
  community.general.read_csv:
    path: users.csv
    key: name
  register: users
  delegate_to: localhost

- ansible.builtin.debug:
    msg: 'User {{ users.dict.dag.name }} has UID {{ users.dict.dag.uid }} and GID {{ users.dict.dag.gid }}'

@innanetguru
Copy link
Author

innanetguru commented Apr 24, 2024

@felixfontein ok, it took me a minute, but I realize now what is happening....

the module is creating a dictionary entry out the first row of the CSV containing the column headers. I ran a test with a file that contained the column headers in row 1 (name, subnet, netmask) and data in each column of row 2 and get this back:

"Data": {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": false,
"dict": {
"SDWAN_agg1": {
"name": "SDWAN_agg1",
"netmask": "255.255.254.0",
"subnet": "10.0.10.0/24"
},
"name": {
"name": "name",
"netmask": "netmask",
"subnet": "subnet"
}
},
"failed": false,
"list": []

I don't think this is the expected behavior for the module.

@felixfontein
Copy link
Collaborator

The module has behaved this way for quite many years, so I would argue that yes, this is the expected behavior for the module.

That doesn't mean that this behavior is great, or shouldn't be configurable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue/PR relates to a bug module module plugins plugin (any type)
Projects
None yet
Development

No branches or pull requests

3 participants