Skip to content
This repository has been archived by the owner on Jun 24, 2022. It is now read-only.

Fix systemd stat symlink #460

Merged
merged 3 commits into from
Jun 27, 2018
Merged

Conversation

toadjaune
Copy link

@toadjaune toadjaune commented Jun 26, 2018

Issue

Elasticsearch version: 6.3.0

Role version: 6.3.0

JVM version (java -version):

java version "1.8.0_171"
Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)

OS version (uname -a if on a Unix-like system):

Linux hostname 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux

Ansible version :

ansible 2.5.5
  config file = None
  configured module search path = ['/home/user/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.5/dist-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 3.5.3 (default, Jan 19 2017, 14:11:04) [GCC 6.3.0 20170118]

Description of the problem including expected versus actual behaviour:

When deploying with systemd (use_system_d is set to true by the playbook, default behaviour on Debian 8 and above), if the elasticsearch systemd unit file does not exist, the playbook fails instead of creating it as a symlink.

Playbook:
The playbook we actually played is quite complicated, and was run number of times to overcome the numerous issues we met during an upgrade from ElasticSearch 5.6.2 to 6.3.0.

The following playbook however reproduces the issue we met :

- hosts: localhost
  tasks:
    # Setup
    - name: Create a dummy source file
      file:
        path: source
        state: touch

    - name: Make sure dest file doesn't exist
      file:
        path: dest
        state: absent

    # Issue reproduction
    - name: Check if dest file exists
      stat:
        path: dest
      register: stat_result

    - name: Remove if it is a normal file
      file:
        path: dest
        state: absent
      when: not stat_result.stat.islnk

    - name: Create a symlink
      file:
        state: link
        src: source
        path: dest
      when: not stat_result.stat.islnk

Provide logs from Ansible:
The above playbook gives the exact same error than the real one (except for the file paths), which is :

PLAY [localhost] ******************************************************************************************************************************************************

TASK [Gathering Facts] ************************************************************************************************************************************************
ok: [localhost]

TASK [Create a dummy source file] *************************************************************************************************************************************
changed: [localhost]

TASK [Make sure dest file doesn't exist] ******************************************************************************************************************************
ok: [localhost]

TASK [Check if dest file exists] **************************************************************************************************************************************
ok: [localhost]

TASK [Remove if it is a normal file] **********************************************************************************************************************************
fatal: [localhost]: FAILED! => {"msg": "The conditional check 'not stat_result.stat.islnk' failed. The error was: error while evaluating conditional (not stat_result.stat.islnk): 'dict object' has no attribute 'islnk'\n\nThe error appears to have been in '/home/user/test_playbook.yml': line 20, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n    - name: Remove if it is a normal file\n      ^ here\n"}
        to retry, use: --limit @/home/user/test_playbook.retry

PLAY RECAP ************************************************************************************************************************************************************
localhost                  : ok=4    changed=1    unreachable=0    failed=1

Proposed fix

According to the stat module documentation, most values in the return dict, including islnk are only defined if the target file exists. When it doesn't, the current code checks a non-existing dict entry, and ansible fails.

I can't explain with certitude why we ended up with no systemd unit file in the first place, but it is definitely possible (happened at some point of the upgrade on every single node we had), and ansible should not fail on something so easily fixable.

First version

Here is the fix I first came up with :

  • add a presence check with sysd_stat_result.stat.exists, to ensure not to test a non-existing dict key
  • remove the second test, because it is useless :
    • the file module is idempotent, and won't change anything if it is not required, nor declare anything has changed unless it is the case.
    • this can actually correct an incorrect symlink, which is probably more fail-proof than the current code.

Second version

The second version is the current PR, and is in my opinion way more elegant : we just get rid of all the test and the manual file suppression, and use the force option of the file module instead.

The only functional difference with the first solution is that ansible will silently create the symlink even if the file it points to does not exist (instead of failing), which is not supposed to be possible (because this file is created earlier in the same task file), so it should be safe.

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@toadjaune toadjaune changed the title Fix stat symlink Fix systemd stat symlink Jun 26, 2018
@Crazybus
Copy link
Contributor

jenkins test this please

@Crazybus
Copy link
Contributor

@toadjaune Thank you so much for the great writeup with all of the detail. I really appreciate a well written PR.

I'm going to let the tests run but I believe this is going to fail on the multi test suite. This playbook supports running multiple instances of elasticsearch on the same machine which is why this logic is more complex than it needs to be.

Create a symbolic link to the default systemd location to the first instance running on this host

This code you removed makes sure that the symbolic link is only created once for the first machine. It is needed to allow uninstalls of the elasticsearch package to work properly (it needs to do a systemctl stop elasticsearch) so that it can switch between the elasticsearch and elasticsearch-oss package.

Lets see what the tests say first, however I think this might need to be solved in a more ugly way :(

@Crazybus
Copy link
Contributor

Yup as I suspected. All of the systemd multi tests are failing.

image

If you want to test this locally we have some documentation on how to run these tests yourself. https://github.com/elastic/ansible-elasticsearch#testing

To reproduce this locall you would want to run.

make verify PATTERN=mutli-ubuntu-1604

@bilsch bilsch mentioned this pull request Jun 26, 2018
@Crazybus
Copy link
Contributor

jenkins test this please

@toadjaune
Copy link
Author

toadjaune commented Jun 27, 2018

As you should have seen, this is a simpler fix, just making sure that we don't try to access stat.islnk if it is not defined, without any change to the underlying logic.

I've been trying to run the tests locally for the last hour, then gave up. I'll be waiting for Jenkins again ! :)

@Crazybus
Copy link
Contributor

@toadjaune Thanks so much for the fix here! Once again I really really appreciate the great detailed pull request description :)

@Crazybus Crazybus merged commit 5889348 into elastic:master Jun 27, 2018
@toadjaune toadjaune deleted the fix-stat-symlink branch June 27, 2018 10:41
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants