Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise FileExistsError when running multiple workers #2039

Closed
elnikkis opened this issue Feb 24, 2017 · 4 comments
Closed

Raise FileExistsError when running multiple workers #2039

elnikkis opened this issue Feb 24, 2017 · 4 comments
Labels

Comments

@elnikkis
Copy link
Contributor

elnikkis commented Feb 24, 2017

When running multiple workers, mkdir in FileSystemTarget(LocalTarget).temporary_path() raises FileExistsError sometimes.

My output files of parallel tasks have same parent directory.
It seems that "raise_if_exists=False" of LocalFileSystem.mkdir does not work properly when running multiple workers.

test_luigi_bug.py:

import os.path
from subprocess import run
import luigi


class TestTask1(luigi.Task):
    param = luigi.IntParameter()

    def requires(self):
        return None

    def output(self):
        return luigi.LocalTarget(os.path.join('data', 'test_data', 'task1', 'sample{}.txt'.format(self.param)))

    def run(self):
        with self.output().temporary_path() as temp_output_path:
            run(['touch', temp_output_path])


class TestTask2(luigi.WrapperTask):
    def requires(self):
        return [TestTask1(param=i) for i in range(10)]

I execuded this command:

PYTHONPATH='.' luigi --local-scheduler --module test_luigi_bug TestTask2 --workers 4 > luigi_bug.txt 2>&1

and I got this output:
luigi_bug.txt

Version

$ pip list | grep luigi
luigi (2.6.0)
$ python --version
Python 3.6.0
@kzuberi
Copy link

kzuberi commented Apr 4, 2017

I am also encountering this (luigi 2.6.1), looks like a race here when the earlier check for path existence had returned false but another task creates the path first, os.makedirs() call fails due to the leaf folder existing.

@Tarrasch
Copy link
Contributor

Tarrasch commented Apr 5, 2017

Ah, yes, the bad things with non-atomicity. Luckily this isn't super-severe as next luigi run the tasks will be retried. Anyhow, if we can make it atomic it would be great. :)

@kalvdans
Copy link
Contributor

kalvdans commented Apr 5, 2017

In Python 3.2 and above, we can use the exist_ok=True flag to os.makedirs(). In Python 2.7 I guess we have to catch the error, check errno, and reraise if errno != EEXIST.

elnikkis added a commit to elnikkis/luigi that referenced this issue Apr 20, 2017
elnikkis added a commit to elnikkis/luigi that referenced this issue Apr 21, 2017
FileExistsError has the errno attribute.
dlstadther pushed a commit that referenced this issue Sep 19, 2017
* Fix os.makedirs issue in multithreading (#2039)

* Remove unnecessary attribute checking

FileExistsError has the errno attribute.

* Reduce complexity
@stale
Copy link

stale bot commented Jul 31, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If closed, you may revisit when your time allows and reopen! Thank you for your contributions.

@stale stale bot added the wontfix label Jul 31, 2018
@stale stale bot closed this as completed Aug 14, 2018
This was referenced Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants