Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contrib/ftp: Clean up temporary files #2755

Merged
merged 4 commits into from
Aug 19, 2019
Merged

Conversation

jethron
Copy link
Contributor

@jethron jethron commented Aug 8, 2019

Description

On Python3, the target now stores the temporary files in a tempfile.TemporaryDirectory; once the target is destroyed/garbage collected the TemporaryDirectory should be too, and remove itself and all its contents, so the temporary files don't hang around any more using disk space. A better solution would be to rewrite the module to actually try and stream the data, but this seemed nontrivial with ftplib (though possibly easier with pysftp's Connection.getfo()).

Motivation and Context

I started using the contrib/ftp targets only to discover that it doesn't actually 'stream' data as the documentation implies, but it downloads entire files to a file in /tmp which doesn't get cleaned up automatically, even after tasks complete. That worker's disk filled up and took it down temporarily.

Have you tested this? If so, how?

I'm running a similar patch and it works for me, but that one didn't keep the old behaviour for Python2 like this one does. Not sure if there's a good solution that easily works for both, but since Py2 is EOL in a few months and this behaviour is already ~5 years old, I didn't bother too much.

Copy link
Collaborator

@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution! Could you verify that this code section is covered by unittests?

I added the SFTP support many years ago. Candidly, I didn't do a very good job. So feel free to improve upon it more! :)

luigi/contrib/ftp.py Outdated Show resolved Hide resolved
@jethron
Copy link
Contributor Author

jethron commented Aug 15, 2019

Thanks @dlstadther I've added checks to a unittest and found a bug in the process. Yay tests. :)

Ran with pipenv run python -m _test_ftp & and then pipenv run python -m unittest _test_ftp.py after installing pyftpdlib.

For future people testing with virtualenvs, if you don't want to run the FTP server as root (required to bind to port 21) you can update all the port references to something higher like 2100 and it works as a normal user.

Also let test work in Python 3.
Also make the mtime offset larger because it fails in my timezone.
Copy link
Collaborator

@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@dlstadther dlstadther merged commit 9e0061d into spotify:master Aug 19, 2019
GoodDok pushed a commit to GoodDok/luigi that referenced this pull request Aug 23, 2019
* contrib/ftp: Clean up temporary files

* contrib/ftp: generate temporary filenames once

* contrib/ftp: slightly cleaner temporary directory names

* contrib/ftp: add testcase for tempfile cleanup

Also let test work in Python 3.
Also make the mtime offset larger because it fails in my timezone.
GoodDok pushed a commit to GoodDok/luigi that referenced this pull request Aug 23, 2019
* contrib/ftp: Clean up temporary files

* contrib/ftp: generate temporary filenames once

* contrib/ftp: slightly cleaner temporary directory names

* contrib/ftp: add testcase for tempfile cleanup

Also let test work in Python 3.
Also make the mtime offset larger because it fails in my timezone.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants