Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for the Docker Python SDK #2158

Merged
merged 50 commits into from
Oct 15, 2017
Merged

Support for the Docker Python SDK #2158

merged 50 commits into from
Oct 15, 2017

Conversation

apierleoni
Copy link
Contributor

@apierleoni apierleoni commented Jun 15, 2017

Description

Hi there, we did some work to run a data processing pipeline using Docker containers with Luigi using the Docker Python SDK and we would like to contribute it back.

Motivation and Context

These PR allows to use the Docker Python SDK to run Docker containers instead of launching them from a subprocess.
The main advantage over calling an external process is that all the communications done with the Docker daemon are through the API and not a subprocess, being able to handle exceptions properly and potentially connect to remote APIs (we did not test this scenario)
It also solve non trivial problems with the docker SDK like mounting a single file, and makes sure that there is enough space in the container to run the job by mounting a temporary directory from the host , avoiding the container size limit of 10Gb.

Have you tested this? If so, how?

Unit tests are included. But they require a Docker daemon to run. Not sure it is available in Travis.
We are currently running this in our staging environment and we are ready to run production pipelines with this contrib.

@dlstadther
Copy link
Collaborator

@apierleoni It would be great if you could separate "companies who use luigi" into another PR - that one can be merged quickly.

Also, looks like you've had some Travis errors; could you take a look and resolve? Thanks!

@elipapa
Copy link

elipapa commented Sep 11, 2017

@dlstadther the travis errors are not ours and apply to every PR. @Tarrasch has already acknowledged that they have yet to be fixed (see comment above: june 23)

on separating the PR: we have no problem waiting for this one to be merged for our "companies with luigi" link to show up. We'd rather keep them in the same PR for simplicity and just wait for the review for it to be merged.

@dlstadther
Copy link
Collaborator

@elipapa Those errors were resolved elsewhere. Your current errors are flake8.

@apierleoni
Copy link
Contributor Author

flake8 errors fixed now, all tests passed on travis

tox.ini Outdated
@@ -10,6 +10,7 @@ deps=
moto<1.0
HTTPretty==0.8.10
nose<2.0
docker
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to specify a specific version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes thanks, well spotted!

Copy link
Collaborator

@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no objections, I'm ok to merge this.

Copy link

@drobakowski drobakowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @apierleoni thanks a lot for all this work! I'm using this DockerTask for quite some time and it is working fine overall so far. Even though I had to adapt some parts to make it work properly for our use case. I will start with this first small review because I think the test are not working correctly when they are fixed. See my comments (volumes to binds) in the tests. After that I will add some additional comments and suggestions, for some later improvements or extensions.

image = "busybox"
name = "MountLocalFileAsVolume"
# volumes= {'/tmp/local_file_test': {'bind': local_file.name, 'mode': 'rw'}}
volumes = [local_file.name + ':/tmp/local_file_test']

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be renamed to binds?

dummyopt = luigi.Parameter()
image = "busybox"
name = "MountLocalFileAsVolumeWithParam"
volumes = [local_file.name + ':/tmp/local_file_test']

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be renamed to binds?

name = "MountLocalFileAsVolumeWithParamRedef"

@property
def volumes(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be renamed to binds?

# derive volumes (ie. list of container destination paths) from
# specified binds
self._volumes = [b.split(':')[1] for b in self._binds]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One addition I had to make to get this properly working in our scenario was to include a custom instance attribute self._complete = False and set this to True when the container exited.

**self.container_options)
self._client.start(container['Id'])

exit_status = self._client.wait(container['Id'])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+self._complete = True

I'm not sure anymore if I included this simply to True on purpose or if this should be set in subject to the exit_status.

In our use case we're chaining multiple DockerTask consecutive so that each task depend on the previous one and its output. As I'm quite new to working with luigi I don't have that much deep knowledge and experience so far, but I think I had some problems with the luigi scheduler while the scheduler was trying to resolve and start the DockerTask process chain properly. After including the previous changes and overwriting the complete methods, the chain worked fine for me. As far as I remember correctly, I've even got the tests to run fine after renaming volumes to binds. Perhaps this is use case specific, but maybe someone with better knowledge of luigi can overlook this better.

class ImportProcess(DockerTask):
    ....
    
    def complete(self):
        return self.output().exists() or self._complete
    ...

Of course this could also be implemented in the DockerTask class directly with maybe some modification like this:

class DockerTask(luigi.Task):
    ....
    def complete(self):
        return super(DockerTask, self).complete() or self._complete

@drobakowski
Copy link

I do have two more smaller additions, but in order to keep this PR simple, I will create a PR after this one will be merged. Thanks again for DockerTask, this helps us a lot! 👍

@dlstadther
Copy link
Collaborator

@apierleoni Where does this PR stand given @drobakowski 's review and comments?

@apierleoni
Copy link
Contributor Author

@drobakowski
I am glad DockerTask was useful for you, and than you for the review.
Yes I have renamed volume to binds, and changed default behaviour for tmp directory mounting. Tests are now passing in my system.
I have added docker to the services in travis, so now the test should run in CI

Regarding the second issue, I am not 100% sure I understood what you want to do with the self._complete attribute. The way we use DockerTask is to make sure that the Target is labeled as completed, so that Luigi will know if he needs to rerun the task or not.
Would you prefer to trigger an error in case the container quit status 0? how do you know it is not completed at that stage?

@dlstadther
tests should be improved now, maybe you can help here on what is the preferred way to store a task to be completed.

@apierleoni
Copy link
Contributor Author

some tests are failing for an error that looks related to boto non importing the google cloud platform module. this might not be limited to this PR
https://travis-ci.org/spotify/luigi/jobs/279538792#L565

@apierleoni
Copy link
Contributor Author

merged with current master, still failing for boto

@apierleoni
Copy link
Contributor Author

@dlstadther
I have fixed the errors with boto in the tests and resolved conflicts with master.
Should be clear to merge if you want.

@apierleoni
Copy link
Contributor Author

I guess the codecov issue is related with this https://github.com/codecov/support/issues/180

Copy link
Collaborator

@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I'm not concerned about that codecov thing. The tests pass, so let's merge and allow others the goodness of Luigi v1 DockerTask support!

@dlstadther dlstadther merged commit 4801fec into spotify:master Oct 15, 2017
@dlstadther
Copy link
Collaborator

Hey @apierleoni , do you have any insight into why two of the Docker tests started failing recently?

It'd be much appreciated if you could check it out! Thanks

@apierleoni
Copy link
Contributor Author

@dlstadther sorry for the late reply, is it still an issue? I cannot find the failing tests.

@dlstadther
Copy link
Collaborator

@apierleoni They've since been fixed!

This was referenced Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants