Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobs: adding job class #117

Merged
merged 2 commits into from
Apr 2, 2019
Merged

Conversation

roksys
Copy link
Contributor

@roksys roksys commented Mar 25, 2019

  • jobManager super class is responsible for job creation,
    execution/submission, deletion/stop and etc. Thic class
    should be inherited by child classes of specific backend
    (K8s, HTCondor, Slurm and etc.) Closes multiple job backend support #118

Co-authored-by: Diego Rodriguez Rodriguez [email protected]
Signed-off-by: Rokas Maciulaitis [email protected]

@roksys roksys added this to the v0.5.0 milestone Mar 25, 2019
@ghost ghost added the Status: in review label Mar 25, 2019
@roksys roksys force-pushed the job-class-abstraction branch from 9407aa5 to 3eee834 Compare March 25, 2019 15:50
return result
return wrapper

def before_submission(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/submission/execution/g everywhere? Seems more natural to take the r-j-controller's internal point of view and talk about executing jobs that were "already submitted" to it by the r-w-engine. E.g. thinking of k8s or docker execution scenarios irst. For HPC/HTC scenarios, the jobs will be sent (i.e. submitted) elsewhere indeed, so e.g. from SLURM's point of view there will be another submission, however from the r-j-c's internal point of view this is just a way of executing the jobs it was asked to run; the r-j-c will just select to proxy them to another backend. (In other words, a "primary submission" is happening between r-w-e and r-j-c, and it is only a "secondary submission" that is happening between r-j-c and SLURM for some backends, so to speak.) Hence my preference for speaking about executing rather then submitting jobs inside r-j-c in general.

@roksys roksys force-pushed the job-class-abstraction branch 2 times, most recently from 1ffee4e to ad6e794 Compare March 27, 2019 10:53
* jobManager super class is responsible for job creation,
  execution/submission, deletion/stop and etc. Thic class
  should be inherited by child classes of specific backend
  (K8s, HTCondor, Slurm and etc.)

Signed-off-by: Rokas Maciulaitis <[email protected]>
@roksys roksys force-pushed the job-class-abstraction branch 3 times, most recently from 8cdc3a1 to 5baa2e0 Compare March 28, 2019 16:52
return wrapper

def before_execution(self):
"""Before job submission hook."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that @lukasheinrich was trying out git-base syncing of needed input/output files instead of rsync-based syncing

@roksys roksys force-pushed the job-class-abstraction branch 3 times, most recently from 1467251 to 9ec8795 Compare March 29, 2019 14:17
@diegodelemos
Copy link
Member

Works locally and looks good 👍 we just need to make Travis happy, polish a couple of things and create issues for what needs to be done after:

For now:

  • Make Travis build pass, there are a couple of isort problems and we need to amend tests (for instance, the after execution hook is now not called so test will fail).
  • I've been checking why we do not use the JobStatus enum when writing to DB, we use the name as a string, and I can't find it so if possible we should use the enum so it is more readable and less error prone.
  • Clean up the k8s.py module removing the functionality that has been moved to kubernetes_job_manager.py.
  • Polish documentation removing HTCondor and adding maybe a short description of Kubernetes implementation.

For later, to translate into issues:

  • Refactoring of the k8s_watch_jobs. It is not clear yet how we are going to implement the other backends, if directly calling from RJC or creating Kubernetes jobs which are capable of running against specific backends. In the case of calling backend directly from RJC, we will need a job watcher per backend (i.e. htcondor_watch_jobs) so this implementation should serve as reference.
  • Manage Kubernetes objects as we do in RWC, for example here, with the official library instead of using dicts.
  • Plug after_execution_hook to k8s_watch_jobs so we actually execute it after the job is completely done.

@roksys roksys force-pushed the job-class-abstraction branch 7 times, most recently from 6c52d82 to 0d21b8d Compare April 2, 2019 12:22
@roksys roksys force-pushed the job-class-abstraction branch from 0d21b8d to cb9d4d0 Compare April 2, 2019 12:31
@diegodelemos diegodelemos merged commit cb9d4d0 into reanahub:master Apr 2, 2019
@ghost ghost removed the Status: in review label Apr 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants