Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #657 - Refactor Jobs interface #686

Merged
merged 26 commits into from
Aug 14, 2018

Conversation

atilag
Copy link
Member

@atilag atilag commented Jul 25, 2018

  • Now there's only one constructor (no classmethod constructors).
  • Now there's a new submit() method that needs to be called in order
    to send the job to the backend, in both IBMQJob and LocalJob for
    consistency.
  • Removed exeception() property. Now if there's something wrong
    while submitting, checking for the status, or results, an exception
    will be thrown in this very moment.
  • Calling IBMQJob.id property will block until we have an ID from the
    API server.
  • IBMQjob.status() has changed. Now is a method and just returns the
    JobStatus enum. If there's still no job ID, it will return the
    current status immediately, so calling properties like done(),
    running(), queued(), etc, will return False.
  • Added new method: IBMQJob.queue_position() the will be populated
    during IBMQJob.status() and will return the position in the queue
    if the job has been queued.
  • Some minor internal refactorig to improve readability
  • Changed all tests accordingly to the new behavior

Fixes #657.

Details and comments

There are some more tests I'd like to add before merging this, but the functionality is finished.

if self._id is None:
if self._future is None:
raise JobError("You have to submit before asking for status or results!")
submit_info = self._future.result(timeout=60)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if a shorter timeout would be sufficient here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, I guess it depends, for example if the user forgot to use job.done() to check for completion, then it won't be enough. But if she went through the common workflow then yeah, we can lower the timeout by a lot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about passing this as a parameter defaulting in 60.

Returns the position in the server queue

Returns:
Number: Position in the queue. 0 = No queued.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 = Not queued yet.

if self._id is None:
if self._future is None:
raise JobError("You have to submit before asking for status or results!")
submit_info = self._future.result(timeout=60)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can actually make this timeout user configurable by passing it in as an optional parameter to __init__

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this should be user configurable, but our current users won't instantiate Jobs manually... I think that in general we need to come up with a global solution for passing user configurations to "qiskit" as a whole, where some of the parameters will be this timeout.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, I would make it a parameter of the method only.


def _create_job_from_circuit(circuit):
""" Helper function that creates a special Job required by the API, from a circuit """
job = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't blocking but "Job" in the code tends to indicate something which derives from BaseJob which can be confusing. Perhaps we could call api-type job data something else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, makes sense.

return LocalJob(self._run_job, qobj)
local_job = LocalJob(self._run_job, qobj)
local_job.submit()
return local_job

def _run_job(self, qobj):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should simplify all the non-async calls in the local backends to just _run(self, qobj)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it's kind of repetitive.. I can change this.

@atilag atilag force-pushed the jobs_simplification branch from bfdbd95 to cb3c478 Compare July 27, 2018 14:47
atilag added 4 commits July 28, 2018 16:57
* Now there's only one constructor (no classmethod constructors).
* Now there's a new submit() method that needs to be called in order
  to send the job to the backend, in both IBMQJob and LocalJob for
  consistency.
* Removed exeception() property. Now if there's something wrong
  while submitting, checking for the status, or results, an exception
  will be thrown in this very moment.
* Calling IBMQJob.id property will block until we have an ID from the
  API server.
* IBMQjob.status() has changed. Now is a method and just returns the
  JobStatus enum. If there's still no job ID, it will return the
  current status immediately, so calling properties like done(),
  running(), queued(), etc, will return False.
* Added new method: IBMQJob.queue_position() the will be populated
  during IBMQJob.status() and will return the position in the queue
  if the job has been queued.
* Some minor internal refactorig to improve readability
* Changed all tests accordingly to the new behavior
  submiting the job to the API server.
* Added JobError exception
* Minor refactoring
* Test fixing
* Better naming variables
* Typos
@atilag atilag force-pushed the jobs_simplification branch from 5d2e034 to f8fecb4 Compare July 28, 2018 14:58
@atilag atilag changed the title [WIP] Issue 657 - Refactor Jobs interface Issue 657 - Refactor Jobs interface Jul 31, 2018
@atilag
Copy link
Member Author

atilag commented Jul 31, 2018

Ready for final reviews! :)

@atilag atilag added this to the 0.6 milestone Jul 31, 2018
@atilag atilag changed the title Issue 657 - Refactor Jobs interface Issue #657 - Refactor Jobs interface Aug 1, 2018
if something wrong happened to the Job, it's server responsability to answer an error response
to a status query from Qiskit.
But, if the exception is thrown before having a valid Job ID, there's nothing we can do,
with this Job instance, as we cannot query the server for information using this Job instance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the detailed docstring for the class, it is something we should expand to other parts of the codebase at some point 🎉

... which prompted me to pick up a nit which probably was overlooked in previous PRs: can you remove the

_final_states (list(JobStatus)): terminal states of async jobs

line a bit down below? That attribute seems not to be present anymore.

pass
self._creation_date = submit_info.get('creationDate')
self._status = JobStatus.QUEUED
self._id = submit_info.get('id')


def _reorder_bits(result):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not technically part of your changes, and more as a "wishlist" item: can you check line 511 in this function:

raise ResultError("creg sizes don't add up in result header.")

It seems ResultError expects a dict instead of a string: maybe it would make sense either to throw a JobError if it makes sense conceptually, or to revise the argument?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, a JobError makes sense here.

# TODO No need for this conversion, just use the new equivalent members above
if qobj is not None:
old_qobj = qobj_to_dict(qobj, version='0.0.1')
self._job_data = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we ensure that self._job_data is always present after instantiation, even if it is an empty dict of None or a similar measure? self.submit() accesses it directly assuming it is present, and even though it seems those lines are not really reachable if it is not present (thanks to the if at the beginning of the function), it will be great for maintenance to be 100% sure that all instances have the same attributes, and that they are defined in __init__.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, would make things clearer I guess.

if self._id is None:
if self._future is None:
raise JobError("You have to submit before asking for status or results!")
submit_info = self._future.result(timeout=60)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be needed to intercept the potential TimeoutError that might be raised by this line if the timeout is reached, even if unlikely? It seems it can ripple through the upwards and for example end up raising an uncontroled Exception when checking the .id property?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very common use-case could be that the user would like to distinguish between a JobError() and a Timeout(), as the latter might be a temporary problem that can be solved by retrying.
A solution could be re-raising a JobTimeoutError (which will inherit from JobError, so QISKitError), this will give the semantic meaning the user needs, and comply with our exception policy: "everything thrown from the SDK must inherit from QISKitError".
Does it make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good - however there seems it still might be possible that a property call raises an exception (ie. a call to job.id)? I think we tend to avoid it if possible, so if it is a concern, it might be just a matter of intercepting it during .id(self).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we will probably end up removing all properties and keep everything as methods, we are just discussing this right now.

@delapuente delapuente self-assigned this Aug 3, 2018
Copy link
Contributor

@delapuente delapuente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this PR includes a lot of changes and just considering the size I would say it's risky. However, I value the intention of simplifying the Job API and I think this is the correct direction. Let's discuss a little bit some of the changes. I'm afraid some of them could lead to patterns where a lot of network requests are done.

CHANGELOG.rst Outdated
@@ -33,6 +33,7 @@ Changed
- update the ``Qobj`` JSON schema. (#668, #677, #703, #709)
- update the local simulators for accepting ``Qobj`` as input. (#667)
- Use ``get_status_job()`` for checking IBMQJob status. (#641)
- Jobs refactoring for simplifying its usage and some minor bug fixes. (#686)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace its usage with their usage or refer to the ``Job`` API instead of Jobs to preserve the singular possessive.

@@ -44,77 +43,85 @@
class IBMQJob(BaseJob):
"""IBM Q Job class

This class represents the Jobs that will be executed on IBM Q backends
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for capitalizing Jobs here, change it to lowercase, please. Also, double check it is IBM Q and not IBM-Q, I'm not entirely sure to be honest.

This class represents the Jobs that will be executed on IBM Q backends
(simulators or real chips). It has some asynchronous properties, and some
blocking behavior. This is a common use-case example:
.. highlight::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a blank line before the directive.

except JobError as ex:
log("Something wrong happened!: {}".format(ex))

About the status in case of Error or Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the content and I agree with the reasoning but I'm not sure about the purpose of this clarification. What's the caveat, message or idea that the user should be aware of after reading these paragraphs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a clarification for the user so he knows exactly how our Error strategy works. I wanted to clearly state that receiving an Exception from a status() or result() call (or whatever other call that could imply a request to the server), doesn't mean that the submitted job has failed it's execution. So the user is expected to try again at some point. However, if we have a JobStatus.ERROR, we do know that the Job has finally failed, and that's a final state (this was not the case in the older implementation).

again about the status of our Job, that could be in state: queued, running or done.
if something wrong happened to the Job, it's server responsability to answer an error response
to a status query from Qiskit.
But, if the exception is thrown before having a valid Job ID, there's nothing we can do,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a blank line between paragraphs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm.. we haven't changed the issue being exposed. Does a blank like means that we are going to talk about something different? Maybe I should not open a new paragraph, and continue in the above one.. what do you think?

result = job.result()
self.assertEqual(result.get_status(), 'ERROR')
self.assertStatus(job, JobStatus.ERROR)
with self.assertRaises(JobError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this test, I understand the new behaviour is to throw when calling result() and something bad happens and that's great but, could you add new assertions to check that the new state of the job is the one expected? I think it could be ERROR but I'm not sure. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An Exception should not change the Job state. If result() raises is due to a Timeout, or due to something wrong happened while requesting the status/job info to the server, we don't really know in which state the Job could be. Is server responsibility to give us this state, because it could be in any state (except INITIALIZING).

Copy link
Contributor

@delapuente delapuente Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's leave this test as it is now and add another one testing that if there is a non-job-related error, the state of the job does not change. Ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm missing it but where is the test that checks the state of a job does not progress if there is a non-job-related error?

Copy link
Member Author

@atilag atilag Aug 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, too many things to review I guess, I've probably forgot it... but I thought that we were not finally doing this test, because it's kind of pointless/artificial, as a Job can actually progress if we are getting a non-job-related error, we just don't know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a reminder to myself, there is no way of getting an out-of-date status except it raises, in which case we don't get a new status at all. It makes sense to omit this test.

job = self.run_with_api(ThrowingInitializationAPI())

# TODO: Seems inconsistent, should throw while initializating?
job = self.run_with_api(ThrowingGetJobAPI())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should check the previous and new version instead of replacing the former with the new one? What do you think?

self.wait_for_initialization(job)
self.assertIsInstance(job.exception, ApiError)
self.assertStatus(job, JobStatus.ERROR)
""" If there's an exeception in the thread that will submit the job, it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two scenarios and both have to be tested. One is a throwing API. AFAIK, that means something uncontrolled happened such as a server error. That implies the job is not progressing. You can restore the previous test and change the final assertions to not check for job.exception, check for a raising JobError and check for the job is still initializing.

The other scenario is to test what happened if an error response (those of the form { 'error': '...' }) is returned. In that case, the job will progress to ERROR with no exception being thrown.

self.wait_for_initialization(job)
self.assertIsInstance(job.exception, IBMQJobError)
self.assertStatus(job, JobStatus.ERROR)
with self.assertRaises(JobError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not testing the same. The error should happen after initialization, when the first API request after run_job is done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now the status() method could rise. In this case: self.wait_for_initialization() is going to call job.status() to check that JobStatus.INITIALIZATION state is transitioned to something else:

...
    while job.status() == JobStatus.INITIALIZING:

If we have received an UNKNOWN state here, that might be caused by a temporary server error so we throw an exception.

self._future_exception = err
return {'error': str(err)}

# Fail fast!!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate to be the killjoy but remove this comment, please. :P

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gngngngn

atilag added 4 commits August 7, 2018 11:19
* Added a decortator for LocalJob class to check that a submit
  has been performed before calling to some of the methods.
* docstring fixes
* Added a JobTimeoutError exception
* Removed _is_commercial() method
* Fixed some tests logic
* Adding helper functions to improve readability
* Fixed some docstrings
* Improved code readability
* Added a test-case simulating API server temporary failure
* Added an error_msg() method for IBMQJobs, that will have an error
  message when the Job has errored.
* Refactored localjob status()
* Updated docstrings
* Updated tests
Copy link
Contributor

@delapuente delapuente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent job! The code is in good shape and, assuming we are introducing breaking changes, I think the new API looks lean and clear.

Aside from these comments, I've detected others regarding docstrings, comments and messages but we will pass through them at the end of the review.

Thank you for your work.

'shots': old_qobj['config']['shots'],
'max_credits': old_qobj['config']['max_credits']
}
self._future_exception = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always read this property as "the exception that happens in a future time", what about _future_captured_exception?

"""
super().__init__()
self._qobj = qobj
self._job_data = None
# TODO No need for this conversion, just use the new equivalent members above
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this TODO one line below to keep it close to the conversion, please.

old_qobj = qobj_to_dict(qobj, version='0.0.1')
self._job_data = {
'circuits': old_qobj['circuits'],
'hpc': None if 'hpc' not in old_qobj['config'] else old_qobj['config']['hpc'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use 'hpc': old_qobj['config'].get('hoc'). The dictionary method get will return None if the parameter is not a key of the dictionary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ouch, love it.

self._job_data = {
'circuits': old_qobj['circuits'],
'hpc': None if 'hpc' not in old_qobj['config'] else old_qobj['config']['hpc'],
'seed': old_qobj['circuits'][0]['config']['seed'], # TODO <-- [0] ???
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just guessing, I think our backends don't understand having different seeds, one per experiment so we use the one from the first experiment, thus the [0]. Now we are moving towards Qobj 1.0.0, we should double check if the backends are ready to handle several seeds. Can you confirm this and clarify the TODO?

Copy link
Member Author

@atilag atilag Aug 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test would have failed, right? ... Confirming it could probably take time as there are other people involved. Let's write this down and solve it in a follow up.

self._exception = self._future_submit.exception()
self._status_msg = str(self.exception)
# TODO: This seems to be an inconsistency in the API package.
self._api_error_msg = api_job.get('error') or api_job.get('Error')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we use _api_error_message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! msg is one of those acronyms that are globally accepted :)
@diego-plan9 will break! Shall we change to this extremely verbose and time consuming word message? Or leave it's compact, readable and shiny acronym globally accepted by everyone except one: msg...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll cheer for both of you equally ⚔️

Copy link
Contributor

@delapuente delapuente Aug 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I give up. :P I proposed it because the public method is named error_message but it's up to you.


def test_status_flow_for_invalid_job(self):
job = self.run_with_api(UnableToInitializeAPI())
self.assertStatus(job, JobStatus.INITIALIZING)
# TODO This is very risky, if the status changes to ERROR too fast
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you file a but for trying to remove race conditions while testing job state flow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Issue #775

@@ -251,18 +258,39 @@ def test_only_final_states_cause_datailed_request(self):

with mock.patch.object(self._current_api, 'get_job',
wraps=self._current_api.get_job):
_ = job.status
_ = job.status()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the _ = part. It was needed because the linter did not understand that reading a member would cause a side effect so it forced me to do something with the value.

we should be able to retrieve the job"""

job = self.run_with_api(TemporaryThrowingServerButJobFinishedAPI())
job._wait_for_submission()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use an internal for waiting until initialization. Instead, use self.wait_for_initialization.

Copy link
Member Author

@atilag atilag Aug 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is the first thing what I thought, but unfortunately wait_for_initialization() doesn't fit here (it calls job.status() for example). I just want to make sure that I have an ID, so wrapping this internal in a function like:

def wait_for_submission():
     job._wait_for_submission()

... felt like overkilling.

{'status': 'COMPLETED'}
]

def get_status_job(self, job_id):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't need to override this method for this specific API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I need it. BaseFakeApi.get_status_job() calls get_job(), which, btw, is not what happens in the real API. Calling get_job() means, that, for example, job.status() will end up calling get_job() so throwing an ApiError, making impossible to test what I wanted here.

result = job.result()
self.assertEqual(result.get_status(), 'ERROR')
self.assertStatus(job, JobStatus.ERROR)
with self.assertRaises(JobError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm missing it but where is the test that checks the state of a job does not progress if there is a non-job-related error?

* Fixing tests
* Refactored some places to improve readability
Copy link
Contributor

@delapuente delapuente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm excited about merging this PR. Thank you @atilag for all the hard work. 👍

@delapuente delapuente merged commit 09af5d7 into Qiskit:master Aug 14, 2018
@@ -57,6 +58,7 @@ def setUp(self, qe_token, qe_url):
qc.measure(qr, cr)
self._qc = qc
self._provider = IBMQProvider(qe_token, qe_url)
self._local_provider = LocalProvider()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._local_provider is not used in any method. Can this line be removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! you are right. It shouldn't be there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simplify Job interface
5 participants