Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor queue resume/stop/start #786

Merged
merged 4 commits into from
Feb 12, 2020
Merged

Refactor queue resume/stop/start #786

merged 4 commits into from
Feb 12, 2020

Conversation

sssoleileraaa
Copy link
Contributor

Description

This is purely a refactor.

  • updated and added more documentation on how we pause, stop, and resume queues.
  • removed unnecessary code:
    • removed resume_queues in on_authentication_success since we were already starting the queues in that method
    • removed clear_error_status from all the job success handlers and added it to the Controller's logout and resume_queues methods.
  • split on_queue_paused into on_file_download_queue_paused and on_main_queue_paused so we could have clear debug logs on which queue we're pausing
  • updated logs and log levels for clarity

@sssoleileraaa
Copy link
Contributor Author

Whoops meant for this to be a WIP PR. Submitting for early review, please focus on the docstrings and lemme know if they make sense. In the meantime I'll work on tests and updating architecture docs.

@sssoleileraaa sssoleileraaa changed the title Fixup queue resume Refactor queue resume/stop/start Feb 8, 2020
@sssoleileraaa
Copy link
Contributor Author

This is now ready for review and can be merged after #787 is done.

super().__init__()
self.api_client = api_client
self.session_maker = session_maker
# If the maxsize provided to PriorityQueue is 0, there will be no upper bound
self.queue = PriorityQueue(maxsize=size) # type: PriorityQueue[Tuple[int, ApiJob]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we were passing maxsize only for the metadata queue (which had maxsize=1) we can also remove this maxsize functionality if we want

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(can be done in a followup or at a later date)

also handling the Full exception in the job adding methods is only relevant when the queue can be full, so that can also be removed

@sssoleileraaa
Copy link
Contributor Author

i removed support for maxsize and updated logs in order to make them less confusing.

Now when timeout occurs

2020-02-10 18:43:49,855 - securedrop_client.queue:236(enqueue) DEBUG: Added SendReplyJob to main queue
2020-02-10 18:43:49,856 - securedrop_client.queue:127(process) DEBUG: SendReplyJobTimeoutError: test
2020-02-10 18:43:49,865 - securedrop_client.queue:127(process) DEBUG: SendReplyJobTimeoutError: test
2020-02-10 18:43:49,871 - securedrop_client.logic:787(on_reply_failure) DEBUG: b082aa29-0400-4bee-be7b-f10faf32a9e6 failed to send
2020-02-10 18:43:49,872 - securedrop_client.queue:201(on_main_queue_paused) DEBUG: Paused main queue
2020-02-10 18:43:49,876 - securedrop_client.logic:787(on_reply_failure) DEBUG: b082aa29-0400-4bee-be7b-f10faf32a9e6 failed to send
2020-02-10 18:43:49,876 - securedrop_client.queue:201(on_main_queue_paused) DEBUG: Paused main queue

instead of...

2020-02-10 18:44:30,051 - securedrop_client.queue:221(enqueue) DEBUG: Adding job to main queue
2020-02-10 18:44:30,054 - securedrop_client.queue:141(process) DEBUG: Job <securedrop_client.queue.RunnableQueue object at 0x7f66f41bec30> raised an exception: SendReplyJobTimeoutError: test
2020-02-10 18:44:30,056 - securedrop_client.queue:129(process) DEBUG: Paused queue
2020-02-10 18:44:30,057 - securedrop_client.queue:123(process) DEBUG: Beginning queue processing loop
2020-02-10 18:44:30,058 - securedrop_client.queue:141(process) DEBUG: Job <securedrop_client.queue.RunnableQueue object at 0x7f66f41bec30> raised an exception: SendReplyJobTimeoutError: test
2020-02-10 18:44:30,059 - securedrop_client.queue:129(process) DEBUG: Paused queue
2020-02-10 18:44:30,061 - securedrop_client.queue:123(process) DEBUG: Beginning queue processing loop
2020-02-10 18:44:30,062 - securedrop_client.queue:141(process) DEBUG: Job <securedrop_client.queue.RunnableQueue object at 0x7f66f41bec30> raised an exception: SendReplyJobTimeoutError: test
2020-02-10 18:44:30,063 - securedrop_client.queue:129(process) DEBUG: Paused queue
2020-02-10 18:44:30,067 - securedrop_client.logic:794(on_reply_failure) DEBUG: 89a116ae-a156-449f-9492-11dac632703a failed to send
2020-02-10 18:44:30,075 - securedrop_client.logic:794(on_reply_failure) DEBUG: 89a116ae-a156-449f-9492-11dac632703a failed to send
2020-02-10 18:44:30,078 - securedrop_client.logic:794(on_reply_failure) DEBUG: 89a116ae-a156-449f-9492-11dac632703a failed to send

Now when we resume

2020-02-10 18:43:53,125 - securedrop_client.queue:217(resume_queues) DEBUG: Resuming main queue
2020-02-10 18:43:53,125 - securedrop_client.queue:220(resume_queues) DEBUG: Resuming download queue

instaed of...

2020-02-10 18:44:33,522 - securedrop_client.queue:123(process) DEBUG: Beginning queue processing loop

The reason we see the SendReplyJobTmeoutError twice is because we re-add the SendReplyJob and it can get processed before the PauseQueueJob, which has a higher priority, even though we programmatically add the pause job before it. As followup we could work on a way to make sure PauseQueueJobs are enqueued before any other jobs are added. Curious what your thoughts are here @redshiftzero . Otherwise this is ready for re-review.

@sssoleileraaa
Copy link
Contributor Author

See a CI error, which I'll have to investigate tomorrow unless someone gets to it first:

make[2]: Entering directory '/home/circleci/debbuild/packaging/securedrop-client'
dh_virtualenv \
	--python /usr/bin/python3 \
	--setuptools \
	--use-system-packages \
	--index-url https://pypi.securedrop.org/simple \
	--extra-pip-arg "--ignore-installed" \
	--extra-pip-arg "--no-deps" \
	--extra-pip-arg "--no-cache-dir" \
	--requirements build-requirements.txt
usage: virtualenv [--version] [-v | -q] [--discovery {builtin}] [-p py] [--creator {builtin,cpython3-posix,venv}] [--seeder {app-data,pip}] [--no-seed] [--activators comma_separated_list] [--clear] [--system-site-packages]
                  [--symlinks | --copies] [--download | --no-download] [--extra-search-dir d [d ...]] [--pip version] [--setuptools version] [--wheel version] [--no-pip] [--no-setuptools] [--no-wheel] [--clear-app-data] [--prompt prompt]
                  [-h]
                  [dest]
virtualenv: error: argument --setuptools: expected one argument
Traceback (most recent call last):
  File "/usr/bin/dh_virtualenv", line 110, in <module>
    sys.exit(main() or 0)
  File "/usr/bin/dh_virtualenv", line 87, in main
    deploy.create_virtualenv()
  File "/usr/lib/python2.7/dist-packages/dh_virtualenv/deployment.py", line 157, in create_virtualenv
    subprocess.check_call(virtualenv)
  File "/usr/lib/python2.7/subprocess.py", line 190, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['virtualenv', '--system-site-packages', '--setuptools', '--python', '/usr/bin/python3', 'debian/securedrop-client/opt/venvs/securedrop-client']' returned non-zero exit status 2
make[2]: *** [debian/rules:7: override_dh_virtualenv] Error 1
make[2]: Leaving directory '/home/circleci/debbuild/packaging/securedrop-client'
make[1]: *** [debian/rules:4: binary] Error 2
make[1]: Leaving directory '/home/circleci/debbuild/packaging/securedrop-client'
dpkg-buildpackage: error: fakeroot debian/rules binary subprocess returned exit status 2
make: *** [Makefile:13: securedrop-client] Error 2

Exited with code exit status 2

@sssoleileraaa
Copy link
Contributor Author

(rebased after ci fix)

@sssoleileraaa
Copy link
Contributor Author

rebased and retested, everything still works!

Emit the paused signal if the main queue has been paused.
'''
logger.debug('Paused main queue')
self.paused.emit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a little hard to follow because both ApiJobQueue and RunnableQueue have signals called paused on them which do slightly different things: the first (signal defined on ApiJobQueue) signals the Controller object "hey i paused the queues, update the GUI", the other (signal defined on RunnableQueue) is from a queue to signal ApiJobQueue "hey i hit a consistent request timeout, let's stop the queues and let the GUI know".

However it looks like these queues are pausing independently: one can be stopped while the other is running. Meanwhile the GUI shows that the server is unreachable. Is this intentional? Or is the idea that if one queue is stopped due to network failure the other will eventually stop too?

lmk if you think I'm missing something here

Copy link
Contributor Author

@sssoleileraaa sssoleileraaa Feb 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is the idea that if one queue is stopped due to network failure the other will eventually stop too?

yes, this was the idea

the way it has been implemented is that there is a paused signal in RunnableQueue, which maintains a priority queue, and will pause when it encounters a request timeout error. meanwhile, other queues do not know about this, and cannot know unless ApiJobQueue, which really should be renamed to ApiJobQueueManager, tells them that another queue was paused. To keep things simple, the queue manager just sends a signal back to the GUI so the user can retry. It could also tell the other queue to pause by enqueuing a PauseQueueJob. I think this is risky because the other queue might at the same time be adding a PauseQueueJob because it too has encountered a timeout error.

i think changing the behavior of how this works should be discussed further and if we decide to change it i would advocate for doing so outside of this pr in order to leave it as a refactor-only pr

Copy link
Contributor Author

@sssoleileraaa sssoleileraaa Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing we could do as a small additional refactor in this PR is replace using a signal from RunnableQueue with using a callback that ApiJobQueue passes to it, so this:

--- a/securedrop_client/queue.py
+++ b/securedrop_client/queue.py
@@ -53,16 +53,15 @@ class RunnableQueue(QObject):
         ReplyDownloadJob: 17,
     }
 
-    # Signal that is emitted when processing stops
-    paused = pyqtSignal()
-
     # Signal that is emitted to resume processing jobs
     resume = pyqtSignal()
 
-    def __init__(self, api_client: API, session_maker: scoped_session) -> None:
+    def __init__(self, api_client: API, session_maker: scoped_session, on_pause_callback) -> None:
         super().__init__()
         self.api_client = api_client
         self.session_maker = session_maker
+        self.on_pause_callback = on_pause_callback
+
         self.queue = PriorityQueue()  # type: PriorityQueue[Tuple[int, ApiJob]]
         # `order_number` ensures jobs with equal priority are retrived in FIFO order. This is needed
         # because PriorityQueue is implemented using heapq which does not have sort stability. For
@@ -113,7 +112,7 @@ class RunnableQueue(QObject):
             priority, job = self.queue.get(block=True)
 
             if isinstance(job, PauseQueueJob):
-                self.paused.emit()
+                self.on_pause_callback()
                 return
 
             try:
@@ -154,8 +153,9 @@ class ApiJobQueue(QObject):
         self.main_thread = QThread()
         self.download_file_thread = QThread()
 
-        self.main_queue = RunnableQueue(api_client, session_maker)
-        self.download_file_queue = RunnableQueue(api_client, session_maker)
+        self.main_queue = RunnableQueue(api_client, session_maker, self.on_main_queue_paused)
+        self.download_file_queue = RunnableQueue(
+            api_client, session_maker, self.on_file_download_queue_paused)
 
         self.main_queue.moveToThread(self.main_thread)
         self.download_file_queue.moveToThread(self.download_file_thread)

Copy link
Contributor

@redshiftzero redshiftzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your patience, I'm gonna approve this, we can continue to test the pause/unpause behavior more on master and decide what to do regarding the queues pausing independently (I agree that it's been this way since the beginning, I'm only just realizing it now)

@redshiftzero redshiftzero merged commit c5c4bb0 into master Feb 12, 2020
@redshiftzero redshiftzero deleted the fixup-queue-resume branch February 12, 2020 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants