Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor async code synchronization #499

Merged
merged 6 commits into from
Feb 8, 2023
Merged

Conversation

ansasaki
Copy link
Contributor

@ansasaki ansasaki commented Jan 13, 2023

Use messages for synchronization instead of mutexes and condvar. Add
separate workers for keys, payload, and revocation execution, as well as
a worker to handle the revocations coming from ZeroMQ channel.

The workers hold the state locally instead of storing all data in server
context (QuoteData).

The keys worker holds the received U and K keys and is responsible for
combining them to obtain K. It also provides the K key when requested
by the /keys/verify endpoint to prove that the resulting K is correct.

When the U and V keys combination is successful (the HMAC verification
is successful), the keys worker wakes the payloads worker sending the
obtained K key and the encrypted payload.

The payloads worker is responsible for decrypting and running the
payload sent by the keys worker.

After decrypting and running the payload, the payloads worker will send
messages to the revocation worker and ZMQ worker to start listening for
revocations.

This also implements graceful shutdown on Ctrl + C (SIGINT) signal.
A dedicated task is spawned to receive the signal and stop the other workers.

Unit tests were implemented for the changes.

Fixes: #363, #447

keylime-agent/src/common.rs Outdated Show resolved Hide resolved
keylime-agent/src/common.rs Outdated Show resolved Hide resolved
keylime-agent/src/payloads.rs Outdated Show resolved Hide resolved
keylime-agent/src/payloads.rs Outdated Show resolved Hide resolved
keylime-agent/src/common.rs Show resolved Hide resolved
keylime-agent/src/common.rs Outdated Show resolved Hide resolved
keylime-agent/src/keys_handler.rs Outdated Show resolved Hide resolved
keylime-agent/src/keys_handler.rs Outdated Show resolved Hide resolved
keylime-agent/src/keys_handler.rs Outdated Show resolved Hide resolved
keylime-agent/src/keys_handler.rs Outdated Show resolved Hide resolved
keylime-agent/src/keys_handler.rs Outdated Show resolved Hide resolved
@ansasaki ansasaki force-pushed the async_refactor branch 3 times, most recently from c7ba7ef to 25df334 Compare January 18, 2023 13:10
@ansasaki ansasaki changed the title WIP: Refactor async code synchronization Refactor async code synchronization Jan 18, 2023
@ansasaki
Copy link
Contributor Author

It seems the change I introduced in e3ebc06 makes tarpaulin to hang (see xd009642/tarpaulin#1056).

I'll revert it for now.

@ansasaki
Copy link
Contributor Author

@ueno @aplanas @lkatalin Could you please review this?

@aplanas
Copy link
Contributor

aplanas commented Jan 30, 2023

It seems the change I introduced in e3ebc06 makes tarpaulin to hang (see xd009642/tarpaulin#1056).

I'll revert it for now.

I still see the error?

Jan 30 13:45:40.153 ERROR cargo_tarpaulin: Failed to get test coverage! Error: Failed to run tests: Error: Timed out waiting for test response

@ansasaki
Copy link
Contributor Author

It seems the change I introduced in e3ebc06 makes tarpaulin to hang (see xd009642/tarpaulin#1056).
I'll revert it for now.

I still see the error?

Jan 30 13:45:40.153 ERROR cargo_tarpaulin: Failed to get test coverage! Error: Failed to run tests: Error: Timed out waiting for test response

It seems this is back (sigh). It is difficult to debug this as I can't reproduce locally... I will investigate

keylime-agent/src/common.rs Outdated Show resolved Hide resolved
keylime-agent/src/revocation.rs Outdated Show resolved Hide resolved
@ansasaki ansasaki force-pushed the async_refactor branch 4 times, most recently from cda679d to c9f2aa4 Compare January 31, 2023 10:58
@ansasaki
Copy link
Contributor Author

@aplanas I did not find the root cause that makes tarpaulin to hang. It definitely is something related with the instrumentation for test coverage, as it works fine when running the tests without it. There are multiple related issues open upstream on tarpaulin repo.

The only thing I could do was to update the image used for testing to a newer version (from Fedora 34 to Fedora 37).

If the issue makes the tests to fail frequently, let's disable the tests that runs external binaries or move the coverage measurement to a separate job which is not required to pass.

@ansasaki ansasaki force-pushed the async_refactor branch 2 times, most recently from 954cd7d to 7f8b26b Compare February 1, 2023 11:59
Copy link
Contributor

@aplanas aplanas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what I see LGTM

@ansasaki
Copy link
Contributor Author

ansasaki commented Feb 1, 2023

@ueno @aplanas @lkatalin Could you please review/approve this?

@aplanas
Copy link
Contributor

aplanas commented Feb 2, 2023

@ueno @aplanas @lkatalin Could you please review/approve this?

AFAICT LGTM, yes

@ansasaki
Copy link
Contributor Author

ansasaki commented Feb 2, 2023

@THS-on Could you please review/approve this?

Copy link
Member

@THS-on THS-on left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General LGTM, but have not run it locally.
I have not worked with much Rust sync code, so I cannot really comment on that part.

keylime-agent/src/common.rs Show resolved Hide resolved
keylime-agent/src/main.rs Show resolved Hide resolved
@ansasaki
Copy link
Contributor Author

ansasaki commented Feb 8, 2023

@ashcrow @mbestavros @lukehinds @lkatalin @puiterwijk @mpeters @ueno

Could someone please review/approve this?

I believe we need to review/expand the reviewers who can approve changes to get merged. I'm trying to get this merged for weeks, but cannot get the required approval.

Also I would like to make changes to the repository (e.g. creating a template for issues, creating a template for PR checklist) but I do not have the required permission. Could someone with the right permissions please contact me?

@ansasaki ansasaki added bug Something isn't working enhancement labels Feb 8, 2023
Separate the ZeroMQ service from the worker task.

Signed-off-by: Anderson Toshiyuki Sasaki <[email protected]>
@ansasaki ansasaki force-pushed the async_refactor branch 2 times, most recently from 1f7da3d to 035b378 Compare February 8, 2023 17:11
@ashcrow
Copy link
Contributor

ashcrow commented Feb 8, 2023

I'm happy to approve given I see 3 approvals already and all outstanding comments have been resolved. My only caveat is that the Fedora tests failed. If this is a known issue then I'll add my approval, if not, let's get that fixed first 🙏

@lkatalin Is there a discussion already on going about approvers by chance?

Add the payloads module which contains all payload-related functions.

Signed-off-by: Anderson Toshiyuki Sasaki <[email protected]>
Use messages for synchronization instead of mutexes and condvar.  Add
separate workers for keys and payload execution, holding the state
locally instead of storing all data in server context (QuoteData).

The keys worker holds the received U and K keys and is responsible for
combining them to obtain K.  It also provides the K key when requested
by the /keys/verify endpoint to prove that the resulting K is correct.

When the U and V keys combination is successful (the HMAC verification
is successful), the keys worker wakes the payloads worker sending the
obtained K key and the encrypted payload.

The payloads worker is responsible for decrypting and running the
payload sent by the keys worker.

Signed-off-by: Anderson Toshiyuki Sasaki <[email protected]>
Use a separate task to handle the SIGINT (Ctrl + C) signal and wait for
all the tasks to finish.

Signed-off-by: Anderson Toshiyuki Sasaki <[email protected]>
Modify the ZeroMQ listener to poll the socket for events before trying
to read instead of blocking.

Create a revocation worker which is reponsible for running the
revocation payloads.

The payloads worker will send a message to the revocation worker after
decrypting the payload which may contain the revocation certificate.

The payloads worker will also send a message to the ZeroMQ worker to
start listening for revocation notifications.

All revocation messages received through both ZeroMQ and Rest API are
sent to the revocation worker which will process them one by one.  This
is to avoid running revocation scripts concurrently, which may conflict.

Signed-off-by: Anderson Toshiyuki Sasaki <[email protected]>
Cleanup U and V keys processing and add tests for processing with bogus
keys in the vectors of received keys.

Signed-off-by: Anderson Toshiyuki Sasaki <[email protected]>
@ansasaki
Copy link
Contributor Author

ansasaki commented Feb 8, 2023

I'm happy to approve given I see 3 approvals already and all outstanding comments have been resolved. My only caveat is that the Fedora tests failed. If this is a known issue then I'll add my approval, if not, let's get that fixed first pray

@ashcrow Done!

Copy link
Contributor

@ashcrow ashcrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review based on previous reviewers.

@ashcrow ashcrow merged commit 4104f5b into keylime:master Feb 8, 2023
@ansasaki ansasaki deleted the async_refactor branch February 9, 2023 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Graceful shutdown of the agent
7 participants