Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect and flag missed notifications from RabbitMQ #4317

Conversation

elland
Copy link
Contributor

@elland elland commented Oct 30, 2024

https://wearezeta.atlassian.net/browse/WPB-11167

Checklist

  • Add a new entry in an appropriate subdirectory of changelog.d
  • Read and follow the PR guidelines

@echoes-hq echoes-hq bot added the echoes/initiative: scale Enterprise Readiness Initiatives label Oct 30, 2024
@zebot zebot added the ok-to-test Approved for running tests in CI, overrides not-ok-to-test if both labels exist label Oct 31, 2024
@elland elland force-pushed the dead-notifications branch 2 times, most recently from 9eed61b to 05f0c1b Compare October 31, 2024 14:20
@elland elland force-pushed the dead-notifications branch 10 times, most recently from 3a1b28d to 2f0d7e3 Compare November 4, 2024 14:40
@elland elland force-pushed the dead-notifications branch from 2f0d7e3 to cfda2da Compare November 4, 2024 15:03
@elland elland force-pushed the dead-notifications branch from d060f45 to 5c9b3db Compare November 6, 2024 15:02
@elland elland force-pushed the dead-notifications branch from aca5dfe to 23d552c Compare November 7, 2024 08:57
@elland elland force-pushed the dead-notifications branch from ca03c82 to 73ea202 Compare November 7, 2024 14:21
@elland elland marked this pull request as ready for review November 11, 2024 14:09
integration/test/Test/Events.hs Outdated Show resolved Hide resolved
( Log.msg (Log.val "Could not find x-last-death-queue in headers")
. Log.field "error_configuring_dead_letter_exchange" (show headers)
)
error "Could not find x-last-death-queue in headers"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to throw instead, so this can be handled by our middlewares.

Log.err env.logger $
Log.msg (Log.val "Could not parse msgHeaders into uid/cid for dead letter exchange message")
. Log.field "error_parsing_message" (show dat)
error "Could not parse msgHeaders into uid/cid for dead letter exchange message"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

Copy link
Contributor

@mdimjasevic mdimjasevic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a changelog.

🚀

@elland
Copy link
Contributor Author

elland commented Nov 12, 2024

Let's add the changelog to the parent.

@elland elland merged commit 61bacf2 into WPB-10308-use-rabbti-mq-classic-queues-for-notifications Nov 12, 2024
10 checks passed
@elland elland deleted the dead-notifications branch November 12, 2024 09:33
mdimjasevic added a commit that referenced this pull request Nov 12, 2024
* create rabbitmq exchange

* set up DLX

* wip

* create user queues on client add/update

* make it internal server error

* make galley compile

* make rabbitmq mandatory

* fix integration tests

* set correct DLX headers

* wip

* Add a WIP test to consume notifs via `GET /events`

* cannon: Roughly implement subscribing notifs from RabbitMQ

* gundeck: Start implementing push to rabbitmq

* integration: Fix assertion to assert on a real event

* integration: Use correct vHost in cannon

* cannon: Ensure exchange exists and publish event correctly on WS

* NotificationSubsystem: Align names for queues and extract them as top level bindings

* gundeck: Push events to RabbitMQ for compatible clients

* integration: Assert that acked events don't come back

* cannon: Forward client acks to rabbitmq

* WIP: Get rid of channel as an explicit param for NotificationSubsystem actions

* Get galley to compile.

* Fix some easy TODOs.

* Use these library better.

* resolve rebase conflict better.

* fix ghc errors.

* Extract rabbitmq channel lookup into helper.

* Move setupConsumableNotificationsClient from subsystems to gundeck.

* Revert "Get galley to compile."

This reverts commit 8a1840d.

* wire-subsystems: Fix compile errors in tests

* gundeck: doesn't depend on wire-subsystems (yet?)

* brig: Remove unnecessary import

* cannon: Don't create the queue for clients, expect it to already be there

* cannon: close ws connection when something goes wrong

* Funky!

* Undo funkiness: Create Wire.API.WebSocket with types for comms on the new websocket

* WIP: cannon: try to use the new types from wire-api

The case split is not the nicest, perhaps we can solve it with one these things:

- Bring back the funkiness for couple of commits ago
- Use separate types for server to client and client to server messages

* wire-api: Cabal file

* gen nix stuff

* ormolu

* Is this a good way of representing websocket messages?

* Resolve TODO.

amqp doesn't offer a bulk push operation.  instead it makes individual
pushes performant enough.  https://www.rabbitmq.com/docs/publishers

* Refactor test, extend coverage.

* Suggestions for better module name; removed "websocket" reference from some names.

* Roundtrip tests for Message*To*.

* Source comments.

* debug failing test [WIP]

* refactor tests a bit, fix ack, fix typos

* fix test, handle connection closed, format

* ping pong test

* Maintain stable connection to rabbitmq from cannon.  [WIP]

* Fix typo.

* Tune tests.

* Remove ping-pong stuff.

* Revert "Maintain stable connection to rabbitmq from cannon.  [WIP]"

This reverts commit 861ee10.

problems with this approach:
- there is a maximum number of chans / conn.
- this is all very complicated and should be done separately.

* Test multiple acks and out of order acks

* cannon: Refactor code a little and more logging

* integration: Deal with events websocket more gracefully, ensure all acks are sent

* cannon: Easier to understand cleanup code

* Add TODO for tomorrow

* cannon: Ensure invalid messages don't accumulate

* small re-org of code

* Avoid using unsafeRange

* Reduce top level functions

* integration: Test that old and new clients can co-exist

* gundeck: Optimize number of calls to brig

* gundeck: Try to not kill brig

* Fix typo

* gundeck: Remove pairing comment

* wire-api: Rename Wire.API.WebSocket -> Wire.API.Event.WebSocketProtocol

* gundeck: Don't configure dead-lettering while declaring queues

This should be done via Policies:
https://www.rabbitmq.com/docs/parameters#policies

When done with Policies, we can change our mind about how to deal with
dead-lettering later because queues cannot be redeclared with with different
headers.

* More TODOs

* Use direct exchange for user notifications

Topic exchange is not very useful for our usecase

* integration: Test that users only get notifs meant for them

* integration/Notifications: Allow waiting for notifs without a client

* Deflake newly written tests

* cannon: Don't declare the exchange, its not needed

* integration: Make cannon logLevel Warn again

* integrations: Throw error if websocket responds with invalid JSON

* Add ticket number to FUTUREWORK

* Add notification Id to rabbitmq notifs

This can be used by clients to detect duplicate deliveries.

* Makefile/clean-rabbit: Also cleanup exchanges

* gundeck: Get preexisting unit tests to pass

* Set correct expiration on events pushed to RabbitMQ

* fix a typo in the changelog

* Fix a release note

* Add changelogs for public and internal API changes

* Move around routing key helper functions

* Remove commented out code

* Detect and flag missed notifications from RabbitMQ (#4317)

* Update a change log to reflect changes from PR #4317

---------

Co-authored-by: Akshay Mankar <[email protected]>
Co-authored-by: Matthias Fischmann <[email protected]>
Co-authored-by: Marko Dimjašević <[email protected]>
Co-authored-by: Igor Ranieri <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
echoes/initiative: scale Enterprise Readiness Initiatives ok-to-test Approved for running tests in CI, overrides not-ok-to-test if both labels exist
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants