-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fix flaky test test_acl_revoke_pub_sub_while_subscribed
#3768
Conversation
async with async_timeout.timeout(10): | ||
while total_msgs != 10: | ||
try: | ||
res = await channel.get_message(ignore_subscribe_messages=True, timeout=5) | ||
if res is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so get_message
can return None
even if it doesn't timeout 😱
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, I've seen it, but it still doesn't explain everything.
See the failure, it shows receiving message4
while expecting message0
:
2024-09-23T10:58:07.4929895Z > assert res["data"] == f"message{total_msgs}"
2024-09-23T10:58:07.4930555Z E AssertionError: assert equals failed
2024-09-23T10:58:07.4931314Z E 'message�^4�' 'message�^0�'
(ignore unprintable coloring characters)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow
The reason it failed is that, in some rare cases, the subscriber did not get the first few messages of the publisher. This is likely due to timing of subscribe and publish, in different connections / threads. Given Pub/Sub has very weak guarantees, it's probably ok as is, so I just added a sleep to get the test to pass always.
446d77a
to
465c375
Compare
test test_acl_revoke_pub_sub_while_subscribed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work!
@@ -707,6 +710,10 @@ async def subscribe_worker(channel: aioredis.client.PubSub): | |||
subscriber_obj = subscriber.pubsub() | |||
await subscriber_obj.subscribe("channel") | |||
|
|||
# There's a rare timing issue if we don't wait here, but given the weak guarantees of Pub/Sub, | |||
# that's probably OK. | |||
await asyncio.sleep(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look back on my PR that added the logs, I had the exact same suspicion. Although I added the asyncio.sleep
on line 698 with hopes that the test would help the producer side (and then the subscriber would get all the messages). Little did I know that we also needed this here because I imagined that subscribe
above would be enough to receive all of the messages. Oh well 🤷 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that if it was a different command, it would have been a bug (that such a sleep is required). But Pub/Sub has very weak guarantees..
The reason it failed is that, in some rare cases, the subscriber did not
get the first few messages of the publisher. This is likely due to
timing of subscribe and publish, in different connections / threads.
Given Pub/Sub has very weak guarantees, it's probably ok as is, so I
just added a sleep to get the test to pass always.
Fixes #3678