Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service Bus v7.4.0(the latest version) hangs receiving messages #20274

Closed
6 tasks
winterAndvelvet opened this issue Feb 9, 2022 · 31 comments
Closed
6 tasks
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-author-feedback Workflow: More information is needed from author to address the issue. no-recent-activity There has been no recent activity on this issue. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Bus
Milestone

Comments

@winterAndvelvet
Copy link

  • Package Name: azure/service-bus
  • Package Version: 7.4.0
  • Operating system:linux debian
  • nodejs
    • version: 14.15.0
  • browser
    • name/version:
  • typescript
    • version:
  • Is the bug related to documentation in

Describe the bug
receiveMessages hang

To Reproduce
Steps to reproduce the behavior:

code like:
  this.receiver = this.sbClient.createReceiver(this.queueName, {
                receiveMode: 'peekLock',
                maxAutoLockRenewalDurationInMs: 0 //set this to 0 to disable auto lock renew! because it's not work as expected!
            });;
  while (true) {
            try {
                // NOTE: asking for 20 messages does not guarantee that we will return
                // all 20 at once so we must loop until we get all the messages we expected.
                /**
                * @throws Error if the underlying connection, client or receiver is closed.
                * @throws Error if current receiver is already in state of receiving messages.
                * @throws `ServiceBusError` if the service returns an error while receiving messages.
                 */
                const messages = await this.receiver.receiveMessages(messageCount, {
                    maxWaitTimeInMs: 2 * 1000,
                });
                if (!messages.length) {
                    logger.debug('queue is empty');
                    continue;
                }
                messages.map(msg=>onMessageHandler(msg));
            } catch (err) {
                await onErrorHandler(new QueueError(err));
            }
        }

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Feb 9, 2022
@azure-sdk azure-sdk added Client This issue points to a problem in the data-plane of the library. needs-team-triage Workflow: This issue needs the team to triage. Service Bus labels Feb 9, 2022
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Feb 9, 2022
@ramya-rao-a
Copy link
Contributor

Thanks for reporting @winterAndvelvet

While we look to replicate the issue, I noticed that you are calling your own message and error handlers. If you already have these handlers set up, can you share why you chose to use the receiveMessages() method in a loop instead of the subscribe() method that will take your handlers?

@ramya-rao-a ramya-rao-a removed the needs-team-triage Workflow: This issue needs the team to triage. label Feb 9, 2022
@winterAndvelvet
Copy link
Author

winterAndvelvet commented Feb 10, 2022

For concurrency control eaily,we choosed the polling message style.In the first few hours,it worked well,but get message hang when queue incoming message increase to 4k.I don't konw if hang is related with message incoming count in queue.In addition,i tied to call receiveMessages with timeout.But when receiveMessages timeout ,it throw "The receiver is already receiving messages" at the next loop step.This also can proved receiveMessages is really hang.
Code Like:

 async startGetMessageLoop(onMessageHandler, onErrorHandler, options){
        let {trafficController, getMsgTimer}= options;
        while (true) {
            try {
                if(global.isReceiveExit){
                    logger.warn('receive sigterm,break rec msg loop');
                    break;
                }
                await trafficController.waitTrafficBlockLeave();
                let messageCount = trafficController.nextTrafficWindow();
                logger.debug(`next traffic window:${messageCount}`);
                // NOTE: asking for 20 messages does not guarantee that we will return
                // all 20 at once so we must loop until we get all the messages we expected.
                /**
                * @throws Error if the underlying connection, client or receiver is closed.
                * @throws Error if current receiver is already in state of receiving messages.
                * @throws `ServiceBusError` if the service returns an error while receiving messages.
                 */
                //bug:https://github.com/Azure/azure-sdk-for-js/issues/20274
                const messages = await Promise.race([this.receiver.receiveMessages(messageCount, {
                    maxWaitTimeInMs: 2 * 1000,
                }),new Promise((_, reject) => {
                    setTimeout(() =>{
                         reject(new Error('receive timeout'))}, 5000);
                 })]);
                if (!messages.length) {
                    logger.debug('queue is empty');
                    continue;
                }
                messages.map(msg=>onMessageHandler(msg));
            } catch (err) {
                if(err.message === 'receive timeout'){
                    logger.error(`receive msg timeout,continue`);
                    continue;
                }
                await onErrorHandler(new QueueError(err));
            }
            await sleep(getMsgTimer);
        }
    } 

image

@winterAndvelvet
Copy link
Author

By the way,when use:

await this.receiver.receiveMessages(messageCount, {
                    maxWaitTimeInMs: 2 * 1000,
                })

also get current receiver is already in state of receiving messages exception,the maxWaitTimeInMs not work?

@jeremymeng
Copy link
Member

@winterAndvelvet thank you for the detailed information! I will look into this.

@jeremymeng jeremymeng added this to the [2022] March milestone Feb 10, 2022
@winterAndvelvet
Copy link
Author

@jeremymeng thank you

@jeremymeng
Copy link
Member

@winterAndvelvet You are right that the "isReceivingMessages" flag is only cleared at the end of the receiveMessages() call. I would expect maxWaitTimeInMs to work but from the above code and logs I could not see what went wrong. Is it possible that you enable more logging and see if anything unusual/erroneous when the issue starts happening? Please refer to the troubleshooting section for more information on getting debug logs:

https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/servicebus/service-bus/README.md#troubleshooting

It is probably useful to start with this configuration:

export DEBUG=azure*,rhea*,-rhea:raw,-rhea:message,-azure:core-amqp:datatransformer

@jeremymeng
Copy link
Member

@winterAndvelvet while we are figuring out the root cause of the issue, one workaround is to pass an abort signal to the receiveMessages() call then handle the error:

import { AbortController } from "@azure/abort-controller";

// ...
        const messages = await this.receiver.receiveMessages(messageCount, {
          maxWaitTimeInMs: 3 * 1000,
          abortSignal: AbortController.timeout(2 * 1000),
        });
// ...
      } catch (err) {
        if (err.name === "AbortError") {
          console.log("2 seconds passed, abort. will try receiving again");
          continue;
        }
        await onErrorHandler(new Error(err));
      }

@jeremymeng
Copy link
Member

@winterAndvelvet have you got any chance to enable more detailed logging or try out the workaround using abort signal?

@jeremymeng jeremymeng modified the milestones: [2022] March, [2022] April Mar 9, 2022
@jeremymeng jeremymeng added the needs-author-feedback Workflow: More information is needed from author to address the issue. label Mar 23, 2022
@ghost ghost added the no-recent-activity There has been no recent activity on this issue. label Mar 30, 2022
@ghost
Copy link

ghost commented Mar 30, 2022

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

@jeremymeng jeremymeng removed this from the [2022] April milestone Apr 7, 2022
@ghost ghost removed the no-recent-activity There has been no recent activity on this issue. label Apr 7, 2022
@jeremymeng jeremymeng added this to the [2022] May milestone Apr 7, 2022
@ghost ghost added the no-recent-activity There has been no recent activity on this issue. label Apr 14, 2022
@ghost
Copy link

ghost commented Apr 14, 2022

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

@CodeJuggernaut
Copy link

CodeJuggernaut commented Apr 20, 2022

Hello. I'm having this same issue. the receiver.receiveMessages freezes if it tries to receive more than 2000 messages at once, and the maxWaitTimeInMs is not respected. Therefore it hangs and requires a restart. It works fine when receiving less than 2000 messages.

it also freezes if I batch process 200 messages at a time, until 2000 are requested. Once 2000 are locked, it will not receive any more messages from the queue (the 2000 limit is approximately when the issue occurs, might be more or less in reality)

Receiving messages...
Received 250 messages
Not enough messages to process yet, waiting for 4446 total messages
Processing all messages took: 0.493ms
Received 250 messages
Not enough messages to process yet, waiting for 4446 total messages
Processing all messages took: 0.339ms
Received 250 messages
Not enough messages to process yet, waiting for 4446 total messages
Processing all messages took: 0.357ms
Received 250 messages
Not enough messages to process yet, waiting for 4446 total messages
Processing all messages took: 0.324ms
Received 250 messages
Not enough messages to process yet, waiting for 4446 total messages
Processing all messages took: 0.384ms
Received 250 messages
Not enough messages to process yet, waiting for 4446 total messages
Processing all messages took: 0.319ms
Received 250 messages
Not enough messages to process yet, waiting for 4446 total messages
Processing all messages took: 0.336ms
Received 250 messages
Not enough messages to process yet, waiting for 4446 total messages
Processing all messages took: 0.342ms
  Error [AbortError]: The operation was aborted.
    at new AbortError
 Finally called

Here you can see that it receives the messages in a loop perfectly fine, until exactly 2000 have been received. At this point, the receiver.receiveMessages hangs and the abort controller is called.

Code:

import { AbortController } from "@azure/abort-controller";

let messageBatchSize = 250;
let messagesRequiredToStartProcessing = 4446;
let allMessages = [];

try {
            const messages = await receiver.receiveMessages(messageBatchSize, {
                maxWaitTimeInMs: 0.5 * 1000,
                abortSignal: AbortController.timeout(10 * 1000),
            });

            console.time(`Processing all messages took`);

            console.log(`Received ${messages.length} messages`);
            allMessages.push(...messages);

            if (allMessages.length < messagesRequiredToStartProcessing) {
                // Don't process until exactly `messageBatchSize` messages are received
                console.log(`Not enough messages to process yet, waiting for ${messagesRequiredToStartProcessing} total messages`);
                console.timeEnd(`Processing all messages took`);
                continue;
            }
    } catch (error) {
        console.log("Assigned Seating Queue Error: ", error)
    }    {
        console.log('Assigned Seating Queue: Finally called');
}

@ghost ghost removed the no-recent-activity There has been no recent activity on this issue. label Apr 20, 2022
@ghost
Copy link

ghost commented May 24, 2022

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

@ghost ghost closed this as completed Jun 8, 2022
@zfyseu
Copy link

zfyseu commented Oct 14, 2022

Hello @jeremymeng

the workaround that pass an abort signal to the receiveMessages() call not work. Afterthe receiveMessages() call abort, the receiveMessages() call hang again soon, as shown in the following figure. I feel I need to create a new receiver.

image

@ghost ghost removed the no-recent-activity There has been no recent activity on this issue. label Oct 14, 2022
@jeremymeng
Copy link
Member

@zfyseu is it possible for you to try the other mentioned workaround of processing at most 2000 messages in a batch?

@zfyseu
Copy link

zfyseu commented Oct 16, 2022

@jeremymeng My receiver only receives one message at a time.

@jeremymeng
Copy link
Member

@zfyseu ah sorry I thought you were having the batching issue. Is it possible for you to turn on more detailed logging and share the logs when the SDK stops receiving messages?

export DEBUG=azure*,rhea*,-rhea:raw,-rhea:message,-azure:core-amqp:datatransformer

@jeremymeng jeremymeng reopened this Oct 17, 2022
@zfyseu
Copy link

zfyseu commented Oct 20, 2022

@jeremymeng OK, I'm a little busy these days. I'll collect the logs later.

@ghost ghost added the no-recent-activity There has been no recent activity on this issue. label Oct 27, 2022
@ghost
Copy link

ghost commented Oct 27, 2022

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

@zfyseu
Copy link

zfyseu commented Oct 27, 2022

Hi @jeremymeng , The time of the receiver hang is around 2022-10-26T17:31:40 UTC time. I take a screenshot of some logs near the time.

debugLogSnipaste_001

debugLogSnipaste_002

debugLogSnipaste_003

debugLogSnipaste_004

debugLogSnipaste_005

debugLogSnipaste_006

debugLogSnipaste_007

debugLogSnipaste_008

@ghost ghost removed the no-recent-activity There has been no recent activity on this issue. label Oct 27, 2022
@jeremymeng
Copy link
Member

@zfyseu Thank you for sharing the log. Just to confirm, this is with the abortSignal workaround? Do you see my logs for "[connection-8]" afterwards?

@zfyseu
Copy link

zfyseu commented Oct 29, 2022

@jeremymeng Yes, this is with the abortSignal workaround. And the AbortSignal is 5s after maxWaitTimeInMs.

image

@ghost ghost added the no-recent-activity There has been no recent activity on this issue. label Nov 5, 2022
@ghost
Copy link

ghost commented Nov 5, 2022

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

@ghost ghost closed this as completed Nov 20, 2022
@jeremymeng jeremymeng reopened this Nov 21, 2022
@ghost ghost removed the no-recent-activity There has been no recent activity on this issue. label Nov 21, 2022
@ghost ghost added the no-recent-activity There has been no recent activity on this issue. label Nov 28, 2022
@ghost
Copy link

ghost commented Nov 28, 2022

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

@jeremymeng jeremymeng removed the no-recent-activity There has been no recent activity on this issue. label Nov 28, 2022
@ghost ghost added the no-recent-activity There has been no recent activity on this issue. label Dec 6, 2022
@ghost
Copy link

ghost commented Dec 6, 2022

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

@jeremymeng jeremymeng removed the no-recent-activity There has been no recent activity on this issue. label Dec 16, 2022
@ghost ghost added the no-recent-activity There has been no recent activity on this issue. label Dec 23, 2022
@ghost
Copy link

ghost commented Dec 23, 2022

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

@ghost ghost closed this as completed Jan 7, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Apr 11, 2023
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-author-feedback Workflow: More information is needed from author to address the issue. no-recent-activity There has been no recent activity on this issue. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Bus
Projects
None yet
Development

No branches or pull requests

6 participants