Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deadlock on latest release #17

Closed
scgm11 opened this issue Sep 7, 2017 · 14 comments
Closed

deadlock on latest release #17

scgm11 opened this issue Sep 7, 2017 · 14 comments

Comments

@scgm11
Copy link

scgm11 commented Sep 7, 2017

Im experiencing a deadlock with the latest release 1.5

Channel Context Extension Prio State Application Data CallerID Duration Accountcode PeerAccount BridgeID
SIP/QUALITY1-0000005 functions FUNC_DIGITE_O_DI 5 Up MRCPRecog "/usr/local/unimrcp/data/ 912732278 07:26:42 QUALITY1

attached traces I will compile it with better debug options and add them

core-asterisk-running-2017-09-07T16-59-48+0200-brief.txt
core-asterisk-running-2017-09-07T16-59-48+0200-full.txt
core-asterisk-running-2017-09-07T16-59-48+0200-locks.txt
core-asterisk-running-2017-09-07T16-59-48+0200-thread1.txt

@scgm11
Copy link
Author

scgm11 commented Sep 7, 2017

Seems to there is no deadlock but anyway the channel is stuck, seems to be when a grammar cant be loaded

ERROR[32387][C-00000005]: app_mrcprecog.c:1203 app_recog_exec: (ASR-13) Unable to load grammar

then the channel get stuck

core-asterisk-running-2017-09-07T19-51-35+0200-brief.txt
core-asterisk-running-2017-09-07T19-51-35+0200-full.txt
core-asterisk-running-2017-09-07T19-51-35+0200-locks.txt
core-asterisk-running-2017-09-07T19-51-35+0200-thread1.txt

@achaloyan
Copy link
Contributor

Could you provide Asterisk/UniMRCP logs with debug log level enabled.
I assume you have built everything from source, right. Just in case, what version of Asterisk do you use?

@scgm11
Copy link
Author

scgm11 commented Sep 7, 2017

it's not really a deadlock in the log attached is clear, the channels is waiting for a timeout maybe some issue on the backend, anyway I think should clear the channel earlier because even asterisk can't hangup the call neither manually using hangup request (channel.
I don't see any lock hold as I show on the log.

Maybe is an engine issue but didn't happen on the previous version or at least not as usual to be aware of.
ast 13.17

log.txt

@achaloyan
Copy link
Contributor

I doubt this issue is related to the latest release. Actually, there is no response to session termination request. The following log statement is a very bad indication, anyway.

speech_channel.c:370 speech_channel_destroy: (ASR-57) MRCP session has not terminated after 30000 ms

If you enable UniMRCP client logs (set log-level to DEBUG in mrcp.conf), I'll tell you what is going on precisely. Most likely there is no response from the MRCP server you use.

@scgm11
Copy link
Author

scgm11 commented Sep 13, 2017

Ill try to reproduce it again I haven't been able yet, Im closing the issue for the time being I will reopen if needed.

@scgm11 scgm11 closed this as completed Sep 13, 2017
@scgm11 scgm11 reopened this Sep 19, 2017
@scgm11
Copy link
Author

scgm11 commented Sep 19, 2017

Im seeing this in a loaded environment that is hard to get all the debug, anyway why cant I hangup request the call? is there no global timeout that can be added so in case not response hangup the channel?

@achaloyan
Copy link
Contributor

If you set the request-timeout in mrcp.conf, for example, to 5000 msec, then any request with no response received within the specified timeout will be cancelled. As a result, the channel will ultimately hang up.

@scgm11
Copy link
Author

scgm11 commented Sep 20, 2017

channels are not releasing even with the request-timeout feature, I will try to get more log

@scgm11
Copy link
Author

scgm11 commented Sep 22, 2017

attached full log of a call that the channel get stuck, even with the timeout params

I think there is other case that hangs the channel indefinitly this is released about 10 minutes later

fulllog.txt
logother.txt

[Sep 22 13:42:18] NOTICE[21049]: src/rtsp_client.c:524 : Push RTSP Request to In-Progress Queue 0x7f33700344a8 CSeq:1
[Sep 22 13:42:18] DEBUG[21049]: src/apt_poller_task.c:244 : Wait for Messages [uni4] timeout [5000]
[Sep 22 13:42:23] NOTICE[21049]: src/rtsp_client.c:764 : Cancel RTSP Request 0x7f33700344a8 CSeq:1 [408]
[Sep 22 13:42:23] NOTICE[21049]: src/rtsp_client.c:549 : Pop In-Progress RTSP Request 0x7f33700344a8 CSeq:1
[Sep 22 13:42:23] DEBUG[21049]: src/apt_task.c:263 : Signal Message to [MRCP Client] [0x7f3380003330;1;0]
[Sep 22 13:42:23] DEBUG[21049]: src/apt_poller_task.c:249 : Wait for Messages [uni4]
[Sep 22 13:42:23] DEBUG[21043]: src/apt_task.c:335 : Process Message [MRCP Client] [0x7f3380003330;1;0]
[Sep 22 13:42:23] NOTICE[21043]: src/mrcp_client_session.c:149 : Receive Answer ASR-186 [c:0 a:0 v:0] Status 408
[Sep 22 13:42:23] NOTICE[21043]: src/mrcp_client_session.c:453 : Raise App Response ASR-186 [2] FAILURE [2]
[Sep 22 13:42:23] DEBUG[21043]: app_mrcprecog.c:253 speech_on_channel_add: (ASR-186) speech_on_channel_add
[Sep 22 13:42:23] ERROR[21043]: app_mrcprecog.c:294 speech_on_channel_add: (ASR-186) Channel error status=2, response code=408!
[Sep 22 13:42:23] DEBUG[21043]: speech_channel.c:109 speech_channel_set_state_unlocked: (ASR-186) CLOSED ==> ERROR
[Sep 22 13:42:23] DEBUG[21630][C-00000049]: speech_channel.c:507 speech_channel_open: (ASR-186) Terminating MRCP session
[Sep 22 13:42:23] DEBUG[21043]: src/apt_consumer_task.c:135 : Wait for Messages [MRCP Client]
[Sep 22 13:42:23] DEBUG[21630][C-00000049]: src/apt_task.c:263 : Signal Message to [MRCP Client] [0x7f344c01de80;4;0]

@achaloyan
Copy link
Contributor

While I could not find anything obviously wrong in the attached txt files, it is interesting what happened after the last log statement that you dumped in the comments.

[Sep 22 13:42:23] DEBUG[21630][C-00000049]: src/apt_task.c:263 : Signal Message to [MRCP Client] [0x7f344c01de80;4;0]

For every Signal Message, there should be a Process Message. If you have the remaining logs, look for 0x7f344c01de80;4;0 if you can see it there.

You may also to track thread identifiers. Process Message should have been called from [21043], and in your logother.txt I can see that this thread continues to process messages, but there is a gap from Sep 22 13:42:23 to Sep 22 13:48:32. Not sure what happened in the meantime.

Needless to say, it is also interesting what happened with thread [21630]. Do you see any logs from this thread afterwards.

@scgm11
Copy link
Author

scgm11 commented Sep 23, 2017

I attach the logs

There is also a possibility that when this happens, the process of the Voice Engine is killed, but Im not sure yet. so it may be in the middle of something and killed the process... but why the timeout is never honored?

0x7f344.txt
21043.txt
21630.txt

@achaloyan
Copy link
Contributor

Well, now I can track this session to the end, and there is nothing wrong in this flow, besides the fact that channel_opened returned error, which is OK. MRCPRecog did complete in all these cases. Why would you think that your problem is MRCP-related.

@scgm11
Copy link
Author

scgm11 commented Sep 23, 2017

Because the channel gets stuck at MRCPRecog, and even when I see a hangup the channel is still stuck there... and cant even hangup request....

@achaloyan
Copy link
Contributor

OK, then provide the identified of a stuck channel along with the logs.

In all the attempts I have seen so far, MRCPRecog() did exit. Here is the last statement which you may track

MRCPRecog() exiting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants