Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kickoff+Forward: st_thread_join unsuccessful triggers assert causing SRS crash #2369

Closed
shitizenlism opened this issue May 20, 2021 · 5 comments
Assignees
Labels
Bug It might be a bug. TransByAI Translated by AI/GPT. Won't fix We won't fix it.
Milestone

Comments

@shitizenlism
Copy link

shitizenlism commented May 20, 2021

SRS4.0.116 version, when kicking off a stream, there is a relatively high probability of a crash occurring.

build: 2021-05-19 20:54:49, configure: --x86-x64 --stream-caster=on --stat=on --without-utest --cxx11=on --srt=off --rtc=on --ssl=on --sys-ssl=off, uname: Linux localhost.localdomain 3.10.0-1160.6.1.el7.x86_64 #1 SMP Tue Nov 17 13:59:11 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux, osx: 0

[centos@ip-10-3-1-237 gss_rel]$ sudo gdb objs/srs runlog/core.5047
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
[New LWP 5047]
[New LWP 5049]
[New LWP 5048]
[New LWP 5051]
[New LWP 5052]
[New LWP 5053]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./objs/srs -c ./conf/live.conf'.
Program terminated with signal 6, Aborted.
#0  0x00007f3890dcf3d7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-323.el7_9.x86_64 libgcc-4.8.5-44.el7.x86_64 libstdc++-4.8.5-44.el7.x86_64
(gdb) bt
#0  0x00007f3890dcf3d7 in raise () from /lib64/libc.so.6
#1  0x00007f3890dd0ac8 in abort () from /lib64/libc.so.6
#2  0x00007f3890dc81a6 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007f3890dc8252 in __assert_fail () from /lib64/libc.so.6
#4  0x0000000000531ec7 in SrsFastCoroutine::stop (this=0x2d15f20) at src/app/srs_app_st.cpp:222
#5  0x00000000005319f0 in SrsSTCoroutine::stop (this=0x2cb0860) at src/app/srs_app_st.cpp:117
#6  0x00000000005211a8 in SrsForwarder::on_unpublish (this=0x2d0d650) at src/app/srs_app_forward.cpp:125
#7  0x0000000000506dcb in SrsOriginHub::destroy_forwarders (this=0x2cb1980) at src/app/srs_app_source.cpp:1522
#8  0x000000000050563c in SrsOriginHub::on_unpublish (this=0x2cb1980, m3u8_end_flag=0) at src/app/srs_app_source.cpp:1178
#9  0x000000000050eec1 in SrsLiveSource::on_unpublish (this=0x2d03bb0) at src/app/srs_app_source.cpp:2939
#10 0x00000000004f8d27 in SrsRtmpConn::release_publish (this=0x2c50b10, source=0x2d03bb0) at src/app/srs_app_rtmp_conn.cpp:1500
#11 0x00000000004f737d in SrsRtmpConn::publishing (this=0x2c50b10, source=0x2d03bb0) at src/app/srs_app_rtmp_conn.cpp:1306
#12 0x00000000004f450a in SrsRtmpConn::stream_service_cycle (this=0x2c50b10) at src/app/srs_app_rtmp_conn.cpp:686
#13 0x00000000004f35b7 in SrsRtmpConn::service_cycle (this=0x2c50b10) at src/app/srs_app_rtmp_conn.cpp:428
#14 0x00000000004f2248 in SrsRtmpConn::do_cycle (this=0x2c50b10) at src/app/srs_app_rtmp_conn.cpp:241
#15 0x00000000004fc094 in SrsRtmpConn::cycle (this=0x2c50b10) at src/app/srs_app_rtmp_conn.cpp:2056
#16 0x00000000005320a1 in SrsFastCoroutine::cycle (this=0x2d11920) at src/app/srs_app_st.cpp:270
#17 0x0000000000532124 in SrsFastCoroutine::pfn (arg=0x2d11920) at src/app/srs_app_st.cpp:285
#18 0x00000000006a44a4 in _st_thread_main () at sched.c:363
#19 0x00000000006a4d17 in st_thread_create (start=0x6a3e20 <_st_vp_schedule+170>, arg=0x2cab510, joinable=1, stk_size=65536) at sched.c:694
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) quit

void SrsFastCoroutine::stop()
{
    if (disposed) {
        return;
    }
    disposed = true;
    
    interrupt();

    // When not started, the trd is NULL.
    if (trd) {
        void* res = NULL;
        int r0 = st_thread_join((st_thread_t)trd, &res);
        srs_assert(!r0);	//The crash is caused by this assertion.

        srs_error_t err_res = (srs_error_t)res;
        if (err_res != srs_success) {
            // When worker cycle done, the error has already been overrided,
            // so the trd_err should be equal to err_res.
            srs_assert(trd_err == err_res);
        }
    }
    
    // If there's no error occur from worker, try to set to terminated error.
    if (trd_err == srs_success && !cycle_done) {
        trd_err = srs_error_new(ERROR_THREAD_TERMINATED, "terminated");
    }
    
    return;
}

The live.conf is as follows:

srs_log_tank        file;
srs_log_file        ./logs/live.log;
listen              1935;
max_connections     1000;
pid                 ./objs/live.pid;

srt_server {
    enabled off;
    listen 6080;
    maxbw 1000000000;
    connect_timeout 4000;
    peerlatency 300;
    recvlatency 300;
}

http_api {
    enabled on;
    listen 1985;
    raw_api {
        enabled             on;
        allow_reload        on;
        allow_query         on;
        allow_update        on;
    }
}


http_server {
    enabled         on;
    listen          8880;
    dir             ./objs/nginx/html;
}
heartbeat {
    enabled         on;
    interval        60;
    url             http://127.0.0.1:8000/api/gss/heartbeat;
    device_id       "11223344";
    summaries       on;
}
stats {
    network         0;
    disk            sda sdb nvme0n1 nvme1n1;
}

# the hls window in seconds, the number of ts in m3u8.
vhost __defaultVhost__ {
    forward {
        enabled on;
        destination 127.0.0.1:1936;
    }

}

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented May 21, 2021

It seems that there are some issues with the combination of the HTTP API (kickoff) and Forward coroutines.

Would it be convenient to upload the core files and logs?

TRANS_BY_GPT3

@winlinvip winlinvip changed the title st_thread_join不成功触发assert导致srs crash Kickoff+Forward: st_thread_join不成功触发assert导致srs crash May 21, 2021
@winlinvip winlinvip added the Bug It might be a bug. label May 21, 2021
@shitizenlism
Copy link
Author

shitizenlism commented May 21, 2021

Let me upload it next time when I reproduce it. I made some modifications that won't affect the functionality of this kick off stream, but the log will have many differences.

"

Please ensure to maintain the markdown structure.

TRANS_BY_GPT3

@winlinvip winlinvip added this to the SRS 4.0 release milestone Aug 26, 2021
@winlinvip
Copy link
Member

winlinvip commented Aug 26, 2021

Did it reproduce?

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented Dec 26, 2021

The log has already been updated. We need to wait for the problem to occur again, which has a relatively low probability.

Postpone to SRS 5.0.

TRANS_BY_GPT3

@winlinvip winlinvip modified the milestones: 4.0, 5.0 Dec 26, 2021
@winlinvip
Copy link
Member

HTTP RAW API is removed.

@winlinvip winlinvip added the Won't fix We won't fix it. label Jan 2, 2023
@winlinvip winlinvip changed the title Kickoff+Forward: st_thread_join不成功触发assert导致srs crash Kickoff+Forward: st_thread_join unsuccessful triggers assert causing SRS crash Jul 28, 2023
@winlinvip winlinvip added the TransByAI Translated by AI/GPT. label Jul 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug It might be a bug. TransByAI Translated by AI/GPT. Won't fix We won't fix it.
Projects
None yet
Development

No branches or pull requests

3 participants