Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gcoap-dtls] Posting a message yields a stack overflow on the samr21-xpro with ECC #18292

Closed
valentinpi opened this issue Jul 3, 2022 · 10 comments
Labels
Area: network Area: Networking Area: security Area: Security-related libraries and subsystems Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@valentinpi
Copy link

Description

The gcoap-dtls example leads to a stack overflow after executing a post command.

Steps to reproduce the issue

After flashing the board and executing `$ coap post fc00:: 5684

  1. Flash the gcoap-dtls test on a board and execute the terminal:
$ BOARD=samr21-xpro SERIAL=... make clean all flash term
  1. Execute:
# coap post fc00:: 5684 / Hi!

Expected results

No matter whether a server is listening on fc00:: or not, the non-confirmable POST message should just be sent out and appear in Wireshark.

Actual results

This output:

# coap post fc00:: 5684 / Hi!
# gcoap_cli: sending msg ID 3222, 11 bytes
# scheduler(): stack overflow detected, pid=1
# scheduler(): stack overflow detected, pid=1
# scheduler(): stack overflow detected, pid=1

Versions

The current RIOT master branch.

Additional Information

I am currently writing a small DTLS proxy for a Node JS backend, since DTLS support there is quite terrible, and with tinydtls-rs I have developed a small Rust application to listen for DTLS packets from the board and decrypt them for the backend. However, with my application, I also get a stack overflow in pid=6, which is the coap thread, and then a hard fault. I decided to test this example, where I got the above problem. The Rust application is working however, and sending the handshake, but the client is not responding.

@valentinpi
Copy link
Author

Just to add a few more details here, the firmware I am using (about 170 lines, I do not want to post all), roughly performs this in one thread:

void *data_thread(void *arg) {
    (void)arg;

    uint8_t buf[CONFIG_GCOAP_PDU_BUF_SIZE];
    memset(buf, 0, CONFIG_GCOAP_PDU_BUF_SIZE);

    // Put packet metadata
    coap_pkt_t pdu = {};
    gcoap_req_init(&pdu, buf, CONFIG_GCOAP_PDU_BUF_SIZE, COAP_POST, "/data");
    coap_opt_add_format(&pdu, COAP_FORMAT_CBOR);
    coap_hdr_set_type(pdu.hdr, COAP_TYPE_NON);
    ssize_t meta_len = coap_opt_finish(&pdu, COAP_OPT_FINISH_PAYLOAD);
    while (true) {
        // Write some data to `buf`
        size_t payload_len = ...;

        // Post data
        gcoap_req_send(buf, meta_len + payload_len, &host_ep, NULL, NULL);

        // Some cleanup
    }

    return NULL;
}

And the DTLS proxy I am working on roughly does this in its write-callback (non-RIOT code):

unsafe extern "C" fn server_write_callback(
    ctx: *mut dtls_context_t,
    session: *mut session_t,
    buf: *mut u8,
    len: c_size_t,
) -> c_int {
    debug_println!("WRITE");

    let socket = (*ctx).app as *mut UdpSocket;
    let addr = session.as_ref().unwrap().addr.sin6.as_ref();

    assert!(addr.sin6_family == AF_INET6 as u16);

    (*socket)
        .send_to(
            std::slice::from_raw_parts(buf, len as usize),
            SocketAddrV6::new(
                Ipv6Addr::from(addr.sin6_addr.s6_addr),
                u16::from_be(addr.sin6_port),
                addr.sin6_flowinfo,
                addr.sin6_scope_id,
            ),
        )
        .expect(debug_fmt!("Failed to send message"));

    0
}

@kaspar030
Copy link
Contributor

Can you invrease the stacksize a lot, and then get ps output?

@valentinpi
Copy link
Author

I already tried that. I do not have the output at hand, but I increased the stack size of the COAP Thread to > 4096 (in the RIOT source, there are some additions there) and by the time of the hardfault it used up all of it. Before that, I think about 700 bytes. I can provide a ps output tomorrow, but this should be easily reproducible.

@cgundogan
Copy link
Member

Can't do much testing here but could you try moving buf outside the function scope? E.g., as a global variable.

@valentinpi
Copy link
Author

valentinpi commented Jul 5, 2022

Excuse the late reply, Still not working, but we are now switching to PSKs and that does not seem to crash. Anyways, @kaspar030 here is an idle PS of gcoap_dtls running:

2022-07-05 17:58:50,061 # main(): This is RIOT! (Version: 2022.07-devel-949-g1e17aa)                                            
2022-07-05 17:58:50,061 # gcoap example app                                                                                     
2022-07-05 17:58:50,062 # All up, running the shell now                                                                         
> ps                                                                                                                            
2022-07-05 18:02:42,887 # ps                                                                                                    
2022-07-05 18:02:42,896 #       pid | name                 | state    Q | pri | stack  ( used) ( free) | base addr  | current    | 
2022-07-05 18:02:42,904 #         - | isr_stack            | -        - |   - |    512 (  296) (  216) | 0x20000000 | 0x200001c0 | 
2022-07-05 18:02:42,913 #         1 | main                 | running  Q |   7 |   1536 (  680) (  856) | 0x20000730 | 0x20000b4c | 
2022-07-05 18:02:42,922 #         2 | event                | bl anyfl _ |   6 |    512 (  196) (  316) | 0x20000e98 | 0x20000fd4 | 
2022-07-05 18:02:42,932 #         3 | 6lo                  | bl rx    _ |   3 |   1024 (  528) (  496) | 0x20004348 | 0x2000462c | 
2022-07-05 18:02:42,941 #         4 | ipv6                 | bl rx    _ |   4 |   1024 (  448) (  576) | 0x20001c10 | 0x20001ed4 | 
2022-07-05 18:02:42,950 #         5 | udp                  | bl rx    _ |   5 |   1024 (  280) (  744) | 0x2000474c | 0x20004a34 | 
2022-07-05 18:02:42,959 #         6 | coap                 | bl anyfl _ |   6 |   2144 (  332) ( 1812) | 0x200013ac | 0x20001b1c | 
2022-07-05 18:02:42,968 #         7 | at86rf2xx            | bl anyfl _ |   2 |   1024 (  580) (  444) | 0x20002234 | 0x200024f4 | 
2022-07-05 18:02:42,975 #           | SUM                  |            |     |   8800 ( 3340) ( 5460)                          

This stack usage seems reasonable. I cannot see more, since after crashing I can only restart the board.

I now multiplied the GCOAP_STACK_SIZE by 4, yielding a ps of:

> ps                                                                                                                             
2022-07-05 18:14:19,525 # ps                                                                                                     
2022-07-05 18:14:19,534 #       pid | name                 | state    Q | pri | stack  ( used) ( free) | base addr  | current    │
2022-07-05 18:14:19,543 #         - | isr_stack            | -        - |   - |    512 (  280) (  232) | 0x20000000 | 0x200001c0 | 
2022-07-05 18:14:19,552 #         1 | main                 | running  Q |   7 |   1536 (  712) (  824) | 0x20000730 | 0x20000b4c │
2022-07-05 18:14:19,561 #         2 | event                | bl anyfl _ |   6 |    512 (  196) (  316) | 0x20000e98 | 0x20000fd4 │
2022-07-05 18:14:19,570 #         3 | 6lo                  | bl rx    _ |   3 |   1024 (  420) (  604) | 0x20004f48 | 0x2000522c │
2022-07-05 18:14:19,579 #         4 | ipv6                 | bl rx    _ |   4 |   1024 (  448) (  576) | 0x20002810 | 0x20002ad4 │
2022-07-05 18:14:19,588 #         5 | udp                  | bl rx    _ |   5 |   1024 (  280) (  744) | 0x2000534c | 0x20005634 │
2022-07-05 18:14:19,598 #         6 | coap                 | bl anyfl _ |   6 |   5216 (  332) ( 4884) | 0x200013ac | 0x2000271c │
2022-07-05 18:14:19,607 #         7 | at86rf2xx            | bl anyfl _ |   2 |   1024 (  580) (  444) | 0x20002e34 | 0x200030f4 │
2022-07-05 18:14:19,613 #           | SUM                  |            |     |  11872 ( 3248) ( 8624)

And the above command still crashes.

@cgundogan I tried your idea, it still crashes. :(

@miri64 miri64 changed the title [gcoap-dtls] Posting a message yields a stack overflow on the samr21-xpro [gcoap-dtls] Posting a message yields a stack overflow on the samr21-xpro with ECC Jul 11, 2022
@miri64
Copy link
Member

miri64 commented Jul 11, 2022

Excuse the late reply, Still not working, but we are now switching to PSKs and that does not seem to crash.

ECC is known to be flaky with TinyDTLS, so I think it is good to keep this open as a known issue.

@miri64 miri64 added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: network Area: Networking Area: security Area: Security-related libraries and subsystems labels Jul 11, 2022
@miri64
Copy link
Member

miri64 commented Jul 11, 2022

There is more info on ECC on microcontrollers on the forum btw. Could also be of interest to you.

@maribu
Copy link
Member

maribu commented May 18, 2023

@valentinpi sorry for the late reply. Could you try it again with increasing the stack size? But this time, please increase the stack size of the main stack rather than the coap stack, as

# scheduler(): stack overflow detected, pid=1

indicates that the main stack rather than the coap stack was overflowing. Thx :)

@valentinpi
Copy link
Author

valentinpi commented Jun 4, 2023

Thank you so much for the reply, but I sadly cannot access my board right now :(. May we close the issue and could I reopen it in the case I get back to this again please?

@maribu
Copy link
Member

maribu commented Jun 4, 2023

Sure. If the issue arises again, I'm happy to assist solving :)

@maribu maribu closed this as completed Jun 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: network Area: Networking Area: security Area: Security-related libraries and subsystems Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

No branches or pull requests

5 participants