netdev_event_thread: add support for event_thread irq handler #13573

jia200x · 2020-03-06T13:00:58Z

Contribution description

As promised in #13562, this PR adds an optional event_thread based IRQ handler.

Now it's possible to write radio applications without an explicit recv thread for calling dev->driver->isr. Also, network stacks are not required to handle their radio events if this module is enabled, so the glue code only requires an implementation of event_callback. With some minor tricks we could easily expand the device support for LWIP and OpenThread.

As an example, the `tests/driver_at86rf2xx` looks like this:

/* ... */
static void _event_cb(netdev_t *dev, netdev_event_t event)
{
    switch (event) {
        case NETDEV_EVENT_RX_COMPLETE:
        {
            recv(dev);

            break;
        }
        default:
            puts("Unexpected event received");
            break;
    }
}

int main(void)
{
    puts("AT86RF2xx device driver test");

    unsigned dev_success = 0;
    for (unsigned i = 0; i < AT86RF2XX_NUM; i++) {
        netopt_enable_t en = NETOPT_ENABLE;
        const at86rf2xx_params_t *p = &at86rf2xx_params[i];
        netdev_t *dev = (netdev_t *)(&devs[i]);

        printf("Initializing AT86RF2xx radio at SPI_%d\n", p->spi);
        at86rf2xx_setup(&devs[i], p);

        /* Just call this and voilà! */
        netdev_event_thread_init(dev);

        dev->event_callback = _event_cb;
        if (dev->driver->init(dev) < 0) {
            continue;
        }
        dev->driver->set(dev, NETOPT_RX_END_IRQ, &en, sizeof(en));
        dev_success++;
    }

    if (!dev_success) {
        puts("No device could be initialized");
        return 1;
    }

    /* start the shell */
    puts("Initialization successful - starting the shell now");

    char line_buf[SHELL_DEFAULT_BUFSIZE];
    shell_run(shell_commands, line_buf, SHELL_DEFAULT_BUFSIZE);

    return 0;
}

Testing procedure

I used this patch to run tests/driver_at86rf2xx with 2 samr21-xpro boards:

Patch

diff --git a/tests/driver_at86rf2xx/Makefile b/tests/driver_at86rf2xx/Makefile
index 76daf84719..d4578e841a 100644
--- a/tests/driver_at86rf2xx/Makefile
+++ b/tests/driver_at86rf2xx/Makefile
@@ -6,6 +6,7 @@ USEMODULE += od
 USEMODULE += shell
 USEMODULE += shell_commands
 USEMODULE += ps
+USEMODULE += netdev_event_thread
 
 # define the driver to be used for selected boards
 ifneq (,$(filter samr21-xpro,$(BOARD)))
diff --git a/tests/driver_at86rf2xx/main.c b/tests/driver_at86rf2xx/main.c
index 7edab95bb5..9acd2754b3 100644
--- a/tests/driver_at86rf2xx/main.c
+++ b/tests/driver_at86rf2xx/main.c
@@ -31,9 +31,6 @@
 #define _STACKSIZE      (THREAD_STACKSIZE_DEFAULT + THREAD_EXTRA_STACKSIZE_PRINTF)
 #define MSG_TYPE_ISR    (0x3456)
 
-static char stack[_STACKSIZE];
-static kernel_pid_t _recv_pid;
-
 at86rf2xx_t devs[AT86RF2XX_NUM];
 
 static const shell_command_t shell_commands[] = {
@@ -44,44 +41,16 @@ static const shell_command_t shell_commands[] = {
 
 static void _event_cb(netdev_t *dev, netdev_event_t event)
 {
-    if (event == NETDEV_EVENT_ISR) {
-        msg_t msg;
-
-        msg.type = MSG_TYPE_ISR;
-        msg.content.ptr = dev;
-
-        if (msg_send(&msg, _recv_pid) <= 0) {
-            puts("gnrc_netdev: possibly lost interrupt.");
-        }
-    }
-    else {
-        switch (event) {
-            case NETDEV_EVENT_RX_COMPLETE:
-            {
-                recv(dev);
-
-                break;
-            }
-            default:
-                puts("Unexpected event received");
-                break;
-        }
-    }
-}
+    switch (event) {
+        case NETDEV_EVENT_RX_COMPLETE:
+        {
+            recv(dev);
 
-void *_recv_thread(void *arg)
-{
-    (void)arg;
-    while (1) {
-        msg_t msg;
-        msg_receive(&msg);
-        if (msg.type == MSG_TYPE_ISR) {
-            netdev_t *dev = msg.content.ptr;
-            dev->driver->isr(dev);
-        }
-        else {
-            puts("unexpected message type");
+            break;
         }
+        default:
+            puts("Unexpected event received");
+            break;
     }
 }
 
@@ -97,6 +66,7 @@ int main(void)
 
         printf("Initializing AT86RF2xx radio at SPI_%d\n", p->spi);
         at86rf2xx_setup(&devs[i], p);
+        netdev_event_thread_init(dev);
         dev->event_callback = _event_cb;
         if (dev->driver->init(dev) < 0) {
             continue;
@@ -110,15 +80,6 @@ int main(void)
         return 1;
     }
 
-    _recv_pid = thread_create(stack, sizeof(stack), THREAD_PRIORITY_MAIN - 1,
-                              THREAD_CREATE_STACKTEST, _recv_thread, NULL,
-                              "recv_thread");
-
-    if (_recv_pid <= KERNEL_PID_UNDEF) {
-        puts("Creation of receiver thread failed");
-        return 1;
-    }
-
     /* start the shell */
     puts("Initialization successful - starting the shell now");

Issues/PRs references

Requires #13565 and #13562

jia200x · 2020-03-06T13:19:47Z

I will introduce a netdev_init wrapper to automatically init this (if available), call dev->driver->init, etc.

kaspar030 · 2020-03-06T13:22:58Z

needs rebase

jia200x · 2020-03-06T13:26:13Z

needs rebase

Yup. I will rebase when #13565 and #13562 are merged.

kaspar030 · 2020-03-06T14:40:49Z

Yup. I will rebase when #13565 and #13562 are merged.

Yould you giv it a ninbetween rebase now that #13562 is merged?

bergzand · 2020-03-12T09:47:05Z

@jia200x This PR allows the netdev code to move the netdev::isr call to a separate thread. How are race conditions between the netif and the event thread prevented? From what I see, at least the at86rf2xx driver modifies the driver state during the netdev::isr call (in the at86rf2xx_set_state() function) and this is not guarded.

If you want to continue with this I think we should first make sure that any driver context modifications are guarded. Furthermore I think there should also be a modification to the netdev documentation to point out that the netdev::isr function can be called from a different thread and that the necessary measures should be in place.

bergzand · 2020-03-12T09:48:55Z

@jia200x Do you have any measurements on what the impact is on performance when netdev::isr is handled in a separate thread? Both on flash/ram and radio reception to the end of the netif recv latency?

jia200x · 2020-03-12T13:35:42Z

@jia200x This PR allows the netdev code to move the netdev::isr call to a separate thread. How are race conditions between the netif and the event thread prevented? From what I see, at least the at86rf2xx driver modifies the driver state during the netdev::isr call (in the at86rf2xx_set_state() function) and this is not guarded.

This problem pops up because most of our drivers are not thread safe :/
The only way to solve this would be to design them to be like that.

I agree, that in the current state we cannot just plug in a netif on top.

If you want to continue with this I think we should first make sure that any driver context modifications are guarded. Furthermore I think there should also be a modification to the netdev documentation to point out that the netdev::isr function can be called from a different thread and that the necessary measures should be in place.

Maybe we should also start making the drivers thread safe. I know some of them are (ethernet devices).

jia200x · 2020-03-12T13:39:32Z

@jia200x Do you have any measurements on what the impact is on performance when netdev::isr is handled in a separate thread? Both on flash/ram and radio reception to the end of the netif recv latency?

It's hard to do a fair comparison now (since the standard "recv" thread is heavier than the event_thread thread), but I wouldn't expect big differences (besides the fact you cannot miss an event_post and the fact they are more efficient compared to IPC messages).

Radio reception shouldn't be affected, since it's the same principle (a thread wakes up ASAP to process the radio ISR)

bergzand · 2020-03-12T14:46:01Z

This problem pops up because most of our drivers are not thread safe :/
The only way to solve this would be to design them to be like that.

Yeah, there was never a need to design them in a thread safe way. It would be good to have a rough idea of the impact here as making them thread safe is not free.

I agree, that in the current state we cannot just plug in a netif on top.

I would phrase this as that we can't plug this PR on top of netdev in the current state, but yeah. Do you want me to block this to prevent accidental merges?

It's hard to do a fair comparison now (since the standard "recv" thread is heavier than the event_thread thread), but I wouldn't expect big differences (besides the fact you cannot miss an event_post and the fact they are more efficient compared to IPC messages).

I still think it is good to have this confirmed. Current netif uses messages, but it could easily use events for the netdev::isr part and reuse the bits here (but keep it in a single thread). That might even be a good intermediate step which doesn't require thread safe drivers (except for the actual irq part, but that's already covered).

Radio reception shouldn't be affected, since it's the same principle (a thread wakes up ASAP to process the radio ISR)

Well, in this case I think it could be interesting to get some performance measurements in situations where two network interfaces (or any two drivers) are competing for attention from this event thread. So is there a worst case situation for the event handler here and how does it perform compared to the current implementation. This way we can determine whether it is worth the additional overhead, same for making the drivers thread safe.

miri64 · 2020-03-12T15:07:58Z

I still think it is good to have this confirmed. Current netif uses messages, but it could easily use events for the netdev::isr part and reuse the bits here (but keep it in a single thread). That might even be a good intermediate step which doesn't require thread safe drivers (except for the actual irq part, but that's already covered).

See #9326

bergzand · 2020-03-12T15:45:55Z

a thread wakes up ASAP to process the radio ISR

In case of the event thread architecture, this is not necessarily true and it is possible that the event thread does not pre-empt the 'normal' netif thread to handle the netdev::isr event.

In a situation with two network interfaces, the event thread could be blocked on a mutex lock because it was triggered while that mutex was already locked (for example during a relative long frame buffer load of an SPI-connected radio). This means that the event thread is blocked on the mutex waiting for a lower priority thread to exit the critical section. In this situation if a second event (from a different radio) would be submitted, it would not be handled 'ASAP' but after the first event is done. Now assuming both radio's have an identical priority this should . But if there is a situation where these two threads do not have equal priority, the netif thread could be scheduled first to handle a new netdev::send request from a higher layer.

The second netif thread here is only an example and this could be swapped for anything using the same event thread, such as a graphics stack (lvgl) requiring a callback after rendering is done or a high priority sensor read. This example applies to any shared usage of the event thread which includes the usage of blocking functions in the event handlers.

I can draw this example in a timing schedule as I can imagine that it is not entirely clear. Let me know if you want me to do that.

Edit: okay, maybe it is still ASAP, but it might not be the same ASAP you're thinking of here.

benpicco · 2020-03-12T15:51:13Z

Mind you that currently it's the responsibility of the radio driver to make calls to send() block while a transmission is ongoing, so this would also be an issue with multiple interfaces.

(Ideally the driver would return -EBUSY and the upper layer would would queue the packet and transmit it when it receives the TX complete event from the driver)

jia200x · 2020-03-12T16:54:10Z

In a situation with two network interfaces, the event thread could be blocked on a mutex lock because it was triggered while that mutex was already locked (for example during a relative long frame buffer load of an SPI-connected radio). This means that the event thread is blocked on the mutex waiting for a lower priority thread to exit the critical section. In this situation if a second event (from a different radio) would be submitted, it would not be handled 'ASAP' but after the first event is done. Now assuming both radio's have an identical priority this should . But if there is a situation where these two threads do not have equal priority, the netif thread could be scheduled first to handle a new netdev::send request from a higher layer.

But if you are loading a packet, it's because you already put your radio in "TX_ON" mode. That's why the PD-DATA command of the IEEE802.15.4 PHY specification (section 6.2.1.1), requires a PLME.SET-TRX-STATE to TX_ON before.
So, the ISR won't be triggered randomly.

As @benpicco describes, the MAC layer should queue the packet and retransmit if the radio was receiving.

Also, ideally the radio won't be in RX_ON or TRX_OFF if it didn't receive a TX_DONE event (the radio will wait until the state transition, so there aren't spurious ISRs).

Basically, we can avoid all kind of priority inversion if we are careful enough

jia200x · 2020-03-12T17:04:19Z

Anyway, we can evaluate the effect of thread safe drivers and several devices being processed by the same event_thread

jia200x · 2020-04-03T14:32:04Z

On seconds thoughts, the whole problem would be solved if the scheduler is not affected by priority inversion

kaspar030 · 2020-04-03T14:45:50Z

On seconds thoughts, the whole problem would be solved if the scheduler is not affected by priority inversion

You mean for example, when the radio SPI is also used by a low-priority driver thread? Otherwise, which mutex would the radio event handler be blocked on that would benefit from priority inversion?

jia200x · 2020-04-03T15:08:14Z

You mean for example, when the radio SPI is also used by a low-priority driver thread? Otherwise, which mutex would the radio event handler be blocked on that would benefit from priority inversion?

Let's say you have 3 threads A, B, C (A high prio, B mid, C low).

C uses the SPI configures the HW address of the radio. An interrupt from the radio arrives so A (e.g event_thread) tries to access the radio, which is blocked by C. So far so good.

The problem happens when B gets control and starts doing a long task (let's say there's a 5 seconds spin lock). C won't get back the control until B finishes. So, A will be locked until B finishes.

When B finishes, C will release the mutex and A will be released. But the event thread was locked all the time

jia200x · 2020-04-03T15:12:34Z

It' could be ok that A waits for some time, but what shouldn't be tolerable is the fact that A can be blocked by B

kaspar030 · 2020-04-08T13:42:36Z

Let's say you have 3 threads A, B, C (A high prio, B mid, C low).

C uses the SPI configures the HW address of the radio. An interrupt from the radio arrives so A (e.g event_thread) tries to access the radio, which is blocked by C. So far so good.

The problem happens when B gets control and starts doing a long task (let's say there's a 5 seconds spin lock). C won't get back the control until B finishes. So, A will be locked until B finishes.

When B finishes, C will release the mutex and A will be released. But the event thread was locked all the time

Hm, if B has a higher priority than C, it might have more important stuff to do than B C. RIOT's scheduler guarantees that in that case, B stays active. Temporarily giving C the priority of A messes with that guarantee.

C should send something to A, asking it to please "use the SPI to configure the HW address of the radio" when there is time.

Or, C should have higher priority than B.

jia200x · 2020-04-08T14:29:55Z

Hm, if B has a higher priority than C, it might have more important stuff to do than B C. RIOT's scheduler guarantees that in that case, B stays active. Temporarily giving C the priority of A messes with that guarantee.

This is not entirely true. B was defined on purpose as a thread with lower priority than C. There was a reason why the code in B was not in C.
Actually what is happening here is a violation of the priority model. A lower priority thread (B) takes the control over the highest priority thread.

C should send something to A, asking it to please "use the SPI to configure the HW address of the radio" when there is time.

This is the way we handle things now. Are we OK with this procedure?, considering that:

There's an overhead of passing messages to a higher priority thread (runtime cost and memory consumption)
It makes things more complicated (a user could simply cally a "send" function from the thread instead of requiring an interface to pass instructions to the SPI

Or, C should have higher priority than B.

E.g C is the IRQ handler, B is a beacon sender (with timers) and A is the main thread. You cannot give more priority to B in this case

jia200x · 2020-04-08T14:32:16Z

Note this is regardless of the outcome of #9379, but to me this IS indeed a use case for #7445

kaspar030 · 2020-04-08T14:35:58Z

E.g C is the IRQ handler, B is a beacon sender (with timers) and A is the main thread. You cannot give more priority to B in this case

I think you mixed the priorities, in your previous example, "A" had the highest. I think you mean this:

E.g A is the IRQ handler, B is a beacon sender (with timers) and C is the main thread. You cannot give more priority to B in this case

The issue was A waiting on a resource shared with C, and B jumping in, causing C and thus A to wait. If that is an issue, consider incresing the priority of C to something higher than B.

bergzand · 2020-04-08T14:49:43Z

The issue was A waiting on a resource shared with C, and B jumping in, causing C and thus A to wait. If that is an issue, consider increasing the priority of C to something higher than B.

This only makes sense when C is locking the shared resources is locked most of the time while active. If C is active for a duration where the shared resource is only a small portion of this time, reversing the priority of C and B would make B block for this full duration instead of only the duration the shared resource is required.

jia200x · 2020-04-08T15:26:42Z

Ah yes! I swapped A and C. C should have the highest

The issue was A waiting on a resource shared with C, and B jumping in, causing C and thus A to wait. If that is an issue, consider incresing the priority of C to something higher than B.

Consider this:

A is a thread that gets the value of a sensor, performs heavy calculation and send a packet. Let's say the whole process takes 700 ms, but sending is in the order of 1 ms.
B is a thread that checks every 100 ms that a sensor value doesn't exceed a threshold and activate an alarm if happened. The process take 30 ms.
C is a time sensitive MAC layer (e.g TSCH) that must receive packets with timestamps (with reception windows of ~10 ms)

Clearly C needs the highest priority here.
If we give A a higher priority than B, B won't be able to detect a threshold violation in 700 ms, so B requires a higher priority than A.

Let's say A blocks the radio for 1 ms, and just before state changes a packet arrives. C will try to access the radio, but it's locked by the mutex in A. At the same time, the timer of B triggers and the thread starts measuring. C cannot be released until B finished (30 ms).
So, without a priority inheritance mechanism, we potentially lost 2-3 packets.

With priority inversion, A will take back the control and finish sending (1 ms). Then, C is scheduled again (packet is not lost) and finally B gets control.

EDIT: Just noticed @bergzand comment. It's basically what he describes

benpicco · 2020-10-02T11:57:29Z

This needs a rebase.

stale · 2021-06-03T19:59:57Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

jia200x · 2021-06-09T14:29:56Z

I think we can close this PR. There are currently more mechanisms that replace this (gnrc_netif_events and the newer approaches with event threads).

jia200x added Area: network Area: Networking Type: new feature The issue requests / The PR implemements a new feature for RIOT State: waiting for other PR State: The PR requires another PR to be merged first Area: drivers Area: Device drivers labels Mar 6, 2020

jia200x requested a review from benpicco March 6, 2020 13:00

jia200x requested review from aabadie, bergzand, daniel-k, dylad, fjmolinas, gschorcht, haukepetersen, maribu, miri64 and smlng as code owners March 6, 2020 13:00

jia200x requested review from kaspar030 and PeterKietzmann March 6, 2020 13:17

jia200x added 3 commits March 9, 2020 11:07

netdev: make netdev_% variants a submodule

4bb6536

netdev_tap: make it a netdev submodule

da1c9f1

netdev_event_thread: add support for event_thread irq handler

f448c07

jia200x force-pushed the pr/netdev_event_thread branch from 5af0690 to f448c07 Compare March 9, 2020 10:24

fixup! netdev_event_thread: add support for event_thread irq handler

dcf8b16

jia200x mentioned this pull request Mar 9, 2020

gnrc_netif: add netif set and get function #13582

Closed

fixup! netdev_event_thread: add support for event_thread irq handler

24a0338

jia200x mentioned this pull request Apr 3, 2020

scheduler: priority inversion problem #7365

Closed

stale bot added the State: stale State: The issue / PR has no activity for >185 days label Jun 3, 2021

jia200x closed this Jun 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

netdev_event_thread: add support for event_thread irq handler #13573

netdev_event_thread: add support for event_thread irq handler #13573

jia200x commented Mar 6, 2020

jia200x commented Mar 6, 2020

kaspar030 commented Mar 6, 2020

jia200x commented Mar 6, 2020

kaspar030 commented Mar 6, 2020

bergzand commented Mar 12, 2020

bergzand commented Mar 12, 2020

jia200x commented Mar 12, 2020

jia200x commented Mar 12, 2020

bergzand commented Mar 12, 2020

miri64 commented Mar 12, 2020

bergzand commented Mar 12, 2020 •

edited

Loading

benpicco commented Mar 12, 2020

jia200x commented Mar 12, 2020 •

edited

Loading

jia200x commented Mar 12, 2020

jia200x commented Apr 3, 2020

kaspar030 commented Apr 3, 2020

jia200x commented Apr 3, 2020

jia200x commented Apr 3, 2020

kaspar030 commented Apr 8, 2020 •

edited

Loading

jia200x commented Apr 8, 2020

jia200x commented Apr 8, 2020

kaspar030 commented Apr 8, 2020

bergzand commented Apr 8, 2020

jia200x commented Apr 8, 2020 •

edited

Loading

benpicco commented Oct 2, 2020

stale bot commented Jun 3, 2021

jia200x commented Jun 9, 2021

netdev_event_thread: add support for event_thread irq handler #13573

netdev_event_thread: add support for event_thread irq handler #13573

Conversation

jia200x commented Mar 6, 2020

Contribution description

Testing procedure

Issues/PRs references

jia200x commented Mar 6, 2020

kaspar030 commented Mar 6, 2020

jia200x commented Mar 6, 2020

kaspar030 commented Mar 6, 2020

bergzand commented Mar 12, 2020

bergzand commented Mar 12, 2020

jia200x commented Mar 12, 2020

jia200x commented Mar 12, 2020

bergzand commented Mar 12, 2020

miri64 commented Mar 12, 2020

bergzand commented Mar 12, 2020 • edited Loading

benpicco commented Mar 12, 2020

jia200x commented Mar 12, 2020 • edited Loading

jia200x commented Mar 12, 2020

jia200x commented Apr 3, 2020

kaspar030 commented Apr 3, 2020

jia200x commented Apr 3, 2020

jia200x commented Apr 3, 2020

kaspar030 commented Apr 8, 2020 • edited Loading

jia200x commented Apr 8, 2020

jia200x commented Apr 8, 2020

kaspar030 commented Apr 8, 2020

bergzand commented Apr 8, 2020

jia200x commented Apr 8, 2020 • edited Loading

benpicco commented Oct 2, 2020

stale bot commented Jun 3, 2021

jia200x commented Jun 9, 2021

bergzand commented Mar 12, 2020 •

edited

Loading

jia200x commented Mar 12, 2020 •

edited

Loading

kaspar030 commented Apr 8, 2020 •

edited

Loading

jia200x commented Apr 8, 2020 •

edited

Loading