-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
netdev_event_thread: add support for event_thread irq handler #13573
Conversation
I will introduce a |
needs rebase |
5af0690
to
f448c07
Compare
@jia200x This PR allows the netdev code to move the If you want to continue with this I think we should first make sure that any driver context modifications are guarded. Furthermore I think there should also be a modification to the netdev documentation to point out that the |
@jia200x Do you have any measurements on what the impact is on performance when |
This problem pops up because most of our drivers are not thread safe :/ I agree, that in the current state we cannot just plug in a netif on top.
Maybe we should also start making the drivers thread safe. I know some of them are (ethernet devices). |
It's hard to do a fair comparison now (since the standard "recv" thread is heavier than the event_thread thread), but I wouldn't expect big differences (besides the fact you cannot miss an Radio reception shouldn't be affected, since it's the same principle (a thread wakes up ASAP to process the radio ISR) |
Yeah, there was never a need to design them in a thread safe way. It would be good to have a rough idea of the impact here as making them thread safe is not free.
I would phrase this as that we can't plug this PR on top of netdev in the current state, but yeah. Do you want me to block this to prevent accidental merges?
I still think it is good to have this confirmed. Current netif uses messages, but it could easily use events for the
Well, in this case I think it could be interesting to get some performance measurements in situations where two network interfaces (or any two drivers) are competing for attention from this event thread. So is there a worst case situation for the event handler here and how does it perform compared to the current implementation. This way we can determine whether it is worth the additional overhead, same for making the drivers thread safe. |
See #9326 |
In case of the event thread architecture, this is not necessarily true and it is possible that the event thread does not pre-empt the 'normal' netif thread to handle the In a situation with two network interfaces, the event thread could be blocked on a mutex lock because it was triggered while that mutex was already locked (for example during a relative long frame buffer load of an SPI-connected radio). This means that the event thread is blocked on the mutex waiting for a lower priority thread to exit the critical section. In this situation if a second event (from a different radio) would be submitted, it would not be handled 'ASAP' but after the first event is done. Now assuming both radio's have an identical priority this should . But if there is a situation where these two threads do not have equal priority, the netif thread could be scheduled first to handle a new The second netif thread here is only an example and this could be swapped for anything using the same event thread, such as a graphics stack (lvgl) requiring a callback after rendering is done or a high priority sensor read. This example applies to any shared usage of the event thread which includes the usage of blocking functions in the event handlers. I can draw this example in a timing schedule as I can imagine that it is not entirely clear. Let me know if you want me to do that. Edit: okay, maybe it is still ASAP, but it might not be the same ASAP you're thinking of here. |
Mind you that currently it's the responsibility of the radio driver to make calls to (Ideally the driver would return |
But if you are loading a packet, it's because you already put your radio in "TX_ON" mode. That's why the PD-DATA command of the IEEE802.15.4 PHY specification (section 6.2.1.1), requires a PLME.SET-TRX-STATE to TX_ON before. As @benpicco describes, the MAC layer should queue the packet and retransmit if the radio was receiving. Also, ideally the radio won't be in RX_ON or TRX_OFF if it didn't receive a TX_DONE event (the radio will wait until the state transition, so there aren't spurious ISRs). Basically, we can avoid all kind of priority inversion if we are careful enough |
Anyway, we can evaluate the effect of thread safe drivers and several devices being processed by the same event_thread |
On seconds thoughts, the whole problem would be solved if the scheduler is not affected by priority inversion |
You mean for example, when the radio SPI is also used by a low-priority driver thread? Otherwise, which mutex would the radio event handler be blocked on that would benefit from priority inversion? |
Let's say you have 3 threads A, B, C (A high prio, B mid, C low). C uses the SPI configures the HW address of the radio. An interrupt from the radio arrives so A (e.g event_thread) tries to access the radio, which is blocked by C. So far so good. The problem happens when B gets control and starts doing a long task (let's say there's a 5 seconds spin lock). C won't get back the control until B finishes. So, A will be locked until B finishes. When B finishes, C will release the mutex and A will be released. But the event thread was locked all the time |
It' could be ok that A waits for some time, but what shouldn't be tolerable is the fact that A can be blocked by B |
Hm, if B has a higher priority than C, it might have more important stuff to do than C should send something to A, asking it to please "use the SPI to configure the HW address of the radio" when there is time. Or, C should have higher priority than B. |
This is not entirely true. B was defined on purpose as a thread with lower priority than C. There was a reason why the code in B was not in C.
This is the way we handle things now. Are we OK with this procedure?, considering that:
E.g C is the IRQ handler, B is a beacon sender (with timers) and A is the main thread. You cannot give more priority to B in this case |
I think you mixed the priorities, in your previous example, "A" had the highest. I think you mean this:
The issue was A waiting on a resource shared with C, and B jumping in, causing C and thus A to wait. If that is an issue, consider incresing the priority of C to something higher than B. |
This only makes sense when C is locking the shared resources is locked most of the time while active. If C is active for a duration where the shared resource is only a small portion of this time, reversing the priority of C and B would make B block for this full duration instead of only the duration the shared resource is required. |
Ah yes! I swapped A and C. C should have the highest
Consider this:
Clearly C needs the highest priority here. Let's say A blocks the radio for 1 ms, and just before state changes a packet arrives. C will try to access the radio, but it's locked by the mutex in A. At the same time, the timer of B triggers and the thread starts measuring. C cannot be released until B finished (30 ms). With priority inversion, A will take back the control and finish sending (1 ms). Then, C is scheduled again (packet is not lost) and finally B gets control. EDIT: Just noticed @bergzand comment. It's basically what he describes |
This needs a rebase. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions. |
I think we can close this PR. There are currently more mechanisms that replace this (gnrc_netif_events and the newer approaches with event threads). |
Contribution description
As promised in #13562, this PR adds an optional
event_thread
based IRQ handler.Now it's possible to write radio applications without an explicit
recv
thread for callingdev->driver->isr
. Also, network stacks are not required to handle their radio events if this module is enabled, so the glue code only requires an implementation ofevent_callback
. With some minor tricks we could easily expand the device support for LWIP and OpenThread.As an example, the `tests/driver_at86rf2xx` looks like this:
Testing procedure
I used this patch to run
tests/driver_at86rf2xx
with 2samr21-xpro
boards:Patch
Issues/PRs references
Requires #13565 and #13562