Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu/cc2538: wait for ongoing transmission before flushing TX FIFO #5725

Merged
merged 1 commit into from
Aug 30, 2016

Conversation

aeneby
Copy link
Member

@aeneby aeneby commented Aug 5, 2016

For larger packet sizes and/or fast consecutive sends, the mutex lock is not enough to guarantee exclusive TX FIFO access. Since the TX operation is non-blocking, the lock may be released and re-taken again before transmission has completed. Flushing the TX FIFO while a transmission is in progress will result in TX underflow errors and lost packets, so make sure there's no active transmission before we claim it.

Perform the wait here rather than after the send, in case a non-netdev TX operation is in progress (e.g. AUTOACK).

@PeterKietzmann PeterKietzmann added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: drivers Area: Device drivers labels Aug 5, 2016
@PeterKietzmann PeterKietzmann self-assigned this Aug 5, 2016
@PeterKietzmann
Copy link
Member

Can't test before next week

@LucaZulberti
Copy link

Hi, i've tested with this:

ping6 -c 3 -s 120 fe80::0:212:4b00:433:eef8%lowpan0

Now the cc2538 sends the first TX frame, but the second one is not sent.

tcpdump output:

tcpdump: listening on wpan0, link-type IEEE802_15_4_NOFCS (IEEE 802.15.4 without FCS), capture size 262144 bytes
09:07:02.232921 IEEE 802.15.4 Data packet seq b3 0023:00:12:4b:00:04:33:ee:f8 < 0023:07:06:05:04:03:02:01:00 
    0x0000:  c0a8 0028 7a33 3a80 00de 6a04 4100 01c6  ...(z3:...j.A...
    0x0010:  59a8 5709 8d03 0008 090a 0b0c 0d0e 0f10  Y.W.............
    0x0020:  1112 1314 1516 1718 191a 1b1c 1d1e 1f20  ................
    0x0030:  2122 2324 2526 2728 292a 2b2c 2d2e 2f30  !"#$%&'()*+,-./0
    0x0040:  3132 3334 3536 3738 393a 3b3c 3d3e 3f40  123456789:;<=>?@
    0x0050:  4142 4344 4546 4748 494a 4b4c 4d4e 4f50  ABCDEFGHIJKLMNOP
    0x0060:  5152 5354 5556 57                        QRSTUVW
09:07:02.239903 IEEE 802.15.4 Data packet seq b4 0023:00:12:4b:00:04:33:ee:f8 < 0023:07:06:05:04:03:02:01:00 
    0x0000:  e0a8 0028 1158 595a 5b5c 5d5e 5f60 6162  ...(.XYZ[\]^_`ab
    0x0010:  6364 6566 6768 696a 6b6c 6d6e 6f70 7172  cdefghijklmnopqr
    0x0020:  7374 7576 77                             stuvw
09:07:02.252503 IEEE 802.15.4 Data packet seq 0e 0023:07:06:05:04:03:02:01:00 < 0023:00:12:4b:00:04:33:ee:f8 
    0x0000:  c0a8 0004 7a33 3a81 00dd 6a04 4100 01c6  ....z3:...j.A...
    0x0010:  59a8 5709 8d03 0008 090a 0b0c 0d0e 0f10  Y.W.............
    0x0020:  1112 1314 1516 1718 191a 1b1c 1d1e 1f20  ................
    0x0030:  2122 2324 2526 2728 292a 2b2c 2d2e 2f30  !"#$%&'()*+,-./0
    0x0040:  3132 3334 3536 3738 393a 3b3c 3d3e 3f40  123456789:;<=>?@
    0x0050:  4142 4344 4546 4748 494a 4b4c 4d4e 4f    ABCDEFGHIJKLMNO
09:07:03.249038 IEEE 802.15.4 Data packet seq b6 0023:00:12:4b:00:04:33:ee:f8 < 0023:07:06:05:04:03:02:01:00 
    0x0000:  c0a8 0029 7a33 3a80 00c6 2a04 4100 02c7  ...)z3:...*.A...
    0x0010:  59a8 5720 cc03 0008 090a 0b0c 0d0e 0f10  Y.W.............
    0x0020:  1112 1314 1516 1718 191a 1b1c 1d1e 1f20  ................
    0x0030:  2122 2324 2526 2728 292a 2b2c 2d2e 2f30  !"#$%&'()*+,-./0
    0x0040:  3132 3334 3536 3738 393a 3b3c 3d3e 3f40  123456789:;<=>?@
    0x0050:  4142 4344 4546 4748 494a 4b4c 4d4e 4f50  ABCDEFGHIJKLMNOP
    0x0060:  5152 5354 5556 57                        QRSTUVW
09:07:03.256270 IEEE 802.15.4 Data packet seq b7 0023:00:12:4b:00:04:33:ee:f8 < 0023:07:06:05:04:03:02:01:00 
    0x0000:  e0a8 0029 1158 595a 5b5c 5d5e 5f60 6162  ...).XYZ[\]^_`ab
    0x0010:  6364 6566 6768 696a 6b6c 6d6e 6f70 7172  cdefghijklmnopqr
    0x0020:  7374 7576 77                             stuvw
09:07:03.267581 IEEE 802.15.4 Data packet seq 10 0023:07:06:05:04:03:02:01:00 < 0023:00:12:4b:00:04:33:ee:f8 
    0x0000:  c0a8 0005 7a33 3a81 00c5 2a04 4100 02c7  ....z3:...*.A...
    0x0010:  59a8 5720 cc03 0008 090a 0b0c 0d0e 0f10  Y.W.............
    0x0020:  1112 1314 1516 1718 191a 1b1c 1d1e 1f20  ................
    0x0030:  2122 2324 2526 2728 292a 2b2c 2d2e 2f30  !"#$%&'()*+,-./0
    0x0040:  3132 3334 3536 3738 393a 3b3c 3d3e 3f40  123456789:;<=>?@
    0x0050:  4142 4344 4546 4748 494a 4b4c 4d4e 4f    ABCDEFGHIJKLMNO
09:07:04.249035 IEEE 802.15.4 Data packet seq b9 0023:00:12:4b:00:04:33:ee:f8 < 0023:07:06:05:04:03:02:01:00 
    0x0000:  c0a8 002a 7a33 3a80 00c8 2904 4100 03c8  ...*z3:...).A...
    0x0010:  59a8 571d cc03 0008 090a 0b0c 0d0e 0f10  Y.W.............
    0x0020:  1112 1314 1516 1718 191a 1b1c 1d1e 1f20  ................
    0x0030:  2122 2324 2526 2728 292a 2b2c 2d2e 2f30  !"#$%&'()*+,-./0
    0x0040:  3132 3334 3536 3738 393a 3b3c 3d3e 3f40  123456789:;<=>?@
    0x0050:  4142 4344 4546 4748 494a 4b4c 4d4e 4f50  ABCDEFGHIJKLMNOP
    0x0060:  5152 5354 5556 57                        QRSTUVW
09:07:04.257248 IEEE 802.15.4 Data packet seq ba 0023:00:12:4b:00:04:33:ee:f8 < 0023:07:06:05:04:03:02:01:00 
    0x0000:  e0a8 002a 1158 595a 5b5c 5d5e 5f60 6162  ...*.XYZ[\]^_`ab
    0x0010:  6364 6566 6768 696a 6b6c 6d6e 6f70 7172  cdefghijklmnopqr
    0x0020:  7374 7576 77                             stuvw
09:07:04.268271 IEEE 802.15.4 Data packet seq 12 0023:07:06:05:04:03:02:01:00 < 0023:00:12:4b:00:04:33:ee:f8 
    0x0000:  c0a8 0006 7a33 3a81 00c7 2904 4100 03c8  ....z3:...).A...
    0x0010:  59a8 571d cc03 0008 090a 0b0c 0d0e 0f10  Y.W.............
    0x0020:  1112 1314 1516 1718 191a 1b1c 1d1e 1f20  ................
    0x0030:  2122 2324 2526 2728 292a 2b2c 2d2e 2f30  !"#$%&'()*+,-./0
    0x0040:  3132 3334 3536 3738 393a 3b3c 3d3e 3f40  123456789:;<=>?@
    0x0050:  4142 4344 4546 4748 494a 4b4c 4d4e 4f    ABCDEFGHIJKLMNO

@aeneby
Copy link
Member Author

aeneby commented Aug 9, 2016

Hi @LucaZulberti, two things:

  1. You need to invest in an external sniffer :) your packet trace only proves that the packet isn't arriving at the wpan0 interface, not that it isn't being sent.

  2. My level of confidence in the upstream at86rf230 Linux driver (which I guess you are using to ping from) is not very high. I've seen some strange behaviour with this driver on an RPi 3 + OpenLabs radio, like random bytes being appended to packets and therefore failing their FCF checks. Granted, this was compiled from git master, but it was the only way to get it working on the Pi.

In my tests, the ICMP response packets are in fact being sent/received correctly from/by the CC2538. I can see all packets using an external sniffer, as well as by using debug output on the CC2538. I can also ping between two CC2538 nodes successfully, using the same size payload (120). So I can only conclude that the at86rf230 driver is not processing the second packet of the ICMP response correctly, and therefore not registering it as a reply to the ping.

Actually, it turns out this is a real Heisenbug - try setting ENABLE_DEBUG to 1 in cc2538/radio/cc2538_rf_netdev.c, and perform your test again. You should find it succeeds this time. My only theory for this is that the debug output introduces a very slight delay between packets, and this enables the at86rf230 driver to process them quickly enough. So perhaps this driver is not able to handle multiple packets in the RX buffer either?

Hope that makes sense.

@LucaZulberti
Copy link

Hi @aeneby:

  1. You're right! :-)
  2. I'm sorry I didn't specified that now I'm using the mrf24j40 Microchip transceiver on the Linux host.
    2a) I've tried the ping6 also from the samr21-xpro board to the cc2538 and the result is the same.

Setting ENABLE_DEBUG to 1 resolves the problem.
I think the bug is in the cc2538, because between the mrf24j40 (Linux host) and the at86rf233 (samr21-xpro board) all is working without errors.

Unfortunately I will not be able to test the patch with another cc2538 board before Semptember.

@aeneby
Copy link
Member Author

aeneby commented Aug 9, 2016

Thanks @LucaZulberti - I will have to look into this more closely given your comments. I'm confused as to why it would work between two CC2538 nodes, but not other transceivers. If only I had more hardware to test with...

@PeterKietzmann
Copy link
Member

I just gave it a quick try. When pinging between a zolertia remote-reva and a samr21-xpro board, timeouts still appeared for greater payloads. Somehow I cannot even ping between two remotes any more :/ ... Don't have time to debug this. Just give me a hint when you need me to test something specific

@aeneby
Copy link
Member Author

aeneby commented Aug 25, 2016

Rebased to master.

I'm confused. I can ping happily between cc2538 nodes, with both small and large payloads (up to about 1000 bytes - see below). I can see the OTA traffic with an external sniffer. It looks fine. I really have no clue as to why it shouldn't work for you.

I did notice one strange thing, though - RIOT cannot respond to ipv6 pings where the reply requires greater than 9 packets (say, a payload of 1024 bytes). I say "RIOT" here and not "driver" because the problem seems to be further up the network stack - if I enable debug in sys/net/gnrc/netapi/gnrc_netapi.c, and try pinging that node with a payload of 1024, I get messages in the console like:

gnrc_netapi: dropped message to 7 (receiver queue is full)

One for each packet missing from the reply, which seems to be every packet after the 9th for some reason. This bug is probably unrelated to this PR but I wanted to mention it, in case you are trying to test with payloads that big.

@PeterKietzmann
Copy link
Member

Which board do you use?

@aeneby
Copy link
Member Author

aeneby commented Aug 30, 2016

openmote-cc2538

@PeterKietzmann PeterKietzmann added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Aug 30, 2016
@PeterKietzmann
Copy link
Member

With two remote revision A boards and gnrc_networking I still get big losses for "larger" packets. However, this PR does not make things worse and according to the user guide the delay makes sense to me. Will open a separate issue for the remotes. ACK for this PR.

@PeterKietzmann
Copy link
Member

Issue is in #5786

@PeterKietzmann
Copy link
Member

ACK and go

@PeterKietzmann PeterKietzmann merged commit 28638be into RIOT-OS:master Aug 30, 2016
@aeneby
Copy link
Member Author

aeneby commented Aug 30, 2016

Thanks @PeterKietzmann - I'll see if I can replicate the packet loss and report on the newly created issue.

@aeneby aeneby deleted the tx_underrun_fix branch August 30, 2016 20:10
@aeneby
Copy link
Member Author

aeneby commented Aug 30, 2016

@PeterKietzmann BTW, did you not encounter the issue with gnrc_netapi dropping messages?

@OlegHahm OlegHahm added this to the Release 2016.10 milestone Aug 31, 2016
@PeterKietzmann
Copy link
Member

@aeneby honestly I didn't test it and I don't find the time currently. The default message queue size is 8 which seems to be exhausted in your scenario. Obviously the queue is faster than messages are precessed (probably slowed 6LoWPAN fragmentation?). Would you open an issue for that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: drivers Area: Device drivers CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants