Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: tcp2: Lock connection when running from work queue #28595

Merged

Conversation

jukkar
Copy link
Member

@jukkar jukkar commented Sep 22, 2020

We run various TCP function from work queue. Make sure the
connection lock is taken before accessing the connection.

Signed-off-by: Jukka Rissanen [email protected]

Possible fix to #28587

@jukkar jukkar added this to the v2.4.0 milestone Sep 22, 2020
@jukkar
Copy link
Member Author

jukkar commented Sep 22, 2020

Saw some crashes with heavily loaded system, added new commits fixing those.

@jukkar jukkar requested a review from MaureenHelm September 22, 2020 15:23
@jukkar jukkar force-pushed the bug-28587-tcp2-data-corruption branch from bf0c9c2 to b6d0d9b Compare September 23, 2020 07:35
@jukkar
Copy link
Member Author

jukkar commented Sep 23, 2020

Removed the crash checks as they cannot happen any more after commit c54a511 is merged.
I was too early in my comment, there was a crash when I removed the two commits, restoring them back.

@jukkar jukkar force-pushed the bug-28587-tcp2-data-corruption branch from b6d0d9b to 9ceec3f Compare September 23, 2020 07:45
subsys/net/ip/tcp2.c Outdated Show resolved Hide resolved
subsys/net/ip/tcp2.c Show resolved Hide resolved
subsys/net/l2/ethernet/ethernet.c Outdated Show resolved Hide resolved
We run various TCP function from work queue. Make sure the
connection lock is taken before accessing the connection.

Signed-off-by: Jukka Rissanen <[email protected]>
Saw this crash with heavily loaded system in mimxrt1050_evk:

<err> os: ***** MPU FAULT *****
<err> os:   Data Access Violation
<err> os:   MMFAR Address: 0xc
<err> os: r0/a1:  0x80000ab0  r1/a2:  0x800f6a60  r2/a3:  0x00000000
<err> os: r3/a4:  0x800f72a0 r12/ip:  0x00000000 r14/lr:  0x6000eb43
<err> os:  xpsr:  0x41000000
<err> os: Faulting instruction address (r15/pc): 0x6000dc82
<err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
<err> os: Current thread: 0x80001a18 (rx_workq)
<err> os: Halting system

Where the fault at 0x6000dc82 points to ethernet_recv()

	uint16_t type = ntohs(hdr->type);
6000dc82:	89ab      	ldrh	r3, [r5, #12]

Signed-off-by: Jukka Rissanen <[email protected]>
Saw this crash with heavily loaded system in nucleo_f767zi:

<err> os: ***** MPU FAULT *****
<err> os:   Data Access Violation
<err> os:   MMFAR Address: 0x0
<err> os: r0/a1:  0x800f6d30  r1/a2:  0x80005d84  r2/a3:  0x00000006
<err> os: r3/a4:  0x00000000 r12/ip:  0x00000001 r14/lr:  0x60013f69
<err> os:  xpsr:  0x61000000
<err> os: Faulting instruction address (r15/pc): 0x60014304
<err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
<err> os: Current thread: 0x80001a18 (rx_workq)
<err> os: Halting system

Where the fault at 0x60014304 points to net_conn_input()

   } else if (IS_ENABLED(CONFIG_NET_TCP) && proto == IPPROTO_TCP) {
	src_port = proto_hdr->tcp->src_port;
60014300:	f8d9 3000 	ldr.w	r3, [r9]
60014304:	881a      	ldrh	r2, [r3, #0]

Signed-off-by: Jukka Rissanen <[email protected]>
Check that Ethernet header is in the first net_buf fragment.
This is very unlikely to happen as device driver is expected
to only deliver proper Ethernet frames to upper stack.

Signed-off-by: Jukka Rissanen <[email protected]>
@jukkar jukkar force-pushed the bug-28587-tcp2-data-corruption branch from 2d349c0 to 96c0aad Compare September 24, 2020 08:47
@MaureenHelm MaureenHelm modified the milestones: v2.4.0, v2.5.0 Sep 27, 2020
@jukkar jukkar merged commit 3b64d57 into zephyrproject-rtos:master Sep 28, 2020
@jukkar jukkar deleted the bug-28587-tcp2-data-corruption branch September 28, 2020 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants