Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ethernet hardware PTP support not working #4

Open
rogergreen2003 opened this issue Jul 5, 2023 · 2 comments
Open

Ethernet hardware PTP support not working #4

rogergreen2003 opened this issue Jul 5, 2023 · 2 comments

Comments

@rogergreen2003
Copy link

I have been making several tests to validate the use of the MA35D1 to transport very low latency audio (<= 1ms) over ethernet, which needs a high reliable clock synchronization, hardware assisted method, using IEEE1588 protocol (PTP).

Using the latest 5.10 linux kernel from OpenNuvoton, and the 10/100 ethernet port of the NuMaker-IoT-MA35D1-A1 board (eth1),
the output of ethtool is:

/ # ethtool -T eth1
Time stamping parameters for eth1:
Capabilities:
        hardware-transmit     (SOF_TIMESTAMPING_TX_HARDWARE)
        software-transmit     (SOF_TIMESTAMPING_TX_SOFTWARE)
        hardware-receive      (SOF_TIMESTAMPING_RX_HARDWARE)
        software-receive      (SOF_TIMESTAMPING_RX_SOFTWARE)
        software-system-clock (SOF_TIMESTAMPING_SOFTWARE)
        hardware-raw-clock    (SOF_TIMESTAMPING_RAW_HARDWARE)
PTP Hardware Clock: 0
Hardware Transmit Timestamp Modes:
        off                   (HWTSTAMP_TX_OFF)
        on                    (HWTSTAMP_TX_ON)
Hardware Receive Filter Modes:
        none                  (HWTSTAMP_FILTER_NONE)
        all                   (HWTSTAMP_FILTER_ALL)
        ptpv1-l4-event        (HWTSTAMP_FILTER_PTP_V1_L4_EVENT)
        ptpv1-l4-sync         (HWTSTAMP_FILTER_PTP_V1_L4_SYNC)
        ptpv1-l4-delay-req    (HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ)
        ptpv2-l4-event        (HWTSTAMP_FILTER_PTP_V2_L4_EVENT)
        ptpv2-l4-sync         (HWTSTAMP_FILTER_PTP_V2_L4_SYNC)
        ptpv2-l4-delay-req    (HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ)
        ptpv2-event           (HWTSTAMP_FILTER_PTP_V2_EVENT)
        ptpv2-sync            (HWTSTAMP_FILTER_PTP_V2_SYNC)
        ptpv2-delay-req       (HWTSTAMP_FILTER_PTP_V2_DELAY_REQ)

Which is correct: it shows that RX and TX HW TIMESTAMP is available on this interface. But running ptp4l, it doesn't synchronize to other masters if launched as slave, and other slaves don't synchronize to it if launched as master.

After some more verbose debugging, It seems that there is a problem with the reception of PTP events in the driver, but I don't know how to fix it because I haven't the complete technical information about this IP module.

The driver is based on the stmmac implementation of STMicroelectronics, which is used in a lot of other devices, and we are sure the PTP works fine on some of them, as in the STM32MP1.

Could someone try to fix this issue, please?

@rogergreen2003
Copy link
Author

Finally, I have found where is the problem when using PTP on this platform: rx multicast filtering is not working.

By default, multicast filtering is used to avoid processing all multicast traffic, so only the joined multicast addresses should be filtered in hardware. This filter uses a hash capability of the stmmac driver.

The problem comes when looking the HW Feature Register: bit 4 (HASSEL) is 0, which means that this implementation lacks the HW multicast filtering. It can be seen easily running cat /sys/kernel/debug/stmmaceth/eth0/dma_cap | grep -i "Hash Filter:"

    Hash Filter: N

PTP implementation used in typical low latency audio protocols (DANTE, RAVENNA) uses multicast addressing, so it fails on this platform.

The solution, then, is easy: to disable the HW filter of stmmac with ifconfig eth0 allmulti

@ychuang3
Copy link
Contributor

We have also been debugging the PTP issue for a while. Thank you for the information you provided, which has also resolved the problem we've been experiencing with PTP not functioning properly. After setting 'ifconfig allmulti,' our PTP tests are now functioning correctly. It works.

Best Regards,
Jacky Huang

jserv pushed a commit to jserv/linux-ma35d1 that referenced this issue Sep 23, 2024
The two commits below add up to a cpuset might_sleep() splat for RT:

8447a0f cpuset: convert callback_mutex to a spinlock
344736f cpuset: simplify cpuset_node_allowed API

BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:995
in_atomic(): 0, irqs_disabled(): 1, pid: 11718, name: cset
CPU: 135 PID: 11718 Comm: cset Tainted: G            E   4.10.0-rt1-rt OpenNuvoton#4
Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0056.R01.1409242327 09/24/2014
Call Trace:
 ? dump_stack+0x5c/0x81
 ? ___might_sleep+0xf4/0x170
 ? rt_spin_lock+0x1c/0x50
 ? __cpuset_node_allowed+0x66/0xc0
 ? ___slab_alloc+0x390/0x570 <disables IRQs>
 ? anon_vma_fork+0x8f/0x140
 ? copy_page_range+0x6cf/0xb00
 ? anon_vma_fork+0x8f/0x140
 ? __slab_alloc.isra.74+0x5a/0x81
 ? anon_vma_fork+0x8f/0x140
 ? kmem_cache_alloc+0x1b5/0x1f0
 ? anon_vma_fork+0x8f/0x140
 ? copy_process.part.35+0x1670/0x1ee0
 ? _do_fork+0xdd/0x3f0
 ? _do_fork+0xdd/0x3f0
 ? do_syscall_64+0x61/0x170
 ? entry_SYSCALL64_slow_path+0x25/0x25

The later ensured that a NUMA box WILL take callback_lock in atomic
context by removing the allocator and reclaim path __GFP_HARDWALL
usage which prevented such contexts from taking callback_mutex.

One option would be to reinstate __GFP_HARDWALL protections for
RT, however, as the 8447a0f changelog states:

The callback_mutex is only used to synchronize reads/updates of cpusets'
flags and cpu/node masks. These operations should always proceed fast so
there's no reason why we can't use a spinlock instead of the mutex.

Cc: [email protected]
Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Luis Claudio R. Goncalves <[email protected]>
jserv pushed a commit to jserv/linux-ma35d1 that referenced this issue Sep 23, 2024
The two commits below add up to a cpuset might_sleep() splat for RT:

8447a0f cpuset: convert callback_mutex to a spinlock
344736f cpuset: simplify cpuset_node_allowed API

BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:995
in_atomic(): 0, irqs_disabled(): 1, pid: 11718, name: cset
CPU: 135 PID: 11718 Comm: cset Tainted: G            E   4.10.0-rt1-rt OpenNuvoton#4
Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0056.R01.1409242327 09/24/2014
Call Trace:
 ? dump_stack+0x5c/0x81
 ? ___might_sleep+0xf4/0x170
 ? rt_spin_lock+0x1c/0x50
 ? __cpuset_node_allowed+0x66/0xc0
 ? ___slab_alloc+0x390/0x570 <disables IRQs>
 ? anon_vma_fork+0x8f/0x140
 ? copy_page_range+0x6cf/0xb00
 ? anon_vma_fork+0x8f/0x140
 ? __slab_alloc.isra.74+0x5a/0x81
 ? anon_vma_fork+0x8f/0x140
 ? kmem_cache_alloc+0x1b5/0x1f0
 ? anon_vma_fork+0x8f/0x140
 ? copy_process.part.35+0x1670/0x1ee0
 ? _do_fork+0xdd/0x3f0
 ? _do_fork+0xdd/0x3f0
 ? do_syscall_64+0x61/0x170
 ? entry_SYSCALL64_slow_path+0x25/0x25

The later ensured that a NUMA box WILL take callback_lock in atomic
context by removing the allocator and reclaim path __GFP_HARDWALL
usage which prevented such contexts from taking callback_mutex.

One option would be to reinstate __GFP_HARDWALL protections for
RT, however, as the 8447a0f changelog states:

The callback_mutex is only used to synchronize reads/updates of cpusets'
flags and cpu/node masks. These operations should always proceed fast so
there's no reason why we can't use a spinlock instead of the mutex.

Cc: [email protected]
Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Luis Claudio R. Goncalves <[email protected]>
jserv pushed a commit to jserv/linux-ma35d1 that referenced this issue Nov 15, 2024
The two commits below add up to a cpuset might_sleep() splat for RT:

8447a0f cpuset: convert callback_mutex to a spinlock
344736f cpuset: simplify cpuset_node_allowed API

BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:995
in_atomic(): 0, irqs_disabled(): 1, pid: 11718, name: cset
CPU: 135 PID: 11718 Comm: cset Tainted: G            E   4.10.0-rt1-rt OpenNuvoton#4
Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0056.R01.1409242327 09/24/2014
Call Trace:
 ? dump_stack+0x5c/0x81
 ? ___might_sleep+0xf4/0x170
 ? rt_spin_lock+0x1c/0x50
 ? __cpuset_node_allowed+0x66/0xc0
 ? ___slab_alloc+0x390/0x570 <disables IRQs>
 ? anon_vma_fork+0x8f/0x140
 ? copy_page_range+0x6cf/0xb00
 ? anon_vma_fork+0x8f/0x140
 ? __slab_alloc.isra.74+0x5a/0x81
 ? anon_vma_fork+0x8f/0x140
 ? kmem_cache_alloc+0x1b5/0x1f0
 ? anon_vma_fork+0x8f/0x140
 ? copy_process.part.35+0x1670/0x1ee0
 ? _do_fork+0xdd/0x3f0
 ? _do_fork+0xdd/0x3f0
 ? do_syscall_64+0x61/0x170
 ? entry_SYSCALL64_slow_path+0x25/0x25

The later ensured that a NUMA box WILL take callback_lock in atomic
context by removing the allocator and reclaim path __GFP_HARDWALL
usage which prevented such contexts from taking callback_mutex.

One option would be to reinstate __GFP_HARDWALL protections for
RT, however, as the 8447a0f changelog states:

The callback_mutex is only used to synchronize reads/updates of cpusets'
flags and cpu/node masks. These operations should always proceed fast so
there's no reason why we can't use a spinlock instead of the mutex.

Cc: [email protected]
Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Luis Claudio R. Goncalves <[email protected]>
jserv pushed a commit to jserv/linux-ma35d1 that referenced this issue Nov 27, 2024
The two commits below add up to a cpuset might_sleep() splat for RT:

8447a0f cpuset: convert callback_mutex to a spinlock
344736f cpuset: simplify cpuset_node_allowed API

BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:995
in_atomic(): 0, irqs_disabled(): 1, pid: 11718, name: cset
CPU: 135 PID: 11718 Comm: cset Tainted: G            E   4.10.0-rt1-rt OpenNuvoton#4
Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0056.R01.1409242327 09/24/2014
Call Trace:
 ? dump_stack+0x5c/0x81
 ? ___might_sleep+0xf4/0x170
 ? rt_spin_lock+0x1c/0x50
 ? __cpuset_node_allowed+0x66/0xc0
 ? ___slab_alloc+0x390/0x570 <disables IRQs>
 ? anon_vma_fork+0x8f/0x140
 ? copy_page_range+0x6cf/0xb00
 ? anon_vma_fork+0x8f/0x140
 ? __slab_alloc.isra.74+0x5a/0x81
 ? anon_vma_fork+0x8f/0x140
 ? kmem_cache_alloc+0x1b5/0x1f0
 ? anon_vma_fork+0x8f/0x140
 ? copy_process.part.35+0x1670/0x1ee0
 ? _do_fork+0xdd/0x3f0
 ? _do_fork+0xdd/0x3f0
 ? do_syscall_64+0x61/0x170
 ? entry_SYSCALL64_slow_path+0x25/0x25

The later ensured that a NUMA box WILL take callback_lock in atomic
context by removing the allocator and reclaim path __GFP_HARDWALL
usage which prevented such contexts from taking callback_mutex.

One option would be to reinstate __GFP_HARDWALL protections for
RT, however, as the 8447a0f changelog states:

The callback_mutex is only used to synchronize reads/updates of cpusets'
flags and cpu/node masks. These operations should always proceed fast so
there's no reason why we can't use a spinlock instead of the mutex.

Cc: [email protected]
Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Luis Claudio R. Goncalves <[email protected]>
jserv pushed a commit to jserv/linux-ma35d1 that referenced this issue Nov 27, 2024
The two commits below add up to a cpuset might_sleep() splat for RT:

8447a0f cpuset: convert callback_mutex to a spinlock
344736f cpuset: simplify cpuset_node_allowed API

BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:995
in_atomic(): 0, irqs_disabled(): 1, pid: 11718, name: cset
CPU: 135 PID: 11718 Comm: cset Tainted: G            E   4.10.0-rt1-rt OpenNuvoton#4
Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0056.R01.1409242327 09/24/2014
Call Trace:
 ? dump_stack+0x5c/0x81
 ? ___might_sleep+0xf4/0x170
 ? rt_spin_lock+0x1c/0x50
 ? __cpuset_node_allowed+0x66/0xc0
 ? ___slab_alloc+0x390/0x570 <disables IRQs>
 ? anon_vma_fork+0x8f/0x140
 ? copy_page_range+0x6cf/0xb00
 ? anon_vma_fork+0x8f/0x140
 ? __slab_alloc.isra.74+0x5a/0x81
 ? anon_vma_fork+0x8f/0x140
 ? kmem_cache_alloc+0x1b5/0x1f0
 ? anon_vma_fork+0x8f/0x140
 ? copy_process.part.35+0x1670/0x1ee0
 ? _do_fork+0xdd/0x3f0
 ? _do_fork+0xdd/0x3f0
 ? do_syscall_64+0x61/0x170
 ? entry_SYSCALL64_slow_path+0x25/0x25

The later ensured that a NUMA box WILL take callback_lock in atomic
context by removing the allocator and reclaim path __GFP_HARDWALL
usage which prevented such contexts from taking callback_mutex.

One option would be to reinstate __GFP_HARDWALL protections for
RT, however, as the 8447a0f changelog states:

The callback_mutex is only used to synchronize reads/updates of cpusets'
flags and cpu/node masks. These operations should always proceed fast so
there's no reason why we can't use a spinlock instead of the mutex.

Cc: [email protected]
Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Luis Claudio R. Goncalves <[email protected]>
jserv pushed a commit to jserv/linux-ma35d1 that referenced this issue Dec 27, 2024
commit 5a22fbcc10f3f7d94c5d88afbbffa240a3677057 upstream.

When LAN9303 is MDIO-connected two callchains exist into
mdio->bus->write():

1. switch ports 1&2 ("physical" PHYs):

virtual (switch-internal) MDIO bus (lan9303_switch_ops->phy_{read|write})->
  lan9303_mdio_phy_{read|write} -> mdiobus_{read|write}_nested

2. LAN9303 virtual PHY:

virtual MDIO bus (lan9303_phy_{read|write}) ->
  lan9303_virt_phy_reg_{read|write} -> regmap -> lan9303_mdio_{read|write}

If the latter functions just take
mutex_lock(&sw_dev->device->bus->mdio_lock) it triggers a LOCKDEP
false-positive splat. It's false-positive because the first
mdio_lock in the second callchain above belongs to virtual MDIO bus, the
second mdio_lock belongs to physical MDIO bus.

Consequent annotation in lan9303_mdio_{read|write} as nested lock
(similar to lan9303_mdio_phy_{read|write}, it's the same physical MDIO bus)
prevents the following splat:

WARNING: possible circular locking dependency detected
5.15.71 OpenNuvoton#1 Not tainted
------------------------------------------------------
kworker/u4:3/609 is trying to acquire lock:
ffff000011531c68 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}, at: regmap_lock_mutex
but task is already holding lock:
ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> OpenNuvoton#1 (&bus->mdio_lock){+.+.}-{3:3}:
       lock_acquire
       __mutex_lock
       mutex_lock_nested
       lan9303_mdio_read
       _regmap_read
       regmap_read
       lan9303_probe
       lan9303_mdio_probe
       mdio_probe
       really_probe
       __driver_probe_device
       driver_probe_device
       __device_attach_driver
       bus_for_each_drv
       __device_attach
       device_initial_probe
       bus_probe_device
       deferred_probe_work_func
       process_one_work
       worker_thread
       kthread
       ret_from_fork
-> #0 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}:
       __lock_acquire
       lock_acquire.part.0
       lock_acquire
       __mutex_lock
       mutex_lock_nested
       regmap_lock_mutex
       regmap_read
       lan9303_phy_read
       dsa_slave_phy_read
       __mdiobus_read
       mdiobus_read
       get_phy_device
       mdiobus_scan
       __mdiobus_register
       dsa_register_switch
       lan9303_probe
       lan9303_mdio_probe
       mdio_probe
       really_probe
       __driver_probe_device
       driver_probe_device
       __device_attach_driver
       bus_for_each_drv
       __device_attach
       device_initial_probe
       bus_probe_device
       deferred_probe_work_func
       process_one_work
       worker_thread
       kthread
       ret_from_fork
other info that might help us debug this:
 Possible unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock(&bus->mdio_lock);
                               lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock);
                               lock(&bus->mdio_lock);
  lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock);
*** DEADLOCK ***
5 locks held by kworker/u4:3/609:
 #0: ffff000002842938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work
 OpenNuvoton#1: ffff80000bacbd60 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work
 OpenNuvoton#2: ffff000007645178 (&dev->mutex){....}-{3:3}, at: __device_attach
 OpenNuvoton#3: ffff8000096e6e78 (dsa2_mutex){+.+.}-{3:3}, at: dsa_register_switch
 OpenNuvoton#4: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read
stack backtrace:
CPU: 1 PID: 609 Comm: kworker/u4:3 Not tainted 5.15.71 OpenNuvoton#1
Workqueue: events_unbound deferred_probe_work_func
Call trace:
 dump_backtrace
 show_stack
 dump_stack_lvl
 dump_stack
 print_circular_bug
 check_noncircular
 __lock_acquire
 lock_acquire.part.0
 lock_acquire
 __mutex_lock
 mutex_lock_nested
 regmap_lock_mutex
 regmap_read
 lan9303_phy_read
 dsa_slave_phy_read
 __mdiobus_read
 mdiobus_read
 get_phy_device
 mdiobus_scan
 __mdiobus_register
 dsa_register_switch
 lan9303_probe
 lan9303_mdio_probe
...

Cc: [email protected]
Fixes: dc70058 ("net: dsa: LAN9303: add MDIO managed mode support")
Signed-off-by: Alexander Sverdlin <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
jserv pushed a commit to jserv/linux-ma35d1 that referenced this issue Dec 27, 2024
commit 7fed14f7ac9cf5e38c693836fe4a874720141845 upstream.

The following warning appears when using buffered events:

[  203.556451] WARNING: CPU: 53 PID: 10220 at kernel/trace/ring_buffer.c:3912 ring_buffer_discard_commit+0x2eb/0x420
[...]
[  203.670690] CPU: 53 PID: 10220 Comm: stress-ng-sysin Tainted: G            E      6.7.0-rc2-default OpenNuvoton#4 56e6d0fcf5581e6e51eaaecbdaec2a2338c80f3a
[  203.670704] Hardware name: Intel Corp. GROVEPORT/GROVEPORT, BIOS GVPRCRB1.86B.0016.D04.1705030402 05/03/2017
[  203.670709] RIP: 0010:ring_buffer_discard_commit+0x2eb/0x420
[  203.735721] Code: 4c 8b 4a 50 48 8b 42 48 49 39 c1 0f 84 b3 00 00 00 49 83 e8 01 75 b1 48 8b 42 10 f0 ff 40 08 0f 0b e9 fc fe ff ff f0 ff 47 08 <0f> 0b e9 77 fd ff ff 48 8b 42 10 f0 ff 40 08 0f 0b e9 f5 fe ff ff
[  203.735734] RSP: 0018:ffffb4ae4f7b7d80 EFLAGS: 00010202
[  203.735745] RAX: 0000000000000000 RBX: ffffb4ae4f7b7de0 RCX: ffff8ac10662c000
[  203.735754] RDX: ffff8ac0c750be00 RSI: ffff8ac10662c000 RDI: ffff8ac0c004d400
[  203.781832] RBP: ffff8ac0c039cea0 R08: 0000000000000000 R09: 0000000000000000
[  203.781839] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  203.781842] R13: ffff8ac10662c000 R14: ffff8ac0c004d400 R15: ffff8ac10662c008
[  203.781846] FS:  00007f4cd8a67740(0000) GS:ffff8ad798880000(0000) knlGS:0000000000000000
[  203.781851] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  203.781855] CR2: 0000559766a74028 CR3: 00000001804c4000 CR4: 00000000001506f0
[  203.781862] Call Trace:
[  203.781870]  <TASK>
[  203.851949]  trace_event_buffer_commit+0x1ea/0x250
[  203.851967]  trace_event_raw_event_sys_enter+0x83/0xe0
[  203.851983]  syscall_trace_enter.isra.0+0x182/0x1a0
[  203.851990]  do_syscall_64+0x3a/0xe0
[  203.852075]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[  203.852090] RIP: 0033:0x7f4cd870fa77
[  203.982920] Code: 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 b8 89 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 43 0e 00 f7 d8 64 89 01 48
[  203.982932] RSP: 002b:00007fff99717dd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000089
[  203.982942] RAX: ffffffffffffffda RBX: 0000558ea1d7b6f0 RCX: 00007f4cd870fa77
[  203.982948] RDX: 0000000000000000 RSI: 00007fff99717de0 RDI: 0000558ea1d7b6f0
[  203.982957] RBP: 00007fff99717de0 R08: 00007fff997180e0 R09: 00007fff997180e0
[  203.982962] R10: 00007fff997180e0 R11: 0000000000000246 R12: 00007fff99717f40
[  204.049239] R13: 00007fff99718590 R14: 0000558e9f2127a8 R15: 00007fff997180b0
[  204.049256]  </TASK>

For instance, it can be triggered by running these two commands in
parallel:

 $ while true; do
    echo hist:key=id.syscall:val=hitcount > \
      /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger;
  done
 $ stress-ng --sysinfo $(nproc)

The warning indicates that the current ring_buffer_per_cpu is not in the
committing state. It happens because the active ring_buffer_event
doesn't actually come from the ring_buffer_per_cpu but is allocated from
trace_buffered_event.

The bug is in function trace_buffered_event_disable() where the
following normally happens:

* The code invokes disable_trace_buffered_event() via
  smp_call_function_many() and follows it by synchronize_rcu(). This
  increments the per-CPU variable trace_buffered_event_cnt on each
  target CPU and grants trace_buffered_event_disable() the exclusive
  access to the per-CPU variable trace_buffered_event.

* Maintenance is performed on trace_buffered_event, all per-CPU event
  buffers get freed.

* The code invokes enable_trace_buffered_event() via
  smp_call_function_many(). This decrements trace_buffered_event_cnt and
  releases the access to trace_buffered_event.

A problem is that smp_call_function_many() runs a given function on all
target CPUs except on the current one. The following can then occur:

* Task X executing trace_buffered_event_disable() runs on CPU 0.

* The control reaches synchronize_rcu() and the task gets rescheduled on
  another CPU 1.

* The RCU synchronization finishes. At this point,
  trace_buffered_event_disable() has the exclusive access to all
  trace_buffered_event variables except trace_buffered_event[CPU0]
  because trace_buffered_event_cnt[CPU0] is never incremented and if the
  buffer is currently unused, remains set to 0.

* A different task Y is scheduled on CPU 0 and hits a trace event. The
  code in trace_event_buffer_lock_reserve() sees that
  trace_buffered_event_cnt[CPU0] is set to 0 and decides the use the
  buffer provided by trace_buffered_event[CPU0].

* Task X continues its execution in trace_buffered_event_disable(). The
  code incorrectly frees the event buffer pointed by
  trace_buffered_event[CPU0] and resets the variable to NULL.

* Task Y writes event data to the now freed buffer and later detects the
  created inconsistency.

The issue is observable since commit dea499781a11 ("tracing: Fix warning
in trace_buffered_event_disable()") which moved the call of
trace_buffered_event_disable() in __ftrace_event_enable_disable()
earlier, prior to invoking call->class->reg(.. TRACE_REG_UNREGISTER ..).
The underlying problem in trace_buffered_event_disable() is however
present since the original implementation in commit 0fc1b09
("tracing: Use temp buffer when filtering events").

Fix the problem by replacing the two smp_call_function_many() calls with
on_each_cpu_mask() which invokes a given callback on all CPUs.

Link: https://lore.kernel.org/all/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]

Cc: [email protected]
Fixes: 0fc1b09 ("tracing: Use temp buffer when filtering events")
Fixes: dea499781a11 ("tracing: Fix warning in trace_buffered_event_disable()")
Signed-off-by: Petr Pavlu <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
jserv pushed a commit to jserv/linux-ma35d1 that referenced this issue Dec 27, 2024
[ Upstream commit 14694179e561b5f2f7e56a0f590e2cb49a9cc7ab ]

Trying to suspend to RAM on SAMA5D27 EVK leads to the following lockdep
warning:

 ============================================
 WARNING: possible recursive locking detected
 6.7.0-rc5-wt+ #532 Not tainted
 --------------------------------------------
 sh/92 is trying to acquire lock:
 c3cf306c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0xe8/0x100

 but task is already holding lock:
 c3d7c46c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0xe8/0x100

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&irq_desc_lock_class);
   lock(&irq_desc_lock_class);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 6 locks held by sh/92:
  #0: c3aa0258 (sb_writers#6){.+.+}-{0:0}, at: ksys_write+0xd8/0x178
  OpenNuvoton#1: c4c2df44 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x138/0x284
  OpenNuvoton#2: c32684a0 (kn->active){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x148/0x284
  OpenNuvoton#3: c232b6d4 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend+0x13c/0x4e8
  OpenNuvoton#4: c387b088 (&dev->mutex){....}-{3:3}, at: __device_suspend+0x1e8/0x91c
  OpenNuvoton#5: c3d7c46c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0xe8/0x100

 stack backtrace:
 CPU: 0 PID: 92 Comm: sh Not tainted 6.7.0-rc5-wt+ #532
 Hardware name: Atmel SAMA5
  unwind_backtrace from show_stack+0x18/0x1c
  show_stack from dump_stack_lvl+0x34/0x48
  dump_stack_lvl from __lock_acquire+0x19ec/0x3a0c
  __lock_acquire from lock_acquire.part.0+0x124/0x2d0
  lock_acquire.part.0 from _raw_spin_lock_irqsave+0x5c/0x78
  _raw_spin_lock_irqsave from __irq_get_desc_lock+0xe8/0x100
  __irq_get_desc_lock from irq_set_irq_wake+0xa8/0x204
  irq_set_irq_wake from atmel_gpio_irq_set_wake+0x58/0xb4
  atmel_gpio_irq_set_wake from irq_set_irq_wake+0x100/0x204
  irq_set_irq_wake from gpio_keys_suspend+0xec/0x2b8
  gpio_keys_suspend from dpm_run_callback+0xe4/0x248
  dpm_run_callback from __device_suspend+0x234/0x91c
  __device_suspend from dpm_suspend+0x224/0x43c
  dpm_suspend from dpm_suspend_start+0x9c/0xa8
  dpm_suspend_start from suspend_devices_and_enter+0x1e0/0xa84
  suspend_devices_and_enter from pm_suspend+0x460/0x4e8
  pm_suspend from state_store+0x78/0xe4
  state_store from kernfs_fop_write_iter+0x1a0/0x284
  kernfs_fop_write_iter from vfs_write+0x38c/0x6f4
  vfs_write from ksys_write+0xd8/0x178
  ksys_write from ret_fast_syscall+0x0/0x1c
 Exception stack(0xc52b3fa8 to 0xc52b3ff0)
 3fa0:                   00000004 005a0ae8 00000001 005a0ae8 00000004 00000001
 3fc0: 00000004 005a0ae8 00000001 00000004 00000004 b6c616c0 00000020 0059d190
 3fe0: 00000004 b6c61678 aec5a041 aebf1a26

This warning is raised because pinctrl-at91-pio4 uses chained IRQ. Whenever
a wake up source configures an IRQ through irq_set_irq_wake, it will
lock the corresponding IRQ desc, and then call irq_set_irq_wake on "parent"
IRQ which will do the same on its own IRQ desc, but since those two locks
share the same class, lockdep reports this as an issue.

Fix lockdep false positive by setting a different class for parent and
children IRQ

Fixes: 7761808 ("pinctrl: introduce driver for Atmel PIO4 controller")
Signed-off-by: Alexis Lothoré <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants