Changes 27/12/21 #3

Vylpes · 2021-12-28T14:14:33Z

No description provided.

…xfer_start() Coverity complains of an uninitialized variable: 5. uninit_use_in_call: Using uninitialized value config.dst_per when calling axi_chan_config_write. [show details] 6. uninit_use_in_call: Using uninitialized value config.hs_sel_src when calling axi_chan_config_write. [show details] CID 121164 (#1-3 of 3): Uninitialized scalar variable (UNINIT) 7. uninit_use_in_call: Using uninitialized value config.src_per when calling axi_chan_config_write. [show details] 418 axi_chan_config_write(chan, &config); Fix this by initializing the structure to 0 which should at least be benign in axi_chan_config_write(). Also fix what looks like a cut-n-paste error when initializing config.hs_sel_dst. Fixes: 8243516 ("dmaengine: dw-axi-dmac: support DMAX_NUM_CHANNELS > 8") Cc: Eugeniy Paltsev <[email protected]> Cc: Vinod Koul <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Tim Gardner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vinod Koul <[email protected]>

…ent() The commit in the Fixes: tag has changed the logic of the code and now it is likely that the probe will return an early success (0), even if not completely executed. This should lead to a crash or similar issue later on when the code accesses to some never allocated resources. Change the '!err' into a 'err' when checking if 'dma_set_mask_and_coherent()' has failed or not. While at it, simplify the code and remove the "can't success code" related to 32 DMA mask. As stated in [1], 'dma_set_mask_and_coherent(DMA_BIT_MASK(64))' can't fail if 'dev->dma_mask' is non-NULL. And if it is NULL, it would fail for the same reason when tried with DMA_BIT_MASK(32). [1]: https://lkml.org/lkml/2021/6/7/398 Fixes: ecb8c88 ("dmaengine: dw-edma-pcie: switch from 'pci_' to 'dma_' API") Signed-off-by: Christophe JAILLET <[email protected]> Link: https://lore.kernel.org/r/935fbb40ae930c5fe87482a41dcb73abf2257973.1636492127.git.christophe.jaillet@wanadoo.fr Signed-off-by: Vinod Koul <[email protected]>

Dan reports that smatch has found idxd_wq_quiesce() is being called inside the idxd->dev_lock. idxd_wq_quiesce() calls wait_for_completion() and therefore it can sleep. Move the call outside of the spinlock as it does not need device lock. Fixes: 5b0c68c ("dmaengine: idxd: support reporting of halt interrupt") Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Dave Jiang <[email protected]> Link: https://lore.kernel.org/r/163716858508.1721911.15051495873516709923.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul <[email protected]>

Per HiFive Unleashed schematics, the card detect signal of the micro SD card is connected to gpio pin torvalds#11, which should be reflected in the DT via the <gpios> property, as described in Documentation/devicetree/bindings/mmc/mmc-spi-slot.txt. [1] https://sifive.cdn.prismic.io/sifive/c52a8e32-05ce-4aaf-95c8-7bf8453f8698_hifive-unleashed-a00-schematics-1.pdf Signed-off-by: Bin Meng <[email protected]> Fixes: d573b55 ("riscv: dts: add initial board data for the SiFive HiFive Unmatched") Cc: [email protected] Signed-off-by: Palmer Dabbelt <[email protected]>

Per HiFive Unmatched schematics, the card detect signal of the micro SD card is connected to gpio pin torvalds#15, which should be reflected in the DT via the <gpios> property, as described in Documentation/devicetree/bindings/mmc/mmc-spi-slot.txt. [1] https://sifive.cdn.prismic.io/sifive/6a06d6c0-6e66-49b5-8e9e-e68ce76f4192_hifive-unmatched-schematics-v3.pdf Signed-off-by: Bin Meng <[email protected]> Fixes: d573b55 ("riscv: dts: add initial board data for the SiFive HiFive Unmatched") Cc: [email protected] Signed-off-by: Palmer Dabbelt <[email protected]>

When CONFIG_FSL_PMC is set to n, no value is assigned to cpu_up_prepare in the mpc85xx_pm_ops structure. As a result, oops is triggered in smp_85xx_start_cpu(). smp: Bringing up secondary CPUs ... kernel tried to execute user page (0) - exploit attempt? (uid: 0) BUG: Unable to handle kernel instruction fetch (NULL pointer?) Faulting instruction address: 0x00000000 Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [00000000] 0x0 LR [c0021d2c] smp_85xx_kick_cpu+0xe8/0x568 Call Trace: [c1051da8] [c0021cb8] smp_85xx_kick_cpu+0x74/0x568 (unreliable) [c1051de8] [c0011460] __cpu_up+0xc0/0x228 [c1051e18] [c0031bbc] bringup_cpu+0x30/0x224 [c1051e48] [c0031f3c] cpu_up.constprop.0+0x180/0x33c [c1051e88] [c00322e8] bringup_nonboot_cpus+0x88/0xc8 [c1051eb8] [c07e67bc] smp_init+0x30/0x78 [c1051ed8] [c07d9e28] kernel_init_freeable+0x118/0x2a8 [c1051f18] [c00032d8] kernel_init+0x14/0x124 [c1051f38] [c0010278] ret_from_kernel_thread+0x14/0x1c Fixes: c45361a ("powerpc/85xx: fix timebase sync issue when CONFIG_HOTPLUG_CPU=n") Reported-by: Martin Kennedy <[email protected]> Signed-off-by: Xiaoming Ni <[email protected]> Tested-by: Martin Kennedy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]

While converting bindings to dtschema, the buck regulators lost "op_mode" property. The "op_mode" is a valid property for all regulators (both LDOs and bucks), so add it. Reported-by: Rob Herring <[email protected]> Fixes: fab58de ("regulator: dt-bindings: samsung,s5m8767: convert to dtschema") Cc: <[email protected]> Signed-off-by: Krzysztof Kozlowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>

The corresponding API for clk_prepare is clk_unprepare, other than clk_disable_unprepare. Fix this by changing clk_disable_unprepare to clk_unprepare. Fixes: 5762ab7 ("spi: Add support for Armada 3700 SPI Controller") Signed-off-by: Dongliang Mu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>

After commit 9f76779 ("MIPS: implement architecture-specific 'pci_remap_iospace()'"), there exists the following warning on the Loongson64 platform: loongson-pci 1a000000.pci: IO 0x0018020000..0x001803ffff -> 0x0000020000 loongson-pci 1a000000.pci: MEM 0x0040000000..0x007fffffff -> 0x0040000000 ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1 at arch/mips/pci/pci-generic.c:55 pci_remap_iospace+0x84/0x90 resource start address is not zero ... Call Trace: [<ffffffff8020dc78>] show_stack+0x40/0x120 [<ffffffff80cf4a0c>] dump_stack_lvl+0x58/0x74 [<ffffffff8023a0b0>] __warn+0xe0/0x110 [<ffffffff80cee02c>] warn_slowpath_fmt+0xa4/0xd0 [<ffffffff80cecf24>] pci_remap_iospace+0x84/0x90 [<ffffffff807f9864>] devm_pci_remap_iospace+0x5c/0xb8 [<ffffffff808121b0>] devm_of_pci_bridge_init+0x178/0x1f8 [<ffffffff807f4000>] devm_pci_alloc_host_bridge+0x78/0x98 [<ffffffff80819454>] loongson_pci_probe+0x34/0x160 [<ffffffff809203cc>] platform_probe+0x6c/0xe0 [<ffffffff8091d5d4>] really_probe+0xbc/0x340 [<ffffffff8091d8f0>] __driver_probe_device+0x98/0x110 [<ffffffff8091d9b8>] driver_probe_device+0x50/0x118 [<ffffffff8091dea0>] __driver_attach+0x80/0x118 [<ffffffff8091b280>] bus_for_each_dev+0x80/0xc8 [<ffffffff8091c6d8>] bus_add_driver+0x130/0x210 [<ffffffff8091ead4>] driver_register+0x8c/0x150 [<ffffffff80200a8c>] do_one_initcall+0x54/0x288 [<ffffffff811a5320>] kernel_init_freeable+0x27c/0x2e4 [<ffffffff80cfc380>] kernel_init+0x2c/0x134 [<ffffffff80205a2c>] ret_from_kernel_thread+0x14/0x1c ---[ end trace e4a0efe10aa5cce6 ]--- loongson-pci 1a000000.pci: error -19: failed to map resource [io 0x20000-0x3ffff] We can see that the resource start address is 0x0000020000, because the ISA Bridge used the zero address which is defined in the dts file arch/mips/boot/dts/loongson/ls7a-pch.dtsi: ISA Bridge: /bus@10000000/isa@18000000 IO 0x0000000018000000..0x000000001801ffff -> 0x0000000000000000 Based on the above analysis, the architecture-specific pci_remap_iospace() is not suitable for Loongson64, we should only define pci_remap_iospace() for Ralink on MIPS based on the commit background. Fixes: 9f76779 ("MIPS: implement architecture-specific 'pci_remap_iospace()'") Suggested-by: Thomas Bogendoerfer <[email protected]> Signed-off-by: Tiezhu Yang <[email protected]> Tested-by: Sergio Paracuellos <[email protected]> Acked-by: Sergio Paracuellos <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]>

This reverts commit b3484d2. That change attempted to improve the DRM drivers fbdev emulation device names to avoid having confusing names like "simpledrmdrmfb" in /proc/fb. But unfortunately, there are user-space programs such as pm-utils that match against the fbdev names and so broke after the mentioned commit. Since the names in /proc/fb are used by tools that consider it an uAPI, let's restore the old names even when this lead to silly names like the one mentioned above. Fixes: b3484d2 ("drm/fb-helper: improve DRM fbdev emulation device names") Reported-by: Johannes Stezenbach <[email protected]> Signed-off-by: Javier Martinez Canillas <[email protected]> Reviewed-by: Ville Syrjälä <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Smatch reports below warnings [1] wrt dereferencing rm_res when it can potentially be ERR_PTR(). This is possible when entire range is allocated to Linux Fix this case by making sure, there is no deference of rm_res when its ERR_PTR(). [1]: drivers/dma/ti/k3-udma.c:4524 udma_setup_resources() error: 'rm_res' dereferencing possible ERR_PTR() drivers/dma/ti/k3-udma.c:4537 udma_setup_resources() error: 'rm_res' dereferencing possible ERR_PTR() drivers/dma/ti/k3-udma.c:4681 bcdma_setup_resources() error: 'rm_res' dereferencing possible ERR_PTR() drivers/dma/ti/k3-udma.c:4696 bcdma_setup_resources() error: 'rm_res' dereferencing possible ERR_PTR() drivers/dma/ti/k3-udma.c:4711 bcdma_setup_resources() error: 'rm_res' dereferencing possible ERR_PTR() drivers/dma/ti/k3-udma.c:4848 pktdma_setup_resources() error: 'rm_res' dereferencing possible ERR_PTR() drivers/dma/ti/k3-udma.c:4861 pktdma_setup_resources() error: 'rm_res' dereferencing possible ERR_PTR() Reported-by: Nishanth Menon <[email protected]> Signed-off-by: Vignesh Raghavendra <[email protected]> Acked-by: Peter Ujfalusi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vinod Koul <[email protected]>

Ming reported that with the abort path of the descriptor submission, there can be a window where a completed descriptor can be missed to be completed by the irq completion thread: CPU A CPU B Submit (successful) Submit (fail) irq_process_work_list() // empty llist_abort_desc() // remove all descs from pending list irq_process_pending_llist() // empty exit idxd_wq_thread() with no processing Add opportunistic descriptor completion in the abort path in order to remove the missed completion. Fixes: 6b4b87f ("dmaengine: idxd: fix submission race window") Reported-by: Ming Li <[email protected]> Signed-off-by: Dave Jiang <[email protected]> Link: https://lore.kernel.org/r/163898288714.443911.16084982766671976640.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul <[email protected]>

modprobe can't handle spaces in aliases. Fixes: 6b4cd72 ("dmaengine: st_fdma: Add STMicroelectronics FDMA engine driver support") Signed-off-by: Alyssa Ross <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vinod Koul <[email protected]>

Pixel clock has to be set in kHz. Signed-off-by: Alejandro Concepcion-Rodriguez <[email protected]> Fixes: 11e8f5f ("drm: Add simpledrm driver") Signed-off-by: Thomas Zimmermann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

When listening for notifications through netlink of a new interface being registered, sporadically, it is possible for the MAC to be read as zero. The zero MAC address lasts a short period of time and then switches to a valid random MAC address. This causes problems for netd in Android, which assumes that the interface is malfunctioning and will not use it. In the good case we get this log: InterfaceController::getCfg() ifName usb0 hwAddr 92:a8:f0:73:79:5b ipv4Addr 0.0.0.0 flags 0x1002 In the error case we get these logs: InterfaceController::getCfg() ifName usb0 hwAddr 00:00:00:00:00:00 ipv4Addr 0.0.0.0 flags 0x1002 netd : interfaceGetCfg("usb0") netd : interfaceSetCfg() -> ServiceSpecificException (99, "[Cannot assign requested address] : ioctl() failed") The reason for the issue is the order in which the interface is setup, it is first registered through register_netdev() and after the MAC address is set. Fixed by first setting the MAC address of the net_device and after that calling register_netdev(). Fixes: bcd4a1c ("usb: gadget: u_ether: construct with default values and add setters/getters") Cc: [email protected] Signed-off-by: Marian Postevca <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

There is warning of 'list_del corruption' when enable list debug (CONFIG_DEBUG_LIST=y), fix it by using list_del_init() Fixes: 4ce1866 ("usb: xhci-mtk: Do not use xhci's virt_dev in drop_endpoint") Cc: stable <[email protected]> Signed-off-by: Chunfeng Yun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

Patch restrict calling of cdnsp_died function during removing modules or software disconnect. This function was called because after transition controller to HALT state the driver starts handling the deferred interrupt. In this case such interrupt can be simple ignored. Fixes: 3d82904 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver") cc: <[email protected]> Reviewed-by: Peter Chen <[email protected]> Signed-off-by: Pawel Laszczak <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

Patch fixes incorrect order of __entry->stream_id and __entry->state parameters in TP_printk macro. Fixes: 3d82904 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver") cc: <[email protected]> Reviewed-by: Peter Chen <[email protected]> Signed-off-by: Pawel Laszczak <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

Patch fixes incorrect status for control request. Without this fix all usb_request objects were returned to upper drivers with usb_reqest->status field set to -EINPROGRESS. Fixes: 3d82904 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver") cc: <[email protected]> Reported-by: Ken (Jian) He <[email protected]> Reviewed-by: Peter Chen <[email protected]> Signed-off-by: Pawel Laszczak <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

We have two io-wq creation paths: - On queue enqueue - When a worker goes to sleep The latter invokes worker creation with the wqe->lock held, but that can run into problems if we end up exiting and need to cancel the queued work. syzbot caught this: ============================================ WARNING: possible recursive locking detected 5.16.0-rc4-syzkaller #0 Not tainted -------------------------------------------- iou-wrk-6468/6471 is trying to acquire lock: ffff88801aa98018 (&wqe->lock){+.+.}-{2:2}, at: io_worker_cancel_cb+0xb7/0x210 fs/io-wq.c:187 but task is already holding lock: ffff88801aa98018 (&wqe->lock){+.+.}-{2:2}, at: io_wq_worker_sleeping+0xb6/0x140 fs/io-wq.c:700 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&wqe->lock); lock(&wqe->lock); *** DEADLOCK *** May be due to missing lock nesting notation 1 lock held by iou-wrk-6468/6471: #0: ffff88801aa98018 (&wqe->lock){+.+.}-{2:2}, at: io_wq_worker_sleeping+0xb6/0x140 fs/io-wq.c:700 stack backtrace: CPU: 1 PID: 6471 Comm: iou-wrk-6468 Not tainted 5.16.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1dc/0x2d8 lib/dump_stack.c:106 print_deadlock_bug kernel/locking/lockdep.c:2956 [inline] check_deadlock kernel/locking/lockdep.c:2999 [inline] validate_chain+0x5984/0x8240 kernel/locking/lockdep.c:3788 __lock_acquire+0x1382/0x2b00 kernel/locking/lockdep.c:5027 lock_acquire+0x19f/0x4d0 kernel/locking/lockdep.c:5637 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:154 io_worker_cancel_cb+0xb7/0x210 fs/io-wq.c:187 io_wq_cancel_tw_create fs/io-wq.c:1220 [inline] io_queue_worker_create+0x3cf/0x4c0 fs/io-wq.c:372 io_wq_worker_sleeping+0xbe/0x140 fs/io-wq.c:701 sched_submit_work kernel/sched/core.c:6295 [inline] schedule+0x67/0x1f0 kernel/sched/core.c:6323 schedule_timeout+0xac/0x300 kernel/time/timer.c:1857 wait_woken+0xca/0x1b0 kernel/sched/wait.c:460 unix_msg_wait_data net/unix/unix_bpf.c:32 [inline] unix_bpf_recvmsg+0x7f9/0xe20 net/unix/unix_bpf.c:77 unix_stream_recvmsg+0x214/0x2c0 net/unix/af_unix.c:2832 sock_recvmsg_nosec net/socket.c:944 [inline] sock_recvmsg net/socket.c:962 [inline] sock_read_iter+0x3a7/0x4d0 net/socket.c:1035 call_read_iter include/linux/fs.h:2156 [inline] io_iter_do_read fs/io_uring.c:3501 [inline] io_read fs/io_uring.c:3558 [inline] io_issue_sqe+0x144c/0x9590 fs/io_uring.c:6671 io_wq_submit_work+0x2d8/0x790 fs/io_uring.c:6836 io_worker_handle_work+0x808/0xdd0 fs/io-wq.c:574 io_wqe_worker+0x395/0x870 fs/io-wq.c:630 ret_from_fork+0x1f/0x30 We can safely drop the lock before doing work creation, making the two contexts the same in that regard. Reported-by: [email protected] Fixes: 71a8538 ("io-wq: check for wq exit after adding new worker task_work") Signed-off-by: Jens Axboe <[email protected]>

The driver supports a "direct" mode of operation, where the SMP req frame is directly copied into the command payload (and vice-versa for the SMP resp). To get at the SMP req frame data in the scatterlist the driver uses phys_to_virt() on the DMA mapped memory dma_addr_t . This is broken, and subsequently crashes as follows when an IOMMU is enabled: Unable to handle kernel paging request at virtual address ffff0000fcebfb00 ... pc : pm80xx_chip_smp_req+0x2d0/0x3d0 lr : pm80xx_chip_smp_req+0xac/0x3d0 pm80xx_chip_smp_req+0x2d0/0x3d0 pm8001_task_exec.constprop.0+0x368/0x520 pm8001_queue_command+0x1c/0x30 smp_execute_task_sg+0xdc/0x204 sas_discover_expander.part.0+0xac/0x6cc sas_discover_root_expander+0x8c/0x150 sas_discover_domain+0x3ac/0x6a0 process_one_work+0x1d0/0x354 worker_thread+0x13c/0x470 kthread+0x17c/0x190 ret_from_fork+0x10/0x20 Code: 371806e1 910006d6 6b16033f 54000249 (38766b05) ---[ end trace b91d59aaee98ea2d ]--- note: kworker/u192:0[7] exited with preempt_count 1 Instead use kmap_atomic(). -- Difference to v1: - use kmap_atomic() in both locations Difference to v2: - add whitespace around arithmetic (Damien) Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: John Garry <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>

The return value of kzalloc() needs to be checked. To avoid use of null pointer '&ast_state->base' in case of the failure of alloc. Fixes: f0adbc3 ("drm/ast: Allocate initial CRTC state of the correct size") Signed-off-by: Jiasheng Jiang <[email protected]> Signed-off-by: Thomas Zimmermann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

When generalising GPIO support and adding support for CP2102N, the GPIO registration for some CP2105 devices accidentally broke. Specifically, when all the pins of a port are in "modem" mode, and thus unavailable for GPIO use, the GPIO chip would now be registered without having initialised the number of GPIO lines. This would in turn be rejected by gpiolib and some errors messages would be printed (but importantly probe would still succeed). Fix this by initialising the number of GPIO lines before registering the GPIO chip. Note that as for the other device types, and as when all CP2105 pins are muxed for LED function, the GPIO chip is registered also when no pins are available for GPIO use. Reported-by: Maarten Brock <[email protected]> Link: https://lore.kernel.org/r/[email protected] Fixes: c8acfe0 ("USB: serial: cp210x: implement GPIO support for CP2102N") Cc: [email protected] # 4.19 Cc: Karoly Pados <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Greg Kroah-Hartman <[email protected]> Tested-by: Maarten Brock <[email protected]> Signed-off-by: Johan Hovold <[email protected]>

Add the following Telit FN990 compositions: 0x1070: tty, adb, rmnet, tty, tty, tty, tty 0x1071: tty, adb, mbim, tty, tty, tty, tty 0x1072: rndis, tty, adb, tty, tty, tty, tty 0x1073: tty, adb, ecm, tty, tty, tty, tty Signed-off-by: Daniele Palmas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: [email protected] Signed-off-by: Johan Hovold <[email protected]>

…tive. smatch warning: drivers/gpu/drm/i915/display/intel_dmc.c:601 parse_dmc_fw() warn: unsigned 'fw->size - offset' is never less than zero Firmware size is size_t and offset is u32. So the subtraction is unsigned which can never be less than zero. Fixes: 3d5928a ("drm/i915/xelpd: Pipe A DMC plugging") Signed-off-by: Harshit Mogalapalli <[email protected]> Reviewed-by: Lucas De Marchi <[email protected]> Signed-off-by: Lucas De Marchi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 87bb2a4) Signed-off-by: Rodrigo Vivi <[email protected]>

Livepatching a loaded module involves applying relocations through apply_relocate_add(), which attempts to write to read-only memory when CONFIG_STRICT_MODULE_RWX=y. Work around this by performing these writes through the text poke area by using patch_instruction(). R_PPC_REL24 is the only relocation type generated by the kpatch-build userspace tool or klp-convert kernel tree that I observed applying a relocation to a post-init module. A more comprehensive solution is planned, but using patch_instruction() for R_PPC_REL24 on should serve as a sufficient fix. This does have a performance impact, I observed ~15% overhead in module_load() on POWER8 bare metal with checksum verification off. Fixes: c35717c ("powerpc: Set ARCH_HAS_STRICT_MODULE_RWX") Cc: [email protected] # v5.14+ Reported-by: Joe Lawrence <[email protected]> Signed-off-by: Russell Currey <[email protected]> Tested-by: Joe Lawrence <[email protected]> [mpe: Check return codes from patch_instruction()] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]

Masking all unused MSI-X entries is done to ensure that a crash kernel starts from a clean slate, which correponds to the reset state of the device as defined in the PCI-E specificion 3.0 and later: Vector Control for MSI-X Table Entries -------------------------------------- "00: Mask bit: When this bit is set, the function is prohibited from sending a message using this MSI-X Table entry. ... This bit’s state after reset is 1 (entry is masked)." A Marvell NVME device fails to deliver MSI interrupts after trying to enable MSI-X interrupts due to that masking. It seems to take the MSI-X mask bits into account even when MSI-X is disabled. While not specification compliant, this can be cured by moving the masking into the success path, so that the MSI-X table entries stay in device reset state when the MSI-X setup fails. [ tglx: Move it into the success path, add comment and amend changelog ] Fixes: aa8092c ("PCI/MSI: Mask all unused MSI-X entries") Signed-off-by: Stefan Roese <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: Bjorn Helgaas <[email protected]> Cc: Michal Simek <[email protected]> Cc: Marek Vasut <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]

PCI_MSIX_FLAGS_MASKALL is set in the MSI-X control register at MSI-X interrupt setup time. It's cleared on success, but the error handling path only clears the PCI_MSIX_FLAGS_ENABLE bit. That's incorrect as the reset state of the PCI_MSIX_FLAGS_MASKALL bit is zero. That can be observed via lspci: Capabilities: [b0] MSI-X: Enable- Count=67 Masked+ Clear the bit in the error path to restore the reset state. Fixes: 4385539 ("PCI/MSI: Enable and mask MSI-X early") Reported-by: Stefan Roese <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Stefan Roese <[email protected]> Cc: [email protected] Cc: Bjorn Helgaas <[email protected]> Cc: Michal Simek <[email protected]> Cc: Marek Vasut <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/87tufevoqx.ffs@tglx

The donation calculation logic assumes that the donor has non-zero after-donation hweight, so the lowest active hweight a donating cgroup can have is 2 so that it can donate 1 while keeping the other 1 for itself. Earlier, we only donated from cgroups with sizable surpluses so this condition was always true. However, with the precise donation algorithm implemented, f1de243 ("blk-iocost: revamp donation amount determination") made the donation amount calculation exact enabling even low hweight cgroups to donate. This means that in rare occasions, a cgroup with active hweight of 1 can enter donation calculation triggering the following warning and then a divide-by-zero oops. WARNING: CPU: 4 PID: 0 at block/blk-iocost.c:1928 transfer_surpluses.cold+0x0/0x53 [884/94867] ... RIP: 0010:transfer_surpluses.cold+0x0/0x53 Code: 92 ff 48 c7 c7 28 d1 ab b5 65 48 8b 34 25 00 ae 01 00 48 81 c6 90 06 00 00 e8 8b 3f fe ff 48 c7 c0 ea ff ff ff e9 95 ff 92 ff <0f> 0b 48 c7 c7 30 da ab b5 e8 71 3f fe ff 4c 89 e8 4d 85 ed 74 0 4 ... Call Trace: <IRQ> ioc_timer_fn+0x1043/0x1390 call_timer_fn+0xa1/0x2c0 __run_timers.part.0+0x1ec/0x2e0 run_timer_softirq+0x35/0x70 ... iocg: invalid donation weights in /a/b: active=1 donating=1 after=0 Fix it by excluding cgroups w/ active hweight < 2 from donating. Excluding these extreme low hweight donations shouldn't affect work conservation in any meaningful way. Signed-off-by: Tejun Heo <[email protected]> Fixes: f1de243 ("blk-iocost: revamp donation amount determination") Cc: [email protected] # v5.10+ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>

Line 1169 (#3) allocates a memory chunk for victim_name by kmalloc(), but when the function returns in line 1184 (#4) victim_name allocated by line 1169 (#3) is not freed, which will lead to a memory leak. There is a similar snippet of code in this function as allocating a memory chunk for victim_name in line 1104 (#1) as well as releasing the memory in line 1116 (#2). We should kfree() victim_name when the return value of backref_in_log() is less than zero and before the function returns in line 1184 (#4). 1057 static inline int __add_inode_ref(struct btrfs_trans_handle *trans, 1058 struct btrfs_root *root, 1059 struct btrfs_path *path, 1060 struct btrfs_root *log_root, 1061 struct btrfs_inode *dir, 1062 struct btrfs_inode *inode, 1063 u64 inode_objectid, u64 parent_objectid, 1064 u64 ref_index, char *name, int namelen, 1065 int *search_done) 1066 { 1104 victim_name = kmalloc(victim_name_len, GFP_NOFS); // #1: kmalloc (victim_name-1) 1105 if (!victim_name) 1106 return -ENOMEM; 1112 ret = backref_in_log(log_root, &search_key, 1113 parent_objectid, victim_name, 1114 victim_name_len); 1115 if (ret < 0) { 1116 kfree(victim_name); // #2: kfree (victim_name-1) 1117 return ret; 1118 } else if (!ret) { 1169 victim_name = kmalloc(victim_name_len, GFP_NOFS); // #3: kmalloc (victim_name-2) 1170 if (!victim_name) 1171 return -ENOMEM; 1180 ret = backref_in_log(log_root, &search_key, 1181 parent_objectid, victim_name, 1182 victim_name_len); 1183 if (ret < 0) { 1184 return ret; // #4: missing kfree (victim_name-2) 1185 } else if (!ret) { 1241 return 0; 1242 } Fixes: d3316c8 ("btrfs: Properly handle backref_in_log retval") CC: [email protected] # 5.10+ Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Jianglei Nie <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>

…ernel/git/mips/linux Pull MIPS fix from Thomas Bogendoerfer: - only enable pci_remap_iospace() for Ralink devices * tag 'mips-fixes_5.16_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: Only define pci_remap_iospace() for Ralink

…/linux/kernel/git/tip/tip Pull signal handlign fix from Borislav Petkov: - Prevent lock contention on the new sigaltstack lock on the common-case path, when no changes have been made to the alternative signal stack. * tag 'core_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: signal: Skip the altstack update when not needed

…scm/linux/kernel/git/tip/tip Pull locking fix from Borislav Petkov: - Fix the rtmutex condition checking when the optimistic spinning of a waiter needs to be terminated * tag 'locking_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rtmutex: Fix incorrect condition in rtmutex_spin_on_owner()

…cm/linux/kernel/git/tip/tip Pull timer fix from Borislav Petkov: - Make sure the CLOCK_REALTIME to CLOCK_MONOTONIC offset is never positive * tag 'timers_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Really make sure wall_to_monotonic isn't positive

…linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Clear the PCI_MSIX_FLAGS_MASKALL bit too on the error path so that it is restored to its reset state - Mask MSI-X vectors late on the init path in order to handle out-of-spec Marvell NVME devices which apparently look at the MSI-X mask even when MSI-X is disabled * tag 'irq_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error PCI/MSI: Mask MSI-X vectors only on success

Pull block revert from Jens Axboe: "It turns out that the fix for not hammering on the delayed work timer too much caused a performance regression for BFQ, so let's revert the change for now. I've got some ideas on how to fix it appropriately, but they should wait for 5.17" * tag 'block-5.16-2021-12-19' of git://git.kernel.dk/linux-block: Revert "block: reduce kblockd_mod_delayed_work_on() CPU consumption"

Pull kvm fixes from Paolo Bonzini: "Two small fixes, one of which was being worked around in selftests" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Retry page fault if MMU reload is pending and root has no sp KVM: selftests: vmx_pmu_msrs_test: Drop tests mangling guest visible CPUIDs KVM: x86: Drop guest CPUID check for host initiated writes to MSR_IA32_PERF_CAPABILITIES

The fixed counter 3 is used for the Topdown metrics, which hasn't been enabled for KVM guests. Userspace accessing to it will fail as it's not included in get_fixed_pmc(). This breaks KVM selftests on ICX+ machines, which have this counter. To reproduce it on ICX+ machines, ./state_test reports: ==== Test Assertion Failure ==== lib/x86_64/processor.c:1078: r == nmsrs pid=4564 tid=4564 - Argument list too long 1 0x000000000040b1b9: vcpu_save_state at processor.c:1077 2 0x0000000000402478: main at state_test.c:209 (discriminator 6) 3 0x00007fbe21ed5f92: ?? ??:0 4 0x000000000040264d: _start at ??:? Unexpected result from KVM_GET_MSRS, r: 17 (failed MSR was 0x30c) With this patch, it works well. Signed-off-by: Wei Wang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

An overlook from the previous commit: we don't even parse or start the device, meaning that the device is not presented to user space. Fixes: 9302095 ("HID: check for valid USB device for many HID drivers") Cc: [email protected] Link: https://bugs.archlinux.org/task/73048 Link: https://bugzilla.kernel.org/show_bug.cgi?id=215341 Link: https://lore.kernel.org/r/[email protected]/ Signed-off-by: Benjamin Tissoires <[email protected]>

The return value of devm_kzalloc() needs to be checked. To avoid hdev->dev->driver_data to be null in case of the failure of alloc. Fixes: 14c9c01 ("HID: add vivaldi HID driver") Cc: [email protected] Signed-off-by: Jiasheng Jiang <[email protected]> Signed-off-by: Benjamin Tissoires <[email protected]> Link: https://lore.kernel.org/r/[email protected]

After dropping mmu_lock in the TDP MMU, restart the iterator during tdp_iter_next() and do not advance the iterator. Advancing the iterator results in skipping the top-level SPTE and all its children, which is fatal if any of the skipped SPTEs were not visited before yielding. When zapping all SPTEs, i.e. when min_level == root_level, restarting the iter and then invoking tdp_iter_next() is always fatal if the current gfn has as a valid SPTE, as advancing the iterator results in try_step_side() skipping the current gfn, which wasn't visited before yielding. Sprinkle WARNs on iter->yielded being true in various helpers that are often used in conjunction with yielding, and tag the helper with __must_check to reduce the probabily of improper usage. Failing to zap a top-level SPTE manifests in one of two ways. If a valid SPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(), the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae If kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped by kvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form of marking a struct page as dirty/accessed after it has been put back on the free list. This directly triggers a WARN due to encountering a page with page_count() == 0, but it can also lead to data corruption and additional errors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Note, the underlying bug existed even before commit 1af4a96 ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed") moved calls to tdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could still incorrectly advance past a top-level entry when yielding on a lower-level entry. But with respect to leaking shadow pages, the bug was introduced by yielding before processing the current gfn. Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, or callers could jump to their "retry" label. The downside of that approach is that tdp_mmu_iter_cond_resched() _must_ be called before anything else in the loop, and there's no easy way to enfornce that requirement. Ideally, KVM would handling the cond_resched() fully within the iterator macro (the code is actually quite clean) and avoid this entire class of bugs, but that is extremely difficult do while also supporting yielding after tdp_mmu_set_spte_atomic() fails. Yielding after failing to set a SPTE is very desirable as the "owner" of the REMOVED_SPTE isn't strictly bounded, e.g. if it's zapping a high-level shadow page, the REMOVED_SPTE may block operations on the SPTE for a significant amount of time. Fixes: faaf05b ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU") Fixes: 1af4a96 ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed") Reported-by: Ignat Korchagin <[email protected]> Cc: [email protected] Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

The kvm_run struct's if_flag is a part of the userspace/kernel API. The SEV-ES patches failed to set this flag because it's no longer needed by QEMU (according to the comment in the source code). However, other hypervisors may make use of this flag. Therefore, set the flag for guests with encrypted registers (i.e., with guest_state_protected set). Fixes: f1c6366 ("KVM: SVM: Add required changes to support intercepts under SEV-ES") Signed-off-by: Marc Orr <[email protected]> Message-Id: <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]> Reviewed-by: Maxim Levitsky <[email protected]>

Attempting to compile on a non-x86 architecture fails with include/kvm_util.h: In function â€˜vm_compute_max_gfnâ€™: include/kvm_util.h:79:21: error: dereferencing pointer to incomplete type â€˜struct kvm_vmâ€™ return ((1ULL << vm->pa_bits) >> vm->page_shift) - 1; ^~ This is because the declaration of struct kvm_vm is in lib/kvm_util_internal.h as an effort to make it private to the test lib code. We can still provide arch specific functions, though, by making the generic function symbols weak. Do that to fix the compile error. Fixes: c8cc43c ("selftests: KVM: avoid failures due to reserved HyperTransport region") Cc: [email protected] Signed-off-by: Andrew Jones <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

Revert a relatively recent change that set vmx->fail if the vCPU is in L2 and emulation_required is true, as that behavior is completely bogus. Setting vmx->fail and synthesizing a VM-Exit is contradictory and wrong: (a) it's impossible to have both a VM-Fail and VM-Exit (b) vmcs.EXIT_REASON is not modified on VM-Fail (c) emulation_required refers to guest state and guest state checks are always VM-Exits, not VM-Fails. For KVM specifically, emulation_required is handled before nested exits in __vmx_handle_exit(), thus setting vmx->fail has no immediate effect, i.e. KVM calls into handle_invalid_guest_state() and vmx->fail is ignored. Setting vmx->fail can ultimately result in a WARN in nested_vmx_vmexit() firing when tearing down the VM as KVM never expects vmx->fail to be set when L2 is active, KVM always reflects those errors into L1. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 21158 at arch/x86/kvm/vmx/nested.c:4548 nested_vmx_vmexit+0x16bd/0x17e0 arch/x86/kvm/vmx/nested.c:4547 Modules linked in: CPU: 0 PID: 21158 Comm: syz-executor.1 Not tainted 5.16.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:nested_vmx_vmexit+0x16bd/0x17e0 arch/x86/kvm/vmx/nested.c:4547 Code: <0f> 0b e9 2e f8 ff ff e8 57 b3 5d 00 0f 0b e9 00 f1 ff ff 89 e9 80 Call Trace: vmx_leave_nested arch/x86/kvm/vmx/nested.c:6220 [inline] nested_vmx_free_vcpu+0x83/0xc0 arch/x86/kvm/vmx/nested.c:330 vmx_free_vcpu+0x11f/0x2a0 arch/x86/kvm/vmx/vmx.c:6799 kvm_arch_vcpu_destroy+0x6b/0x240 arch/x86/kvm/x86.c:10989 kvm_vcpu_destroy+0x29/0x90 arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 kvm_free_vcpus arch/x86/kvm/x86.c:11426 [inline] kvm_arch_destroy_vm+0x3ef/0x6b0 arch/x86/kvm/x86.c:11545 kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1189 [inline] kvm_put_kvm+0x751/0xe40 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1220 kvm_vcpu_release+0x53/0x60 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3489 __fput+0x3fc/0x870 fs/file_table.c:280 task_work_run+0x146/0x1c0 kernel/task_work.c:164 exit_task_work include/linux/task_work.h:32 [inline] do_exit+0x705/0x24f0 kernel/exit.c:832 do_group_exit+0x168/0x2d0 kernel/exit.c:929 get_signal+0x1740/0x2120 kernel/signal.c:2852 arch_do_signal_or_restart+0x9c/0x730 arch/x86/kernel/signal.c:868 handle_signal_work kernel/entry/common.c:148 [inline] exit_to_user_mode_loop kernel/entry/common.c:172 [inline] exit_to_user_mode_prepare+0x191/0x220 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x2e/0x70 kernel/entry/common.c:300 do_syscall_64+0x53/0xd0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: c8607e4 ("KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if !from_vmentry") Reported-by: [email protected] Reviewed-by: Maxim Levitsky <[email protected]> Cc: [email protected] Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

Synthesize a triple fault if L2 guest state is invalid at the time of VM-Enter, which can happen if L1 modifies SMRAM or if userspace stuffs guest state via ioctls(), e.g. KVM_SET_SREGS. KVM should never emulate invalid guest state, since from L1's perspective, it's architecturally impossible for L2 to have invalid state while L2 is running in hardware. E.g. attempts to set CR0 or CR4 to unsupported values will either VM-Exit or #GP. Modifying vCPU state via RSM+SMRAM and ioctl() are the only paths that can trigger this scenario, as nested VM-Enter correctly rejects any attempt to enter L2 with invalid state. RSM is a straightforward case as (a) KVM follows AMD's SMRAM layout and behavior, and (b) Intel's SDM states that loading reserved CR0/CR4 bits via RSM results in shutdown, i.e. there is precedent for KVM's behavior. Following AMD's SMRAM layout is important as AMD's layout saves/restores the descriptor cache information, including CS.RPL and SS.RPL, and also defines all the fields relevant to invalid guest state as read-only, i.e. so long as the vCPU had valid state before the SMI, which is guaranteed for L2, RSM will generate valid state unless SMRAM was modified. Intel's layout saves/restores only the selector, which means that scenarios where the selector and cached RPL don't match, e.g. conforming code segments, would yield invalid guest state. Intel CPUs fudge around this issued by stuffing SS.RPL and CS.RPL on RSM. Per Intel's SDM on the "Default Treatment of RSM", paraphrasing for brevity: IF internal storage indicates that the [CPU was post-VMXON] THEN enter VMX operation (root or non-root); restore VMX-critical state as defined in Section 34.14.1; set to their fixed values any bits in CR0 and CR4 whose values must be fixed in VMX operation [unless coming from an unrestricted guest]; IF RFLAGS.VM = 0 AND (in VMX root operation OR the “unrestricted guest” VM-execution control is 0) THEN CS.RPL := SS.DPL; SS.RPL := SS.DPL; FI; restore current VMCS pointer; FI; Note that Intel CPUs also overwrite the fixed CR0/CR4 bits, whereas KVM will sythesize TRIPLE_FAULT in this scenario. KVM's behavior is allowed as both Intel and AMD define CR0/CR4 SMRAM fields as read-only, i.e. the only way for CR0 and/or CR4 to have illegal values is if they were modified by the L1 SMM handler, and Intel's SDM "SMRAM State Save Map" section states "modifying these registers will result in unpredictable behavior". KVM's ioctl() behavior is less straightforward. Because KVM allows ioctls() to be executed in any order, rejecting an ioctl() if it would result in invalid L2 guest state is not an option as KVM cannot know if a future ioctl() would resolve the invalid state, e.g. KVM_SET_SREGS, or drop the vCPU out of L2, e.g. KVM_SET_NESTED_STATE. Ideally, KVM would reject KVM_RUN if L2 contained invalid guest state, but that carries the risk of a false positive, e.g. if RSM loaded invalid guest state and KVM exited to userspace. Setting a flag/request to detect such a scenario is undesirable because (a) it's extremely unlikely to add value to KVM as a whole, and (b) KVM would need to consider ioctl() interactions with such a flag, e.g. if userspace migrated the vCPU while the flag were set. Cc: [email protected] Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: Maxim Levitsky <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

Update the documentation for kvm-intel's emulate_invalid_guest_state to rectify the description of KVM's default behavior, and to document that the behavior and thus parameter only applies to L1. Fixes: a27685c ("KVM: VMX: Emulate invalid guest state by default") Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: Maxim Levitsky <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

…tate Add a selftest to attempt to enter L2 with invalid guests state by exiting to userspace via I/O from L2, and then using KVM_SET_SREGS to set invalid guest state (marking TR unusable is arbitrary chosen for its relative simplicity). This is a regression test for a bug introduced by commit c8607e4 ("KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if !from_vmentry"), which incorrectly set vmx->fail=true when L2 had invalid guest state and ultimately triggered a WARN due to nested_vmx_vmexit() seeing vmx->fail==true while attempting to synthesize a nested VM-Exit. The is also a functional test to verify that KVM sythesizes TRIPLE_FAULT for L2, which is somewhat arbitrary behavior, instead of emulating L2. KVM should never emulate L2 due to invalid guest state, as it's architecturally impossible for L1 to run an L2 guest with invalid state as nested VM-Enter should always fail, i.e. L1 needs to do the emulation. Stuffing state via KVM ioctl() is a non-architctural, out-of-band case, hence the TRIPLE_FAULT being rather arbitrary. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: Maxim Levitsky <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

…en/tip Merge xen fixes from Juergen Gross: "Fixes for two issues related to Xen and malicious guests: - Guest can force the netback driver to hog large amounts of memory - Denial of Service in other guests due to event storms" * 'xsa' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/netback: don't queue unlimited number of packages xen/netback: fix rx queue stall detection xen/console: harden hvc_xen against event channel storms xen/netfront: harden netfront against event channel storms xen/blkfront: harden blkfront against event channel storms

…inux/kernel/git/broonie/regulator Pull regulator fix from Mark Brown: "Binding fix for v5.16 This fixes problems validating DT bindings using op_mode which wasn't described as it should have been when converting to DT schema" * tag 'regulator-fix-v5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: dt-bindings: samsung,s5m8767: add missing op_mode to bucks

…ernel/git/broonie/spi Pull spi fix from Mark Brown: "One small fix for a long standing issue with error handling on probe in the Armada driver" * tag 'spi-fix-v5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: change clk_disable_unprepare to clk_unprepare

…t/rdma/rdma Pull rdma fixes from Jason Gunthorpe: "Last fixes before holidays. Nothing very exciting: - Work around a HW bug in HNS HIP08 - Recent memory leak regression in qib - Incorrect use of kfree() for vmalloc memory in hns" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/hns: Replace kfree() with kvfree() IB/qib: Fix memory leak in qib_user_sdma_queue_pkts() RDMA/hns: Fix RNR retransmission issue for HIP08

…/git/hid/hid Pull HID fixes from Jiri Kosina: - NULL pointer dereference fix in Vivaldi driver (Jiasheng Jiang) - regression fix for device probing in Holtek driver (Benjamin Tissoires) * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: potential dereference of null pointer HID: holtek: fix mouse probing

Drop a check that guards triggering a posted interrupt on the currently running vCPU, and more importantly guards waking the target vCPU if triggering a posted interrupt fails because the vCPU isn't IN_GUEST_MODE. If a vIRQ is delivered from asynchronous context, the target vCPU can be the currently running vCPU and can also be blocking, in which case skipping kvm_vcpu_wake_up() is effectively dropping what is supposed to be a wake event for the vCPU. The "do nothing" logic when "vcpu == running_vcpu" mostly works only because the majority of calls to ->deliver_posted_interrupt(), especially when using posted interrupts, come from synchronous KVM context. But if a device is exposed to the guest using vfio-pci passthrough, the VFIO IRQ and vCPU are bound to the same pCPU, and the IRQ is _not_ configured to use posted interrupts, wake events from the device will be delivered to KVM from IRQ context, e.g. vfio_msihandler() | |-> eventfd_signal() | |-> ... | |-> irqfd_wakeup() | |->kvm_arch_set_irq_inatomic() | |-> kvm_irq_delivery_to_apic_fast() | |-> kvm_apic_set_irq() This also aligns the non-nested and nested usage of triggering posted interrupts, and will allow for additional cleanups. Fixes: 379a3c8 ("KVM: VMX: Optimize posted-interrupt delivery for timer fastpath") Cc: [email protected] Reported-by: Longpeng (Mike) <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

…git/cel/linux Pull nfsd fix from Chuck Lever: "Address a buffer overrun reported by Anatoly Trosinenko" * tag 'nfsd-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: Fix READDIR buffer overflow

Pull kvm fixes from Paolo Bonzini: - Fix for compilation of selftests on non-x86 architectures - Fix for kvm_run->if_flag on SEV-ES - Fix for page table use-after-free if yielding during exit_mm() - Improve behavior when userspace starts a nested guest with invalid state - Fix missed wakeup with assigned devices but no VT-d posted interrupts - Do not tell userspace to save/restore an unsupported PMU MSR * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU KVM: selftests: Add test to verify TRIPLE_FAULT on invalid L2 guest state KVM: VMX: Fix stale docs for kvm-intel.emulate_invalid_guest_state KVM: nVMX: Synthesize TRIPLE_FAULT for L2 if emulation is required KVM: VMX: Always clear vmx->fail on emulation_required selftests: KVM: Fix non-x86 compiling KVM: x86: Always set kvm_run->if_flag KVM: x86/mmu: Don't advance iterator after restart due to yielding KVM: x86: remove PMU FIXED_CTR3 from msrs_to_save_all

…git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Fix a recent regression causing the loop in dpm_prepare() to become infinite if one of the device ->prepare() callbacks returns an error" * tag 'pm-5.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: sleep: Fix error handling in dpm_prepare()

…linux-master

rtg-canonical and others added 30 commits November 22, 2021 11:22

torvalds and others added 28 commits December 19, 2021 11:40

Linux 5.16-rc6

a7904a5

Merge branch 'master' of https://github.com/archlinux/linux into arch…

218a9f2

…linux-master

Vylpes merged commit 3a34597 into master Dec 28, 2021

Vylpes deleted the archlinux-master branch December 28, 2021 14:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes 27/12/21 #3

Changes 27/12/21 #3

Vylpes commented Dec 28, 2021

Changes 27/12/21 #3

Changes 27/12/21 #3

Conversation

Vylpes commented Dec 28, 2021